Celebrating 60 Years of the Cambridge Structural Database

Image of Olga Kennard

“The beginnings of the Cambridge Structural Database (CSD) go back to 1965, when J.D. Bernal and Olga Kennard had a vision that the collective use of data would lead to the discovery of new knowledge. 60 years later, the CSD is the world’s largest database of experimentally observed small molecule crystal structures.

Today, the CSD contains more than 1.3 million organic and metal-organic experimental crystal structures, contributed by over half a million authors from more than 110 countries. Data is shared via scientific articles, patents, theses, institutional repositories, and directly through CSD Communications. Each entry is curated and enriched by editors at the Cambridge Crystallographic Data Centre (CCDC), the non-profit organisation that curates the CSD.”

Read the full text: RSC CICAG Newsletter Winter 2025-26, pages 64-66. (PDF)

CSD Growth by Year

Cambridge Structural Database Growth by year

Enlarge image

Access Structures is the CCDC’s Cambridge Structural Database and FIZ Karlsruhe’s Inorganic Crystal Structure Database free service to view and retrieve structures.

Milestone: >20,000 Structures Deposited to the PDB in 2025

“The Worldwide Protein Data Bank (wwPDB) is proud to announce that it surpassed the milestone of 20,000 new structure depositions this year. This achievement reflects the impact of rapid advances in cryo-electron microscopy, high-throughput crystallography, enhanced computational and AI-driven modeling tools, and a long-standing commitment to open data sharing.

This milestone was supported by the expertise of wwPDB biocurators, who carefully review, validate, and enhance each structure with value-added annotations, ensuring entries are accurate, discoverable, and useful to researchers across disciplines.

We extend our sincere thanks to the global scientific community for their continued support and invaluable contributions. We look forward to reaching new milestones together for the advancement of science worldwide.”

Source: wwPDB 2025 News (12-16-2025)

InChI Update and Request for Interesting Molecules

Contribution from Prof. Jonathan Goodman, University of Cambridge, et al.

“The InChI developers are keen to have feedback on the extension of the InChI so it can become an even more effective identifier for organometallic and molecular inorganic structures.

Please try out the latest version. Remember to switch the version to “Latest with Molecular Inorganics” to experiment with the new features. This will generate a non-standard InChI which is being tested for all molecules including those with bonds to metal atoms.

If you find something that looks wrong, or that you particularly like, please let us know. Please send the InChI, the molfile of the structure (use the Save as… button on the web demo to generate a molfile from a sketch of a molecule) and the reason for your interest.”

All feedback welcome! Please send comments to: test@inchi-trust.org

Read the full text: RSC CICAG Newsletter Winter 2025-26, pages 14-15. (PDF)

Cheminformatics: A Digital History – Part 7. A Personal Recollection of 35 Years in the Chemical Information Trenches by Phil McHale

“In 1963 PM Harold Wilson warned that for the UK to prosper, it would need to be remade in the “white heat” of a “scientific revolution”. I was a product of this scientific revolution, earning my DPhil in synthetic organic chemistry in 1972, along with a huge cohort of other white heat-forged science PhDs.

In the 1970s the UK pharmaceutical industry – Glaxo, Wellcome, Boots, Imperial Chemical Industries (ICI), May & Baker, Fisons, Allen & Hanburys, Smith, Kline & French, Beecham; plus UK subsidiaries: Pfizer, Roche, Merck/Merck, Sharpe & Dohme, Eli Lilly, Bayer, Ciba-Geigy – organised the traditional “milk round” of job interviews, but none of these landed me a job as a research chemist.

So as an alternative, I became an assistant editor at the erstwhile Chemical Society in London. The Society itself was housed in prestigious Burlington House on Piccadilly, but its editorial staff were crammed into poky offices on the upper floors of an equally prestigious tailor on Savile Row. I don’t think the tailor appreciated the gaggle of scruffy assistant editors traipsing in through the main door every day.

The basic job requirements were an understanding of PhD level organic chemistry (I worked on J.C.S. Perkin I) and a good grasp of written English; everything else (editing, house style, printers’ marks, selecting referees, etc.)was taught on the job. And this was 53 years ago, so before personal computers, word processors, widespreademail, and chemical drawing programs.

My editorial stint taught me IUPAC nomenclature, rigorous English grammar and spelling, manuscript mark up for the printer, and how to use edge-punched cards. Some of these skills have served me well (and lasted longer) while others infected me with long-term GPS (Grammar Pedantry syndrome). I also took an extramural interest in developments in computer handling of chemical structures and attended Wiswesser Line Notation (WLN) courses organised by the Chemical Notation Association (UK).”

Read the full text: CICAG Distillate Winter 2025-26, 9-13. (PDF)

Professor Michael Felix Lynch

February 21, 1932 – November 15, 2024

One of the pioneers of chemical information, Michael (Mike) Felix Lynch, has died at the age of 92. Mike obtained B.Sc. and Ph.D. degrees in chemistry at University College, Dublin before conducting postdoctoral research at ETH Zürich with Vladimir Prelog, the recipient of the 1975 Nobel Prize for chemistry. He then worked for two years with Ciba-Geigy in Cambridge U.K., before moving to the United States in 1961 to work for Chemical Abstracts Service (CAS). At this time, CAS was conducting some of the first research anywhere on the use of computers to build searchable databases of both journal abstracts and of the structures of chemical compounds, and Mike directed much of this work when he became Head of the Basic Research Department.1,2

On returning to the United Kingdom in 1965, he was awarded a research grant by the Office of Scientific and Technical Information to conduct studies of the automatic indexing of textual documents. He joined the University of Sheffield’s Postgraduate School of Librarianship (as it was then named, now the Information School) to conduct this research, and spent the rest of his career there until his retirement in 1997. Mike can truly be regarded as the founding father of information science research at Sheffield: starting as a Senior Research Fellow, he joined the permanent staff in 1968, and in 1974 he was appointed to the personal chair which made him the first professor of information science in the United Kingdom.

Mike’s initial research in Sheffield involved the development of methods for the automated production of articulated subject indexes such as those used in the CAS subject indexes.3 This work was followed by initial studies of techniques for the indexing, storage, and retrieval of chemical reaction information. These studies, which focused on the identification of the reaction center involved in a transformation, continued for over a decade before the development of graph-matching techniques that, after extensive subsequent refinement, underlie modern reaction retrieval systems.4,5 While this work was under way he started to develop and test systematic approaches for the selection of fragment substructures6 for screening chemical substructure searches, and was also responsible for probably the first book anywhere on the computer handling of chemical structures.7 In addition to their use in chemical information systems, the selection techniques developed for screening subsequently found application in several types of textual processing, for example, for compression, for sorting, and for searching online catalogs.8

Much of the second half of Mike’s Sheffield career, from 1980 onwards, was devoted to the development of techniques for the representation, storage, and retrieval of the generic (Markush) structures that characterize many chemical patents. This work extended over some 15 years and was carried out in collaboration with CAS, Derwent, and International Documentation for Chemistry (IDC).9,10 Other areas of interest in this period included the use of parallel computer architectures for database searching11 and the application of natural language processing methods to the textual components of chemical patents.12

When Mike arrived in Sheffield in the mid-1960s, there was very little computer-related research being undertaken in library schools in the United Kingdom. His work established a tradition that has enabled the Sheffield school to play a leading role over the years in the development not just of chemical information but also of information science more generally, with several of the researchers that he supervised subsequently following academic careers in Sheffield.

During his career he received many awards, perhaps most notably the 1989 Herman Skolnik Award of the ACS Division of Chemical Information for his “pioneering research of more than two decades on the development of methods for the storage, manipulation, and retrieval of chemical structures and reactions as well as related bibliographic information”. A further mark of recognition was the 2002 creation by the Chemical Structure Association Trust of an award in his name to recognize and encourage “outstanding accomplishments in education, research, and development activities that are related to the systems and methods used to store, process, and retrieve information about chemical structures, reactions, and properties”.

Mike had a huge impact not just on our own careers, but upon the whole profession, yet he will be remembered not only for his outstanding professional achievements, but also for his remarkable personal qualities. He was a practicing Roman Catholic and a devoted family man. He loved classical chamber music. He was kind and gently spoken. To journal editors who needed papers reviewing, Mike was known as a real “softie”: he looked for the value in everyone’s work and was never excessively critical. These qualities are well illustrated by the comments of one of his doctoral students, John Barnard: “Mike was generous in the independence he gave his students, without in any way stinting in the support he offered. When he was invited to talk about the Sheffield Generic Chemical Structures project at a conference, just six months after its inception, he immediately suggested that my fellow student on the project, Steve Welford, and I should do the talking instead. That immediately projected Steve and me into the center of the international chemical information community and provided us with contacts that directly benefited our subsequent careers. Mike involved us in all aspects of the development of the project, including discussions with potential funders, recruitment of additional staff, and as shareholders in its commercial exploitation. Mike’s easygoing and gentle Irish character made him an ideal companion on social occasions, whether in the bar at conferences, for pub lunches, or at the informal dinners at his home so often given for scientists visiting our project. I literally owe my entire career in chemical information systems to Mike, and it was a privilege and a pleasure to have been one of his students”.

Michael’s first wife, Mary, died of a heart condition in 1993. In 1995, he married Mary Dykstra, a professor at Dalhousie University in Halifax, Nova Scotia, Canada who had studied in Michael’s department at the University of Sheffield. They had a long and happy retirement, traveling extensively and living in both Sheffield and Nova Scotia. Mary survives him. Our sympathies also go to Catherine and Kevin, his children by his first wife; his stepsons, Mark and Jeffery Dykstra; his eight grandchildren; and other members of his extended family worldwide.

References

  1. Dyson, G. M.; Lynch, M. F. Chemical-biological activities: a computer-produced express digest. J. Chem. Doc. 1963, 3, 81-85. https://doi.org/10.1021/c160009a011
  2. Cossum, W. E.; Krakiwsky, M. L.; Lynch, M. F. Advances in automatic chemical substructure searching techniques. J. Chem. Doc. 1965, 5 (1), 33-35. https://doi.org/10.1021/c160016a006
  3. Armitage, J. E.; Lynch, M. F. Articulation in the generation of subject indexes by computer. J. Chem. Doc. 1967, 7 (3), 170-178. https://doi.org/10.1021/c160026a010
  4. Armitage, J. E.; Lynch, M. F. Automatic detection of structural similarities among chemical compounds. J. Chem. Soc. C 1967, 521-528. https://doi.org/10.1039/j39670000521
  5. Lynch, M. F.; Willett, P. The automatic detection of chemical reaction sites. J. Chem. Inf. Comput. Sci. 1978, 18 (3), 154-159. https://doi.org/10.1021/ci60015a009
  6. Adamson, G. W.; Cowell, J.; Lynch, M. F.; McLure, A. H. W.; Town, W. G.; Yapp, A. M. Strategic considerations in the design of a screening system for substructure of chemical structure files. J. Chem. Doc. 1973, 13 (3), 153-157. https://doi.org/10.1021/c160050a013
  7. Lynch, M.; Harrison, J. M.; Town, W. G.; Ash, J. Computer Handling of Chemical Structure Information, Macdonald, London and American Elsevier, New York, 1971. http://openlibrary.org/books/OL4770313M/Computer_handling_of_chemical_structure_information
  8. Lynch, M. F. Variety generation—a reinterpretation of Shannon’s mathematical theory of communication, and its implications for information science. J. Am. Soc. Inf. Sci. 1977, 28 (1), 19-25. https://doi.org/10.1002/asi.4630280104
  9. Lynch, M. F.; Barnard, J. M.; Welford, S. M. Computer storage and retrieval of generic chemical structures in patents. 1. Introduction and general strategy. J. Chem. Inf. Comput. Sci. 1981, 21 (3), 148-150. https://doi.org/10.1021/ci00031a009
  10. Lynch, M. F.; Holliday, J. D. The Sheffield generic structures project-a retrospective review. J. Chem. Inf. Comput. Sci. 1996, 36 (5), 930-936. https://doi.org/10.1021/ci950173l
  11. Brint, A. T.; Gillet, V. J.; Lynch, M. F.; Willett, P.; Manson, G. A.; Wilson, G. A. Chemical graph matching using transputer networks. Parallel Comput. 1988, 8 (1), 295-300. https://doi.org/10.1016/0167-8191(88)90133-0
  12. Lawson, M.; Kemp, N.; Lynch, M. F.; Chowdhury, G. G. Automatic extraction of citations from the text of English-language patents – an example of template mining. J. Inf. Sci. 1996, 22 (6), 423-436. https://doi.org/10.1177/016555159602200604

Wendy Warr and Peter Willett

January 18, 2025

See also:

ChemTalks: A New (Free) Meeting

Submitted by Dr. Wendy A. Warr

The first ever ChemTalks is a step up from Chemaxon’s user group meetings to a fully-fledged conference in Basel, Switzerland on 25th September 2024. This free, one-day live event will bring you insights from renowned industry experts on using technology to bridge silos in early-stage drug discovery. It will also provide you with a quick look at Chemaxon’s product updates, and make sure you have lots of networking opportunities. The meeting will be followed by a free reception. The speakers are:

  • Karl-Heinz Baringhaus, Site Director R&D Frankfurt, Sanofi
  • Nessa Carson, Associate Principal Scientist, Digital Champion, AstraZeneca
  • Josef Eiblmaier, Head of Research, Discovery and Preclinical, PharmaLex, a Cencora company
  • Peter Ertl, Formerly Director of Cheminformatics, Biomedical Research, Novartis
  • Jeremy Frey, Head Computational Systems Chemistry, University of Southampton
  • Thrasyvoulos Karydis, Co-founder, Chief Technology Officer, DeepCure
  • Jessica Lanini, Biomedical Research, Novartis
  • Timur Madzhidov, Senior Product Manager in Chemistry Innovation, Elsevier
  • Adrian Stevens, Chief Product Office, ChemAxon
  • Becky Upton, President, Pistoia Alliance

There is no charge for attendance, but registration is required: https://chemaxon.com/chemtalks-2024

InChI Version 1.07 Approved

IUPAC InChI moves to GitHub to support sustainable chemical standards development

“The InChI (International Chemical Identifier) is a widely used standard chemical identifier that enables the connection and interoperability of chemistry data across the web. The core code and development framework of the InChI has now been migrated to GitHub, providing a foundation to support future extensions of the standard and associated applications, and to broaden the expertise supporting the standard. The first milestone of this work is the 1.07 version, recently approved by IUPAC (International Union for Pure and Applied Chemistry) and the InChI Trust and available for download at GitHub.

The InChI has been used to identify unique chemical structures for over 20 years and is implemented in chemistry toolkits and in databases across the world. The work to maintain and develop the software rested on the shoulders of one expert, Igor Pletnev, and following his sad death in 2021, it became urgent to move the code to a more open model. Completion of this move with a new fully tested and approved release makes the development of this digital standard more transparent. It will also enable the extension of the InChI to further classes of chemical compounds, in particular inorganic and organo-metallic compounds, as well as future modernisation of the codebase.”  

Read more: InChI Trust News Release, July 16, 2024.

“For over 20 years, InChI (International Chemical Identifier) has uniquely identified chemical structures in databases, journal articles, and supplier catalogues, across the web. It helps quickly determine if a chemical compound in a library is present in an internal repository or match data about chemical compounds across different databases. Reliable standards like InChI are essential for the FAIR principles (Findable, Accessible, Interoperable, Re-usable) in chemical data exchange.”  

Read more: Fit for Future: InChI Standard Moves to GitHub. ChemistryViews, July 28, 2024.

Summer 2024 Newsletter for RSC CICAG Published

RSC Interest Group: Chemical Information and Computer Applications Group (CICAG) aims to keep its members abreast of the latest activities, services and developments in all aspects of chemical information, from generation through to archiving, and in the computer applications used in this rapidly changing area, through meetings, newsletters and professional networking.

RSC CICAG publish a Newsletter quarterly to keep members in touch with the Group’s activities and includes articles, reviews of interest, news and events. The Summer 2024 issue of the newsletter contains 97 pages. Dr. Helen Cooke FRSC is the Newsletter Editor, and she does an excellent job! Selected articles of potential interest include:

  • Cheminformatics: A Digital History – Part 5. Cheminformatics at Indiana University by Dr. Gary D. Wiggins
  • 100 Years of Markush by Dr Anne Jones and Stuart Newbold
  • Molecule Normalisation with InChI and SMILES Processing by Prof. Jonathan M. Goodman, Prof. Yusuf Hamied, and Vincent F. Scalfani
  • Dr George W.A. Milne (‘Bill’ Milne), 1 May 1937 – 22 November 2023, by Dr. Wendy Warr

Note: Prof. Jonathan M. Goodman, Vincent F. Scalfani, and Dr. Wendy Warr are CSA Trustees.

Chemical Information and Computation 2023, Number Two. Fall 2023 ACS National Meeting in San Francisco

I am excited to announce the upcoming issue in my renowned report series. My reports are essential reading if you want to keep up to date with recent developments in cheminformatics and computational chemistry. As usual, I have transcribed technical presentations in detail, and have added value by including web links, expanding abbreviations, correcting errors, and carefully checking the literature references. License the current report and you will benefit from reading about symposia with these themes:

  • machine learning and AI in chemistry
  • AI and predictive analytics for chemical reactions
  • algorithm development and data analysis in chemical space
  • helping chemists manage their data
  • cheminformatics: toward democratization and open science
  • enhance your data: smart ways to metadata and knowledge graphs.

The report starts with news of recent developments at 50 organizations in the computational chemistry, cheminformatics, chemical information, and publishing markets, plus news about people, awards, and more. Posters and more material appear in appendices. Waste no time. Order now for delivery at the beginning of March 2024.

The 50% discount for academia and government continues, and within large organizations, academic or industrial, licensees can share the report with hundreds, or even thousands of colleagues for little more than twice the basic price.

The contents list is at https://www.warr.com/morepubs.html.

Order forms are at https://www.warr.com.

Dr. Wendy A. Warr

Report by Wendy Warr on Herman Skolnik Award Symposium 2023 Honoring Patrick Walters is Freely Available

At the ACS National Fall Meeting for 2023, Patrick Walters received the Herman Skolnik Award for his contributions to the fields of chemical information and cheminformatics applied to computer-aided drug discovery research. A report about talks given at this award symposium was written for the ACS CINF Chemical Information Bulletin by Wendy Warr (wendy@warr.com) where it is scheduled to appear in March 2024. This report is also freely available from her website.

Learn more:

Oral presentations at this award symposium

  • New ways of working. How virtual compound generators and retrosynthesis prediction are changing the way we work. Speaker: Clara Christ of Bayer, Germany (with co-authors Yannic Alber, Hans Briem, Michael Hahn, Gary Hermann, Florian Koelling, Mario Lobell, Georg Mogk, Florian Mrugalla, and Michael Schimeczek).
  • Complementing the medicinal chemists’ toolbox with cloud-based deep learning models. Speaker: Barry Bunin of Collaborative Drug Discovery (CDD).
  • Unlocking the potential of computational chemistry via integration of tools, data, and people. Speakers: Yakov Pechersky, Eric Manas of Treeline Biosciences.
  • Hierarchical splitting: a novel method for data splitting that improves model performance in real-world drug discovery. Speaker: Ankit Gupta, Reverie Labs.
  • Binding structures and strengths through generative machine learning. Speakers Hannes Stärk and Bowen Jing of Massachusetts Institute of Technology.
  • Machine learning in computer-aided drug discovery is harder than you might think. Speakers: Ajay Jain and his co-author Ann Cleves are Vice Presidents of Research and Application Science, respectively, of the BioPharmics Division of Optibrium.
  • Molecular modeling evolves beyond metrics of similarity. “A sea-change into something rich and strange.” Speaker: Anthony Nicholls of OpenEye, Cadence Molecular Sciences.
  • A rising tide lifts all boats. Speaker: Georgia McGaughey of Vertex Pharmaceuticals.
  • Explainable machine learning in drug discovery. Speaker: Jürgen Bajorath of the University of Bonn, Germany.
  • Tributes to Pat Walters. Speaker: Mark Murcko of Relay Therapeutics.
  • Artificial intelligence in drug discovery: revolution, evolution, or complete nonsense. Speaker: Pat Walters of Relay Therapeutics gave the award address.