Fantastic Four & the Summer Data Sharing

It is that time of a year again when things get even busier in the Data Team with our annual summer programme that’s aiming to archive legacy data with the participation of young scientists. In this blog we wanted to highlight the impact this initiative has had on the wealth of structural data you can now access and share some insights from our amazing young scientists.

 

Recruiting our fantastic four

The pandemic reality of working from home had a positive effect on this project and allowed us to recruit more widely. The response to our recruitment was overwhelming and it was surprising even for us, keen CIF afficionados, just how many beautiful young minds there are who care about the data, preserving it and sharing, as much as we do! We had a tough time narrowing down the candidates but in July along came four of them – Aaron Horner, Matthew Stout, Rosie Lester and Sanziana Foia.

 

Data preservation

In just three weeks they created over 300 CIF data sets ready for curation into the Cambridge Structural Database (CSD). We had many sources of the hardcopy data again this year; from structures published in scientific journals (for various reasons not available in electronic format), through to historic data received directly from authors that for years have only been stored in paper format (some even handwritten) to previously undiscovered data published in Russian papers, that have been waiting to be digitalised for over 30 years.

 

Perspectives from our young scientists

To understand what this work means for users of the CSD and to our early career scientists we asked our summer students a few questions.

First up we asked Rosie Lester and Sanziana Foia: what's the most memorable structure you have helped to convert so far?

 

For Rosie a structure from Russia stood out:

“The structure itself was a fairly nondescript organic molecule, but it was amazing to finally produce a digital record of a structure determined 35 years ago in Russia. Visualizing the atomic coordinates and being able to see someone’s hand drawn sketch in a sharable format was definitely a moment to remember - I really felt I had contributed to the crystallography community when that was deposited!”

Image of OXOVUV (DOI: 10.5517/ccdc.csd.cc28bjdk) converted by Rosie Lester published by V. N. Nesterov, V. E. Shklover, Yu. T. Struchkov, V. V. Korshak, A. L. Rusanov, Z. B. Shifrina, Izvestiya Akademii Nauk SSSR, Seriya Khimicheskaya, 1988, 37, 66 DOI: 10.1007/BF00962659

 

While Sanziana was clearly taken with a structure that could help to treat cancer:

“I helped convert the molecule of Epacadostat, an IDO-1 inhibitor involved in cancer immunotherapy. IDO-1, or indoleamine-2,3-dioxygenase, is an immunomodulator that is central to fetal protection from the maternal immune system. However, some cancers can hijack this immunomodulatory mechanism to promote local tolerance to cancer. Epacadostat pomises to subvert this mechanism by blocking IDO-1 in cancer patients”

CSD refcode EYALUO (DOI:10.5517/ccdc.csd.cc28fb9c) structure published by  Eddy W. Yue, Richard Sparks, Padmaja Polam, Dilip Modi, Brent Douty, Brian Wayland, Brian Glass, Amy Takvorian, Joseph Glenn, Wenyu Zhu, Michael Bower, Xiangdong Liu, Lynn Leffet, Qian Wang, Kevin J. Bowman, Michael J. Hansbury, Min Wei, Yanlong Li, Richard Wynn, Timothy C. Burn, Holly K. Koblish, Jordan S. Fridman, Tom Emm, Peggy A. Scherle, Brian Metcalf, Andrew P. Combs, ACS Medicinal Chemistry Letters, 2017, 8, 486 DOI: 10.1021/acsmedchemlett.6b00391 converted by Sanziana Foia & visualised using Mercury

 

Converting the data into electronic format is only one part of the journey that data archived in that way embarks on when added to our system. It also undergoes all the normal procedures, like curation and validation as each dataset is as valuable for us as the other. Users of our database may not always think of single dataset the same way we do since they use CSD as a collection, but I bet Matthew Stout will be very pleased seeing the structures he typed out when he uses the CSD in the future: what has given you most satisfaction during the summer project so far?

"Being able to contribute to the database that I had used throughout my studies at university, especially in my final year project, and to understand how and why it works. It has been satisfying learning what goes on behind the scenes and how the database is compiled, helping to complete the picture of what the CCDC is all about."

 

To Sanziana Foia we asked: what surprised you about the project?

“I was impressed about the thoroughness that each structure is treated with at the CCDC, from deposition to editing, curation and validation. I learned how multiple teams work together to hold each database entry at a high standard of scientific accuracy. Being immersed in this project made me realize the amount of work going on behind “the curtains” that keeps running such a large database of structures.”

 

The benefits of this initiative for our users and the CCDC are obvious and as long as we have the resources and there are still unshared datasets our summer programme will continue in the years to come. Not only are the students helping us increase the wealth of datasets available through our free Access Structures service it looks like it is helping us to build a new generation of scientists who know the importance of data completeness because they learned it the hard way!

This is what Aaron Horner had to say when we asked: will any parts of the project be useful for your future scientific career?

"Knowing the woes of improper data (with aspects like temperature, R factor etc.) missing. I will certainly ensure all crystallographic data is complete, legible & in-line with CIF standards (assuming I don’t submit it myself!)"

 

With their first three weeks complete the summer students have now moved on to help us make scientific improvements to the database as well as new educational videos so you may come across more of their work in the CSD or on our YouTube channel soon! With our summer now almost over Matt, Sanziana, Aaron and Rosie will soon move on to pursue their future careers, either in academia or industry. We are certainly hoping they deposit some of their own structures in the future too – we know they will be good! In the meantime, the team at CCDC will continue working on the handmade CIFs as we wish our Fantastic Four all the best for the future.

  

We would also like to extend our huge thanks to the researchers that have already been in touch with us so we could help share their data through this year’s summer programme and to Dr Anna Vologzhanina for helping us to identify structures in Russian language articles that had not been added to the CSD.

  

If you have historic data that is not currently available in CIF format that you would like help sharing through the CSD as a CSD Communications then find out more about how we can help you here.