Today, it is possible to salvage precious data from protein crystals that have been damaged by synchrotron radiation, which can have a direct effect on the analysis and treatment of deadly diseases such as COVID-19. The ability to squeeze every bit of data from an experiment can make all the difference. In MX however, there has been a specific problem related to the incomplete datasets from radiation sensitive and small crystals. A research team at Diamond has developed software named xia2.multiplex which now allows users to analyse small slices of structural data and assemble them on the fly to know how many more slices are needed to create a complete dataset.
Things didn’t used to be like this, though. 10 years ago, sitting down to collect data at a macromolecular crystallography (MX) beamline was a very different experience. At that time, most of the structures came from complete datasets for a given protein. Growing perfect, diffraction-quality crystals is hard and, in some cases, impossible. Even when crystals can be formed, sometimes the conditions required to form the crystals can alter the natural structure of a protein.
This inevitably meant that the structure of some proteins were significantly more challenging to solve, which was a huge limitation in many areas of biological research. However, as scientific techniques advanced, it became possible to image smaller and smaller crystals at more physiologically relevant temperatures. It also became possible to collect data from much smaller crystals.
Innovations often bring new challenges with them, and this was no exception. While we had the ability to collect data from difficult samples, interpretation of the data was another matter. Collecting complete data from large single crystals was the gold standard for crystallography. This means that much of the experience, methods and software have been developed around this gold standard. As users attempt to obtain structures for ever more challenging systems, we need new tools to properly analyse this new and incomplete data.
Dr Sam Horrell, Beamline Scientist on I24 explained;
For example, with a large single crystal at cryogenic temperatures it is a fairly routine experiment to collect a complete data set from a single crystal. With small crystals or radiation sensitive protein crystals it is often necessary to collect small wedges of data instead of a complete dataset (up to 20 degrees), which means we require multiple crystals for a complete dataset and a quick and efficient way to know when we have collected enough data for the dataset to be complete.
This led to several small incomplete datasets which have to be combined to get a complete picture. But it was hard to know when you had a good data set or if radiation damage was affecting the sample. If there were problems with data collection, in theory it would be possible to fix if you knew about it in time, however, beamtimes are short and often, key information about the dataset was only uncovered after the experiment was over.
These challenges in MX have meant that software and automation have played a pivotal role in the development of the field. Continually updating and upgrading the software tools available to synchrotron users means that more can be done with the precious data in a shorter amount of time. A good example of this is the DIALS software package developed by Diamond, CCP4 and Lawrence Berkeley National Laboratory. This collection of tools is designed to make data collection and analysis easier and faster.
In MX however, there was a specific problem related to the incomplete datasets from radiation sensitive and small crystals. There were solutions available, but they required a lot of manual input which took a long time. In a publication in Acta Crystallographica, the Diamond team explore how, if these manual steps can be automated, researchers at the beamline will be able to get almost real-time information on their data collection.
This rapid analysis allows scientists to make important decisions about data acquisition to make sure there won’t be any gaps in their results. The software named xia2.multiplex allows users to analyse small slices of structural data and assemble them on the fly to know how many more slices are needed to create a complete dataset.
Developing software for the real world is a challenging process and this was no different. Lots of scientific software are developed around good quality, well characterised sample data that behaves entirely as expected, all the time. Shifting over into real-world data can be the death of a promising software idea and there were many challenges getting ready to release xia2.multiplex into the wild.
Lead author of the study, Dr Richard Gildea, a Data Analysis Scientist for MX, noted;
All of the rare ‘corner case’ data sets that didn’t fit perfectly with the models had to be anaylsed and accounted for, which turned out to be a long and complicated process.
However, the work and perseverance paid off. The software was developed in the middle of the COVID-19 pandemic and was ultimately tested on SARS-CoV-2 proteins over the course of four beamtimes.
Dr Sam Horrell, who worked on the data collection, described the process,
At the beginning, there was a lot of manual intervention required to make sure the software properly processed the data. However, at some point in the middle of the beamtimes, we realised that the software could be left to ‘just work’ in the background.
At this point the real benefit of the software could be seen. Users collecting the data were suddenly getting real-time feedback on how to collect the best data possible. In high-stakes experiments such as those with SARS-CoV-2, the benefit of this is clear.
This software is now freely available for anyone in the world to use. Work like this has propelled the area of multi-crystal data collection from being a highly specialised technique for a select few to a valuable tool at the disposal of many molecular biologists. Innovations like this help to make complex techniques more accessible.
Gildea R. J. et al. xia2.multiplex: a multi-crystal data-analysis pipeline. Acta Cryst. D 78, 752-769 (2022). DOI: 10.1107/S2059798322004399
Diamond Light Source is the UK's national synchrotron science facility, located at the Harwell Science and Innovation Campus in Oxfordshire.
Copyright © 2022 Diamond Light Source
Diamond Light Source Ltd
Diamond House
Harwell Science & Innovation Campus
Didcot
Oxfordshire
OX11 0DE
Diamond Light Source® and the Diamond logo are registered trademarks of Diamond Light Source Ltd
Registered in England and Wales at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom. Company number: 4375679. VAT number: 287 461 957. Economic Operators Registration and Identification (EORI) number: GB287461957003.