Usability of publicly available datasets to help combat antimicrobial resistance
| 15 September, 2020 | Quentin Leclerc |
Quentin Leclerc, is a 3rd year PhD student at the London School of Hygiene & Tropical Medicine (LSHTM), focusing on antimicrobial resistance (AMR). His project combines mathematical modelling and lab work to try to improve our understanding of how AMR genes spread between bacteria. In this blog post, Quentin, discusses his and his colleagues’ article, published on Wellcome Open Research, and explains why they chose to use not one, but four, open datasets and the benefits of doing so.
AMR is a complex public health challenge, that requires interdisciplinary work on multiple angles to be solved. In this particular paper, which I wrote alongside other colleagues from LSHTM, we were interested in informing empiric antibiotic prescribing using publicly available AMR surveillance datasets. This type of therapy is provided before the pathogen infecting a patient and its potential resistances are known, and is therefore an informed guess based on the pathogens most likely to be responsible for the symptoms displayed by a patient, and the local resistance rates of these pathogens to different antibiotics.
Strength in numbers
We used four AMR surveillance datasets, which were the ECDC Surveillance Atlas, CDDEP ResistanceMap, WHO GLASS and the Pfizer ATLAS dataset. These are all datasets that tell us about resistance rates in different countries, for various combinations of bacteria and antibiotics. Using them all instead of only one was essential to enable us to focus on patient syndromes rather than bacteria species, which is more useful for empiric prescribing.
For example, if a patient has a bloodstream infection, the physician might determine that the most likely causative pathogens are E. coli or S. aureus. To maximise the chances of prescribing an effective antibiotic, the physician therefore needs information on local resistance rates for both of these bacteria species. Combining these four datasets therefore allowed us to have this level of information available in as many countries as possible, and to build our AMR index around these patient syndromes.
A transparent approach
I think that the key benefit of using these open access datasets is the resulting increased transparency in our approach. As readers can see the entire datasets themselves, they can better understand what we did with them, and how we calculated our results. For researchers, this facilitates further work building on our conclusions. Overall, anyone should be able to freely reproduce our results if they wish, and this requires publicly available datasets.
Open and easy access to tackle public health issues
These open datasets were central to our process of developing an antibiotic resistance surveillance tool. Thanks to these, we were able to cover a large number of countries, antibiotics and bacteria. For antibiotic resistance surveillance in general, open datasets are crucial. Fighting this public health challenge requires awareness of the extent of the problem, and therefore all researchers should be able to easily access information on resistance rates worldwide.
I would say this is even more crucial for low and middle income countries, where localised estimates of resistance rates (e.g. at a city level) might not be readily available. In such areas, having at least national data that can be openly accessed is an undeniable advantage.
The more the merrier, but we need to remove barriers first
Open data facilitates transparency and reproducibility. For the scientific community, allowing more people to access data allows for multiple different research questions to be addressed simultaneously, speeding up the overall fight against public health threats.
However, there are currently barriers which complicate this. For example, although the four AMR surveillance datasets we used are all open access, the formatting changes significantly between them. This is because a lot of open datasets are still currently only made available in a format that suits the researchers who created them to answer their own question, which is not necessarily a format that allows these datasets to be re-used easily by others.
To remove these barriers, the scientific community must increasingly adopt data sharing practices. For example, pushing for the design of formatting conventions for common datasets like AMR surveillance datasets will, in turn, facilitate the usability of these open datasets by as many researchers as possible.
Discovering the value of a new dataset
In 2018, Wellcome organised the Wellcome Data Reuse Prize competition, where the aim was to formulate an AMR research question and use the newly available ATLAS dataset from Pfizer to try answer it. Our paper describes the work our team from LSHTM submitted, which eventually won the Prize. Alongside the submissions of other participants, we were able to illustrate how such a new dataset from the pharmaceutical industry could be useful for AMR research. Hopefully, this will lead to further data sharing agreements between pharmaceutical companies and organisations like the Wellcome Trust, which I am sure could greatly benefit AMR research!
‘I would like to congratulate the authors with this original, comprehensive article,’ says Marlieke de Kraker, University of Geneva Hospitals, Switzerland, in their review of the article. Read Quentin Leclerc et al’s full Research Article along with the reviewer comments here. This piece is part of the #ECROpenData campaign, which you can find out more about and read other’s stories on data sharing here.