TY - JOUR
T1 - Factors influencing taxonomic unevenness in scientific research
T2 - A mixed-methods case study of non-human primate genomic sequence data generation: Why do scientists study certain species?
AU - Hernandez, Margarita
AU - Shenk, Mary K.
AU - Perry, George H.
N1 - Funding Information:
Ethics. The human subjects research component of this study was approved by Penn State’s Institutional Review Board (IRB) under the study number STUDY00008181. As per our IRB agreement and to protect the confidentiality of our research participants, interviews from the qualitative portion of our work are not available. All metadata associated with our grounded theory analysis of the interview data are presented within the manuscript. Data accessibility. Data and relevant code for this research work are stored in GitHub: https://github.com/maggiehern/ PrimateGenomeProject and have been archived within the Zenodo repository: https://doi.org/10.5281/zenodo. 4011305 [35]. As per our IRB agreement and to protect the confidentiality of our research participants, interviews from the qualitative portion of our work are not available. All metadata associated with our grounded theory analysis of the interview data are presented within the manuscript. Authors’ contributions. M.H. and G.H.P came up with the research project and design. M.H. carried out the research and analyses. G.H.P and M.K.S provided analytical advice. M.H. and G.H.P wrote the manuscript. M.H., G.H.P. and M.K.S edited the manuscript. Competing interests. We declare we have no competing interests. Acknowledgements. We thank the participants of this research study who took the time out of their days to speak with us and contribute to our project. Additionally, we thank Nicholas Triozzi, Dylan Davis, Kathleen Grogan, Christina Bergey, Julie White and Stephanie Marciniak for their discussion and analytical advice. We also thank Max Fancourt from the IUCN Red List and Sarah Zehr from the Duke Lemur Center for their assistance in data access. This material is based upon work supported by the National Science Foundation (NSF) Graduate Research Fellowship Program under grant no. DGE1255832 (to M.H.) and by NSF grant BCS-1554834 (to G.H.P.). Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Publisher Copyright:
© 2020 The Authors.
PY - 2020/9/1
Y1 - 2020/9/1
N2 - Scholars have noted major disparities in the extent of scientific research conducted among taxonomic groups. Such trends may cascade if future scientists gravitate towards study species with more data and resources already available. As new technologies emerge, do research studies employing these technologies continue these disparities? Here, using non-human primates as a case study, we identified disparities in massively parallel genomic sequencing data and conducted interviews with scientists who produced these data to learn their motivations when selecting study species. We tested whether variables including publication history and conservation status were significantly correlated with publicly available sequence data in the NCBI Sequence Read Archive (SRA). Of the 179.6 terabases (Tb) of sequence data in SRA for 519 non-human primate species, 135 Tb (approx. 75%) were from only five species: rhesus macaques, olive baboons, green monkeys, chimpanzees and crab-eating macaques. The strongest predictors of the amount of genomic data were the total number of non-medical publications (linear regression; r 2 = 0.37; p = 6.15 × 10 -12) and number of medical publications (r 2 = 0.27; p = 9.27 × 10 -9). In a generalized linear model, the number of non-medical publications (p = 0.00064) and closer phylogenetic distance to humans (p = 0.024) were the most predictive of the amount of genomic sequence data. We interviewed 33 authors of genomic data-producing publications and analysed their responses using grounded theory. Consistent with our quantitative results, authors mentioned their choice of species was motivated by sample accessibility, prior published work and relevance to human medicine. Our mixed-methods approach helped identify and contextualize some of the driving factors behind species-uneven patterns of scientific research, which can now be considered by funding agencies, scientific societies and research teams aiming to align their broader goals with future data generation efforts.
AB - Scholars have noted major disparities in the extent of scientific research conducted among taxonomic groups. Such trends may cascade if future scientists gravitate towards study species with more data and resources already available. As new technologies emerge, do research studies employing these technologies continue these disparities? Here, using non-human primates as a case study, we identified disparities in massively parallel genomic sequencing data and conducted interviews with scientists who produced these data to learn their motivations when selecting study species. We tested whether variables including publication history and conservation status were significantly correlated with publicly available sequence data in the NCBI Sequence Read Archive (SRA). Of the 179.6 terabases (Tb) of sequence data in SRA for 519 non-human primate species, 135 Tb (approx. 75%) were from only five species: rhesus macaques, olive baboons, green monkeys, chimpanzees and crab-eating macaques. The strongest predictors of the amount of genomic data were the total number of non-medical publications (linear regression; r 2 = 0.37; p = 6.15 × 10 -12) and number of medical publications (r 2 = 0.27; p = 9.27 × 10 -9). In a generalized linear model, the number of non-medical publications (p = 0.00064) and closer phylogenetic distance to humans (p = 0.024) were the most predictive of the amount of genomic sequence data. We interviewed 33 authors of genomic data-producing publications and analysed their responses using grounded theory. Consistent with our quantitative results, authors mentioned their choice of species was motivated by sample accessibility, prior published work and relevance to human medicine. Our mixed-methods approach helped identify and contextualize some of the driving factors behind species-uneven patterns of scientific research, which can now be considered by funding agencies, scientific societies and research teams aiming to align their broader goals with future data generation efforts.
UR - http://www.scopus.com/inward/record.url?scp=85093862173&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85093862173&partnerID=8YFLogxK
U2 - 10.1098/rsos.201206
DO - 10.1098/rsos.201206
M3 - Article
AN - SCOPUS:85093862173
SN - 2054-5703
VL - 7
JO - Royal Society Open Science
JF - Royal Society Open Science
IS - 9
M1 - 1206
ER -