TY - GEN
T1 - COVIDSeer
T2 - 20th ACM Symposium on Document Engineering, DocEng 2020
AU - Rohatgi, Shaurya
AU - Karishma, Zeba
AU - Chhay, Jason
AU - Keesara, Sai Raghav Reddy
AU - Wu, Jian
AU - Caragea, Cornelia
AU - Giles, C. Lee
N1 - Funding Information:
We gratefully acknowledge partial support from the National Science Foundation.
Publisher Copyright:
© 2020 ACM.
PY - 2020/9/29
Y1 - 2020/9/29
N2 - We develop an enhanced version of CORD-19 dataset released by the Allen Institute for AI. Tools in the SeerSuite project are used to exploit information in original articles not directly provided in the CORD-19 datasets. We add 728 new abstracts, 70,102 figures and 31,446 tables with captions that are not provided in the current data release. We also built a vertical search engine COVIDSeer based on the new dataset we created. COVIDSeer has a relatively simple architecture with features like keyword filtering, and similar paper recommendation. The goal was to provide a system and dataset that can help scientists better navigate through the literature concerning COVID-19. The enriched dataset can serve as a supplement to the existing dataset. The search engine, which offers keyphrase-enhanced search, will hopefully help biomedical and life science researchers, medical students, and the general public to more effectively explore coronavirus-related literature. The entire data set and the system will be made open source.
AB - We develop an enhanced version of CORD-19 dataset released by the Allen Institute for AI. Tools in the SeerSuite project are used to exploit information in original articles not directly provided in the CORD-19 datasets. We add 728 new abstracts, 70,102 figures and 31,446 tables with captions that are not provided in the current data release. We also built a vertical search engine COVIDSeer based on the new dataset we created. COVIDSeer has a relatively simple architecture with features like keyword filtering, and similar paper recommendation. The goal was to provide a system and dataset that can help scientists better navigate through the literature concerning COVID-19. The enriched dataset can serve as a supplement to the existing dataset. The search engine, which offers keyphrase-enhanced search, will hopefully help biomedical and life science researchers, medical students, and the general public to more effectively explore coronavirus-related literature. The entire data set and the system will be made open source.
UR - http://www.scopus.com/inward/record.url?scp=85093096224&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85093096224&partnerID=8YFLogxK
U2 - 10.1145/3395027.3419597
DO - 10.1145/3395027.3419597
M3 - Conference contribution
AN - SCOPUS:85093096224
T3 - Proceedings of the ACM Symposium on Document Engineering, DocEng 2020
BT - Proceedings of the ACM Symposium on Document Engineering, DocEng 2020
PB - Association for Computing Machinery, Inc
Y2 - 29 September 2020 through 1 October 2020
ER -