TY - GEN
T1 - CiteSeerX data
T2 - 2016 International Workshop on Semantic Big Data, SBD 2016, in conjunction with the 2016 ACM SIGMOD/PODS Conference
AU - Wu, Jian
AU - Liang, Chen
AU - Yang, Huaiyu
AU - Giles, C. Lee
N1 - Funding Information:
We gratefully acknowledge partial support from the National Science Foundation and AllenAI.
Publisher Copyright:
© 2016 Copyright held by the owner/author(s).
PY - 2016/6/26
Y1 - 2016/6/26
N2 - Scholarly big data is, for many, an important instance of Big Data. Digital library search engines have been built to acquire, extract, and ingest large volumes of scholarly papers. This paper provides an overview of the scholarly big data released by CiteSeerX, as of the end of 2015, and discusses various aspects such as how the data is acquired, its size, general quality, data management, and accessibility. Preliminary results on extracting semantic entities from body text of scholarly papers with Wikifier show biases towards general terms appearing in Wikipedia and against domain specific terms. We argue that the latter will play a more important role in extracting important facts from scholarly papers.
AB - Scholarly big data is, for many, an important instance of Big Data. Digital library search engines have been built to acquire, extract, and ingest large volumes of scholarly papers. This paper provides an overview of the scholarly big data released by CiteSeerX, as of the end of 2015, and discusses various aspects such as how the data is acquired, its size, general quality, data management, and accessibility. Preliminary results on extracting semantic entities from body text of scholarly papers with Wikifier show biases towards general terms appearing in Wikipedia and against domain specific terms. We argue that the latter will play a more important role in extracting important facts from scholarly papers.
UR - http://www.scopus.com/inward/record.url?scp=85054857077&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054857077&partnerID=8YFLogxK
U2 - 10.1145/2928294.2928306
DO - 10.1145/2928294.2928306
M3 - Conference contribution
AN - SCOPUS:85054857077
SN - 9781450342995
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
BT - Proceedings of the International Workshop on Semantic Big Data, SBD 2016, in conjunction with the 2016 ACM SIGMOD/PODS Conference
A2 - Gruenwald, Le
A2 - Groppe, Sven
PB - Association for Computing Machinery
Y2 - 1 July 2016
ER -