TY - GEN
T1 - SimSeerX
T2 - 2014 ACM Symposium on Document Engineering, DocEng 2014
AU - Williams, Kyle
AU - Wu, Jian
AU - Giles, C. Lee
N1 - Publisher Copyright:
© 2014 ACM.
PY - 2014
Y1 - 2014
N2 - The need to find similar documents occurs in many settings, such as in plagiarism detection or research paper recommendation. Manually constructing queries to find similar documents may be overly complex, thus motivating the use of whole documents as queries. This paper introduces Sim-SeerX, a search engine for similar document retrieval that receives whole documents as queries and returns a ranked list of similar documents. Key to the design of SimSeerX is that is able to work with multiple similarity functions and document collections. We present the architecture and interface of SimSeerX, show its applicability with 3 different similarity functions and demonstrate its scalability on a collection of 3.5 million academic documents.
AB - The need to find similar documents occurs in many settings, such as in plagiarism detection or research paper recommendation. Manually constructing queries to find similar documents may be overly complex, thus motivating the use of whole documents as queries. This paper introduces Sim-SeerX, a search engine for similar document retrieval that receives whole documents as queries and returns a ranked list of similar documents. Key to the design of SimSeerX is that is able to work with multiple similarity functions and document collections. We present the architecture and interface of SimSeerX, show its applicability with 3 different similarity functions and demonstrate its scalability on a collection of 3.5 million academic documents.
UR - http://www.scopus.com/inward/record.url?scp=84908651001&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84908651001&partnerID=8YFLogxK
U2 - 10.1145/2644866.2644895
DO - 10.1145/2644866.2644895
M3 - Conference contribution
AN - SCOPUS:84908651001
T3 - DocEng 2014 - Proceedings of the 2014 ACM Symposium on Document Engineering
SP - 143
EP - 146
BT - DocEng 2014 - Proceedings of the 2014 ACM Symposium on Document Engineering
PB - Association for Computing Machinery, Inc
Y2 - 16 September 2014 through 19 September 2014
ER -