TY - GEN
T1 - Automatic categorization of figures in scientific documents
AU - Lu, Xiaonan
AU - Mitra, Prasenjit
AU - Wang, James Z.
AU - Giles, C. Lee
PY - 2006
Y1 - 2006
N2 - Figures are very important non-textual information contained in scientific documents. Current digital libraries do not provide users tools to retrieve documents based on the information available within the figures. We propose an architecture for retrieving documents by integrating figures and other information. The initial step in enabling integrated document search is to categorize figures into a set of pre-defined types. We propose several categories of figures based on their functionalities in scholarly articles. We have developed a machine-learning-based approach for automatic categorization of figures. Both global features, such as texture, and part features, such as lines, are utilized in the architecture for discriminating among figure categories. The proposed approach has been evaluated on a testbed document set collected from the CiteSeer scientific literature digital library. Experimental evaluation has demonstrated that our algorithms can produce acceptable results for real-world use. Our tools will be integrated into a scientific-document digital library.
AB - Figures are very important non-textual information contained in scientific documents. Current digital libraries do not provide users tools to retrieve documents based on the information available within the figures. We propose an architecture for retrieving documents by integrating figures and other information. The initial step in enabling integrated document search is to categorize figures into a set of pre-defined types. We propose several categories of figures based on their functionalities in scholarly articles. We have developed a machine-learning-based approach for automatic categorization of figures. Both global features, such as texture, and part features, such as lines, are utilized in the architecture for discriminating among figure categories. The proposed approach has been evaluated on a testbed document set collected from the CiteSeer scientific literature digital library. Experimental evaluation has demonstrated that our algorithms can produce acceptable results for real-world use. Our tools will be integrated into a scientific-document digital library.
UR - http://www.scopus.com/inward/record.url?scp=34247258424&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34247258424&partnerID=8YFLogxK
U2 - 10.1145/1141753.1141778
DO - 10.1145/1141753.1141778
M3 - Conference contribution
AN - SCOPUS:34247258424
SN - 1595933549
SN - 9781595933546
T3 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries
SP - 129
EP - 138
BT - 6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006
T2 - 6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006: Opening Information Horizons, JCDL '06
Y2 - 11 June 2006 through 15 June 2006
ER -