Classification of Source Code Archives

Robert Krovetz, Secil Ugurel, C. Lee Giles

Research output: Contribution to journalConference articlepeer-review

7 Scopus citations


The World Wide Web contains a number of source code archives. Programs are usually classified into various categories within the archive by hand. We report on experiments for automatic classification of source code into these categories. We examined a number of factors that affect classification accuracy. Weighting features by expected entropy loss makes a significant improvement in classification accuracy. We show a Support Vector Machine can be trained to classify source code with a high degree of accuracy. We feel these results show promise for software reuse.

Original languageEnglish (US)
Pages (from-to)425-426
Number of pages2
JournalSIGIR Forum (ACM Special Interest Group on Information Retrieval)
Issue numberSPEC. ISS.
StatePublished - 2003
EventProceedings of the Twenty-Sixth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003 - Toronto, Ont., Canada
Duration: Jul 28 2003Aug 1 2003

All Science Journal Classification (ASJC) codes

  • Management Information Systems
  • Hardware and Architecture


Dive into the research topics of 'Classification of Source Code Archives'. Together they form a unique fingerprint.

Cite this