Abstract
The World Wide Web contains a number of source code archives. Programs are usually classified into various categories within the archive by hand. We report on experiments for automatic classification of source code into these categories. We examined a number of factors that affect classification accuracy. Weighting features by expected entropy loss makes a significant improvement in classification accuracy. We show a Support Vector Machine can be trained to classify source code with a high degree of accuracy. We feel these results show promise for software reuse.
Original language | English (US) |
---|---|
Pages (from-to) | 425-426 |
Number of pages | 2 |
Journal | SIGIR Forum (ACM Special Interest Group on Information Retrieval) |
Issue number | SPEC. ISS. |
DOIs | |
State | Published - 2003 |
Event | Proceedings of the Twenty-Sixth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003 - Toronto, Ont., Canada Duration: Jul 28 2003 → Aug 1 2003 |
All Science Journal Classification (ASJC) codes
- Management Information Systems
- Hardware and Architecture