TY - GEN
T1 - Biber redux
T2 - 25th International Conference on Computational Linguistics, COLING 2014
AU - Passonneau, Rebecca J.
AU - Ide, Nancy
AU - Su, Songqiao
AU - Stuart, Jesse
PY - 2014
Y1 - 2014
N2 - Genre classification has been found to improve performance in many applications of statistical NLP, including language modeling for spoken language, domain adaptation of statistical parsers, and machine translation. It has also been found to benefit retrieval of spoken or written documents. At its base, however, classification assumes separability. This paper revisits an assumption that genre variation is continuous along multiple dimensions, and an early use of principal component analysis to find these dimensions. Results on a very heterogeneous corpus of post- 1990s American English reveal four major dimensions, three of which echo those found in prior work and the fourth depending on features not used in the earlier study. The resulting model can provide a basis for more detailed analysis of sub-genres and the relation between genre and situations of language use, as well as a means to predict distributional properties of new genres.
AB - Genre classification has been found to improve performance in many applications of statistical NLP, including language modeling for spoken language, domain adaptation of statistical parsers, and machine translation. It has also been found to benefit retrieval of spoken or written documents. At its base, however, classification assumes separability. This paper revisits an assumption that genre variation is continuous along multiple dimensions, and an early use of principal component analysis to find these dimensions. Results on a very heterogeneous corpus of post- 1990s American English reveal four major dimensions, three of which echo those found in prior work and the fourth depending on features not used in the earlier study. The resulting model can provide a basis for more detailed analysis of sub-genres and the relation between genre and situations of language use, as well as a means to predict distributional properties of new genres.
UR - http://www.scopus.com/inward/record.url?scp=84959872741&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959872741&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84959872741
T3 - COLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014: Technical Papers
SP - 565
EP - 576
BT - COLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014
PB - Association for Computational Linguistics, ACL Anthology
Y2 - 23 August 2014 through 29 August 2014
ER -