TY - JOUR
T1 - Contextual self-organizing map
T2 - Software for constructing semantic representations
AU - Zhao, Xiaowei
AU - Li, Ping
AU - Kohonen, Teuvo
N1 - Funding Information:
Preparation of this article was made possible by a grant from the National Science Foundation (BCS-0642586) to P.L. and by a faculty discretionary research grant from Colgate University to X.Z. during the 2009–2010 academic year. We thank Hua Shu and Jianfeng Yang for providing the BNU corpus, and Hongbing Xing for providing the MCRC corpus. X.Z. also thanks Zachary Helft, who assisted in the preparation of some of the figures.
PY - 2011/3
Y1 - 2011/3
N2 - In this article, we introduce a software package that applies a corpus-based algorithm to derive semantic representations of words. The algorithm relies on analyses of contextual information extracted from a text corpus-specifically, analyses of word co-occurrences in a large-scale electronic database of text. Here, a target word is represented as the combination of the average of all words preceding the target and all words following it in a text corpus. The semantic representation of the target words can be further processed by a self-organizing map (SOM; Kohonen, Self-organizing maps,2001), an unsupervised neural network model that provides efficient data extraction and representation. Due to its topography-preserving features, the SOM projects the statistical structure of the context onto a 2-D space, such that words with similar meanings cluster together, forming groups that correspond to lexically meaningful categories. Such a representation system has its applications in a variety of contexts, including computational modeling of language acquisition and processing. In this report, we present specific examples from two languages (English and Chinese) to demonstrate how the method is applied to extract the semantic representations of words.
AB - In this article, we introduce a software package that applies a corpus-based algorithm to derive semantic representations of words. The algorithm relies on analyses of contextual information extracted from a text corpus-specifically, analyses of word co-occurrences in a large-scale electronic database of text. Here, a target word is represented as the combination of the average of all words preceding the target and all words following it in a text corpus. The semantic representation of the target words can be further processed by a self-organizing map (SOM; Kohonen, Self-organizing maps,2001), an unsupervised neural network model that provides efficient data extraction and representation. Due to its topography-preserving features, the SOM projects the statistical structure of the context onto a 2-D space, such that words with similar meanings cluster together, forming groups that correspond to lexically meaningful categories. Such a representation system has its applications in a variety of contexts, including computational modeling of language acquisition and processing. In this report, we present specific examples from two languages (English and Chinese) to demonstrate how the method is applied to extract the semantic representations of words.
UR - http://www.scopus.com/inward/record.url?scp=79953729941&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79953729941&partnerID=8YFLogxK
U2 - 10.3758/s13428-010-0042-z
DO - 10.3758/s13428-010-0042-z
M3 - Article
C2 - 21287105
AN - SCOPUS:79953729941
SN - 1554-351X
VL - 43
SP - 77
EP - 88
JO - Behavior research methods
JF - Behavior research methods
IS - 1
ER -