||
自然语言处理中的许多任务只能在对文本理解很浅显的情况下实现。
Many tasks in natural language processingcan be performed with only very shallow understanding of text.
向量空间模型是一个有用但浅显的数据表示示例,它已成功地用于许多任务,包括检测同义词、查找类比和学习名词短语的属性。
The vector space model is one example of auseful, but shallow, data representation that has been successfully used formany tasks, including detecting synonyms, finding analogies, and learningproperties of noun phrases.
向量空间模型表示名词短语(NP)(例如“纽约洋基队”或“房屋”)的含义,作为与上下文同时出现的向量。
The vector space model represents the meaningof a noun phrase(NP) (e.g. "the New York Yankees" or"house") as a vector of co-occurrence counts with contexts.
上下文是指一小段文本,如“alexrodriguez plays for _”或“_ on the street”。
A context is a short snippet of text like"alex rodriguez plays for _" or "_ on the street".
模型本质上是一个(非常)大的矩阵A,行代表名词短语,列代表上下文。
The model is essentially a (very) largematrix A, whose rows represent noun phrases and whose columns representcontexts.
数值A_{i,j}是指在大量文档(如Web)中,名词短语i与上下文j一起出现的次数
The value of entry A_{i,j} is the number oftimes noun phrase i occurred with context j in a large corpus of documents(e.g., the Web).
直观地说,该模型包含了有用的信息,因为某些上下文只与某些类型的名词短语一起出现;例如,上下文“athletes, such as _”,只与athletes一起出现。
Intuitively, this model contains usefulinformation because some contexts only occur with certain types of nounphrases; for example, the context "athletes, such as _" only occurswith athletes.
相关数据及软件的网页:
http://www.cs.cmu.edu/~tom/10709_fall09/RTWdata.html
http://qwone.com/~jason/20Newsgroups/
参考文献:Peter D.Turney and Patrick Pantel (2010). From Frequency to Meaning: Vector SpaceModels of Semantics. Journal of Artificial Intelligence Research 37, pp. 141-188.
更多精彩文章请关注微信号:
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-9-24 00:43
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社