大工至善|大学至真分享 http://blog.sciencenet.cn/u/lcj2212916

博文

[转载]【机器学习开放项目】自然语言处理的向量空间模型

已有 1818 次阅读 2019-2-1 09:59 |系统分类:科研笔记|文章来源:转载

自然语言处理中的许多任务只能在对文本理解很浅显的情况下实现。

Many tasks in natural language processingcan be performed with only very shallow understanding of text.

 

向量空间模型是一个有用但浅显的数据表示示例,它已成功地用于许多任务,包括检测同义词、查找类比和学习名词短语的属性。

The vector space model is one example of auseful, but shallow, data representation that has been successfully used formany tasks, including detecting synonyms, finding analogies, and learningproperties of noun phrases.

 

向量空间模型表示名词短语(NP)(例如“纽约洋基队”或“房屋”)的含义,作为与上下文同时出现的向量。

The vector space model represents the meaningof a noun phrase(NP) (e.g. "the New York Yankees" or"house") as a vector of co-occurrence counts with contexts.

 

上下文是指一小段文本,如“alexrodriguez plays for _”或“_ on the street”。

A context is a short snippet of text like"alex rodriguez plays for _" or "_ on the street".

 

模型本质上是一个(非常)大的矩阵A,行代表名词短语,列代表上下文。

The model is essentially a (very) largematrix A, whose rows represent noun phrases and whose columns representcontexts.

 

数值A_{i,j}是指在大量文档(如Web)中,名词短语i与上下文j一起出现的次数

The value of entry A_{i,j} is the number oftimes noun phrase i occurred with context j in a large corpus of documents(e.g., the Web).

 

直观地说,该模型包含了有用的信息,因为某些上下文只与某些类型的名词短语一起出现;例如,上下文“athletes, such as _”,只与athletes一起出现。

Intuitively, this model contains usefulinformation because some contexts only occur with certain types of nounphrases; for example, the context "athletes, such as _" only occurswith athletes.

 

相关数据及软件的网页:

http://www.cs.cmu.edu/~tom/10709_fall09/RTWdata.html

http://qwone.com/~jason/20Newsgroups/

 

参考文献:Peter D.Turney and Patrick Pantel (2010). From Frequency to Meaning: Vector SpaceModels of Semantics. Journal of Artificial Intelligence Research 37, pp. 141-188.


更多精彩文章请关注微信号:qrcode_for_gh_60b944f6c215_258.jpg



https://blog.sciencenet.cn/blog-69686-1160222.html

上一篇:[转载]【读书2】【2014】基于MATLAB的雷达信号处理基础(第二版)——气象目标雷达截面(1)
下一篇:[转载]【源码】Pplane函数——研究微分方程平面自治系统的交互式工具
收藏 IP: 183.160.73.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-9-24 00:43

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部