博文

[读论文]---066 重返词嵌入

已有 2177 次阅读 2016-7-10 16:14 |系统分类:科研笔记

Word Embedding Revisited: A New Representation Learning and ExplicitMatrix Factorization Perspective

重返词嵌入：一个新的表示学习和明确矩阵分解的角度

Recentlysignificant advances have been witnessed in the area of distributed wordrepresentations based on neural networks, which are also known as wordembeddings. Among the new word embedding models, skip-gram negative sampling(SGNS) in the word2vec toolbox has attracted much attention due to itssimplicity and effectiveness. However, the principles of SGNS remain not wellunderstood, except for a recent work that explains SGNS as an implicit matrixfactorization of the pointwise mutual information (PMI) matrix. In this paper,we provide a new perspective for further understanding SGNS. We point out thatSGNS is essentially a representation learning method, which learns to representthe co-occurrence vector for a word. Based on the representation learning view,SGNS is in fact an explicit matrix factorization (EMF) of the words’ co-occurrencematrix. Furthermore, extended supervised word embedding can be establishedbased on our proposed representation learning view.

最近在基于神经网络的分布式词表示取得了非常显著的进步，分布式词表示也被称作是词嵌入。在最新的一些词嵌入模型中，word2vec工具箱中的skip-gram负采样（SGNS）由于其简单性和有效性引起了很大的重视。然而，SGNS的原则仍然没有被人们很好地理解，除了最近的工作将SGNS为一个模糊的逐点的互信息（PMI）矩阵的一个不清晰的矩阵分解。在本文中，我们提供了一个更进一步地理解SGNS的视角。我们指出SGNS本质上是一个表示学习方法，该方法学习一个词的共现向量。基于这个表示学习的视角，SGNS事实上是词共现矩阵的一个明确的矩阵分解（EMF）。进一步地，拓展地监督词嵌入可以通过我们提出的词表示学习视角建立。

这是一个研究SGNS问题的基础研究的文章。该文章认为SGNS本质上是一个表示学习方法。学习的内容是词的共现向量，也就是说两个向量的距离表示的是两个“共现”之间的距离？

转载本文请联系原作者获取授权，同时请注明本文来自曹建平科学网博客。
链接地址：https://blog.sciencenet.cn/blog-656867-989788.html

上一篇：[读论文]---065 句子和文档的分布式表示
下一篇：[读论文]---067 实现社会计算的模式转换：ACP方法

收藏 IP: 202.197.9.*| 热度|

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

数据加载中...

返回顶部

曹建平

扫一扫，分享此博文

cjpnudt的个人博客分享 http://blog.sciencenet.cn/u/cjpnudt

博文

[读论文]---066 重返词嵌入

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

曹建平

全部作者的其他最新博文

全部精选博文导读

相关博文

cjpnudt的个人博客分享 http://blog.sciencenet.cn/u/cjpnudt

博文

[读论文]---066 重返词嵌入

当前推荐数：0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

曹建平

全部作者的其他最新博文

全部精选博文导读

相关博文

该博文允许注册用户评论请点击登录评论 (0 个评论)