cjpnudt的个人博客分享 http://blog.sciencenet.cn/u/cjpnudt

博文

[读论文]---069 内里朝外:单词表示和短语表示的两个联合预测模

已有 2174 次阅读 2016-7-20 14:20 |系统分类:科研笔记

Inside Out: Two Jointly Predictive Models for Word Representations andPhrase Representations

内里朝外:单词表示和短语表示的两个联合预测模型

(经过翻译Abstract我觉得这里的Inside Out的准确意思应该是里里外外,即单词内部的、外部的信息一起运用!)

Distributionalhypothesis lies in the root of most existing word representation models byinferring word meaning from its external contexts. However, distributionalmodels cannot handle rare and morphologically complex words very well and failto identify some fine-grained linguistic regularity as they are ignoring theword forms. On the contrary, morphology points out that words are built fromsome basic units, i.e., morphemes. Therefore, the meaning and function of suchrare words can be inferred from the words sharing the same morphemes, and manysyntactic relations can be directly identified based on the word forms.However, the limitation of morphology is that it cannot infer the relationshipbetween two words that do not share any morphemes. Considering the advantagesand limitations of both approaches, we propose two novel models to build betterword representations by modeling both external contexts and internal morphemesin a jointly predictive way, called BEING and SEING. These two models can alsobe extended to learn phrase representations according to the distributedmorphology theory. We evaluate the proposed models on similarity tasks andanalogy tasks. The results demonstrate that the proposed models can outperformstate-of-the-art models significantly on both word and phrase representationlearning.

分布式假设是很多现存的通过它的外部信息推断单词的意义的单词表示模型根基。然而,分布式模型并不能很好地处理罕见的、语形学上复杂的单词,而且由于这些模型忽略了单词的形式,在辨别一些细致的语法规则上表现并不好。相反地,语言形态学指出单词是由一些基本的单元组成,例如,语素(词根?)。因此,一些罕见的词的意义和方程可以通过单词之间共同的词根来推断,很多句法的关系可以基于单词的形式直接推断出。然而,语言形态学的局限在于,它不可以推断两个单词并没有共享任何词根的情况。考虑到这两种方法的又是和劣势,我们提出了两个构建更好的词表示的新颖的模型:他们将外部的背景内部的词根结合在一起预测,称为:BEING SEING。根据分布式形态学的理论,这两个模型也能扩展到学习短语表示上。我们通过相似性的任务和类似性的任务评估了所提出的模型。结果显示:我们提出的模型比最好的state-of-the-art模型要好得多。

这是程学旗老师学生的一篇AAAI的文章,创新点非常明确,就是同时利用文章背景信息和词素(词根)信息来进行单词的表示学习。作者通过两类的任务: similarity tasks and analogy tasks 来评估模型的好坏。当然是我们的水平比别人的都好不?




https://blog.sciencenet.cn/blog-656867-991782.html

上一篇:[读论文]---068 实现社会计算的模式转换:ACP方法
下一篇:[读论文]---070 通过神经网络降维
收藏 IP: 202.197.9.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-6-4 02:58

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部