《镜子大全》《朝华午拾》分享 http://blog.sciencenet.cn/u/liwei999 曾任红小兵,插队修地球,1991年去国离乡,不知行止。

博文

Notes on Building and Using Lexical Semantic Knowledge Bases

已有 3127 次阅读 2014-8-12 09:15 |个人分类:立委科普|系统分类:科研笔记|关键词:knowledge,base,,semantics,,ontology,,taxonomy,,MindNet,,HowNet,,WordNet,,cyc| knowledge, Ontology, Semantics, taxonomy, base

To follow up with my Chinese blog on the three semantic giants and their amazing work in knowledge bases, here are some of my notes in English on their construction/acquisition and application.

Traditionally, knowledge networks, especially in the form of lexical knowledge bases, have been hand-encoded by human experts.  More recently, efforts on attempting to do knowledge acquisition from corpora are reported as a very promising area. I think that the more challenging task is to use the acquired knowledge base effectively and efficiently whether the knowledge base is hand-coded or learned from corpus.

I did not say lexical knowledge acquisition is easy. I said it is promising, at least for the basic ISA-type concept/word taxonomy. Long time ago, Microsoft NLP used automatic parsing of dictionary and encyclopedia definitions in building up their MindNet, with impressive results and very fancy demo.  More recently, some Google guys have managed to uncover rich ISA relationships from huge raw corpora based on only two surface patterns (NP1 and other NP2; NP2 such as NP1 => NP1 ISA NP2). For the second research, we managed to repeat the process and gain similar results.  The interesting part of this research is that the knowledge results are automatically adapted to the trained corpus.  In other words, there is potential for automatically adapting a general knowledge base to a chosen domain along the line of knowledge acquisition from the domain data.  

In addition to the above automatic efforts, Dr. Lenat encoded common sense into the cyc knowledge base with reasoning capability, Prof Dong has worked out HowNet to connect concepts with deep semantic relationships, and Prof. Fillmore has worked out FrameNet for general-purpose semantic hierarchy of pragmatic situations (scenarios), plus the long-standing quasi-lexical knowledge base in WordNet, all by hand-coding.  So we are not really lacking knowledge bases, at least in terms of general domain knowledge.

That is why I said the proper use of knowledge is even more challenging and important.  After all, knowledge is not the end, the use of knowledge to solve problems is.

Unfortunately, the successful uses of knowledge bases are not many. Microsoft's MindNet has been there for decades without being able to be put to any scalable use for practical merits. Cyc faces even more challenges in real world applications. FrameNet has been adopted in academia as a standard for computational semantics research but hardly seen anywhere in industry. The use of WordNet has had many tries, so far with limited success.

It is noteworthy that Prof. Dong has been experimenting with his HowNet in improving Machine Translation quality and disambiguation, which starts to show promise.  It is an encouraging and important progress to watch.

[Related]

《科研笔记:自然语言处理领域中的语义路线及其代表人物》 

【置顶:立委科学网博客NLP博文一览(定期更新版)】



http://blog.sciencenet.cn/blog-362400-818903.html

上一篇:《语义三巨人》
下一篇:是家具?还是家俱?这是个问题

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备14006957 )

GMT+8, 2019-4-25 14:54

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部