|||
To follow up with my Chinese blog on the three semantic giants and their amazing work in knowledge bases, here are some of my notes in English on their construction/acquisition and application.
Traditionally, knowledge networks, especially in the form of lexical knowledge bases, have been hand-encoded by human experts. More recently, efforts on attempting to do knowledge acquisition from corpora are reported as a very promising area. I think that the more challenging task is to use the acquired knowledge base effectively and efficiently whether the knowledge base is hand-coded or learned from corpus.
I did not say lexical knowledge acquisition is easy. I said it is promising, at least for the basic ISA-type concept/word taxonomy. Long time ago, Microsoft NLP used automatic parsing of dictionary and encyclopedia definitions in building up their MindNet, with impressive results and very fancy demo. More recently, some Google guys have managed to uncover rich ISA relationships from huge raw corpora based on only two surface patterns (NP1 and other NP2; NP2 such as NP1 => NP1 ISA NP2). For the second research, we managed to repeat the process and gain similar results. The interesting part of this research is that the knowledge results are automatically adapted to the trained corpus. In other words, there is potential for automatically adapting a general knowledge base to a chosen domain along the line of knowledge acquisition from the domain data.
In addition to the above automatic efforts, Dr. Lenat encoded common sense into the cyc knowledge base with reasoning capability, Prof Dong has worked out HowNet to connect concepts with deep semantic relationships, and Prof. Fillmore has worked out FrameNet for general-purpose semantic hierarchy of pragmatic situations (scenarios), plus the long-standing quasi-lexical knowledge base in WordNet, all by hand-coding. So we are not really lacking knowledge bases, at least in terms of general domain knowledge.
That is why I said the proper use of knowledge is even more challenging and important. After all, knowledge is not the end, the use of knowledge to solve problems is.
Unfortunately, the successful uses of knowledge bases are not many. Microsoft's MindNet has been there for decades without being able to be put to any scalable use for practical merits. Cyc faces even more challenges in real world applications. FrameNet has been adopted in academia as a standard for computational semantics research but hardly seen anywhere in industry. The use of WordNet has had many tries, so far with limited success.
It is noteworthy that Prof. Dong has been experimenting with his HowNet in improving Machine Translation quality and disambiguation, which starts to show promise. It is an encouraging and important progress to watch.
[Related]
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-21 16:45
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社