《镜子大全》《朝华午拾》分享 http://blog.sciencenet.cn/u/liwei999 曾任红小兵,插队修地球,1991年去国离乡,不知行止。

博文

台北演讲幻灯片上网(1/2)

已有 3488 次阅读 2013-4-6 06:42 |个人分类:立委科普|系统分类:论文交流| 幻灯片, 台北, 上网

Towards robust large-scale Chineseparsing


Wei Li


March 29, 2013

Institute of Information Science Academia Sinica






Chinese Parsing Background:

Four Layer System Architecture


I:DesignPhilosophy


Indexingsystem (backend engine for offline processing)

vsRetrieval system (frontend engine for on-the-fly run)


Parser-IEarchitecture:  

 deeper parsing, shallow IE

 domain-independent, app specific

 linguistic,  domain

 bridge, end results

 

Twoengines, four layers



I:System Architecture for Core Engine


II:Parsing-based Information Extraction



III:Text Mining  

IV:App-level

【立委科普:NLP 联络图】


DevelopmentEnvironment for Parsing


Language engineering 与其他软体工程并无本质不同

Follow software development best practice

 1. unit test:  environment fordebugging, data search etc.

 2. regression test: baselines, millions of checking points

 3. QA (quality assurance) test

 4. several layers of regression protection: nightly build, release build

 5. NLP-specific language

 6. NLP platform & environment

 7. Platform extension & support

 8. code review: readability and maintainablility is no 1

 9. help from statistics and learning


Development vstesting

 1.roughly 1:1 in terms of developer’s  time  

 2. 1:0.5 in terms of developer and QAresources


Avoid unnecessary work and gettingoverdone

 1. linguists need to be controlled: lostin trees without seeing forest

 2. data-driven development

 3. better goal oriented  


DependencyTrees as Representation

parsing    推荐到群组

引用
如果爱因斯坦在时空万物中看到了造物主的美,如果门捷列夫在千姿百态的物质后面看到了元素表的简洁,语言学家则是在千变万化的语言现象中看到了逻辑结构之美。这种美的体验伴随着我们的汗水,鼓励我们为铲平语言壁垒而愚公移山,造福人类。

摘自:立委科普:语法结构树之美

 

 

 

【立委科普:语法结构树之美(之二)】


台北讲演幻灯第二部分:

http://blog.sciencenet.cn/blog-362400-677358.html



【置顶:立委科学网博客NLP博文一览(定期更新版)】



https://blog.sciencenet.cn/blog-362400-677352.html

上一篇:“杂耍”回放: 教师和养猪
下一篇:台北演讲幻灯片上网(2/2)
收藏 IP: 192.168.0.*| 热度|

2 刘钢 武夷山

该博文允许注册用户评论 请点击登录 评论 (2 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-25 07:12

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部