《镜子大全》《朝华午拾》分享 http://blog.sciencenet.cn/u/liwei999 曾任红小兵,插队修地球,1991年去国离乡,不知行止。

博文

突然有一种紧迫感:再不上中文NLP,可能就错过时代机遇了

已有 6752 次阅读 2011-12-10 20:29 |个人分类:立委科普|系统分类:科研笔记| NLP, 中文处理, 紧迫感

与业內老友的对话:在‘用’字上狠下功夫

耳边响起了林副主席关于系统开发的谆谆教导:

Quote
带着问题做,活做活用,做用结合,急用先做,立竿见影,在‘用’字上狠下功夫。

这是从与朋友的内部交流中得来的。赶的是编造名人名言的时髦。
~~~~~~~~~~~~

在我发文【坚持四项基本原则,开发鲁棒性NLP系统】以后,有业内资深老友表示非常有意思,建议我把NLP方面的博文系列汇集加工,可以考虑出书:

Quote
A good 经验之谈. Somehow it reminds me this --
带着问题学,活学活用,学用结合,急用先学,立竿见影,在‘用’字上狠下功夫。

You made a hidden preamble -- a given type of application in a given domain.

A recommendation: expand your blog a bit as a series, heading to a book.

My friend 吴军 did that quite successfully. Of course with statistics background. So he approached NLP from math perspective -- 数学之美 系列

You have very good thoughts and raw material. Just you need to put a bit more time to make your writing more approachable -- I am commenting on comments like "学习不了。" and "读起来鸭梨很大".

I know you said: "有时候想,也不能弄得太可读了,都是多年 的经验,后生想学的话,也该吃点苦头。:=)"

But as you already put in the efforts, why not make it more approachable?

The issue is, even if I am willing to 吃点苦头, I still don't know where to start 吃苦头, IF I have never built a real-life NLP system.

For example, 词汇主义 by itself is enough for an article. You need to mention its opponents and its history to put it into context. Then you need to give some examples.


文章千古事,网上涂鸦岂敢出书?这倒不是妄自菲薄,主要是出书太麻烦,跟不上这个时代。我回到:


吴军's series are super popular. When I first read one of his articles on the Google Blackboard, recommended by a friend, I was amazed how well he structured and carried the content. It is intriguing. (边注:当然,他那篇谈 Page Rank 的文章有偏颇,给年轻人一种印象,IT 事业的成功是由技术主宰的,而实际上技术永远是第二位的。对于所谓高技术企业,没有技术是万万不行的,但企业成功的关键却不是技术,这是显而易见的事实了。)

For me, to be honest, I do not aim that high.  Never bothered polishing things to pursue perfection although I did make an effort to try to link my stuffs into a series for the convenience of cross reference between the related pieces. There are missing links which I know I want to write about but which sort of depends on my mood or time slots.  I guess I am just not pressed and motivated to do the writing part.  Popularizing the technology is only a side effect of the blogging hobby at times.  The way I prove myself is to show that I will be able to build products worth of millions, or even hundreds of millions of dollars.

网上的文字都是随兴之所至,我从来不写命题作文,包括我自己的命题。有时候兴趣来了,就说自己下一篇打算写什么什么,算是自我命题,算是动了某个话题的心思。可是过了两天,一个叉打过去,没那个兴致和时间了,也就作罢。

赶上什么写什么,这就是上网的心态。平时打工已经够累了,上网绝不给自己增加负担。

So far I have been fairly straightforward on what I write about.  If there is readability issue, it is mainly due to my lack of time.  Young people should be able to benefit from my writings especially once they start getting their hands dirty in building up a system.

Your discussion is fun. You can see and appreciate things hidden behind my work more than other readers.  After all, you have published in THE CL and you have almost terminated the entire segmentation as a scientific area. Seriously, it is my view that there is not much to do there after your work on tokenization both in theory and practice.

I feel some urgency now for having to do Chinese NLP asap.  Not many people have been through that much as what I have been, so I am in a position to potentially build a much more powerful system to make an impact on Chinese NLP, and hopefully on the IT landscape as well.  But time passes fast . That is why my focus is on the Chinese processing now, day and night.  I am keeping my hands dirty also with a couple of European languages, but they are less challenging and exciting.


https://blog.sciencenet.cn/blog-362400-517079.html

上一篇:米拉围脖:生物领域就比较“软”
下一篇:回应网友问:干吗比较11公和饶1的学问档次?
收藏 IP: 192.168.0.*| 热度|

1 朱新亮

该博文允许注册用户评论 请点击登录 评论 (2 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-12-21 22:12

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部