You made a hidden preamble -- a given type of application in a given domain.
A recommendation: expand your blog a bit as a series, heading to a book.
My friend 吴军 did that quite successfully. Of course with statistics background. So he approached NLP from math perspective -- 数学之美 系列
You have very good thoughts and raw material. Just you need to put a bit more time to make your writing more approachable -- I am commenting on comments like "学习不了。" and "读起来鸭梨很大".
I know you said: "有时候想,也不能弄得太可读了,都是多年 的经验,后生想学的话,也该吃点苦头。:=)"
But as you already put in the efforts, why not make it more approachable?
The issue is, even if I am willing to 吃点苦头, I still don't know where to start 吃苦头, IF I have never built a real-life NLP system.
For example, 词汇主义 by itself is enough for an article. You need to mention its opponents and its history to put it into context. Then you need to give some examples.
文章千古事,网上涂鸦岂敢出书?这倒不是妄自菲薄,主要是出书太麻烦,跟不上这个时代。我回到:
吴军's series are super popular. When I first read one of his articles on the Google Blackboard, recommended by a friend, I was amazed how well he structured and carried the content. It is intriguing. (边注:当然,他那篇谈 Page Rank 的文章有偏颇,给年轻人一种印象,IT 事业的成功是由技术主宰的,而实际上技术永远是第二位的。对于所谓高技术企业,没有技术是万万不行的,但企业成功的关键却不是技术,这是显而易见的事实了。)
For me, to be honest, I do not aim that high. Never bothered polishing things to pursue perfection although I did make an effort to try to link my stuffs into a series for the convenience of cross reference between the related pieces. There are missing links which I know I want to write about but which sort of depends on my mood or time slots. I guess I am just not pressed and motivated to do the writing part. Popularizing the technology is only a side effect of the blogging hobby at times. The way I prove myself is to show that I will be able to build products worth of millions, or even hundreds of millions of dollars.
So far I have been fairly straightforward on what I write about. If there is readability issue, it is mainly due to my lack of time. Young people should be able to benefit from my writings especially once they start getting their hands dirty in building up a system.
Your discussion is fun. You can see and appreciate things hidden behind my work more than other readers. After all, you have published in THE CL and you have almost terminated the entire segmentation as a scientific area. Seriously, it is my view that there is not much to do there after your work on tokenization both in theory and practice.
I feel some urgency now for having to do Chinese NLP asap. Not many people have been through that much as what I have been, so I am in a position to potentially build a much more powerful system to make an impact on Chinese NLP, and hopefully on the IT landscape as well. But time passes fast . That is why my focus is on the Chinese processing now, day and night. I am keeping my hands dirty also with a couple of European languages, but they are less challenging and exciting.