|||
(Minutes from AI Salon Discussion 06/2015)
Today, I feel the urge to argue against Chomsky, de facto God of Linguistics. It has been in my mind for very long.
Here is my thesis: Chomsky’s misleading or negative effect on the field of (computational) linguistics is almost as big as his revolutionary contributions.
His hierarchy is extremely insightful, having revealed an untold secret from Lord. It is a perfect theory as cornerstones in computer science for guiding the invention, interpretation and compilation of computer languages.
However, a perfect theory overstepping one inch can well lead to fallacy. Inspired by the mathematical study of human languages, Chomsky invented this wonderful formal language theory good for computer languages. He then seemed to try to apply this theory back to human languages, leading the field to meaningless and misplaced argumentation on the question of whether natural language is context-free, or sensitive, or mildly sensitive. Too many people have been misled, believing that due to the complexity of natural languages, we need a more powerful grammar. The term “powerful” is perhaps the most misleading word in NLP (Natural Language Processing).
Engineers have found that the finite state mechanism is fairly effective to use, but they are not able to argue against criticisms from theorists following Chomsky: finite-state is too low-level stuff, not powerful enough to handle natural language. Period.
In fact, people who have architected big engineering projects know well that the complexity of an object is not a legitimate reason for using a complex mechanism. Greatest work often comes from masters who know how to use a simple mechanism to handle complex objects.
One of the most misleading and influential arguments from Chomsky states that since natural language involves “center recursion” by nature, finite state formalism is not adequate for natural language. Chomsky cites a couple of English examples as evidence for the so-called center-recursion in natural language. Although his cited examples are such rare phenomena in language and they look unreal and far-fetched for the claim of recursion as basic nature of human language, many scholars chose to believe him, willingly brainwashed by his argument. Many researchers thus take it for granted that a grammar more powerful than finite state is needed for parsing natural language. It follows that the recursive nature of language calls for a storage device “stack” in memory, whether in human brain or in computer. Hardly anything is farther away from the truth than this assumption, with little evidence from real data. It is not supported by observable language phenomena, and it is also incompatible with the known limits of the human brain short-term memory. Where in the world can we find people speak in such a way that doors keep being opened without getting closed, or left brackets are accumulated without prompt use of right brackets, leaving things hanging there? Three layers of embedding are almost the ceiling in observed data. Even if you are a superman, you can stand more layers, your audience cannot, they cannot parse you. Isn’t speech serving the purpose of human communication? It would be really weird if one speaks not for communicating with people, but deliberately making life difficult for people to comprehend. It simply does not make sense.
That being the case, what justifies Chomsky for treating natural language as if it would seem infinitely recursive despite the observed fact that it almost never goes beyond three levels of loop?
(M: Recursion became his religion.)
Yes, it is religion-like. Chomsky’s recursion theory has been misleading NLP as well as linguistics for too long. A man this powerful and intelligent, when he mislead, would devoid a generation. The devoided generation is a generation before me (1970s-1980s), their work in natural language understanding resulted all in toy systems confined to a lab, no exceptions, and few of them had any practical merit or impact in real life applications. This situation led directly to the next generation of rebels. The older generation had no force at all to compete with the aggressive new generation, having eventually faded out from the mainstream arena. In the past 30 years, each achievement from statistical NLP is a practical critique of Chomsky because almost all of these new models are based on a finite state ngram mechanism heavily criticized by Chomsky as inappropriate.
(H: from what Chomsky did, we can see the drastic distinction in the difficult level between the construction of machine intelligence and that of human intelligence. In the fifties last century, Chomsky quickly and seemingly effortlessly established his formal language theory, which soon became cornerstones of computer science. On the other hand, it has taken his entire life to repeatedly rebuild his linguistic theories for natural language and there seems still a long way to go.
M: If we do not have his formal language theory, are we able to make a compiler for a computer language?
H: it was always needed for mapping syntax to semantics, and it does not have to use a formal language theory, just like people who do NLP do not have to know linguistics. However, people like David Marr and David Rumelhart are ambitious, and have always been persistent in attempting to find a common device for both machine intelligence and human intelligence.
L: Marr is also a God-like man.
M: Similar to my previous question on Chomsky’s theory and compiler: if we do not have theories from Turing and von Neumann, can we construct computers?
H: Well, Babbage's Analytical Engine was working without. Ada’s programs/algorithms are also practical and verifiable in the early days before Turing. In fact, the issue is not whether we have John von Neumann or Peter von Neumann, the problem is that no matter how different their theories look on the surface, they are probably equivalent , subject to the same constraints. And Chomsky studied these constraints.
M: didn’t Turing go even deeper?
H: yes, Turing studied it from the perspective of machines while Chomsky did this from the human side. )
Yes, it is true that over the past three decades the NLPers who do not know linguistics are way more than linguists pursuing NLP. But that is an unhealthy state. Of course, there are also serious problems within the linguistics field, very easy to confuse new-comers or casual visitors. Nevertheless, linguistics involves some guiding principles, and it makes a big difference between NLP researchers who have knowledge of them from those who are ignorant. For example, Saussure is worth studying. What Saussure says is mostly a matter of principles, with a degree of philosophical flavor, which provide a high-level guidance. His work elaborates on relationships between commonality and individuality, language and speech, rules and idioms, all very insightful. We are thus reminded not to be trapped in the details, losing the direction.
(B: I think linear-speed and flexibility are key to NLP, multi-level recursion and long-distance correlations must get solved, whatever the approach it is.)
These are basically already all well taken care of, parsing handling recursion, flexible/robust, with linear speed. Even the multiple-parse “pseudo-ambiguity” problem, which has been a huge challenge in traditional CFG-type parsers, is also solved now. A seasoned mechanism called cascaded FSAs, when put to practical development in grammar engineering, can nail these down. This is assuming, of course, that the architect/leader keeps in mind the global architecture in directing the program, following Saussure’s philosophy instead of Chomsky’s ever-changing linguistic theories. After the architecture and the modular design with proper interfaces are done, the remaining work is just the development of modules, mainly the work of linguistic resources (lexicons and rule sets), which can lead to very deep parsing, close to a logical level, much deeper than what Chomsky-style CFG parsers can reach. The traditional rule systems implemented in some variation of CFG (including unification grammars such as HPSG) suffer too much from Chomsky’s influence and they are inefficient, with no linear implementation. In fact, almost all one-level NLP systems including traditional neural networks and statistical parsers face a common challenge of some type of combination explosion problem in search paths. Just like multi-level deep learning is considered a breakthrough in AI now, the seasoned FSA formalism, once cascaded in multiple levels properly, the classical parsing problems such as recursion, long-distance and pseudo-ambiguity just fade away, or at least become highly tractable for processing.
(B: like a string of mathematical functions.)
It's that simple a truth, which does not seem to be well understood in the community yet. Too many times, people who criticize rule systems are often attacking a Scarecrow-like fake object, without knowing that not all rule systems are as “silly” and inefficient as single-level CFG.
Another line of NLP misleading that originates from Chomsky is his phrase structure tree representation. That is by no means an adequate representation for natural language understanding: too complex, based on too many structural assumptions (assumably from principles of universal grammar). Compared with dependency trees, Chomsky’s phrase structure representation is neither easy to use nor economical, but it stays as mainstream of parsing, having misled so many people for so long. At least, it misled the de facto community standard PennTree, through which it has been misleading the entire parsing community.
(W: In a sense, nlp is application-driven. When appropriate for a target application, Ngram cannot be judged as misleading. When found inadequate for the application, even HPSG can also be very misleading. Misleading or not at an abstract level is not a meaningful question; let linguists fight with each other. Look at the arguments in the “topic” role (in Chinese grammar), the fighting has been going on so many years now.)
We all know linguists like to fight, mostly for meaningless arguments. This situation also has quite some to do with the negative impact from Chomsky. There is a reflexive thing “self”, with its related Binding Theory. That alone leads to countless wars and papers, mostly junky, which are directly related to Chomsky’s misguidance. Chomsky’s extreme pursuit of the abstract universal grammar and semantic generalizations has led the field to many unnecessary saliva wars. Such fighting does not seem to significantly advance science of understanding human linguistic abilities, nor does it help language applications. The only benefit is to have helped many linguistics degree topics, having produced many linguistics PhDs. However, most linguists still cannot land on a decent job in the field after many years of linguistic training. Due to the extraordinary authority and fame of Chomsky, the status of linguistics has been promoted to a fairly salient level, but Chomsky has no way to influence the job market. The result is the surplus of the world's linguists, who can offer them professional jobs?
The goal for all linguistic analysis is to help semantics/understanding. Parsing for parsing’s sake is meaningless unless it can help decode the meaning of the utterance. But Chomsky’s emphasis has been self-sufficient syntax, or syntax independence. He believes in pure syntactic study without interference from semantics to pursue the ultimate mechanism of universal grammar in human linguistic capabilities. This argument has revolutionized the field, with academic value, but it is also very easy to fall in impractical metaphysics and one-sided view of linguistic mechanism. The direct consequence is that it is a norm rather exceptions now in the linguistics field to study language structures without a real need. Syntacticians have been debating endlessly for one analysis or model over another without a tangible purpose or a measurable standard. They are bound to lead to meaningless wars. (Many times, competing models or analysis are essentially equivalent or very similar as long as they are configured coherent system-internally. )
(L: getting rid of semantics, directly into pragmatics?)
Not really, what Chomsky wants is to directly get into “communist society”, to one unified linguistic world in harmony. He is not even interested in semantics, not to say pragmatics. Semantics essentially belongs to logic to his mind, it is not linguistics in the strict sense. Syntax and semantics are in two distinct categories, syntax must be self-sufficient.
(B: self-sufficient syntax is incorrect.)
Admittedly, for traditional linguistics which mixes syntax and semantics altogether, Chomsky enjoys the power of linguistic revolution and it indeed did deepen the research of language structures. However, anything going to extreme is subject to negative results for things over-done, Chomsky was no exception. He was going too far. Syntax independent of semantics, once pushed to the extreme, causes the linguistic study to lose purpose, means and end upside down now.
As intelligent and revolutionary as Chomsky, he is not necessarily immune to misleading.
(H: Chomsky is not God, he just did the work to let us see the light of reason from the Creator.)
Yes, his formal language theory is unbelievably magical, like an untold secret revealed from heaven. To my mind, this alone would make him a God-man.
Well, after all these rebellious complaints today, the sun still rises tomorrow, and he will still stay as my idol.
[Related original posts in my Chinese Blog]
《泥沙龙笔记:狗血的语言学》 2015-11-20
泥沙龙笔记:从乔姆斯基大战谷歌Norvig说起
乔姆斯基批判
Seeing linguistics god Chomsky by accident: 巧遇语言学上帝乔姆斯基
[转载]特大新闻:乔姆斯基新婚一周年接受采访,谈上帝礼物
从 colorless green ideas sleep furiously 说开去
【Church - 钟摆摆得太远(2):乔姆斯基论】
乔氏 X 杠杠理论 以及各式树形图表达法
《立委随笔:乔姆斯基的“世界语”》
《立委随笔:自然语言是递归的么?》
【科普小品:文法里的父子原则】
立委随笔:Chomsky meets Gates
《立委推荐:乔姆斯基》
Dad, can you explain Chomsky's X-bar Theory to me?
科学网—【立委科普:语言学的基本概念】
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-9-11 15:39
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社