||
For new readers and those who request to be “好友 good friends” please read my 公告栏 first
I went to the below talk which featured a distinguished speaker, Alfred Spector, who is academician of US NAE, former CMU professor, and now executive in knowledge industry. The talk was full of facts/data and anecdotes about “Computer Science + X” where X can be sociology, medicine, traffic, literature, and anything about our civilization. Infact, I had problem taking notes for this write up because of the massive amount of data offered by the speaker.
|
So what are my takeaways of this lecture? Let me use an example. Spell checking in word processing is well known to every user of computers. One would think the way it operates is to check your typing against a giant dictionary stored in the computer memory. Nothing could be further from the truth. Instead, the word processer utilizes a giant data base which contains a vast collection of mistyping and corrections. For example the first name of the US president –Barack. Sometimes people type in “Baraque” or “Barak” and then correct it to “Barack”.Such streams are stored in the Cloud memory. When similar mis-typings are made by other typists, the automatic corrections are quickly provided. Music recommendation works also in ways that combine users, musician’s knowledge, and signal analysis of music.
To give an idea of the scaleof Big Data consider the following: There are 3x10^9 searches/day on the Internet, 3x10^13 bytes of astronomical data collected every night never mind data in other disciplines.
What are the challenges?
Look at it from the viewpoint of a matrix. Horizontally you plot the clarity of and contentiousness of the objectives (that of correct spelling is very clear while enforcement of prostitution laws are not that clear cut). Vertically you consider the serious nature of failures (again spelling incorrectly is not that serious while medical diagnosis is). In this sense, spelling correction is located at the origin of this 2D matrix while Global warming and Economic growth may be located farthest from the origin.
Privacy, Security and Resilience are contradictory requirements.Optimized systems are often fragile and not resilient. Remember the work on HOT(Highly Optimized Tolerant) systems.
Machine Learning (ML) is often heavily used in Big Data Science. However, ML does not explain “why” something works nor “causation vs correlation”
Finally, the speaker speculates the effect of Data Science on civilization and whether “Meritocracy” is really fair since Data Science offers the educated user too much of an advantage.
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-20 03:24
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社