何毓琦的个人博客分享 http://blog.sciencenet.cn/u/何毓琦 哈佛(1961-2001) 清华(2001-date)

博文

Data Science (5)

已有 4280 次阅读 2016-12-4 22:21 |个人分类:S and T|系统分类:海外观察

For new readers and those who request to be “好友 good friends”  please read my 公告 first



I went to the below talk which featured a distinguished speaker, Alfred Spectorhttp://r20.rs6.net/on.jsp?a=1118921482350&r=3&c=f3eb6520-84a8-11e4-8f95-d4ae5292c426&d=1126469109410&ch=f570a590-84a8-11e4-8fd3-d4ae5292c426&ca=be058e29-4153-4015-8529-c55d01c737ce&o=https://imgssl.constantcontact.com/ui/images1/s.gif, who is academician of US NAE, former CMU professor, and now executive in knowledge industry.  The talk was full of facts/data and anecdotes about “Computer Science + X” where X can be  sociology, medicine, traffic, literature, and anything about our civilization. Infact,  I had problem taking notes for this write up because of the massive amount of data offered by the speaker.

http://img.constantcontact.com/letters/images/sys/S.gif

http://img.constantcontact.com/letters/images/sys/S.gif

http://img.constantcontact.com/letters/images/sys/S.gif

http://img.constantcontact.com/letters/images/sys/S.gif

http://img.constantcontact.com/letters/images/1101116784221/S5x5.gif

http://img.constantcontact.com/letters/images/sys/S.gif

http://img.constantcontact.com/letters/images/1101116784221/S5x5.gif

http://img.constantcontact.com/letters/images/1101116784221/S5x5.gif

http://img.constantcontact.com/letters/images/sys/S.gif


 

"Opportunities and Perils in Data             Science"

presented            by

Alfred Spector              

CTO, Two Sigma Investments  

Friday,            December 2, 2016

1-2pm

SEAS            Campus

Maxwell Dworkin G115 (MAP)

33 Oxford Street

Cambridge, MA 02138

Over the last few decades, empiricism has become            the third leg of computer science, adding to the field's            traditional bases in mathematical analysis and engineering.             This shift has occurred due to the sheer growth in the scale of            computation, networking and usage as well as progress in machine            learning and related technologies.  Resulting data-driven            approaches have led to extremely powerful prediction and            optimization techniques and hold great promise, even in the            humanities and social sciences.  However, no new technology            arrives without complications.  In this presentation, Dr.            Spector will balance the opportunities provided by big data and            associated A.I. approaches with a discussion of the various            challenges.  He'll provide many example problems, and make            suggestions on how to address some of the unanticipated consequences            of Big Data.  Co-sponsored with the Center for Research on Computation and Society            (CRCS).

http://img.constantcontact.com/letters/images/sys/S.gif

http://img.constantcontact.com/letters/images/1101116784221/S5x5.gif

http://img.constantcontact.com/letters/images/1101116784221/S5x5.gif

http://img.constantcontact.com/letters/images/sys/S.gif

http://img.constantcontact.com/letters/images/1101116784221/S5x5.gif

http://img.constantcontact.com/letters/images/1101116784221/PT_MEI_BottomShadow.png

http://img.constantcontact.com/letters/images/sys/S.gif

http://img.constantcontact.com/letters/images/sys/S.gif

http://img.constantcontact.com/letters/images/sys/S.gif

http://files.constantcontact.com/e60eae48401/2404c7fa-3105-4405-90fa-4ac30479cd04.png

https://imgssl.constantcontact.com/letters/images/1101116784221/T.png#IACSSEMINAR SERIES

This semester's seminar series will focus on a wide        range of topics including machine learning, data visualization, and the        perils of data science.

So what are my takeaways of this lecture? Let me use an example.  Spell checking in word processing is well known to every user of computers. One would think the way it operates is to  check your typing against a giant dictionary stored in the computer memory. Nothing could  be further from the truth. Instead, the word processer utilizes a giant data base which contains  a vast collection of mistyping and corrections. For example the first name of the US president –Barack. Sometimes people type in “Baraque” or “Barak” and then correct it to “Barack”.Such streams are stored in the Cloud memory. When similar mis-typings are made by other typists, the automatic corrections  are quickly provided. Music recommendation  works also in ways that combine users,  musician’s knowledge, and signal analysis of music.


To give an idea of the scaleof Big Data consider the following: There are 3x10^9 searches/day on the Internet, 3x10^13 bytes of astronomical data collected every night never mind data in other disciplines.


What are the challenges?

Look at it from the viewpoint of a matrix. Horizontally you plot the clarity of and contentiousness of  the objectives (that of correct spelling is very clear while enforcement of prostitution laws are  not that clear cut). Vertically you consider the serious nature of failures (again spelling incorrectly  is not that serious while medical diagnosis is). In this sense, spelling correction is located at the origin of this 2D matrix  while Global warming and Economic growth may be located farthest from the origin.


Privacy, Security and Resilience are contradictory requirements.Optimized systems are often fragile and not resilient. Remember the work on HOT(Highly Optimized Tolerant) systems.


Machine Learning (ML) is often heavily used in Big Data Science. However,  ML does not explain “why”  something works nor “causation vs correlation”

Finally, the speaker speculates the effect of Data Science on civilization and whether “Meritocracy”  is really  fair since Data Science offers the educated user too much of an advantage.  




http://blog.sciencenet.cn/blog-1565-1018681.html

上一篇:[转载]Home state pride
下一篇:Success and Failure of Joint Global Efforts

2 张士伟 许培扬

该博文允许注册用户评论 请点击登录 评论 (2 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备14006957 )

GMT+8, 2018-12-17 18:23

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部