博文

[自己提出问题，提问] 与大数据与机器学习、数据挖掘等“数据科学”有关的一些问题

已有 3379 次阅读 2022-7-25 14:55 |个人分类:基础数学-逻辑-物理|系统分类:科研笔记

汉语是联合国官方正式使用的 6 种同等有效语言之一。请不要歧视汉语！

Chinese is one of the six equally effective official languages of the United Nations.

Not to discriminate against Chinese, please!

[自己提出问题，提问] 与大数据与机器学习、数据挖掘等“数据科学”有关的一些问题

网传丘成桐（Shing-Tung Yau）老师说：

尽管近40年来中国的数学发展迅速，但这远远不能让人满意，其中最大的问题就是人们还是以解题，很少自己提出问题。

https://new.qq.com/omn/20220411/20220411A0DCY700.html

https://new.qq.com/omn/20220301/20220301A0565L00.html

2022-07-22 （高宏）丘成桐对中国数学现状的评价裁剪.jpg

https://blog.sciencenet.cn/blog-3418723-1348363.html

一、数学类历史上的各种问题

（1） 1900年，希尔伯特在巴黎举行的国际数学家大会上发表演说，提出了 20 世纪数学面临的 23 个问题（见希尔伯特问题, Hilbert problems）。对这些问题的研究有力地推动了 20 世纪数学发展的进程。

（2） 1998年斯梅尔在《Mathematical problems for the next century》也提出了 18 个数学类问题。

（3） 2000年美国 Clay Mathematics Institute 也提出了 7 个数学问题（有奖金）：The Millennium Prize Problems 千禧年大奖难题。

（4） SIAM（Society for Industrial and Applied Mathematics, 美国工业与应用数学学会）主席 Nick Trefethen 在2012年提出了 2 个问题。

（5） 还有 2005、2021年《科学》杂志的两个 125 个科学问题。

二、俺可以提问吗？俺提出几个问题比较合适呢？

俺的疑问实在太多了，在哲学里、在数学里、在物理学里，在生命科学里，在地学里，在气象学里，在历史学里，在摄影学里，在……，多的说不完了。

谚语：

一个傻瓜提问题，十个聪明人也回答不过来。

一个傻瓜能够提出的问题，比十个聪明人能够解答的还要多十倍。

一个傻瓜能够提出多得连十个聪明人都回答不了的问题。

三、[自己提出问题] 与大数据与机器学习、数据挖掘等“数据科学”有关的一些问题

下面，傻尝试在大数据与机器学习、数据挖掘里，提出下面的问题：

（1）线性变换（归一化、中心化等），会引起非线性系统分析和预测的误差吗？

（2）计算“相关性/距离”的好方法是怎样的？

一点发现：值域为(-∞, +∞)的定量指标，才是机器学习里的好指标。

值域为 [-1, +1] 的相关性指标，肯定不是好的指标。这些指标一定会丧失多数情况下的“灵敏度”！可能与其内在的非线性性质有关。

（3）能用线性回归等线性方法，实现“深度”学习类的“高精度”回归与分类吗？

好像可以，但是木有时间、木有精力去做。

已经有了向量机型的深度学习，基于线性支持向量机的深度学习。

（4）线性模型一定比非线性模型更稳健吗？

（5）如何突破“聚类不可能定理”的限制？

（6）比深度学习更精确的回归方法是什么？

（7）小样本下有效的回归方法是什么？

（8）小样本下的数据分析：如何找到“真值/真解”？

（9）比经典的频率学派、贝叶斯学派更实用可靠的新统计学思想是什么？

（10）能否找到比真随机数更好的伪随机数？

（11）均匀分布、正态分布，哪个更基础？

（12）极值分布、正态分布，到底哪个在现实世界里更常见？

（13）数理统计学的置信区间与香农采样定理（傅里叶分析）有怎样的内在等价性？

不算 Langlands Program 圈里的吧？这个问题的思路，实际上是沿着 1872年爱尔兰根纲领（Erlanger Programme），1930+的 Nicolas Bourbaki，1967年朗兰兹纲领的思路来的：某些不同的数学学科之间，具有内在的统一性。这种思想的在物理学里的对应物，大体上就是爱因斯坦的统一场。

朗兰兹纲领（Langlands program）的圈圈：伽罗瓦（Galois）表示（算术对象），自守表示（分析对象）和代数簇的各种上同调理论（几何对象）。

貌似没有：概率论、数理统计学、信号分析、傅里叶分析、拉普拉斯变换。

把“几何曲线”定义为“第二类数域”，则是更广泛数学内部的统一性。

2022-07-22 （高宏）提出问题比解决问题更重要小.png

https://blog.sciencenet.cn/blog-3418723-1348332.html

爱因斯坦 Leopold Infeld 《Evolution of Physics》 page 92 拉曲线.png

Einstein, Leopold Infeld. The Evolution of Physics [M]. Originally published in 1938 by Cambridge University Press

参考资料：

[1] 希尔伯特，D./David Hilbert/李文林，中国大百科全书

https://www.zgbk.com/ecph/words?SiteID=1&ID=44260&Type=bkzyb&SubID=61734

[2] Hilbert problems. Encyclopedia of Mathematics

https://encyclopediaofmath.org/wiki/Hilbert_problems

[3] Steve Smale. Mathematical problems for the next century[J]. Mathematical Intelligencer, 1998, 20(2): 7-15.

doi: 10.1007/BF03025291

https://link.springer.com/article/10.1007/BF03025291

[4] Stephen Smale, American mathematician, britannica

https://www.britannica.com/biography/Stephen-Smale

[5] Stephen Smale, MacTutor History of Mathematics Archive

https://mathshistory.st-andrews.ac.uk/Biographies/Smale/

[6] The Millennium Prize Problems | Clay Mathematics Institute

https://www.claymath.org/millennium-problems/millennium-prize-problems

[7] Nick Trefethen. The smart money is on numerical analysts [J]. SIAM News, 2012, 45(9): 居然没有页码？

doi: 也木有？

https://archive.siam.org/news/news.php?id=2024

[8] Erlangen program. Encyclopedia of Mathematics.

https://encyclopediaofmath.org/wiki/Erlangen_program

[9] Nicolas Bourbaki, MacTutor History of Mathematics Archive

https://mathshistory.st-andrews.ac.uk/Biographies/Bourbaki/

[10] 科学智库--朗兰兹纲领（Langlands program）研究，2017-02

https://thinktank.sciencereading.cn/booklib/v/subLibPreview/122/249/1557382.html

朗兰兹纲领（Langlands program）是当今数学领域非常活跃的研究方向，它联系了3种来源各异的数学对象：伽罗瓦（Galois）表示（算术对象），自守表示（分析对象）和代数簇的各种上同调理论（几何对象），使得相应的3种不变量[阿廷（Artin）L函数、自守L函数、Hasse-Weil L函数]相匹配。这3大领域的结合为数论问题提供了有力的杠杆，Wiles、泰勒（Taylor）等证明的谷山-志村猜想便是一个范例。朗兰兹纲领（Langlands program）的核心问题是函子性猜想，蕴含了很多著名的猜想，如阿廷（Artin）猜想、拉马努金（Ramanujan）猜想、佐藤-塔特（Misak...

[11] 朗兰兹纲领_百度百科

https://baike.baidu.com/item/%E6%9C%97%E5%85%B0%E5%85%B9%E7%BA%B2%E9%A2%86

[12] Langlands Program | Institute for Advanced Study

Robert Langlands: Far-Reaching Mathematics, 2007, Kelly Devine Thomas

https://www.ias.edu/ideas/2007/langlands-mathematics

In his letter, Langlands proposed a grand unifying theory that relates seemingly unrelated concepts in number theory, algebraic geometry, and the theory of automorphic forms. “There were some fine points that were right that rather surprise me to this day,” says Langlands. “There was evidence that these L-functions were good but that they would have these consequences for algebraic number theory was by no means certain.”

[13] Langlands Program -- from Wolfram MathWorld

https://mathworld.wolfram.com/LanglandsProgram.html

A grand unified theory of mathematics which includes the search for a generalization of Artin reciprocity (known as Langlands reciprocity) to non-Abelian Galois extensions of number fields. In a January 1967 letter to André Weil, Langlands proposed that the mathematics of algebra (Galois representations) and analysis (automorphic forms) are intimately related, and that congruences over finite fields are related to infinite-dimensional representation theory. In particular, Langlands conjectured that the transformations behind general reciprocity laws could be represented by means of matrices (Mackenzie 2000).

[14] 丘成桐（Shing-Tung Yau）----中国科学院学部 - CAS

http://www.casad.cas.cn/sourcedb_ad_cas/zw2/ysxx/wjysmd/200906/t20090624_1808840.html

[15] 腾讯，2022-04-11，丘成桐：“我们的教育如果再不更改，中国科技少说要倒退20年！”我们的教育出现什么问题了？

https://new.qq.com/omn/20220411/20220411A0DCY700.html

https://new.qq.com/omn/20220301/20220301A0565L00.html

[16] 高宏，2022-07-22，丘成桐对中国数学现状的评价

https://blog.sciencenet.cn/blog-3418723-1348363.html

[17] 高宏，2022-07-22，提出问题比解决问题更重要

https://blog.sciencenet.cn/blog-3418723-1348332.html

[18] 杨立坚，2018-08-15，统计学视角1：从包办婚姻（线性回归）到开放式婚姻（机器学习）

http://blog.sciencenet.cn/blog-941132-1129277.html

[19] Bradley Efron, Trevor Hastie. Computer age statistical infer-ence: Algorithms, Evidence and Data Science[M]. Cambridge University Press，2016.