||
与博友的一次讨论
武夷山
20140910
XX厉害!这三处都是我犹豫的地方。
1. 原文是error,但我觉得,测度有误差,数据有错误。典型例子:我国台湾有东吴大学,英文是Soochow University。我们中国科学技术信息研究所曾在早年的论文统计中发现,SCI数据库统计出的苏州大学论文中曾经混入不少东吴大学的论文,因为数据库加工者不清楚这其实是两所大学。这就是张冠李戴的错误,不是误差。
2. “统计功效”的表达,接受。
3. Data Snooping,未发现统一的定义。译为“数据探测”时,显然不是贬义的,而本社评对Data Snooping持否定态度。估计社评作者认可以下的定义(黑体是我加重的):
WHITE, H., 2000. A Reality Check for Data Snooping. Econometrica. [Cited by 193] (34.80/year)
"Data snooping occurs when a given set of data is used more than once for purposes of inference or model selection. When such data reuse occurs, there is always the possibility that any satisfactory results obtained may simply be due to chance rather than to any merit inherent in the method yielding the results. This problem is practically unavoidable in the analysis of time-series data, as typically only a single history measuring a given phenomenon of interest is available for analysis. It is widely acknowledged by empirical researchers that data snooping is a dangerous practice to be avoided, but in fact it is endemic. The main problem has been a lack of sufficiently simple practical methods capable of assessing the potential dangers of data snooping in a given situation. Our purpose here is to provide such methods by specifying a straightforward procedure for testing the null hypothesis that the best model encountered in a specification search has no predictive superiority over a given benchmark model. This permits data snooping to be undertaken with some degree of confidence that one will not mistake results that could have been generated by chance for genuinely good results."
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-24 04:07
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社