|||
https://blog.sciencenet.cn/home.php?mod=attachment&id=1195890
'Why Most Published Research Findings are False' Part I
https://blog.sciencenet.cn/home.php?mod=attachment&id=1195891
'Why Most Published Research Findings are False' Part II
https://blog.sciencenet.cn/home.php?mod=attachment&id=1195892
'Why Most Published Research Findings are False' Part III.
https://blog.sciencenet.cn/home.php?mod=attachment&id=1195893
'You should do solid work, that's priority one' Bruce Beutler, Nobel Laureate
https://blog.sciencenet.cn/home.php?mod=attachment&id=1195894
The Statistical Crisis in Science and How to Move Forward by Professor Andrew Gelman
https://blog.sciencenet.cn/home.php?mod=attachment&id=1195895
The Problem of Bad Research!
The Challenges of Evidence-Based Medicine (Part 1).mp4
The Challenges of Evidence-Based Medicine (Part 1)
The Challenges of Evidence-Based Medicine (Part 2).mp4
The Challenges of Evidence-Based Medicine (Part 2)
Can I trust what’s written in scientific journals- Nobel Laureate Tim Hunt.mp4
Can science be objective- - John Ioannidis, Claudia de Rham & Harry Collins.mp4
英汉对照 (机器翻译)
https://blog.sciencenet.cn/home.php?mod=attachment&id=1195895
The Problem of Bad Research!
In 2005, a Stanford medical professor
John Ioannidis published an essay titled
“Why Most Published Research Findings Are False”,
where he showed that the results of many medical
research studies could not be replicated by
other researchers. This is obviously a problem!
A subsequent survey by the Science
Journal Nature showed that more than
70% of researchers have tried and failed to
reproduce another scientist's experiments,
not only that, but more than half admit to having
failed to reproduce their own experiments.
During a decade as head of global cancer research
at Amgen, C. Glenn Begley identified 53 “landmark”
publications -- papers in top journals, from
reputable labs -- for his team to reproduce.
He sought to double-check the findings before
trying to build on them for drug development.
He found that 47 of the 53 could not be
replicated, causing huge problems for those trying
to produce new medicines based upon the findings.
So, what might be causing this problem? Well,
part way through his project to reproduce these
landmark cancer studies, Begley met with the lead
scientist of one of the problematic studies.
He told the scientist that he had gone through
the paper line by line, figure by figure and
re-did the experiment 50 times and never got
the published result. The scientist told him
that they’d done the experiment six times,
got the published result once and put it in
the paper because it made the best story.
Such selective publication is just one
reason that the scientific literature
is peppered with incorrect results.
Many blame the hypercompetitive academic
environment, as researchers compete for
diminishing funding. The surest ticket to
getting a grant or a good job is getting published
in a high-profile journal, and this can lead
a scientist to engage in sensationalism
and sometimes even dishonest behavior.
Obviously, this is most concerning in the
world of medicine, but the same problem
can be found in all other areas of research.
Incredibly influential and commonly accepted
theories have been found in recent years
to be false under more rigorous retests.
In 2011, Joseph Simmons, a psychologist at the
University of Pennsylvania, published a paper in
the journal Psychological Science, where he showed
that people who listened to the Beatles song "When
I'm Sixty-Four" grew younger, by nearly 18 months.
The result was obviously ridiculous but the point
the paper made was serious. It showed how standard
scientific methods, when abused could generate
scientific support for just about anything.
Scientists have been shocked to discover that
what they used to consider reasonable research
practices were flawed and likely to generate
false positives. This discovery has been
labeled the “replication crisis” by the press.
Campbell Harvey, a professor of finance at Duke
university argues that at least half of the 400
supposedly market-beating strategies identified in
top financial journals over the years are false.
“It’s a huge issue,” he told the
Financial Times. “Step one in dealing
with the replication crisis in finance is to
accept that there is a crisis. And right now,
many of my colleagues are not there yet.”
Harvey is the former editor of the Journal
of Finance, a former president of the American
Finance Association, and an adviser to investment
firms like Research Affiliates and Man Group.
He has written more than 150 papers on finance,
several of which have won prestigious
prizes. This is not like a child saying
that the emperor has no clothes. Harvey’s
criticism of the rigor of academic research
in finance is more like the emperor himself
announcing that he has no clothes.
Obviously, the stakes of the replication crisis
are much higher in medicine, where people’s health
can be at risk than in the world of finance, but
flawed financial research is often pitched to
the public either through the press or by fund
management companies looking to raise assets.
Bad financial research makes its way into
people’s portfolios and can affect their
wealth and the comfort of their retirement.
While Ioannidis’s 2005 paper has been criticized
over time for its use of dramatic and exaggerated
language, most academics do agree with his paper's
conclusions and its recommendations. So, lets
look at some of the issues that he raised.
In statistics, we don’t try to prove that
something is definitely true, instead we show
how unlikely it is that we would have found our
test results if the underlying process was random,
a process known as rejecting the null hypothesis.
This approach is based on the principle of
falsification introduced by the philosopher,
Karl Popper. According to Popper,
we can never prove that something is definitely
true, we can only prove that something is
false. Statistical hypothesis tests thus, never
prove a model is correct, they instead show how
unlikely it is that we would have gotten our test
results if the idea being tested was incorrect.
The p value that we calculate in statistical
hypothesis testing is the evidence against a
null hypothesis. The smaller the p-value, the
stronger the evidence is that our results are
not attributable to randomness. P-scores
are used to help us decide in medicine
whether a given drug is actually helpful, or in
finance if cheap stocks outperform over time.
p-values less than .05 are generally considered
significant and worthy of publication,
they tell us that there is a 5% chance that
our results can be attributed to randomness.
This 5% threshold was picked by Ronald Fisher –
an important statistician in a book he published
in 1925 as being a reasonable threshold.
The term p-hacking, describes the deliberate or
accidental manipulation of data in a study until
it produces a sufficient p-value. It is the misuse
of data analysis to find patterns in data that
can be presented as statistically significant,
thus dramatically increasing and understating the
risk of false positives. If you took random data
and tested enough hypothesizes on it, you would
eventually come up with a study that appears to
prove something, which is actually false.
Harvey (the former editor of the Journal of
Finance who we mentioned earlier) attributes the
scourge of p-hacking to incentives in academia.
Getting a paper with a sensational
finding published in a prestigious journal
can earn an ambitious young professor the ultimate
academic prize — tenure. Wasting months of work
on a theory that does not hold up to scrutiny
would frustrate anyone. It is therefore tempting
to torture the data until it yields something
interesting, even if other researchers are later
unable to duplicate the results. And
therein lies the problem of incentives:
scientists have huge incentives to publish
papers, in fact their careers depend on it;
as one scientist Brian Nosek puts it:
"There is no cost to getting things wrong,
the cost is not getting things published".
But Isn't science supposed to self-correct by
having other scientists replicate the findings of
an initial discovery? It is a lot less glamorous
to just replicate other people’s studies.
Scientists want to find their own breakthrough,
not check other scientists’ homework.
Additionally, many journals don’t publish
replication studies. So, if you're a scientist
the successful strategy is clear, don’t waste your
time on replication studies, do the kind of work
that will get you published, and if you can find
a result that is surprising and unusual, maybe
it will get picked up in the popular press too.
Now I don't want this to be seen as a negative
piece on science or the scientific method,
because people are more aware of this problem
today than in the past and things have started
changing for the better. Many scientists
acknowledge the problems I’ve outlined and
are starting to take steps to correct them: there
are more large-scale replication studies going on,
there's a site, Retraction Watch, that publicizes
research that has been withdrawn, there are online
databases of unpublished negative results.
There has been a move in many fields towards
preregistration of studies, where researchers
write up what they plan on studying
and the methods they will use. A journal then
decides whether to accept it in principle.
After the work is completed, reviewers
simply check whether the researchers
stuck to their own recipe; if so, the paper is
published, regardless of what the data show.
This eliminates publication bias, promotes higher
powered studies and lessens the incentive for
p-hacking. The thing I find most striking
about the replication crisis in academia
is not the prevalence of incorrect information in
published scientific journals after all getting
to the truth we know is hard and mathematically
not everything that is published can be correct.
What gets me is that if we use our
best scientific and statistical tools,
and still make this many mistakes,
how frequently do we delude ourselves
when we're not using the scientific method?
As flawed as our research methods may be,
they are significantly more reliable than
any other approach that we can use.
Amusingly, around nine years after
John Ioannidis wrote his essay “Why
Most Published Research Findings Are False”,
a team of biostatisticians Jager and Leek
attempted to replicate his findings and calculated
that the false positive rate in biomedical studies
was estimated to be around 14%, not the 50% that
Ioannidis had asserted. So, things are possibly
not quite as bad as people thought 16 years ago,
and science has moved in a positive direction
where researchers are more aware of the mistakes,
they might make than they were in the past.
Today’s video is based on my book Statistics
for the Trading Floor, where I conclude with
a chapter on common errors in statistical
analysis and how to avoid them. There is a
link to the book in the video description.
If you enjoyed this video, you should watch my
video on chart crimes next.
See you later, bye.
2005 年,斯坦福大学医学教授约翰·约阿尼迪斯 (John Ioannidis) 发表了一篇题为
“为什么大多数已发表的研究结果都是错误的”的文章,他在文章中指出,许多医学
研究 的结果是 其他研究人员无法复制的。这显然有问题!
《自然》杂志随后的一项调查显示,超过
70% 的研究人员曾尝试复制另一位科学家的实验 但未能成功 ,
不仅如此,还有超过一半的人承认未能复制自己的实验。
在 Amgen 担任全球癌症研究负责人的十年中,C. Glenn Begley 确定了 53
篇 “具有里程碑意义的” 出版物——来自知名实验室的顶级期刊论文——供他的团队复制。
在尝试将其用于药物开发之前,他试图仔细检查这些发现。
他发现 53 种中的 47 种无法复制,这给那些试图 根据研究结果生产新药的
人带来了巨大的问题 。那么,什么可能导致这个问题?好吧,
在他重现这些具有里程碑意义的癌症研究的项目中,贝格利会见了
其中一项有问题的研究 的首席 科学家。他告诉这位科学家,他已经
逐行逐图 浏览 了论文,并重新进行了 50 次实验,但始终没有得到
发表的结果。科学家告诉他,他们做了六次实验,
得到了一次发表的结果,并把它放在了论文中,因为它创造了最好的故事。
这种选择性发表只是科学文献
中充斥着错误结果的原因之一。许多人归咎于竞争激烈的学术
环境,因为研究人员争夺不断减少的资金。 获得资助或一份好工作
的最可靠途径 是在知名期刊上发表文章,这可能导致
科学家从事耸人听闻的行为,有时甚至是不诚实的行为。
显然,这是医学界最令人担忧的问题,但
在所有其他研究领域中也存在 同样的问题 。 近年来,在更严格的重新测试下,
令人难以置信的影响力和普遍接受的 理论被发现是错误的。
2011 年,宾夕法尼亚大学的心理学家约瑟夫·西蒙斯 (Joseph Simmons) 在
《心理科学》(Psychological Science) 杂志上 发表了一篇论文
,他表明,听披头士乐队歌曲“当 我 64 岁”的人变年轻了近 18 个月.结果显然是荒谬的,但
论文提出 的观点 是严肃的。它展示了标准的科学方法在被滥用时如何能够
为几乎任何事情 提供 科学支持。科学家们震惊地发现,
他们过去认为合理的研究实践是有缺陷的,可能会产生
误报。这一发现被媒体称为“复制危机”。
杜克大学金融学教授坎贝尔哈维认为, 多年来在顶级金融期刊中确定
的 400 种 据称能够击败市场的策略 中,至少有一半 是错误的。
“这是一个大问题,”他告诉《金融时报》。 “应对 金融复制危机的
第一步 是接受危机的存在。现在, 我的许多同事还没有到那里。” Harvey 是《
金融 杂志 》 的前编辑、 美国金融协会的前任主席,以及
Research Affiliates 和 Man Group 等
投资 公司 的顾问 。他撰写了 150 多篇关于金融的论文
,其中几篇获得了著名的奖项。这可不是小孩子说
皇帝没衣服。哈维对 金融 学术研究严谨性的批评,
更像是皇帝自己宣布自己没有衣服。
显然,复制危机在医学领域的风险要高得多,在医学领域,人们的健康
可能比在金融领域面临风险,但有缺陷的金融研究往往
通过媒体或希望筹集资金的基金管理公司向公众宣传资产。
糟糕的金融研究会影响人们的投资组合,并可能影响他们的
财富和退休后的舒适度。尽管 Ioannidis 2005 年的论文
因使用夸张和夸张的语言 而受到批评 ,但大多数学者确实同意他论文的
结论和建议。那么,让我们来看看他提出的一些问题。
在统计学中,我们不会试图证明某事绝对正确,而是展示
了如果基础过程是随机的,我们发现测试结果的可能性有多大,
这个过程被称为拒绝零假设。这种方法基于 哲学家卡尔·波普尔(Karl Popper)引入
的 证伪 原则 。根据波普尔的说法,
我们永远无法证明某事绝对是真的,我们只能证明某事是
假的。因此,统计假设检验永远不会证明模型是正确的,而是表明
如果被检验的想法不正确,我们得到检验结果的可能性有 多大 。
我们在统计假设检验中计算的 p 值是反对 零假设 的证据
。 p 值越小,表明我们的结果 不可归因于随机性 的证据就越强
。 P-scores 用于帮助我们在医学方面
决定给定的药物是否真的有用,或者如果廉价股票随着时间的推移跑赢大盘,则在金融方面。
小于 0.05 的 p 值通常被认为是显着的并且值得发表,
它们告诉我们,我们的结果有 5% 的机会可以归因于随机性。
这个 5% 的阈值是由 Ronald Fisher 选择的,他是一位重要的统计学家,他
在 1925 年 出版的一本书 中将其作为一个合理的阈值。术语 p-hacking 描述了
对研究中数据 的故意或 意外操作,直到产生足够的 p 值。滥用
数据分析来发现数据中可以显示为具有统计意义的模式,
从而大大增加和低估了误报的风险。如果您采用随机数据
并对其进行足够的假设测试,您最终会提出一项似乎可以 证明某些事情 的研究
,但实际上这是错误的。 Harvey( 我们之前提到 的《
金融 杂志》的前任编辑 )将 p-hacking 的祸害归因于学术界的激励。
在著名期刊上发表具有轰动性发现的论文
可以为雄心勃勃的年轻教授赢得最终的学术奖——终身教职。 在一个 经不起
审查的理论上 浪费数月的工作 会让任何人感到沮丧。因此
,即使其他研究人员后来 无法复制结果 , 也很容易
折磨数据,直到它产生一些有趣的东西 。这就是激励的问题:
科学家有巨大的激励来发表论文,实际上他们的职业生涯依赖于此;
正如一位科学家布赖恩·诺塞克 (Brian Nosek) 所说:“把事情弄错是没有代价的,
代价是没有发表文章”。但是科学难道不应该通过
让其他科学家复制最初发现的结果来自我 纠正 吗? 仅仅复制别人的研究
并不那么光鲜 。科学家要找到自己的突破口,
而不是去查其他科学家的作业。此外,许多期刊不发表
重复研究。所以,如果你是一名科学家,成功的策略是明确的,不要 在重复研究上
浪费你的 时间,做那些能让你发表的工作,如果你能找到
一个令人惊讶和不寻常的结果,也许它也会被大众媒体报道。
现在我不希望这被视为对科学或科学方法的负面影响,
因为今天人们比过去更加意识到这个问题,而且情况已经开始
好转。许多科学家承认我概述的问题,
并开始采取措施纠正它们:有更多的大规模复制研究正在进行,
有一个网站,撤回观察,宣传已被撤回的研究,有在线
数据库未发表的阴性结果。许多领域都朝着 预先注册研究的
方向发展 ,研究人员在其中写下他们计划研究的内容
以及他们将使用的方法。然后期刊决定原则上是否接受。
工作完成后,审稿人只需检查研究人员是否
坚持自己的配方;如果是这样,无论数据显示什么,论文都会发表。
这消除了发表偏见,促进了更高功率的研究并减少了 p-hacking
的动机 。对于学术界的复制危机,我发现最引人注目的
不是已发表的科学期刊中错误信息的普遍存在,毕竟要
了解我们所知道的真相是困难的,而且从数学上讲,并非所有已发表的内容都是正确的。
让我吃惊的是,如果我们使用我们最好的科学和统计工具,
但仍然犯这么多错误, 当我们不使用科学方法时,我们
多久会自欺欺人 ?尽管我们的研究方法可能存在缺陷,
但它们比我们可以使用的任何其他方法都要可靠得多。
有趣的是,在 John Ioannidis 撰写他的文章“为什么
大多数已发表的研究结果是错误的” 大约九年后 ,一个由生物统计学家 Jager 和 Leek 组成的团队
试图复制他的发现并计算出生物医学研究中的假阳性率
估计约为 14% ,而不是约阿尼迪斯声称的 50%。所以,事情可能
不像 16 年前人们想象的那么糟糕,科学已经朝着积极的方向发展
,研究人员比过去更能意识到他们可能犯的错误。
今天的视频基于我的《交易大厅统计》一书,最后有
一章介绍了统计分析中的常见错误以及如何避免这些错误。
视频说明中 有 这本书 的 链接。如果你喜欢这个视频,
接下来 你应该看我 关于图表犯罪的视频。再见拜。
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-7 16:33
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社