yueliusd07017的个人博客分享 http://blog.sciencenet.cn/u/yueliusd07017

博文

[转载]Why Most Published Research Findings are False （听力资料合集）

已有 1947 次阅读 2024-2-15 09:30 |个人分类:科技英语|系统分类:科普集锦|文章来源:转载

https://blog.sciencenet.cn/home.php?mod=attachment&id=1195890

'Why Most Published Research Findings are False' Part I

https://blog.sciencenet.cn/home.php?mod=attachment&id=1195891

'Why Most Published Research Findings are False' Part II

https://blog.sciencenet.cn/home.php?mod=attachment&id=1195892

'Why Most Published Research Findings are False' Part III.

https://blog.sciencenet.cn/home.php?mod=attachment&id=1195893

'You should do solid work, that's priority one' Bruce Beutler, Nobel Laureate

https://blog.sciencenet.cn/home.php?mod=attachment&id=1195894

The Statistical Crisis in Science and How to Move Forward by Professor Andrew Gelman

https://blog.sciencenet.cn/home.php?mod=attachment&id=1195895

The Problem of Bad Research!

The Challenges of Evidence-Based Medicine (Part 1).mp4

The Challenges of Evidence-Based Medicine (Part 1)

The Challenges of Evidence-Based Medicine (Part 2).mp4

The Challenges of Evidence-Based Medicine (Part 2)

Can I trust what’s written in scientific journals- Nobel Laureate Tim Hunt.mp4

Can science be objective- - John Ioannidis, Claudia de Rham & Harry Collins.mp4

Is Science Reliable.mp4

英汉对照（机器翻译）

https://blog.sciencenet.cn/home.php?mod=attachment&id=1195895

The Problem of Bad Research!

In 2005, a Stanford medical professor

John Ioannidis published an essay titled

“Why Most Published Research Findings Are False”,

where he showed that the results of many medical

research studies could not be replicated by

other researchers. This is obviously a problem!

A subsequent survey by the Science

Journal Nature showed that more than

70% of researchers have tried and failed to

reproduce another scientist's experiments,

not only that, but more than half admit to having

failed to reproduce their own experiments.

During a decade as head of global cancer research

at Amgen, C. Glenn Begley identified 53 “landmark”

publications -- papers in top journals, from

reputable labs -- for his team to reproduce.

He sought to double-check the findings before

trying to build on them for drug development.

He found that 47 of the 53 could not be

replicated, causing huge problems for those trying

to produce new medicines based upon the findings.

So, what might be causing this problem? Well,

part way through his project to reproduce these

landmark cancer studies, Begley met with the lead

scientist of one of the problematic studies.

He told the scientist that he had gone through

the paper line by line, figure by figure and

re-did the experiment 50 times and never got

the published result. The scientist told him

that they’d done the experiment six times,

got the published result once and put it in

the paper because it made the best story.

Such selective publication is just one

reason that the scientific literature

is peppered with incorrect results.

Many blame the hypercompetitive academic

environment, as researchers compete for

diminishing funding. The surest ticket to

getting a grant or a good job is getting published

in a high-profile journal, and this can lead

a scientist to engage in sensationalism

and sometimes even dishonest behavior.

Obviously, this is most concerning in the

world of medicine, but the same problem

can be found in all other areas of research.

Incredibly influential and commonly accepted

theories have been found in recent years

to be false under more rigorous retests.

In 2011, Joseph Simmons, a psychologist at the

University of Pennsylvania, published a paper in

the journal Psychological Science, where he showed

that people who listened to the Beatles song "When

I'm Sixty-Four" grew younger, by nearly 18 months.

The result was obviously ridiculous but the point

the paper made was serious. It showed how standard

scientific methods, when abused could generate

scientific support for just about anything.

Scientists have been shocked to discover that

what they used to consider reasonable research

practices were flawed and likely to generate

false positives. This discovery has been

labeled the “replication crisis” by the press.

Campbell Harvey, a professor of finance at Duke

university argues that at least half of the 400

supposedly market-beating strategies identified in

top financial journals over the years are false.

“It’s a huge issue,” he told the

Financial Times. “Step one in dealing

with the replication crisis in finance is to

accept that there is a crisis. And right now,

many of my colleagues are not there yet.”

Harvey is the former editor of the Journal

of Finance, a former president of the American

Finance Association, and an adviser to investment

firms like Research Affiliates and Man Group.

He has written more than 150 papers on finance,

several of which have won prestigious

prizes. This is not like a child saying

that the emperor has no clothes. Harvey’s

criticism of the rigor of academic research

in finance is more like the emperor himself

announcing that he has no clothes.

Obviously, the stakes of the replication crisis

are much higher in medicine, where people’s health

can be at risk than in the world of finance, but

flawed financial research is often pitched to

the public either through the press or by fund

management companies looking to raise assets.

Bad financial research makes its way into

people’s portfolios and can affect their

wealth and the comfort of their retirement.

While Ioannidis’s 2005 paper has been criticized

over time for its use of dramatic and exaggerated

language, most academics do agree with his paper's

conclusions and its recommendations. So, lets

look at some of the issues that he raised.

In statistics, we don’t try to prove that

something is definitely true, instead we show

how unlikely it is that we would have found our

test results if the underlying process was random,

a process known as rejecting the null hypothesis.

This approach is based on the principle of

falsification introduced by the philosopher,

Karl Popper. According to Popper,

we can never prove that something is definitely

true, we can only prove that something is

false. Statistical hypothesis tests thus, never

prove a model is correct, they instead show how

unlikely it is that we would have gotten our test

results if the idea being tested was incorrect.

The p value that we calculate in statistical

hypothesis testing is the evidence against a

null hypothesis. The smaller the p-value, the

stronger the evidence is that our results are

not attributable to randomness. P-scores

are used to help us decide in medicine

whether a given drug is actually helpful, or in

finance if cheap stocks outperform over time.

p-values less than .05 are generally considered

significant and worthy of publication,

they tell us that there is a 5% chance that

our results can be attributed to randomness.

This 5% threshold was picked by Ronald Fisher –

an important statistician in a book he published

in 1925 as being a reasonable threshold.

The term p-hacking, describes the deliberate or

accidental manipulation of data in a study until

it produces a sufficient p-value. It is the misuse

of data analysis to find patterns in data that

can be presented as statistically significant,

thus dramatically increasing and understating the

risk of false positives. If you took random data

and tested enough hypothesizes on it, you would

eventually come up with a study that appears to

prove something, which is actually false.

Harvey (the former editor of the Journal of

Finance who we mentioned earlier) attributes the

scourge of p-hacking to incentives in academia.

Getting a paper with a sensational

finding published in a prestigious journal

can earn an ambitious young professor the ultimate

academic prize — tenure. Wasting months of work

on a theory that does not hold up to scrutiny

would frustrate anyone. It is therefore tempting

to torture the data until it yields something

interesting, even if other researchers are later

unable to duplicate the results. And

therein lies the problem of incentives:

scientists have huge incentives to publish

papers, in fact their careers depend on it;

as one scientist Brian Nosek puts it:

"There is no cost to getting things wrong,

the cost is not getting things published".

But Isn't science supposed to self-correct by

having other scientists replicate the findings of

an initial discovery? It is a lot less glamorous

to just replicate other people’s studies.

Scientists want to find their own breakthrough,

not check other scientists’ homework.

Additionally, many journals don’t publish

replication studies. So, if you're a scientist

the successful strategy is clear, don’t waste your

time on replication studies, do the kind of work

that will get you published, and if you can find

a result that is surprising and unusual, maybe

it will get picked up in the popular press too.

Now I don't want this to be seen as a negative

piece on science or the scientific method,

because people are more aware of this problem

today than in the past and things have started

changing for the better. Many scientists

acknowledge the problems I’ve outlined and

are starting to take steps to correct them: there

are more large-scale replication studies going on,

there's a site, Retraction Watch, that publicizes

research that has been withdrawn, there are online

databases of unpublished negative results.

There has been a move in many fields towards

preregistration of studies, where researchers

write up what they plan on studying

and the methods they will use. A journal then

decides whether to accept it in principle.

After the work is completed, reviewers

simply check whether the researchers

stuck to their own recipe; if so, the paper is

published, regardless of what the data show.

This eliminates publication bias, promotes higher

powered studies and lessens the incentive for

p-hacking. The thing I find most striking

about the replication crisis in academia

is not the prevalence of incorrect information in

published scientific journals after all getting

to the truth we know is hard and mathematically

not everything that is published can be correct.

What gets me is that if we use our

best scientific and statistical tools,

and still make this many mistakes,

how frequently do we delude ourselves

when we're not using the scientific method?

As flawed as our research methods may be,

they are significantly more reliable than

any other approach that we can use.

Amusingly, around nine years after

John Ioannidis wrote his essay “Why

Most Published Research Findings Are False”,

a team of biostatisticians Jager and Leek

attempted to replicate his findings and calculated

that the false positive rate in biomedical studies

was estimated to be around 14%, not the 50% that

Ioannidis had asserted. So, things are possibly

not quite as bad as people thought 16 years ago,

and science has moved in a positive direction

where researchers are more aware of the mistakes,

they might make than they were in the past.

Today’s video is based on my book Statistics

for the Trading Floor, where I conclude with

a chapter on common errors in statistical

analysis and how to avoid them. There is a

link to the book in the video description.

If you enjoyed this video, you should watch my

video on chart crimes next.

See you later, bye.

2005 年，斯坦福大学医学教授约翰·约阿尼迪斯 (John Ioannidis) 发表了一篇题为

“为什么大多数已发表的研究结果都是错误的”的文章，他在文章中指出，许多医学

研究的结果是其他研究人员无法复制的。这显然有问题！

《自然》杂志随后的一项调查显示，超过

70% 的研究人员曾尝试复制另一位科学家的实验但未能成功，

不仅如此，还有超过一半的人承认未能复制自己的实验。

在 Amgen 担任全球癌症研究负责人的十年中，C. Glenn Begley 确定了 53

篇 “具有里程碑意义的” 出版物——来自知名实验室的顶级期刊论文——供他的团队复制。

在尝试将其用于药物开发之前，他试图仔细检查这些发现。

他发现 53 种中的 47 种无法复制，这给那些试图根据研究结果生产新药的

人带来了巨大的问题。那么，什么可能导致这个问题？好吧，

在他重现这些具有里程碑意义的癌症研究的项目中，贝格利会见了

其中一项有问题的研究的首席科学家。他告诉这位科学家，他已经

逐行逐图浏览了论文，并重新进行了 50 次实验，但始终没有得到

发表的结果。科学家告诉他，他们做了六次实验，

得到了一次发表的结果，并把它放在了论文中，因为它创造了最好的故事。

这种选择性发表只是科学文献

中充斥着错误结果的原因之一。许多人归咎于竞争激烈的学术

环境，因为研究人员争夺不断减少的资金。获得资助或一份好工作

的最可靠途径是在知名期刊上发表文章，这可能导致

科学家从事耸人听闻的行为，有时甚至是不诚实的行为。

显然，这是医学界最令人担忧的问题，但

在所有其他研究领域中也存在同样的问题。近年来，在更严格的重新测试下，

令人难以置信的影响力和普遍接受的理论被发现是错误的。

2011 年，宾夕法尼亚大学的心理学家约瑟夫·西蒙斯 (Joseph Simmons) 在

《心理科学》(Psychological Science) 杂志上发表了一篇论文

，他表明，听披头士乐队歌曲“当我 64 岁”的人变年轻了近 18 个月.结果显然是荒谬的，但

论文提出的观点是严肃的。它展示了标准的科学方法在被滥用时如何能够

为几乎任何事情提供科学支持。科学家们震惊地发现，

他们过去认为合理的研究实践是有缺陷的，可能会产生

误报。这一发现被媒体称为“复制危机”。

杜克大学金融学教授坎贝尔哈维认为，多年来在顶级金融期刊中确定

的 400 种据称能够击败市场的策略中，至少有一半是错误的。

“这是一个大问题，”他告诉《金融时报》。 “应对金融复制危机的

第一步是接受危机的存在。现在，我的许多同事还没有到那里。” Harvey 是《

金融杂志》的前编辑、美国金融协会的前任主席，以及

Research Affiliates 和 Man Group 等

投资公司的顾问。他撰写了 150 多篇关于金融的论文

，其中几篇获得了著名的奖项。这可不是小孩子说

皇帝没衣服。哈维对金融学术研究严谨性的批评，

更像是皇帝自己宣布自己没有衣服。

显然，复制危机在医学领域的风险要高得多，在医学领域，人们的健康

可能比在金融领域面临风险，但有缺陷的金融研究往往

通过媒体或希望筹集资金的基金管理公司向公众宣传资产。

糟糕的金融研究会影响人们的投资组合，并可能影响他们的

财富和退休后的舒适度。尽管 Ioannidis 2005 年的论文

因使用夸张和夸张的语言而受到批评，但大多数学者确实同意他论文的

结论和建议。那么，让我们来看看他提出的一些问题。

在统计学中，我们不会试图证明某事绝对正确，而是展示

了如果基础过程是随机的，我们发现测试结果的可能性有多大，

这个过程被称为拒绝零假设。这种方法基于哲学家卡尔·波普尔（Karl Popper）引入

的证伪原则。根据波普尔的说法，

我们永远无法证明某事绝对是真的，我们只能证明某事是

假的。因此，统计假设检验永远不会证明模型是正确的，而是表明

如果被检验的想法不正确，我们得到检验结果的可能性有多大。

我们在统计假设检验中计算的 p 值是反对零假设的证据

。 p 值越小，表明我们的结果不可归因于随机性的证据就越强

。 P-scores 用于帮助我们在医学方面

决定给定的药物是否真的有用，或者如果廉价股票随着时间的推移跑赢大盘，则在金融方面。

小于 0.05 的 p 值通常被认为是显着的并且值得发表，

它们告诉我们，我们的结果有 5% 的机会可以归因于随机性。

这个 5% 的阈值是由 Ronald Fisher 选择的，他是一位重要的统计学家，他

在 1925 年出版的一本书中将其作为一个合理的阈值。术语 p-hacking 描述了

对研究中数据的故意或意外操作，直到产生足够的 p 值。滥用

数据分析来发现数据中可以显示为具有统计意义的模式，

从而大大增加和低估了误报的风险。如果您采用随机数据

并对其进行足够的假设测试，您最终会提出一项似乎可以证明某些事情的研究

，但实际上这是错误的。 Harvey（我们之前提到的《

金融杂志》的前任编辑）将 p-hacking 的祸害归因于学术界的激励。

在著名期刊上发表具有轰动性发现的论文

可以为雄心勃勃的年轻教授赢得最终的学术奖——终身教职。在一个经不起

审查的理论上浪费数月的工作会让任何人感到沮丧。因此

，即使其他研究人员后来无法复制结果，也很容易

折磨数据，直到它产生一些有趣的东西。这就是激励的问题：

科学家有巨大的激励来发表论文，实际上他们的职业生涯依赖于此；

正如一位科学家布赖恩·诺塞克 (Brian Nosek) 所说：“把事情弄错是没有代价的，

代价是没有发表文章”。但是科学难道不应该通过

让其他科学家复制最初发现的结果来自我纠正吗？仅仅复制别人的研究

并不那么光鲜。科学家要找到自己的突破口，

而不是去查其他科学家的作业。此外，许多期刊不发表

重复研究。所以，如果你是一名科学家，成功的策略是明确的，不要在重复研究上

浪费你的时间，做那些能让你发表的工作，如果你能找到

一个令人惊讶和不寻常的结果，也许它也会被大众媒体报道。

现在我不希望这被视为对科学或科学方法的负面影响，

因为今天人们比过去更加意识到这个问题，而且情况已经开始

好转。许多科学家承认我概述的问题，

并开始采取措施纠正它们：有更多的大规模复制研究正在进行，

有一个网站，撤回观察，宣传已被撤回的研究，有在线

数据库未发表的阴性结果。许多领域都朝着预先注册研究的

方向发展，研究人员在其中写下他们计划研究的内容

以及他们将使用的方法。然后期刊决定原则上是否接受。

工作完成后，审稿人只需检查研究人员是否

坚持自己的配方；如果是这样，无论数据显示什么，论文都会发表。

这消除了发表偏见，促进了更高功率的研究并减少了 p-hacking

的动机。对于学术界的复制危机，我发现最引人注目的

不是已发表的科学期刊中错误信息的普遍存在，毕竟要

了解我们所知道的真相是困难的，而且从数学上讲，并非所有已发表的内容都是正确的。

让我吃惊的是，如果我们使用我们最好的科学和统计工具，

但仍然犯这么多错误，当我们不使用科学方法时，我们

多久会自欺欺人？尽管我们的研究方法可能存在缺陷，

但它们比我们可以使用的任何其他方法都要可靠得多。

有趣的是，在 John Ioannidis 撰写他的文章“为什么

大多数已发表的研究结果是错误的” 大约九年后，一个由生物统计学家 Jager 和 Leek 组成的团队

试图复制他的发现并计算出生物医学研究中的假阳性率

估计约为 14% ，而不是约阿尼迪斯声称的 50%。所以，事情可能

不像 16 年前人们想象的那么糟糕，科学已经朝着积极的方向发展

，研究人员比过去更能意识到他们可能犯的错误。

今天的视频基于我的《交易大厅统计》一书，最后有

一章介绍了统计分析中的常见错误以及如何避免这些错误。

视频说明中有这本书的链接。如果你喜欢这个视频，

接下来你应该看我关于图表犯罪的视频。再见拜。

转载本文请联系原作者获取授权，同时请注明本文来自刘跃科学网博客。
链接地址：https://blog.sciencenet.cn/blog-3589443-1421697.html

上一篇：[转载]不能用文章发表期刊的等级来判断论文的学术质量（科技英语听力资料，英汉对照）
下一篇：现行微波吸收理论混淆了膜和材料的区别（公开的学术擂台，接受挑战）

收藏 IP: 39.152.24.*| 热度|

当前推荐数：1 推荐人：宁利中

该博文允许注册用户评论请点击登录评论 (0 个评论)

1/0 | 闁诡剚妲掗锟�:0 | 濡絾鐗犻妴锟� | 濞戞挸锕ｇ粩瀛樸亜閿燂拷 | 閻犲搫鐤囧ù锟�

返回顶部

刘跃

扫一扫，分享此博文

全部作者的精选博文

全部作者的其他最新博文

全部精选博文导读

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2025-3-14 01:49

返回顶部