黄晓磊(Huang Xiaolei )分享 http://blog.sciencenet.cn/u/book Nature is teacher | 诚实点,简单点,专业点


花费10亿美元却失败的研究? 精选

已有 14441 次阅读 2014-3-26 21:10 |个人分类:科学那些事儿|系统分类:科研笔记| 随机对照试验


美国NIH1991年启动了一项巨大规模的随机对照研究(randomized controlled trial),称为Women's Health InitiativeWHI),该项研究持续约15年,耗资近10亿美元。WHI一项任务是检验低脂肪饮食(low-fat diet)是否能够有效预防绝经后女性的乳腺癌、结肠癌、心血管疾病等。研究者将48835名女性(平均62.3岁)随机分为实验组(饮食干预)和对照组(无饮食干预),并于19932005年间跟踪她们的身体状况。然而,经过十几年的研究,却并未获得科学家预期的结果——两组人员患病概率并没有明显差异。WHI其他方面的研究的结果也差不多(下图)。换句话说,这项花费近10亿美元的研究,很多人认为失败了。该项目貌似现在依然处于活跃状态,其网站(https://cleo.whi.org/SitePages/Home.aspx)还不断有新内容发布。


from http://en.wikipedia.org/wiki/Women's_Health_Initiative

以这个项目做例子,预防医学研究所(Preventive Medicine Research Institute)的主席Dean Ornish认为,对于医学研究来说,大规模随机对照试验可能并不靠谱。最关键的问题是:科学家并不能控制所谓的随机试验中可能出现的误差。比如,实验组的人或许没能严格按照饮食干预的要求来做,而对照组的人或许改变了自己的饮食习惯。仔细读读他下面的观点,对于开展试验设计及理解医学研究结果会很有帮助。


Dean Ornish

Founderand president of the non-profit Preventive Medicine Research Institute

It isa commonly held but erroneous belief that a larger study is always morerigorous or definitive than a smaller one, and a randomized controlled trial isalways the gold standard . However, there is a growing awareness that size doesnot always matter and a randomized controlled trial may introduce its ownbiases. We need more creative experimental designs.

In anyscientific study, the question is: "What is the likelihood that observeddifferences between the experimental group and the control group are due to theintervention or due to chance?" By convention, if the probability is lessthan 5% that the results are due to chance, then it is considered statisticallysignificant, i.e., a real finding.

Arandomized controlled trial (RCT) is based on the idea that if yourandomly-assign subjects to an experimental group that receive an interventionor to a control group that does not, then any known or unknown differencesbetween the groups that might bias the study are as likely to affect one groupas another.

Whilethat sounds good in theory, in practice a RCT can often introduce its own setof biases and thus undermine the validity of the findings.

Forexample, a RCT may be designed to determine if dietary changes may prevent heartdisease and cancer. Investigators identify patients who meet certain selectioncriteria, e.g., that they have heart disease. When they meet with prospectivestudy participants, investigators describe the study in great detail and ask,"If you are randomly-assigned to the experimental group, would you bewilling to change your lifestyle?" In order to be eligible for the study,the patient needs to answer, "Yes."

However,if that patient is subsequently randomly-assigned to the control group, it islikely that this patient may begin to make lifestyle changes on their own,since they have already been told in detail what these lifestyle changes are.If they're studying a new drug that only is available to the experimentalgroup, then it is less of an issue. But in the case of behavioralinterventions, those who are randomly-assigned to the control group are likelyto make at least some of these changes because they believe that theinvestigators must think that these lifestyle changes are worth doing or theywouldn't be studying them.

Or,they may be disappointed that they were randomly-assigned to the control group,and so they are more likely to drop out of the study, creating selection bias.

Also,in a large-scale RCT, it is often hard to provide the experimental group enoughsupport and resources to be able to make lifestyle changes. As a result,adherence to these lifestyle changes is often less than the investigators mayhave predicted based on earlier pilot studies with smaller groups of patients whowere given more support.

Thenet effect of the above is to (a) reduce the likelihood that the experimentalgroup will make the desired lifestyle changes, and (b) increase the likelihoodthat the control group will make similar lifestyle changes. This reduces thedifferences between the groups and makes it less likely to show statisticallysignificant differences between them.

As aresult, the conclusion that the intervention had no significant effect may bemisleading. This is known as a "type 2 error" meaning that there wasa real difference but these design issues obscured the ability to detect them.

That'sjust what happened in the Women's Health Initiative study, which followednearly 49,000 middle-aged women for more than eight years. The women in theexperimental group were asked to eat less fat and more fruits, vegetables, andwhole grains each day to see if it could help prevent heart disease and cancer.The women in the control group were not asked to change their diets.

However,the experimental group participants did not reduce their dietary fat asrecommended—over 29 percent of their diet was comprised of fat, not the study'sgoal of less than 20 percent. Also, they did not increase their consumption offruits and vegetables very much. In contrast, the control group reduced itsconsumption of fat almost as much and increased its consumption of fruits andvegetables, diluting the between-group differences to the point that they werenot statistically significant. The investigators reported that these dietarychanges did not protect against heart disease or cancer when the hypothesis wasnot really tested.

Paradoxically,a small study may be more likely to show significant differences between groupsthan a large one. The Women's Health Initiative study cost almost a billiondollars yet did not adequately test the hypotheses. A smaller study providesmore resources per patient to enhance adherence at lower cost.

Also,the idea in RCTs that you're changing only one independent variable (theintervention) and measuring one dependent variable (the result) is often amyth. For example, let's say you're investigating the effects of exercise andits effects on preventing cancer. You devise a study whereby you randomlyassign one group to exercise and the other group to no exercise. On paper, itappears that you're only working with one independent variable.

Inactual practice, however, when you place people on an exercise program, you'renot just getting them to exercise; you're actually affecting other factors thatmay confound the interpretation of your results even if you're not aware ofthem.

Forexample, people often exercise with other people, and there's increasingevidence that enhanced social support significantly reduces the risk of mostchronic diseases. You're also enhancing a sense of meaning and purpose byparticipating in a study, and these also have therapeutic benefits. And whenpeople exercise, they often begin to eat healthier foods.

Weneed new, more thoughtful experimental designs and systems approaches that takeinto account these issues. Also, new genomic insights will make it possible tobetter understand individual variations to treatment rather than hoping thatthis variability will be "averaged out" by randomly-assigningpatients.



上一篇:冒充 Elsevier (爱思唯尔)的约稿信

7 曹聪 叶水送 李学宽 徐耀 武夷山 rosejump tlw2013

该博文允许注册用户评论 请点击登录 评论 (4 个评论)


Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2020-10-20 20:08

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社