葡萄皮的个人空间分享 http://blog.sciencenet.cn/u/Hadron74

博文

介绍国外几位生物信息学家(2)~~Russell F. Doolittle 精选

已有 8502 次阅读 2011-5-18 09:34 |个人分类:生物信息|系统分类:科研笔记| 生物信息学

Russell F. Doolittle1931年生于美国Connecticut现任University of California, San Diego学校分子遗传中心研究教授。他主要的研究兴趣集中在蛋白质结构和功能的进化,1962年获得Harvard University生物化学博士,而后在瑞典进行了博士后研究。他最早使用计算机进行了辅助刻画蛋白质性质的研究。

 一些人很难想象一个不存在互联网,每个学者在他的书桌上没有计算机书桌的时代;更难想象当时光倒退到1978年重组DNA革命时的原始的计算机软硬件水平。就是这样一个时期,Russell Doolittle DEC PDP11计算机和一套自编的程序,开始了系统地用查找序列的方法来发现进化和其他的生物学关系的研究。1983年,他报道了一个驱动血小板的生长因子(PDGF)事实上被鉴定为一个以前发现的癌症相关基因(v-sis)的序列,这一发现使癌症生物学家感到了震惊【13】。这个轰动的新闻和发现也给分子生物学家一个惊醒:寻找所有的新序列与当前的数据库进行比较是第一位要做的事情。

Doolittle实际上更早就开始了在蛋白质序列上的计算机研究。他着迷于所有的生命的历史可能由序列分析来追溯的想法,于是在60年代早期,他就开始确认和比对序列。1964年,他在UCSD找到工作,就试图使学校计算机中心的顾问们关注这个问题,可是显而易见他们之间的不同文化和语言的区别分歧非常大。 因为计算机学家不感兴趣来学习分子生物学,他就不得不学习计算科学,学习了一门基于FORTRAN编程的基础课程,在他的大儿子的帮助下,开发了一些简单的序列比较的程序。那是一个要用键控打孔机在80列的卡片上输入数据的时代。把那些卡片放在计算机中心,希望在第二天才能收集输出结果。

60年代中期,Richard Eck Margaret Dayoff开始了蛋白质序列和结构的图谱研究,先驱地建立了蛋白质鉴别资源数据库(Protein Identification Resource, PIR).。他们的原始意图是用来每年发表“所有的表面上互相符合的序列”。显然,一旦采用多种直接测序DNA发展来的方法,没有人能预知序列的滥用。1978年,可以用购买磁带的方式得到整个的图谱,有1081条。注意到这个蛋白质序列数据集具有非常大的偏向性。Doolittle开始建立自己的数据库来符合图谱的格式,这个数据库被称之为NEWAT(new atlas)。与此同时,他得到了PDP11计算机,其最大内储只有100kb,且大量被用来装配微型的UNIX操作系统。在他的秘书和小儿子(当时只有11岁)的帮助下。Doolittle开始打字输入每条他能拿到手的新序列,寻找每条序列与以往数据集中每条序列的关系。这是基于他关于所有的新蛋白质来源于旧蛋白质,其方式是基因复制的观点。在他们起初的几年的小实验中,Doolittle和儿子建立了大量的预期之外的蛋白质之间的链接。

Doolittle承认在1978年,他对癌症病毒几乎一无所知,而一系列机会的发生使他进入了这个领域。起先,Ted FriedmannGernot Walter(那时在Salk研究所)来找Doolittle帮助比较两条肿瘤病毒的序列:类人猿病毒40sv40)和多瘤病毒。这使他与Salk研究所的Inder Verma研究组进行了接触,他们正在研究反转录病毒,刚测序了一个反转录病毒中的“致癌基因”叫v-sis,它可以导致小鼠的恶性肿瘤。他们找Doolittle来在数据集中寻找它,但是没有显著性的匹配。此后不久(1980),Doolittle读到了一篇关于禽肉瘤(就是著名的Rous恶性肿瘤)病毒的致癌基因的核酸序列的报告。在这篇报告中,提到Salk团队的一些作者提供了一个还没有发表的小鼠恶性肿瘤基因的序列的拷贝,还没有鉴定出相似的蛋白质。由于这符合Doolittle自己的项目,他很快把这条禽的新序列输入计算机中,看看它能否得到其它的序列匹配,他惊喜的发现事实上一条序列看上去和Salk未发表的小鼠反转录致癌基因序列很匹配。他很快打电话给Inder Verma:"嘿,这里有两条序列是直系同源的,这两个蛋白质应该行使同样的功能“。Verma正在准备一个手稿来描述这个新序列,立刻就打开它加入了这个特征。他非常高兴得到这个结果,在这篇成果中把Doolittle的名字放入了共同作者之中。

Rous恶性肿瘤病毒的研究中,salk研究组是怎样漏掉这个发现的呢?这反映了那个时代人们是怎样想问题的。他们比较两个基因的两条DNA序列而不是把它们翻译成相应的氨基酸序列,作为结果就丢掉了大多数的信息。这是一个简单却致命的要点,告诉大家如何思考序列比较。

19835月,一篇发表在Science上的文章描述了一个从人类血小板中分离出来的生长因子的特性。Harry AntoniadesMichael Hunkapiller确定了PDGFN端的28个氨基酸(得出这个结果用了整整100,000单位的血液来取得足够的生长因子的物质)。文章中指出作者在有限的已知序列中寻找,没有发现任何类似的蛋白质。

在那时,Doolittle 可以用调制解调器链接一个部门的VAX计算机,在那里他存储着他的数据库。他把PDGF的小部分序列输入并设置了查找。二十分钟后,他得到了查询结果:人类中的PDGF实际上被鉴定出就是从卷尾猴中分离出来的致癌基因的序列。Dollittle记得评论这个惊喜的时刻,他对那时15岁的小儿子说:“威尔,这个实验花费了我们五年零二十分钟”。就象事实发生的那样,他没有独享发现的快乐。在LondanImperial癌症实验室(ICL)的工人们也对PDGF进行了测序,在1983年春天他们写信给Doolittle问他要去了收集序列的磁带。Doolittle给他们发去了最新版本,幸运地是其中包含了卷尾猴的v-sis序列。恰当Science文章发表的前几个星期,ICL研究组回信给Doolittle表示衷心的感谢,并没有提及为什么这磁带如何有价值。这时候,Doolittle同时给PDGF的工人们和v-sis团队写信,建议他们比较笔记。结果显示,这个匹配的新闻很快就被认可。一场昂扬的发表比赛开始了,从在出现Science上的报告仅比英国的Nature早了一个星期。Doolittle 80年代中期继续进行了其它许多的匹配工作,其中包括了几个更多的涉及致癌的基因。例如,他发现在致癌基因v-jun和基因调控因子GCN4之间的关系。他描述那些非比寻常的日子,业余学家可能偶然完成一些专业人士做的事。虽然他继续着他在蛋白进化中的兴趣,越来越多地后退回了实验室,然而他留下了这个领域给那些具有更正式训练的生物信息学家继续研究。


13】致癌基因是指那些病毒中的基因,它们可以导致转染细胞的类似癌变。猴子的致癌基因v-sis是在猴肉瘤病毒中可以导致细胞生长失控和致癌。而表面上不相关的“生长因子”PDGF是一个激活细胞生长的蛋白质。


原文:

   Russell F.Doolittle, born 1931 in Connecticut, is currently a research professor at the Center for Molecular Genetics,University of California, San Diego. His principal research interests center around the evolution of protein structure and function. He has a PhD in biochemistry from Harvard (1962) and did postdoctoral work in Sweden. He was an early advocate of using computers as an aid to characterizing proteins.

   For some it may be difficult to envision a time when the World Wide Web did not exist and every academician did not have a computer terminal on his or her desk. It may be even harder to imagine the primitive state of computer hardware and software at the time of the recombinant DNA revolution, which dates back to about 1978. It was in this period that Russell Doolittle, using a DEC PDP11 computer and a suite of home-grown programs, began systematically searching sequences
in an effort to find evolutionary and other biological relationships. In 1983 he stunned cancer biologists when he reported that a newly reported sequence for platelet derived growth factor (PDGF) was virtually identical to a previously reported sequence for the oncogene known as v-sis.13 This was big news, and the finding served as a wake-up call to molecular biologists:
searching all new sequences against up-to-date databases is your first order of business.
   Doolittle had actually begun his computer studies on protein sequences much earlier. Fascinated by the idea that the history of all lifemight be traceable by sequence analysis, he had begun determining and aligning sequences in the early 1960s. When he landed a job at UCSD in 1964, he tried to interest consultants at the university computer center in the problem, but it was clear that the language and cultural divide between them was too great. Because computer people were not interested in learningmolecular biology, he would have to learn about computing. He took an elementary course in FORTRAN programming, and, with the help of his older son, developed some simple programs for comparing sequences. These were the days when one used a keypunch machine to enter data on eighty-column cards, packs of which were dropped off at the computer center with the hope that the output could be collected the next day.
    In the mid-1960s, Richard Eck and Margaret Dayhoff had begun the Atlas of Protein Sequence and Structure, the forerunner of the Protein Identification Resource (PIR) database. Their original intention was to publish an annual volume of "all the sequences that could fit between two covers." Clearly, no one foresaw the deluge of sequences that was to come once methods had been developed for directly sequencing DNA. In 1978, for example, the entire holding of the atlas, which could be purchased on magnetic tape, amounted to 1081 entries. Realizing that this was a very biased collection of protein sequences, Doolittle began his own database, which, because it followed the format of the atlas, he called NEWAT ("new atlas"). At about the same time he acquired a PDP11 computer, the maximum capacity of which was only 100 kilobytes, much of that occupied by a mini-UNIX operating system. With the help of his secretary and his younger son (eleven years old at the time), Doolittle began typing in every new sequence he could get his hands on, searching each against every other sequence in the collection as they went. This was in keeping with his view that all new proteins come from old proteins, mostly by way of gene duplications. In the first few years of their small enterprise, Doolittle & Son established a number of unexpected connections.
   Doolittle admits that in 1978 he knewhardly anything about cancer viruses,but a number of chance happenings put him in touch with the field. For one, Ted Friedmann and Gernot Walter (who was then at the Salk Institute),had sought Doolittle’s aid in comparing the sequences of two DNA tumor viruses, simian virus 40 (SV40) and the polyoma virus. This led indirectly to contacts with Inder Verma’s group at Salk, which was studying retroviruses and had sequenced an “oncogene” called v-mos in a retrovirus that caused sarcomas in mice. They asked Doolittle to search it for them, but no significant matches were found. Not long afterward (in 1980), Doolittle read an article reporting the nucleotide sequence of an oncogene from an avian sarcoma virus—the famous Rous sarcoma virus. It was noted in that article that the Salk teamhad provided the authors with a copy of their still unpublished mouse sarcoma gene sequence, but no resemblances had been detected. In line with his own project, Doolittle promptly typed the new avian sequence into his computer to see if it might match anything else. He was astonished to find that in fact a match quickly appeared with the still unpublished Salk sequence for the mouse retrovirus oncogene. He immediately telephoned Inder Verma; "Hey, these two sequences are in fact homologous. These proteins must be doing the same thing." Verma, who had just packaged up a manuscript describing the newsequence, promptly unwrapped it and added the new feature. He was so pleased with the outcome that he added Doolittle’s name as one of the coauthors.
   How was it that the group studying the Rous sarcoma virus had missed this match? It’s a reflection on how people were thinking at the time. They had compared the DNA sequences of the two geneswithout translating them into the corresponding amino acid sequences, losing most of the information as a result. It was another simple but urgent message to the community about how to think about sequence comparisons.
   In May of 1983, an article appeared in Science describing the characterization of a growth factor isolated from human blood platelets. Harry Antoniades and Michael Hunkapiller had determined 28 amino acid residues from the N-terminal end of PDGF. (It had taken almost 100,000 units of human blood to obtain enough of the growth factor material to get this much sequence.) The article noted that the authors had conducted a limited search of known sequences and hadn’t found any similar proteins.
   By this time, Doolittle had modem access to a department VAX computer where he now stored his database. He typed in the PDGF partial sequence and set it searching. Twenty minutes later he had the results of the search; human PDGF had a sequence that was virtually identical to that of an oncogene isolated from a woolly monkey. Doolittle describes it as an electrifying moment, enriched greatly by his prior experiences with the other oncogenes. He remembers remarking to his then fifteen-year old son, “Will, this experiment took us five years and twenty minutes.” As it happened, he was not alone in enjoying the thrill of this discovery. Workers at the Imperial Cancer Laboratory in London were also sequencing PDGF, and in the spring of 1983 had written to Doolittle asking for a tape of his sequence collection. He had sent them his newest version, fortuitously containing the v-sis sequence from the woolly monkey. Just a few weeks before the Science article appeared, Antoniades and Hunkapiller replied with an effusive letter of thanks, not mentioning just why the tape had been so valuable to them. Meanwhile, Doolittle had written to both the PDGF workers and the v-sis team, suggesting that they compare notes. As a result, the news of the match was quickly made known, and a spirited race to publication occurred, the report from the Americans appearing in Science only a week ahead of the British effort in Nature. Doolittle went on to make many other matches during the mid -1980s, including severalmore involving oncogenes. For example, he found a relationship between the oncogene v-jun and the gene regulator GCN4. He describes those days as unusual in that an amateur could still occasionally compete with the professionals. Although he continued with his interests in protein evolution, he increasingly retreated to the laboratory and left bioinformatics to those more formally trained in the field.


13. Oncogenes are genes in viruses that cause a cancer-like transformation of infected cells. Oncogene v-sis in the simian sarcoma virus causes uncontrolled cell growth and leads to cancer in monkeys. The seemingly unrelated growth factor PDGF is a protein that stimulates cell growth.



http://blog.sciencenet.cn/blog-565112-445340.html

上一篇:介绍国外几位生物信息学家(1)~~Richard Karp
下一篇:我的头像图片的来源

7 谢鑫 郭桅 梁建华 黄晓磊 田灿荣 王守业 张亮生

发表评论 评论 (10 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2021-4-13 09:28

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部