Science Blog of Dr. Yuan分享 http://blog.sciencenet.cn/u/albumns This blog is mainly on Molecular molecular modelling and simulations

博文

Protein-Protein Docking Summary and Tools

已有 22759 次阅读 2010-1-1 17:09 |个人分类:好文转载|系统分类:科研笔记

 

Wiki about Protein Docking

Macromolecular docking

From Wikipedia, the free encyclopedia

Jump to: navigation, search

Macromolecular docking is the computational modelling of the quaternary structure of complexes formed by two or more interacting biological macromolecules. Protein–protein complexes are the most commonly attempted targets of such modelling, followed by protein–nucleic acid complexes.

The ultimate goal of docking is the prediction of the three dimensional structure of the macromolecular complex of interest as it would occur in a living organism. Docking itself only produces plausible candidate structures. These candidates must be ranked using methods such as scoring functions to identify structures that are most likely to occur in nature.

The term "docking" originated in the late 1970s, with a more restricted meaning; then, "docking" meant refining a model of a complex structure by optimizing the separation between the interactors but keeping their relative orientations fixed. Later, the relative orientations of the interacting partners in the modelling was allowed to vary, but the internal geometry of each of the partners was held fixed. This type of modelling is sometimes referred to as "rigid docking". With further increases in computational power, it became possible to model changes in internal geometry of the interacting partners that may occur when a complex is formed. This type of modelling is referred to as "flexible docking".

Contents

[hide]

[edit] Introduction

The biological roles of most proteins, as characterized by which other macromolecules they interact with, are known at best incompletely. Even those proteins that participate in a well-studied biological process (e.g., the Krebs cycle) may have unexpected interaction partners or functions which are unrelated to that process. Moreover, vast numbers of "hypothetical" proteins have been emerging as part of the genomic revolution of the late 1990s, proteins that, apart from their amino acid sequence, are a complete mystery.

In cases of known protein–protein interactions, other questions arise. Genetic diseases (e.g., cystic fibrosis) are known to be caused by misfolded or mutated proteins, and there is a desire to understand what, if any, anomalous protein–protein interactions a given mutation can cause. In the distant future, proteins may be designed to perform biological functions, and a determination of the potential interactions of such proteins will be essential.

For any given set of proteins, the following questions may be of interest, from the point of view of technology or natural history:

If they do bind:

  • What is the spatial configuration which they adopt in their bound state?
  • How strong or weak is their interaction?

If they do not bind, can they be made to bind by inducing a mutation?

Protein–protein docking is ultimately envisaged to address all these issues. Furthermore, since docking methods can be based on purely physical principles, even proteins of unknown function (or which have been studied relatively little) may be docked. The only prerequisite is that their molecular structure has been either determined experimentally, or can be estimated by a protein structure prediction technique.

Protein–nucleic acid interactions feature prominently in the living cell. Transcription factors, which regulate gene expression, and polymerases, which catalyse replication, are composed of proteins, and the genetic material they interact with is composed of nucleic acids. Modeling protein–nucleic acid complexes presents some unique challenges, as described below.

[edit] History

In the 1970s, complex modelling revolved around manually identifying features on the surfaces of the interactors, and interpreting the consequences for binding, function and activity; any computer programmes were typically used at the end of the modelling process, to discriminate between the relatively few configurations which remained after all the heuristic constraints had been imposed. The first use of computers was in a study on hemoglobin interaction in sickle-cell fibres.[1] This was followed in 1978 by work on the trypsin-BPTI complex.[2] Computers discriminated between good and bad models using a scoring function which rewarded large interface area, and pairs of molecules in contact but not occupying the same space. The computer used a simplified representation of the interacting proteins, with one interaction centre for each residue. Favorable electrostatic interactions, including hydrogen bonds, were identified by hand.

In the early 1990s, more structures of complexes were determined, and available computational power had increased substantially. With the emergence of bioinformatics, the focus moved towards developing generalized techniques which could be applied to an arbitrary set of complexes at acceptable computational cost. The new methods were envisaged to apply even in the absence of phylogenetic or experimental clues; any specific prior knowledge could still be introduced at the stage of choosing between the highest ranking output models, or be framed as input if the algorithm catered for it. 1992 saw the publication of the correlation method,[3] an algorithm which used the fast Fourier transform to give a vastly improved scalability for evaluating coarse shape complementarity on rigid-body models. This was extended in 1997 to cover coarse electrostatics.[4]

In 1996 the results of the first blind trial were published,[5] in which six research groups attempted to predict the complexed structure of TEM-1 Beta-lactamase with Beta-lactamase inhibitor protein (BLIP). The exercise brought into focus the necessity of accommodating conformational change and the difficulty of discriminating between conformers. It also served as the prototype for the CAPRI assessment series, which debuted in 2001.

[edit] Rigid-body docking vs. flexible docking

If the bond angles, bond lengths and torsion angles of the components are not modified at any stage of complex generation, it is known as rigid body docking. A subject of speculation is whether or not rigid-body docking is sufficiently good for most docking. When substantial conformational change occurs within the components at the time of complex formation, rigid-body docking is inadequate. However, scoring all possible conformational changes is prohibitively expensive in computer time. Docking procedures which permit conformational change, or flexible docking procedures, must intelligently select small subset of possible conformational changes for consideration.

[edit] Methods

Successful docking requires two criteria:

  • Generating a set configurations which reliably includes at least one nearly correct one.
  • Reliably distinguishing nearly correct configurations from the others.

For many interactions, the binding site is known on one or more of the proteins to be docked. This is the case for antibodies and for competitive inhibitors. In other cases, a binding site may be strongly suggested by mutagenic or phylogenetic evidence. Configurations where the proteins interpenetrate severely may also be ruled out a priori.

After making exclusions based on prior knowledge or stereochemical clash, the remaining space of possible complexed structures must be sampled exhaustively, evenly and with a sufficient coverage to guarantee a near hit. Each configuration must be scored with a measure that is capable of ranking a nearly correct structure above at least 100,000 alternatives. This is a computationally intensive task, and a variety of strategies have been developed.

[edit] Reciprocal space methods

Each of the proteins may be represented as a simple cubic lattice. Then, for the class of scores which are discrete convolutions, configurations related to each other by translation of one protein by an exact lattice vector can all be scored almost simultaneously by applying the convolution theorem.[3] It is possible to construct reasonable, if approximate, convolution-like scoring functions representing both stereochemical and electrostatic fitness.

Reciprocal space methods have been used extensively for their ability to evaluate enormous numbers of configurations. They lose their speed advantage if torsional changes are introduced. Another drawback is that it is impossible to make efficient use of prior knowledge. The question also remains whether convolutions are too limited a class of scoring function to identify the best complex reliably.

[edit] Monte Carlo methods

In Monte Carlo, an initial configuration is refined by taking random steps which are accepted or rejected based on their induced improvement in score (see the Metropolis criterion), until a certain number of steps have been tried. The assumption is that convergence to the best structure should occur from a large class of initial configurations, only one of which needs to be considered. Initial configurations may be sampled coarsely, and much computation time can be saved. Because of the difficulty of finding a scoring function which is both highly discriminating for the correct configuration and also converges to the correct configuration from a distance, the use of two levels of refinement, with different scoring functions, has been proposed.[6] Torsion can be introduced naturally to Monte Carlo as an additional property of each random move.

Monte Carlo methods are not guaranteed to search exhaustively, so that the best configuration may be missed even using a scoring function which would in theory identify it. How severe a problem this is for docking has not been firmly established.

[edit] Selecting the docked complex structure

To find a score which forms a consistent basis for selecting the best configuration, studies are carried out on a standard benchmark (see below) of protein–protein interaction cases. Scoring functions are assessed on the rank they assign to the best structure (ideally the best structure should be ranked 1), and on their coverage (the proportion of the benchmark cases for which they achieve an acceptable result). Types of scores studied include:

It is usual to create hybrid scores by combining one or more categories above in a weighted sum whose weights are optimized on cases from the benchmark. To avoid bias, the benchmark cases used to optimize the weights must not overlap with the cases used to make the final test of the score.

[edit] Benchmark

A benchmark of 84 protein–protein interactions with known complexed structures has been developed for testing docking methods.[7] The set is chosen to cover a wide range of interaction types, and to avoid repeated features, such as the profile of interactors' structural families according to the SCOP database. Benchmark elements are classified into three levels of difficulty (the most difficult containing the largest change in backbone conformation). The protein–protein docking benchmark contains examples of enzyme-inhibitor, antigen-antibody and homomultimeric complexes.

[edit] The CAPRI assessment

The Critical Assessment of PRediction of Interactions[8] is an ongoing series of events in which researchers throughout the community try to dock the same proteins, as provided by the assessors. Rounds take place approximately every 6 months. Each round contains between one and six target protein–protein complexes whose structures have been recently determined experimentally. The coordinates and are held privately by the assessors, with the cooperation of the structural biologists who determined them. The assessment of submissions is double blind.

CAPRI attracts a high level of participation (37 groups participated worldwide in round seven) and a high level of interest from the biological community in general. Although CAPRI results are of little statistical significance owing to the small number of targets in each round, the role of CAPRI in stimulating discourse is significant. (The CASP assessment is a similar exercise in the field of protein structure prediction).

[edit] Deciding whether a complex actually occurs in nature and measuring its affinity

A reliable method for affinity prediction has the potential to transform biochemistry and cell biology. Though a distant prospect, affinity prediction may be considered as the ultimate achievement in protein–protein docking.

[edit] Protein–protein docking and molecular docking

The field of protein–protein docking is highly computationally oriented, and it shares approaches with small-molecule docking. Proteins complexed with polynucleotide molecules are widely studied using similar or identical approaches to protein–protein docking, although if the nucleotide molecule is small enough, the case may be framed as a small-molecule docking problem. See Scoring functions for docking and Searching the conformational space for docking for more information.

[edit] References

  1. ^ Levinthal C, Wodak SJ, Kahn P, Dadivanian AK (1975). "Hemoglobin Interactions in Sickle Cell Fibers: I. Theoretical Approaches to the Molecular Contacts". Proceedings of the National Academy of Sciences 72 (4): 1330. doi:10.1073/pnas.72.4.1330. PMID 1055409. 
  2. ^ Wodak SJ, Janin J (1978). "Computer Analysis of Protein-Protein Interactions". Journal of Molecular Biology 124 (2): 323–42. doi:10.1016/0022-2836(78)90302-9. PMID 712840. 
  3. ^ a b Katchalski-Katzir E, Shariv I, Eisenstein M, Friesem AA, Aflalo C, Vakser IA (1992). "Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques". Proc. Natl. Acad. Sci. U.S.A. 89 (6): 2195–9. doi:10.1073/pnas.89.6.2195. PMID 1549581. 
  4. ^ Gabb HA, Jackson RM, Sternberg MJ (September 1997). "Modelling protein docking using shape complementarity, electrostatics and biochemical information". J. Mol. Biol. 272 (1): 106–20. doi:10.1006/jmbi.1997.1203. PMID 9299341. 
  5. ^ Strynadka NC, Eisenstein M, Katchalski-Katzir E, Shoichet BK, Kuntz ID, Abagyan R, Totrov M, Janin J, Cherfils J, Zimmerman F, Olson A, Duncan B, Rao M, Jackson R, Sternberg M, James MN (1996). "Molecular Docking Programs Successfully Predict the Binding of a Beta-lactamase Inhibitory Protein to TEM-1 Beta-Lactamase". Nature Structural Biology 3 (3): 233–9. doi:10.1038/nsb0396-233. PMID 8605624. 
  6. ^ Gray JJ, Moughon S, Wang C, Schueler-Furman O, Kuhlman B, Rohl CA, Baker D (2003). "Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations". J. Mol. Biol. 331 (1): 281–99. doi:10.1016/S0022-2836(03)00670-3. PMID 12875852. 
  7. ^ Mintseris J, Wiehe K, Pierce B, Anderson R, Chen R, Janin J, Weng Z (2005). "Protein-Protein Docking Benchmark 2.0: an update". Proteins 60 (2): 214–6. doi:10.1002/prot.20560. PMID 15981264. 
  8. ^ Janin J, Henrick K, Moult J, Eyck LT, Sternberg MJ, Vajda S, Vakser I, Wodak SJ (2003). "CAPRI: a Critical Assessment of PRedicted Interactions". Proteins 52 (1): 2–9. doi:10.1002/prot.10381. PMID 12784359. 

 

Nowadays Tools

ZDOCK    http://zlab.bu.edu/zdock/index.shtml
对学术用户免费,与RDOCK整合在Discovery Studio软件包中,分别进行对接和评分。

RDOCK    http://zlab.bu.edu/zdock/index.shtml
ZDOCK配套的评分软件,需要Charmm力场支持。

3D-DOCK    http://www.bmm.icnet.uk/docking/index.html
对学术用户免费,包含FTDOCK,MultiDock

DOT    http://www.sdsc.edu/CCMS/Papers/DOT_sc95.html
免费,图形界面

ESCHER    http://www.ddl.unimi.it/vega/index.htm
VEGA ZZ软件包中模块(见VEGA ZZ)

Gramm    http://vakser.bioinformatics.ku.edu/resources/gramm
免费,字符界面

Haddock    http://www.nmr.chem.uu.nl/haddock
完全免费

HEX    http://www.loria.fr/~ritchied/hex/
免费,图形界面

MolFit    http://www.weizmann.ac.il/Chemical_Research_Support//molfit/
免费

Monty        蛋白质-DNA对接,已停止研发,改为ESCHER
http://www.ddl.unimi.it/escherng/index.htm


Rosetta    http://graylab.jhu.edu/docking/rosetta
完全免费



https://blog.sciencenet.cn/blog-355217-283305.html

上一篇:Gift for New Year
下一篇:模拟平台,你选哪个?
收藏 IP: .*| 热度|

0

发表评论 评论 (3 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-24 23:36

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部