||
Referred from websites
https://www.biostars.org/p/13535/
https://www.researchgate.net/post/Interpretation_of_low_Bootstrap_value
In general, you should (1) get the protein sequences*, (1a) remove pseudogenes if you can, (2) align them (clustalX), (3) manually select conserved region(s), (4) iterate between points (2) and (3) till it is stable and then (5) bootstrap, (6) produce trees and (7) consensus.
I suspect you might have skipped step (3) and (4). If, after doing so, you still have low bootstrap values, I suspect you'll have some branches with good support but lower values in the "high branches". Focus on clusters of well conserved genes, remove them, cluster them alone and re-cluster the other with only one or two genes from each group.
Phyolgenetic treee are a bit of a craft...
How to integret the low bootstrap value?
Also, don't forget the alignment itself and the distance measures. Recombination can mess things up but I suspect a more common issue is sequence and alignment quality. You might have more joy if you restrict your alignments to regions where the sequences are visually aligning well, otherwise you may well just be modelling noise. Gap treatment can have a big influence too. If some of your sequences have large deletions (e.g. missing termini or exons), then they can get dragged to different parts of the tree depending on which parts of the alignment get re-sampled in the bootstrapping. If it's a distance-based tree, that might also have an influence as you may have saturated your distance calculations. (Or, at the other extreme, there may be almost no changes.)
Interpretation of low Bootstrap value - ResearchGate. Available from: https://www.researchgate.net/post/Interpretation_of_low_Bootstrap_value [accessed Jan 13, 2017].
Low bootstrap values indicate a lack of consistent signal across your alignment. This could be due to different parts of the alignment having different trees but it could also be due to a poor signal:noise ratio (few variable/informative sites and/or poor alignment) and/or homoplasy, i.e. independent shared mutations, which will be more common in mutation hotspots. Because mtDNA is non-recombining, you can essentially rule out gene flow/recombination as the explanation. As all of your sequences are from the same species, there is a fair chance that the sequence diversity is low and your low bootstrap values might be indicative of this lack of information. You might have more joy if you concentrate on the variable regions of the mtDNA - remember that bootstrapping is a random sampling method, and so if the random samples are likely to pull out predominantly invariant sites, the chances of getting the "right" tree are going to be small.
The other problem is if you have a more "star-like" phylogeny, where the evolutionary time since the last divergence is much higher than the evolutionary time between splits. (i.e. short basal branches and long terminal ones). In this scenario, homoplasy is relatively high versus informative sites and the signal is too weak to get decent bootstrap values. I can dig out a reference for this latter issue if you like?
Interpretation of low Bootstrap value - ResearchGate. Available from: https://www.researchgate.net/post/Interpretation_of_low_Bootstrap_value [accessed Jan 13, 2017].
How to improve?
1 Improve the quality of MSA. For example, try Gblocks and trimAl for removing poorly aligned columns.
2 Using the nucldotide sequences. If your nucleotide sequences are difficult to align, you may want to use a reverse alignment approach
(e.g. http://www.ncbi.nlm.nih.gov/pubmed/20435676).
3 Try multiple methods and parameters. Try model selection
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-21 20:18
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社