论文部分内容阅读
Planktonic bacterial lineages with streamlined genomes are prevalent in the ocean.The base composition of their DNA is often highly biased towards low G+C content,a possible source of systematic error in phylogenetic reconstruction.A total of 228 orthologous protein families were sampled that are shared among major lineages of Alphaproteobacteria,including the marine free-living SAR11 clade and the obligate endosymbiotic Rickettsiales.These two ecologically distinct lineages share genome sizes of < 1.5 Mbp and genomic G+C content of < 30%.Statistical analyses showed that only 28 protein families are composition-homogeneous,while the other 200 families significantly violate the composition-homogeneous assumption included in most phylogenetic methods.RAxML analysis based on the concatenation of 24ribosomal proteins that fall into the heterogeneous protein category clustered the SAR11 and Rickettsiales lineages at the base of the Alphaproteobacteria tree,while that based on the concatenation of 28 homogeneous proteins(including 19 ribosomal proteins)disassociated the lineages and placed SAR11 at the base of the non-endosymbiotic lineages.When the two data sets were concatenated,only a model that accounted for compositional bias yielded a tree identical to the tree built with composition-homogeneous proteins.These results are strong evidence that the clustering of the planktonic SAR11 bacteria and the endosymbiont Rickettsiales is an artifact.