论文部分内容阅读
Background: During evolution, proteins containing newly emerged domains and the increasing proportion of multi-domain proteins in the full Genome-Encoded Proteome (GEP) have substantially contributed to increasing biological complexity.However, it is not known how these two potential structural factors are preferentially utilized at given physiological states, until our most recent study found that multidomain proteins or only older domain-containing proteins are significantly overrepresented in certain-state proteomes (CSPs, i.e., all the proteins expressed at certain physiological states) compared with GEP, which indicates biological complexity under certain conditions i3 more significantly realized by diverse domain organization than by the emergence of new types of domain.To further explore the regular utilization pattern in the view of protein spatial distribution at levels ranging from organelles to organs, we investigated the relationships between protein domain characters and protein subcelluar localization width (SLW) or tissue specificity (TS).Methods: To calculate SLW, we collected high confident subcellular location information in Swiss-Prot, GOA, human protein atlas (HPA), MitoP2, Human Liver Organelle Proteome (HLOP) and related references and references.Domain identification is based on RPS-BLAST against Pfam and SMART (E<0.001).TS is represented by the number of OTCs (organ, tissue or cell types) in which the gene is expressed.Results: The integrated atlas contains subcellular localization information with high confidence of more than 13000 human proteins, including the subcellular locations for different splice isoforms as complete as possible.Based on this atlas, we found that SLW positively correlates with DN value, while it negatively correlates with DA value, that is, multi-domain proteins or only older domain-containing proteins tend to be localized at more subcellular components.In addition, SLW negatively correlate with TS, i.e.proteins with more subcellular locations tend to be expressed widely across OTCs.The above findings are consistent with our previous report that multi-domain proteins or only older domain-containing proteins tend to be expressed widely across OTCs.Conclusions: At given physiological states, biological complexity is more dependent on the diverse domain organization than the new types of domain.The relationships between DN, DA and SLW, TS confirm this conclusion in the view of spatial distribution of proteins,and reveal the consistency of complexity of protein molecular function and the functional requirement at cell or organism levels .