论文部分内容阅读
目的对我国承担的人类基因组测序项目中的3p24-p25478kb完全基因组序列进行注释。方法采用从头预测、数据库相似性比较和全长或部分mRNA序列与基因组序列的比对等手段,识别基因组序列中的编码蛋白质的基因,并采用EMBOSS软件包分析基因组序列的组分特征。结果识别出该区域中的两个编码蛋白质的已知基因,即SLC6A1和SLC6A11(其中后者在基因组草图序列中未被定位);对这段基因组序列中组分特征预测分析结果显示,该基因组序列的平均GC含量为47%,并存在3个假想的CpG岛,其中两个CpG岛分别位于130685~131516bp及307090~307870bp,另一个位于415585~416308bp。结论采用上述方法对基因组序列3p24-p25478kb进行了正确的注释,揭示了基因组序列中的有关的基因结构、GC含量、CpG岛等信息。
Objective To annotate the complete genomic sequence of 3p24-p25478kb in human genome sequencing project undertaken by our country. Methods Using ab initio prediction, comparison of database similarity and comparison of full or partial mRNA sequence with genomic sequence, the gene encoding the genomic sequence was identified and analyzed by EMBOSS software package. As a result, two known genes coding for proteins in the region, namely SLC6A1 and SLC6A11, the latter of which were not located in the genomic sketches, were identified; the prediction of the characteristics of the components in this genomic sequence revealed that the genome The average GC content of the sequence was 47%, and there were three hypothetical CpG islands, of which two CpG islands located at 130685-131516 bp and 307090-307870 bp, respectively, and the other located at 415585-416308 bp. Conclusion The 3p24-p25478kb genomic sequence was correctly annotated by the above method, revealing the relevant gene structure, GC content, CpG island and other information in the genomic sequence.