论文部分内容阅读
对杜仲(Eucommia ulmoides)国审良种‘华仲6号’和‘华仲10号’花后70和160d的种仁共4个样本进行转录组测序,对测序数据进行组装和功能注释分类,并对转录组获得的单基因簇(unigene)进行微卫星特征分析。利用新一代高通量测序技术Illumina HiSeq~(TM)2000对杜仲样品进行转录组测序,采用软件Trinity进行组装;利用BLAST软件将unigene序列分别与Nr、GO、COG和KEGG等数据库比对分析;利用MISA软件对转录组的96 469条unigenes进行SSR搜索。结果表明:转录组测序分析,共得到72 791 399个高质量的序列读取片段(Clean reads),包含了14 702 548 161个的碱基序列(bp)信息。对reads进行序列组装,共获得96 469个平均长度为690bp的unigene,序列信息量达到了66.56 Mb。同源性分析结果显示,有49 856个与其它物种同源的unigenes得到注释,占All-unigene的51.68%。将杜仲转录组中的unigene与GO数据库进行比对分析,根据其功能可将注释到的38 983条unigene分成3大类(细胞组分、分子功能和生物学过程)56个分支;根据COG功能可将注释的14 796条unigene基因划分成25个类别;KEGG数据库作为参照,可将注释到的11 260条unigene定位到117个代谢途径分支;SSR位点搜索结果显示,96 469条unigenes中共包含9 621个完整型SSR位点,占总SSR位点的84.14%。完整型SSR位点共包含55种重复基元,其中出现频率最高的重复基序类型为单核苷酸重复中的A/T(4 597个),其次是AG/CT(2 597个)、AT/AT(439个)。
Four samples of seed of 70 and 160 days after flowering of ’Eucommia ulmoides’, ’Hua Zhong 6’ and ’Hua Zhong 10’, were sequenced and their sequencing data were assembled and function annotated The microsatellite features of unigene obtained from the transcriptome were analyzed. The new generation of high throughput sequencing technology Illumina HiSeq ~ (TM) 2000 transcriptome sequencing of the Eucommia ulmoides samples, using software Trinity assembly; unigene sequences were BLAST software, respectively, and Nr, GO, COG and KEGG database comparison analysis; SSR search was performed on 96,469 unigenes in transcriptome using MISA software. The results showed that there were 72 791 399 high quality sequence reads and 14 702 548 161 base sequence (bp) information in the transcriptome sequencing analysis. A total of 96 469 unigene averaged 690 bp in length were obtained, and the sequence information was 66.56 Mb. Homology analysis revealed that 49,856 unigenes homologous to other species were annotated, accounting for 51.68% of All-unigene. According to its function, 38 983 unigene annotated were divided into 56 branches in 3 major categories (cell component, molecular function and biological process); According to COG function The 14 796 unigene genes annotated can be divided into 25 categories. KEGG database as a reference, 11 260 unigene annotated can be mapped to 117 metabolic pathway branches; search results of SSR loci showed that 96 469 unigenes contained 9 621 complete SSR loci, accounting for 84.14% of the total SSR loci. The complete SSR loci contained a total of 55 kinds of repeat motifs, of which the most frequent repeat motifs were A / T (4 597), followed by AG / CT (2 597) AT / AT (439).