植物学报 ›› 2019, Vol. 54 ›› Issue (3): 316-327.doi: 10.11983/CBB18176

• 研究论文 • 上一篇    下一篇

大豆蛋白编码基因起源与进化

唐康,杨若林()   

  1. 西北农林科技大学生命科学学院, 杨凌 712100
  • 收稿日期:2018-08-14 接受日期:2018-12-10 出版日期:2019-05-01 发布日期:2019-11-24
  • 通讯作者: 杨若林 E-mail:desert.ruolin@gmail.com
  • 基金资助:
    陕西省“百人计划”(SXBR8025)

Origin and Evolution of Soybean Protein-coding Genes

Tang Kang,Yang Ruolin()   

  1. College of Life Sciences, Northwest A&F University, Yangling 712100, China
  • Received:2018-08-14 Accepted:2018-12-10 Online:2019-05-01 Published:2019-11-24
  • Contact: Yang Ruolin E-mail:desert.ruolin@gmail.com

摘要:

物种基因组成是一个高度动态的进化过程, 其中相对较近起源的种系和物种特异性基因会持续整合到包含古老基因的原始基因网络中。新基因在塑造基因组结构中发挥重要作用, 能提高物种适应性。基因复制和新基因的从头起源是产生新基因及改变基因家族大小的2种方式。目前, 大豆(Glycine max)基因起源时间与进化模式的相互联系很大程度上还未被探索。该研究选择19种具有代表性的被子植物基因组, 分析基因含量动态性与大豆基因起源之间的潜在联系。采用基因出现法, 研究显示约58.7%的大豆基因能追溯到大约1.5亿年前, 同时有21.7%的基因为最近起源的orphan基因。研究结果表明, 与新基因相比, 古老基因受到更强的负选择压并且更加保守。此外, 古老基因的表达水平更高且更可能发生选择性剪切。此外, 具有不同拷贝数的基因在上述特征中也具有明显差异。研究结果有助于认识不同年龄基因的进化模式。

关键词: 被子植物, 基因复制, 基因家族, 基因起源, 大豆

Abstract:

The evolution of gene composition of a species is a highly dynamic process, wherein lineage- and species-specific genes originated relatively recently are continuously integrated into the original gene network of older genes. These young genes play important roles in shaping the genome architecture, thereby leading to improved adaptation for organisms. Gene duplication and de novo origination of new genes are two ways to create new genes, causing different gene families with various copy numbers. To what extent and how the evolutionary pattern of genes depends on the timing of gene origination are still largely unexplored in soybean. In this study, we selected 19 representative angiosperms and analyzed the potential relations of the gene content dynamics with the origination of soybean (Glycine max) genes. Using the gene emergence approach, we found that 58.7% of soybean genes could be dated to ~150 million years ago and 21.7% orphan genes had recently originated. As expected, in comparison with young genes, older genes tend to be subjected to stronger purifying selection and were more conserved. In addition, older genes featured higher expression levels and were more likely to undergo alternative splicing. Furthermore, genes with different copy numbers showed a difference in these aspects. These findings may help understand the evolutionary models of genes with different ages.

Key words: angiosperms, gene duplication, gene family, gene origin, soybean

图1

19种被子植物基因家族大小分布(A) 系统发育树代表19种被子植物的进化关系; (B) 直系同源基因家族大小; (C) orphan基因家族大小。白、灰、黑分别代表单拷贝、两拷贝和多拷贝基因所占比例。"

表1

19种被子植物中直系同源基因家族(及基因)数目"

Species Singletons Two-gene families Multigene families Total gene families Maximum gene family size
Amborella trichopoda 9823 1061(2122) 523(2935) 11407 207
Ananas comosus 9059 2087(4174) 1007(4916) 12153 124
Oryza sativa 11966 2269(4538) 1167(5805) 15402 64
Brachypodium distachyon 11455 2264(4528) 1209(6066) 14928 50
Sorghum bicolor 12663 2529(5058) 1399(8749) 16591 416
Zea mays 10277 3568(7136) 1964(10539) 15809 297
Solanum tuberosum 11592 2390(4780) 1399(10741) 15381 1051
S. lycopersicum 12210 2448(4896) 1371(7277) 16029 72
Vitis vinifera 10408 1931(3862) 1104(6483) 13443 100
Populus trichocarpa 6550 5476(10952) 2337(13368) 14363 108
Gossypium raimondii 7582 3700(7400) 2960(14806) 14242 90
Carica papaya 10776 1505(3010) 667(3948) 12948 194
Arabidopsis thaliana 13278 2485(4970) 1194(6144) 16957 125
A. lyrata 12767 2605(5210) 1327(6596) 16699 67
Cucumis sativus 10152 1691(3382) 795(4038) 12638 38
Prunus persica 10822 1876(3752) 1106(7192) 13804 217
Medicago truncatula 9936 2812(5624) 1948(13673) 14696 308
Glycine max 4241 7735(15470) 4206(23027) 16182 153
Phaseolus vulgaris 11324 2873(5746) 1430(7626) 15569 132

表2

19种被子植物中的orphan基因家族(及基因)数目"

Species Singletons Two-gene families Multigene families Species-specific genes Maximum gene family size
Amborella trichopoda 7892 547(1094) 502(3447) 12433 105
Ananas comosus 5685 483(966) 297(2224) 8875 94
Oryza sativa 10774 686(1372) 292(1224) 13370 29
Brachypodium distachyon 3485 235(470) 125(548) 4503 15
Sorghum bicolor 5682 350(700) 254(1644) 8026 103
Zea mays 7253 813(1626) 552(2643) 11522 65
Solanum tuberosum 7278 471(942) 376(3688) 11908 163
S. lycopersicum 7836 308(616) 177(950) 9402 51
Vitis vinifera 7238 445(890) 229(1006) 9134 44
Populus trichocarpa 7923 593(1186) 281(1398) 10507 31
Gossypium raimondii 5495 408(816) 293(1406) 7717 26
Carica papaya 7680 307(614) 224(1653) 9947 88
Arabidopsis thaliana 2751 105(210) 57(261) 3222 21
A. lyrata 5413 461(922) 366(1759) 8094 83
Cucumis sativus 3458 125(250) 54(223) 3931 13
Prunus persica 3347 242(484) 195(2483) 6314 838
Medicago truncatula 12763 962(1924) 820(6524) 21211 145
Glycine max 9961 476(952) 118(523) 11436 23
Phaseolus vulgaris 2013 85(170) 58(318) 2501 19

表3

定位到每个系统发育层级的大豆基因家族(和基因)数目"

Phylostratum internode Genes (%) Singletons Two-genes Multigenes
Angiosperm (PS1) 30932(58.7%) 1982 5150(10300) 3400(18650)
Mesangiosperm (PS2) 4057(7.7%) 508 708(1416) 359(2133)
Eudicot (PS3) 2356(4.5%) 303 521(1042) 206(1011)
Rosid (PS4) 582(1.1%) 109 181(362) 31(111)
Legume (PS5) 1780(3.4%) 460 452(904) 87(416)
Phaseoleae (PS6) 1590(3.0%) 568 400(800) 49(222)
Soybean (PS7) 11436(21.7%) 9961 476(952) 118(523)

图2

大豆基因起源(A) 不同起源节点(PS1-PS7)基因数目; (B) 基因比例; (C) 基因拷贝数状态; (D) 基因GO注释"

图3

大豆基因分歧程度通过大豆与菜豆同源基因对来评估选择压(dN/dS)(A)、同义替换率(dS) (B)和非同义替换率(dN) (C)。"

图4

大豆基因表达(A) 已表达基因; (B) 表达水平; (C) 表达特异性"

图5

大豆基因的选择性剪切(AS)(A) 选择性剪切事件; (B) 发生选择性剪切的基因比例; (C) 不同拷贝数状态下发生选择性剪切的基因; (D) 每个基因发生选择性剪切事件的数目"

图6

核心被子植物基因的功能富集分析"

[1] 孙红正, 葛颂 ( 2010). 重复基因的进化——回顾与进展. 植物学报 45, 13-22.
doi: 10.3969/j.issn.1674-3466.2010.01.002
[2] Albalat R, Ca?estro C ( 2016). Evolution by gene loss. Nat Rev Genet 17, 379-391.
[3] Amborella Genome Project ( 2013). The Amborella genome and the evolution of flowering plants. Science 342, 124-1089.
doi: 10.1126/science.1241089 pmid: 24357323
[4] Bolger AM, Lohse M, Usadel B ( 2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120.
doi: 10.1093/bioinformatics/btu170 pmid: 4103590
[5] Cai JJ, Borenstein E, Chen R, Petrov DA ( 2009). Similarly strong purifying selection acts on human disease genes of all evolutionary ages. Genome Biol Evol 1, 131-144.
doi: 10.1093/gbe/evp013 pmid: 20333184
[6] Chen SD, Krinsky BH, Long MY ( 2013). New genes as drivers of phenotypic evolution. Nat Rev Genet 14, 645-660.
doi: 10.1038/nrg3521 pmid: 4236023
[7] Chen TW, Wu TH, Ng WV, Lin WC ( 2011). Interrogation of alternative splicing events in duplicated genes during evolution. BMC Genomics 12(Suppl3), S16.
doi: 10.1186/1471-2164-12-S3-S16 pmid: 22369477
[8] Domazet-Lo?o T, Brajkovi? J, Tautz D ( 2007). A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages. Trends Genet 23, 533-539.
doi: 10.1016/j.tig.2007.08.014 pmid: 18029048
[9] Doyle JJ, Luckow MA ( 2003). The rest of the iceberg. Legume diversity and evolution in a phylogenetic context. Plant Physiol 131, 900-910.
doi: 10.1104/pp.102.018150
[10] Enright AJ, Van Dongen S, Ouzounis CA ( 2002). An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30, 1575-1584.
pmid: 11917018
[11] Foissac S, Sammeth M ( 2007). ASTALAVISTA: dynamic and flexible analysis of alternative splicing events in custom gene datasets. Nucleic Acids Res 35, W297-W299.
doi: 10.1093/nar/gkm311 pmid: 17485470
[12] Freeling M ( 2009). Bias in plant gene content following different sorts of duplication: tandem, whole-genome, segmental, or by transposition. Annu Rev Plant Biol 60, 433-453.
doi: 10.1146/annurev.arplant.043008.092122
[13] Guo YL ( 2013). Gene family evolution in green plants with emphasis on the origination and evolution of Arabidopsis thaliana genes. Plant J 73, 941-951.
doi: 10.1111/tpj.12089 pmid: 23216999
[14] Jiao YN, Paterson AH ( 2014). Polyploidy-associated genome modifications during land plant evolution. Philos Trans R Soc Lond B Biol Sci 369, 20130355.
doi: 10.1098/rstb.2013.0355 pmid: 4071528
[15] Kaessmann H ( 2010). Origins, evolution, and phenotypic impact of new genes. Genome Res 20, 1313-1326.
doi: 10.1101/gr.101386.109
[16] Keren H, Lev-Maor G, Ast G ( 2010). Alternative splicing and evolution: diversification, exon definition and function. Nat Rev Genet 11, 345-355.
doi: 10.1038/nrg2776 pmid: 20376054
[17] Kim D, Langmead B, Salzberg SL ( 2015). HISAT: a fast spliced aligner with low memory requirements. Nat Me- thods 12, 357-360.
doi: 10.1038/nmeth.3317 pmid: 4655817
[18] Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG ( 2007). Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947-2948.
doi: 10.1093/bioinformatics/btm404
[19] Li L, Stoeckert CJ Jr, Roos DS ( 2003). OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13, 2178-2189.
doi: 10.1101/gr.1224503
[20] Long M, Betrán E, Thornton K, Wang W ( 2003). The origin of new genes: glimpses from the young and old. Nat Rev Genet 4, 865-875.
[21] Lynch M, Conery JS ( 2000). The evolutionary fate and consequences of duplicate genes. Science 290, 1151-1155.
doi: 10.1126/science.290.5494.1151 pmid: 11073452
[22] Merkin J, Russell C, Chen P, Burge CB ( 2012). Evolutionary dynamics of gene and isoform regulation in mam- malian tissues. Science 338, 1593-1599.
doi: 10.1126/science.1228186 pmid: 23258891
[23] Michael TP, Jackson S ( 2013). The first 50 plant genomes. Plant Gen 6, 2.
doi: 10.3835/plantgenome2013.03.0001in
[24] Michael TP, VanBuren R ( 2015). Progress, challenges and the future of crop genomes. Curr Opin Plant Biol 24, 71-81.
doi: 10.1016/j.pbi.2015.02.002 pmid: 25703261
[25] Ohno S ( 1970). Evolution by Gene Duplication. Berlin, Heidelberg: Springer. pp. 1-160.
[26] Panchy N, Lehti-Shiu M, Shiu SH ( 2016). Evolution of gene duplication in plants. Plant Physiol 171, 2294-2316.
doi: 10.1104/pp.16.00523 pmid: 27288366
[27] Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL ( 2015). StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33, 290-295.
doi: 10.1038/nbt.3122 pmid: 25690850
[28] Quint M, Drost HG, Gabel A, Ullrich KK, B?nn M, Grosse I ( 2012). A transcriptomic hourglass in plant embryogenesis. Nature 490, 98-101.
doi: 10.1038/nature11394 pmid: 22951968
[29] Reddy ASN, Marquez Y, Kalyna M, Barta A ( 2013). Complexity of the alternative splicing landscape in plants. Plant Cell 25, 3657-3683.
doi: 10.1105/tpc.113.117523
[30] Schmutz J, Cannon SB, Schlueter J, Ma JX, Mitros T, Nelson W, Hyten DL, Song QJ, Thelen JJ, Cheng JL, Xu D, Hellsten U, May GD, Yu Y, Sakurai T, Umezawa T, Bhattacharyya MK, Sandhu D, Valliyodan B, Lindquist E, Peto M, Grant D, Shu SQ, Goodstein D, Barry K, Futrell-Griggs M, Abernathy B, Du JC, Tian ZX, Zhu LC, Gill N, Joshi T, Libault M, Sethuraman A, Zhang XC, Shinozaki K, Nguyen HT, Wing RA, Cregan P, Specht J, Grimwood J, Rokhsar D, Stacey G, Shoemaker RC, Jackson SA ( 2010). Genome sequence of the palaeopolyploid soybean. Nature 463, 178-183.
doi: 10.1038/nature08957
[31] Shen YT, Zhou ZK, Wang Z, Li WY, Fang C, Wu M, Ma YM, Liu TF, Kong LA, Peng DL, Tian ZX ( 2014). Global dissection of alternative splicing in paleopolyploid soybean. Plant Cell 26, 996-1008.
doi: 10.1105/tpc.114.122739 pmid: 24681622
[32] Suyama M, Torrents D, Bork P ( 2006). PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res 34, W609-W612.
doi: 10.1093/nar/gkl315 pmid: 16845082
[33] Tasdighian S, Van Bel M, Li Z, Van de Peer Y, Carretero-Paulet L, Maere S ( 2017). Reciprocally retained genes in the angiosperm lineage show the hallmarks of dosage balance sensitivity. Plant Cell 29, 2766-2785.
doi: 10.1105/tpc.17.00313 pmid: 29061868
[34] Tautz D, Domazet-Lo?o T ( 2011). The evolutionary origin of orphan genes. Nat Rev Genet 12, 692-702.
doi: 10.1038/nrg3053 pmid: 21878963
[35] Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L ( 2010). Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28, 511-515.
doi: 10.1038/nbt.1621 pmid: 20436464
[36] Yanai I, Benjamin H, Shmoish M, Chalifa-Caspi V, Shklar M, Ophir R, Bar-Even A, Horn-Saban S, Safran M, Domany E, Lancet D, Shmueli O ( 2005). Genome-wide midrange transcription profiles reveal expression level re- lationships in human tissue specification. Bioinformatics 21, 650-659.
doi: 10.1093/bioinformatics/bti042 pmid: 15388519
[37] Yang ZH ( 2007). PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24, 1586-1591.
doi: 10.1093/molbev/msm088 pmid: 17483113
[38] Zhang JZ ( 2003). Evolution by gene duplication: an update. Trends Ecol Evol 18, 292-298.
doi: 10.1016/S0169-5347(03)00033-8
[1] 王小龙, 刘凤之, 史祥宾, 王孝娣, 冀晓昊, 王志强, 王宝亮, 郑晓翠, 王海波. 葡萄NCED基因家族进化及表达分析[J]. 植物学报, 2019, 54(4): 474-485.
[2] 叶子飘, 段世华, 安婷, 康华靖. 最大电子传递速率的确定及其对电子流分配的影响[J]. 植物生态学报, 2018, 42(4): 498-507.
[3] 艾文琴, 姜瀚原, 李欣欣, 廖红. 一种高效研究大豆根瘤共生固氮的营养液栽培体系[J]. 植物学报, 2018, 53(4): 519-527.
[4] 吴国栋, 修宇, 王华芳. 优化子叶节转化法培育大豆MtDREB2A转基因植株[J]. 植物学报, 2018, 53(1): 59-71.
[5] 沈风娇, 任倩倩, 董琦, 朱丽, 张建芳, 杨婧, 张冉, 梁红柱, 赵建成, 石硕. 一种适合野外使用的被子植物分子标本干燥方式[J]. 植物生态学报, 2017, 41(7): 787-794.
[6] 覃海宁, 赵莉娜, 于胜祥, 刘慧圆, 刘博, 夏念和, 彭华, 李振宇, 张志翔, 何兴金, 尹林克, 林余霖, 刘全儒, 侯元同, 刘演, 刘启新, 曹伟, 李建强, 陈世龙, 金效华, 高天刚, 陈文俐, 马海英, 耿玉英, 金孝锋, 常朝阳, 蒋宏, 蔡蕾, 臧春鑫, 武建勇, 叶建飞, 赖阳均, 刘冰, 林秦文, 薛纳新. 中国被子植物濒危等级的评估[J]. 生物多样性, 2017, 25(7): 745-757.
[7] 李艳, 盖钧镒. 大豆向热带地区发展的遗传基础[J]. 植物学报, 2017, 52(4): 389-393.
[8] 王伟, 张晓霞, 陈之端, 路安民. 被子植物APG分类系统评论[J]. 生物多样性, 2017, 25(4): 418-426.
[9] 王家坚, 彭智邦, 孙航, 聂泽龙, 孟盈. 青藏高原与横断山被子植物区系演化的 细胞地理学特征[J]. 生物多样性, 2017, 25(2): 218-225.
[10] 夏正俊. 大豆基因组解析与重要农艺性状基因克隆研究进展[J]. 植物学报, 2017, 52(2): 148-158.
[11] 郑军, 乔玲, 赵佳佳, 乔麟轶, 张世昌, 常建忠, 汤才国, 杨三维. 粗山羊草CCT家族基因序列分析及激素响应[J]. 植物学报, 2017, 52(2): 188-201.
[12] 王倩, 孙文静, 包颖. 植物颗粒结合淀粉合酶GBSS基因家族的进化[J]. 植物学报, 2017, 52(2): 179-187.
[13] 吴家富, 杨博文, 向珣朝, 许亮, 颜李梅. 不同水稻种质在不同生育期耐盐鉴定的差异[J]. 植物学报, 2017, 52(1): 77-88.
[14] 王丹, 乔匀周, 董宝娣, 葛静, 杨萍果, 刘孟雨. 昼夜不对称性与对称性升温对大豆产量和水分利用的影响[J]. 植物生态学报, 2016, 40(8): 827-833.
[15] 程文, 夏正俊, 冯献忠, 杨素欣. 一种快速、无损大豆种子DNA提取方法的建立和应用[J]. 植物学报, 2016, 51(1): 68-73.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 严晓华 蔡竹平. 烯效唑等植物生长延缓剂对水稻萌发时过氧化物酶同功酶影响的比较研究[J]. 植物学报, 1995, 12(专辑3): 109 -112 .
[2] 王勋陵 彭艳芹. 荠菜、独行菜、双果荠营养器官显微结构比较研究[J]. 植物学报, 1994, 11(专辑): 13 .
[3] 于晓敏 蓝兴国 李玉花. 泛素/26S蛋白酶体途径与显花植物自交不亲和反应[J]. 植物学报, 2006, 23(2): 197 -206 .
[4] 王玲丽 刘文哲. 不同种源喜树幼枝中喜树碱的含量[J]. 植物学报, 2005, 22(05): 584 -589 .
[5] 戴云玲 许春辉. 放氧复合物蛋白质组分的研究进展[J]. 植物学报, 1992, 9(03): 1 -16 .
[6] 苏睿丽 李伟. 沉水植物光合作用的特点与研究进展[J]. 植物学报, 2005, 22(增刊): 128 -138 .
[7] 张少斌 刘国琴. 植物肌动蛋白异型体研究进展[J]. 植物学报, 2006, 23(3): 242 -248 .
[8] 布仁仓, 常禹, 胡远满, 李秀珍, 贺红士. 小兴安岭针叶树种在不同尺度上对环境因子的敏感性分析[J]. 植物生态学报, 2008, 32(1): 80 -87 .
[9] 马理辉, 吴普特, 汪有科. 黄土丘陵半干旱区密植枣林随树龄变化的根系空间分布特征[J]. 植物生态学报, 2012, 36(4): 292 -301 .
[10] 潘愉德, Melillo J. M., Kicklighter D. W., 肖向明, McGuire A. D.. 大气CO2升高及气候变化对中国陆地生态系统结构与功能的制约和影响(英文)[J]. 植物生态学报, 2001, 25(2): 175 -189 .