Chin Bull Bot ›› 2019, Vol. 54 ›› Issue (3): 316-327.doi: 10.11983/CBB18176

• RESEARCH ARTICLE • Previous Articles     Next Articles

Origin and Evolution of Soybean Protein-coding Genes

Tang Kang,Yang Ruolin()   

  1. College of Life Sciences, Northwest A&F University, Yangling 712100, China
  • Received:2018-08-14 Accepted:2018-12-10 Online:2019-11-24 Published:2019-05-01
  • Contact: Yang Ruolin E-mail:desert.ruolin@gmail.com

Abstract:

The evolution of gene composition of a species is a highly dynamic process, wherein lineage- and species-specific genes originated relatively recently are continuously integrated into the original gene network of older genes. These young genes play important roles in shaping the genome architecture, thereby leading to improved adaptation for organisms. Gene duplication and de novo origination of new genes are two ways to create new genes, causing different gene families with various copy numbers. To what extent and how the evolutionary pattern of genes depends on the timing of gene origination are still largely unexplored in soybean. In this study, we selected 19 representative angiosperms and analyzed the potential relations of the gene content dynamics with the origination of soybean (Glycine max) genes. Using the gene emergence approach, we found that 58.7% of soybean genes could be dated to ~150 million years ago and 21.7% orphan genes had recently originated. As expected, in comparison with young genes, older genes tend to be subjected to stronger purifying selection and were more conserved. In addition, older genes featured higher expression levels and were more likely to undergo alternative splicing. Furthermore, genes with different copy numbers showed a difference in these aspects. These findings may help understand the evolutionary models of genes with different ages.

Key words: angiosperms, gene duplication, gene family, gene origin, soybean

Figure 1

Gene family size distribution of 19 angiosperm species(A) Phylogenetic tree showing the relationships between the 19 angiosperm species used in this study; (B) Homologous gene family sizes; (C) Gene family sizes of orphan genes. The colors indicate the proportions of genes, white for singletons, grey for two-genes and black for multigenes."

Table 1

Number of homologous gene families (and genes) in 19 angiosperm species"

Species Singletons Two-gene families Multigene families Total gene families Maximum gene family size
Amborella trichopoda 9823 1061(2122) 523(2935) 11407 207
Ananas comosus 9059 2087(4174) 1007(4916) 12153 124
Oryza sativa 11966 2269(4538) 1167(5805) 15402 64
Brachypodium distachyon 11455 2264(4528) 1209(6066) 14928 50
Sorghum bicolor 12663 2529(5058) 1399(8749) 16591 416
Zea mays 10277 3568(7136) 1964(10539) 15809 297
Solanum tuberosum 11592 2390(4780) 1399(10741) 15381 1051
S. lycopersicum 12210 2448(4896) 1371(7277) 16029 72
Vitis vinifera 10408 1931(3862) 1104(6483) 13443 100
Populus trichocarpa 6550 5476(10952) 2337(13368) 14363 108
Gossypium raimondii 7582 3700(7400) 2960(14806) 14242 90
Carica papaya 10776 1505(3010) 667(3948) 12948 194
Arabidopsis thaliana 13278 2485(4970) 1194(6144) 16957 125
A. lyrata 12767 2605(5210) 1327(6596) 16699 67
Cucumis sativus 10152 1691(3382) 795(4038) 12638 38
Prunus persica 10822 1876(3752) 1106(7192) 13804 217
Medicago truncatula 9936 2812(5624) 1948(13673) 14696 308
Glycine max 4241 7735(15470) 4206(23027) 16182 153
Phaseolus vulgaris 11324 2873(5746) 1430(7626) 15569 132

Table 2

Number of orphan gene families (and genes) in 19 angiosperm species"

Species Singletons Two-gene families Multigene families Species-specific genes Maximum gene family size
Amborella trichopoda 7892 547(1094) 502(3447) 12433 105
Ananas comosus 5685 483(966) 297(2224) 8875 94
Oryza sativa 10774 686(1372) 292(1224) 13370 29
Brachypodium distachyon 3485 235(470) 125(548) 4503 15
Sorghum bicolor 5682 350(700) 254(1644) 8026 103
Zea mays 7253 813(1626) 552(2643) 11522 65
Solanum tuberosum 7278 471(942) 376(3688) 11908 163
S. lycopersicum 7836 308(616) 177(950) 9402 51
Vitis vinifera 7238 445(890) 229(1006) 9134 44
Populus trichocarpa 7923 593(1186) 281(1398) 10507 31
Gossypium raimondii 5495 408(816) 293(1406) 7717 26
Carica papaya 7680 307(614) 224(1653) 9947 88
Arabidopsis thaliana 2751 105(210) 57(261) 3222 21
A. lyrata 5413 461(922) 366(1759) 8094 83
Cucumis sativus 3458 125(250) 54(223) 3931 13
Prunus persica 3347 242(484) 195(2483) 6314 838
Medicago truncatula 12763 962(1924) 820(6524) 21211 145
Glycine max 9961 476(952) 118(523) 11436 23
Phaseolus vulgaris 2013 85(170) 58(318) 2501 19

Table 3

Number of soybean gene families (and genes) assigned to each phylostratum"

Phylostratum internode Genes (%) Singletons Two-genes Multigenes
Angiosperm (PS1) 30932(58.7%) 1982 5150(10300) 3400(18650)
Mesangiosperm (PS2) 4057(7.7%) 508 708(1416) 359(2133)
Eudicot (PS3) 2356(4.5%) 303 521(1042) 206(1011)
Rosid (PS4) 582(1.1%) 109 181(362) 31(111)
Legume (PS5) 1780(3.4%) 460 452(904) 87(416)
Phaseoleae (PS6) 1590(3.0%) 568 400(800) 49(222)
Soybean (PS7) 11436(21.7%) 9961 476(952) 118(523)

Figure 2

Origination of soybean genes(A) Numbers in parenthesis denote the number of genes per phylostratum (PS1-PS7); (B) Gene fraction; (C) Gene copy status; (D) Gene Ontology annotation"

Figure 3

Divergence degrees of soybean genes Estimated between soybean and common bean selection pressure (dN/dS)(A), synonymous substitution rate (dS) (B) and nonsynonymous substitution rate (dN) (C)."

Figure 4

Expression of soybean genes(A) Expressed genes; (B) Expression level; (C) Expression specificity"

Figure 5

Alternative splicing (AS) of soybean genes(A) AS event; (B) AS genes ratio; (C) AS genes for different copy status; (D) AS number per gene"

Figure 6

Functional enrichment analyses of the core angiosperm genes"

[1] 孙红正, 葛颂 ( 2010). 重复基因的进化——回顾与进展. 植物学报 45, 13-22.
doi: 10.3969/j.issn.1674-3466.2010.01.002
[2] Albalat R, Ca?estro C ( 2016). Evolution by gene loss. Nat Rev Genet 17, 379-391.
[3] Amborella Genome Project ( 2013). The Amborella genome and the evolution of flowering plants. Science 342, 124-1089.
doi: 10.1126/science.1241089 pmid: 24357323
[4] Bolger AM, Lohse M, Usadel B ( 2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120.
doi: 10.1093/bioinformatics/btu170 pmid: 4103590
[5] Cai JJ, Borenstein E, Chen R, Petrov DA ( 2009). Similarly strong purifying selection acts on human disease genes of all evolutionary ages. Genome Biol Evol 1, 131-144.
doi: 10.1093/gbe/evp013 pmid: 20333184
[6] Chen SD, Krinsky BH, Long MY ( 2013). New genes as drivers of phenotypic evolution. Nat Rev Genet 14, 645-660.
doi: 10.1038/nrg3521 pmid: 4236023
[7] Chen TW, Wu TH, Ng WV, Lin WC ( 2011). Interrogation of alternative splicing events in duplicated genes during evolution. BMC Genomics 12(Suppl3), S16.
doi: 10.1186/1471-2164-12-S3-S16 pmid: 22369477
[8] Domazet-Lo?o T, Brajkovi? J, Tautz D ( 2007). A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages. Trends Genet 23, 533-539.
doi: 10.1016/j.tig.2007.08.014 pmid: 18029048
[9] Doyle JJ, Luckow MA ( 2003). The rest of the iceberg. Legume diversity and evolution in a phylogenetic context. Plant Physiol 131, 900-910.
doi: 10.1104/pp.102.018150
[10] Enright AJ, Van Dongen S, Ouzounis CA ( 2002). An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30, 1575-1584.
pmid: 11917018
[11] Foissac S, Sammeth M ( 2007). ASTALAVISTA: dynamic and flexible analysis of alternative splicing events in custom gene datasets. Nucleic Acids Res 35, W297-W299.
doi: 10.1093/nar/gkm311 pmid: 17485470
[12] Freeling M ( 2009). Bias in plant gene content following different sorts of duplication: tandem, whole-genome, segmental, or by transposition. Annu Rev Plant Biol 60, 433-453.
doi: 10.1146/annurev.arplant.043008.092122
[13] Guo YL ( 2013). Gene family evolution in green plants with emphasis on the origination and evolution of Arabidopsis thaliana genes. Plant J 73, 941-951.
doi: 10.1111/tpj.12089 pmid: 23216999
[14] Jiao YN, Paterson AH ( 2014). Polyploidy-associated genome modifications during land plant evolution. Philos Trans R Soc Lond B Biol Sci 369, 20130355.
doi: 10.1098/rstb.2013.0355 pmid: 4071528
[15] Kaessmann H ( 2010). Origins, evolution, and phenotypic impact of new genes. Genome Res 20, 1313-1326.
doi: 10.1101/gr.101386.109
[16] Keren H, Lev-Maor G, Ast G ( 2010). Alternative splicing and evolution: diversification, exon definition and function. Nat Rev Genet 11, 345-355.
doi: 10.1038/nrg2776 pmid: 20376054
[17] Kim D, Langmead B, Salzberg SL ( 2015). HISAT: a fast spliced aligner with low memory requirements. Nat Me- thods 12, 357-360.
doi: 10.1038/nmeth.3317 pmid: 4655817
[18] Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG ( 2007). Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947-2948.
doi: 10.1093/bioinformatics/btm404
[19] Li L, Stoeckert CJ Jr, Roos DS ( 2003). OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13, 2178-2189.
doi: 10.1101/gr.1224503
[20] Long M, Betrán E, Thornton K, Wang W ( 2003). The origin of new genes: glimpses from the young and old. Nat Rev Genet 4, 865-875.
[21] Lynch M, Conery JS ( 2000). The evolutionary fate and consequences of duplicate genes. Science 290, 1151-1155.
doi: 10.1126/science.290.5494.1151 pmid: 11073452
[22] Merkin J, Russell C, Chen P, Burge CB ( 2012). Evolutionary dynamics of gene and isoform regulation in mam- malian tissues. Science 338, 1593-1599.
doi: 10.1126/science.1228186 pmid: 23258891
[23] Michael TP, Jackson S ( 2013). The first 50 plant genomes. Plant Gen 6, 2.
doi: 10.3835/plantgenome2013.03.0001in
[24] Michael TP, VanBuren R ( 2015). Progress, challenges and the future of crop genomes. Curr Opin Plant Biol 24, 71-81.
doi: 10.1016/j.pbi.2015.02.002 pmid: 25703261
[25] Ohno S ( 1970). Evolution by Gene Duplication. Berlin, Heidelberg: Springer. pp. 1-160.
[26] Panchy N, Lehti-Shiu M, Shiu SH ( 2016). Evolution of gene duplication in plants. Plant Physiol 171, 2294-2316.
doi: 10.1104/pp.16.00523 pmid: 27288366
[27] Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL ( 2015). StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33, 290-295.
doi: 10.1038/nbt.3122 pmid: 25690850
[28] Quint M, Drost HG, Gabel A, Ullrich KK, B?nn M, Grosse I ( 2012). A transcriptomic hourglass in plant embryogenesis. Nature 490, 98-101.
doi: 10.1038/nature11394 pmid: 22951968
[29] Reddy ASN, Marquez Y, Kalyna M, Barta A ( 2013). Complexity of the alternative splicing landscape in plants. Plant Cell 25, 3657-3683.
doi: 10.1105/tpc.113.117523
[30] Schmutz J, Cannon SB, Schlueter J, Ma JX, Mitros T, Nelson W, Hyten DL, Song QJ, Thelen JJ, Cheng JL, Xu D, Hellsten U, May GD, Yu Y, Sakurai T, Umezawa T, Bhattacharyya MK, Sandhu D, Valliyodan B, Lindquist E, Peto M, Grant D, Shu SQ, Goodstein D, Barry K, Futrell-Griggs M, Abernathy B, Du JC, Tian ZX, Zhu LC, Gill N, Joshi T, Libault M, Sethuraman A, Zhang XC, Shinozaki K, Nguyen HT, Wing RA, Cregan P, Specht J, Grimwood J, Rokhsar D, Stacey G, Shoemaker RC, Jackson SA ( 2010). Genome sequence of the palaeopolyploid soybean. Nature 463, 178-183.
doi: 10.1038/nature08957
[31] Shen YT, Zhou ZK, Wang Z, Li WY, Fang C, Wu M, Ma YM, Liu TF, Kong LA, Peng DL, Tian ZX ( 2014). Global dissection of alternative splicing in paleopolyploid soybean. Plant Cell 26, 996-1008.
doi: 10.1105/tpc.114.122739 pmid: 24681622
[32] Suyama M, Torrents D, Bork P ( 2006). PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res 34, W609-W612.
doi: 10.1093/nar/gkl315 pmid: 16845082
[33] Tasdighian S, Van Bel M, Li Z, Van de Peer Y, Carretero-Paulet L, Maere S ( 2017). Reciprocally retained genes in the angiosperm lineage show the hallmarks of dosage balance sensitivity. Plant Cell 29, 2766-2785.
doi: 10.1105/tpc.17.00313 pmid: 29061868
[34] Tautz D, Domazet-Lo?o T ( 2011). The evolutionary origin of orphan genes. Nat Rev Genet 12, 692-702.
doi: 10.1038/nrg3053 pmid: 21878963
[35] Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L ( 2010). Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28, 511-515.
doi: 10.1038/nbt.1621 pmid: 20436464
[36] Yanai I, Benjamin H, Shmoish M, Chalifa-Caspi V, Shklar M, Ophir R, Bar-Even A, Horn-Saban S, Safran M, Domany E, Lancet D, Shmueli O ( 2005). Genome-wide midrange transcription profiles reveal expression level re- lationships in human tissue specification. Bioinformatics 21, 650-659.
doi: 10.1093/bioinformatics/bti042 pmid: 15388519
[37] Yang ZH ( 2007). PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24, 1586-1591.
doi: 10.1093/molbev/msm088 pmid: 17483113
[38] Zhang JZ ( 2003). Evolution by gene duplication: an update. Trends Ecol Evol 18, 292-298.
doi: 10.1016/S0169-5347(03)00033-8
[1] . Stepping out of the Shadow of Goethe: for a More Scientific Plant Systematics [J]. Chin Bull Bot, 2020, 55(4): 0-0.
[2] Feng Feng,Zhan Yong,Tian Zhixi. The Feasibility and Recommendation for Improving Soybean Production in Xinjiang [J]. Chin Bull Bot, 2020, 55(2): 199-204.
[3] Zuo Zeyuan,Liu Wanlin,Xu Jie. Evolution and Functional Analysis of Gene Clusters in Anther Tapetum Cells of Arabidopsis thaliana [J]. Chin Bull Bot, 2020, 55(2): 147-162.
[4] Wang Xiaolong, Liu Fengzhi, Shi Xiangbin, Wang Xiaodi, Ji Xiaohao, Wang Zhiqiang, Wang Baoliang, Zheng Xiaocui, Wang Haibo. Evolution and Expression of NCED Family Genes in Vitis vinifera [J]. Chin Bull Bot, 2019, 54(4): 474-485.
[5] Ai Wenqin, Jiang Hanyuan, Li Xinxin, Liao Hong. An Efficient Nutrient Solution System to Study Symbiotic Nitrogen Fixation in Soybean [J]. Chin Bull Bot, 2018, 53(4): 519-527.
[6] Guodong Wu, Yu Xiu, Huafang Wang. Breeding of MtDREB2A Transgenic Soybean by an Optimized Cotyledonary-Node Method [J]. Chin Bull Bot, 2018, 53(1): 59-71.
[7] Haining Qin, Lina Zhao, Shengxiang Yu, Huiyuan Liu, Bo Liu, Nianhe Xia, Hua Peng, Zhenyu Li, Zhixiang Zhang, Xingjin He, Linke Yin, Yulin Lin, Quanru Liu, Yuantong Hou, Yan Liu, Qixin Liu, Wei Cao, Jianqiang Li, Shilong Chen, Xiaohua Jin, Tiangang Gao, Wenli Chen, Haiying Ma, Yuying Geng, Xiaofeng Jin, Chaoyang Chang, Hong Jiang, Lei Cai, Chunxin Zang, Jianyong Wu, Jianfei Ye, Yangjun Lai, Bing Liu, Qinwen Lin, Naxin Xue. Evaluating the endangerment status of China’s angiosperms through the red list assessment [J]. Biodiv Sci, 2017, 25(7): 745-757.
[8] Feng-Jiao SHEN, Qian-Qian REN, Qi DONG, Li ZHU, Jian-Fang ZHANG, Jing YANG, Ran ZHANG, Hong-Zhu LIANG, Jian-Cheng ZHAO, Shuo SHI. A new angiosperms molecular specimen treatment method for field use [J]. Chin J Plan Ecolo, 2017, 41(7): 787-794.
[9] Yan Li, Junyi Gai. The Genetic Basis of Soybean Extended to Tropical Regions [J]. Chin Bull Bot, 2017, 52(4): 389-393.
[10] Wei Wang, Xiaoxia Zhang, Zhiduan Chen, Anming Lu. Comments on the APG’s classification of angiosperms [J]. Biodiv Sci, 2017, 25(4): 418-426.
[11] Zhengjun Xia. Research Progress in Whole-genome Analysis and Cloning of Genes Underlying Important Agronomic Traits in Soybean [J]. Chin Bull Bot, 2017, 52(2): 148-158.
[12] Zheng Jun, Qiao Ling, Zhao Jiajia, Qiao Linyi, Zhang Shichang, Chang Jianzhong, Tang Caiguo, Yang Sanwei. Whole-genome Analysis of CCT Gene Family and Their Responses to Phytohormones in Aegilops tauschii [J]. Chin Bull Bot, 2017, 52(2): 188-201.
[13] Wang Qian, Sun Wenjing, Bao Ying. Evolutionary Pattern of the GBSS Gene Family in Plants [J]. Chin Bull Bot, 2017, 52(2): 179-187.
[14] Siyu Chen, Peng Liu, Mo Zhu, Dongdong Xia, Liang Li, Kezhang Xu, Zhanyu Chen, Zhian Zhang. Seed Vigor and Antioxidant Enzyme Activities During Germination in Different Canopies of Soybean [J]. Chin Bull Bot, 2016, 51(1): 24-30.
[15] Wen Cheng, Zhengjun Xia, Xianzhong Feng, Suxin Yang. A Rapid and Nondestructive Method for Soybean DNA Extraction and Its Application [J]. Chin Bull Bot, 2016, 51(1): 68-73.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Lu Zhong-shu. Plant Growth Regutators in Relation to Plant Water Status[J]. Chin Bull Bot, 1985, 3(04): 1 -6 .
[2] Li Da Jue;Han Yun-zhou and Wan Li-ping. Studies on Germplasm Collections of Carthamus tinctorius IV Screening of the characterization of Seed Domancy[J]. Chin Bull Bot, 1990, 7(02): 50 -52 .
[3] . [J]. Chin Bull Bot, 1999, 16(增刊): 45 -46 .
[4] LU Jin-Yao;LUO Ai-Ling and LIANG Zheng. Some Improvement of TD-PAGE Technology[J]. Chin Bull Bot, 1998, 15(03): 69 -72 .
[5] LI Ling-Hao and CHEN Zuo-Zhong. The Global Carbon Cycle in Grassland Ecosystems and Its Responses to Global Change I . Carbon Flow Compartment Model, Inputs and Storage[J]. Chin Bull Bot, 1998, 15(02): 14 -22 .
[6] Huanhuan Xu, Jian Kang, Mingxiang Liang. Research Advances in the Metabolism of Fructan in Plant Stress Resistance[J]. Chin Bull Bot, 2014, 49(2): 209 -220 .
[7] . [J]. Chin Bull Bot, 2013, 48(1): 4 -5 .
[8] . [J]. Chin Bull Bot, 1996, 13(专辑): 45 .
[9] SHU Qun-Fang;ZHOU Lu;LI Wen-Bin;ZHANG LI-Ming and SUN Yong-Ru. Study on Gel Electrophoresis of Protein from Plant and Our Improved Methods[J]. Chin Bull Bot, 1998, 15(06): 73 -78 .
[10] ZHANG Zhi-Dong, ZANG Run-Guo. PREDICTING POTENTIAL DISTRIBUTIONS OF DOMINANT WOODY PLANT KEYSTONE SPECIES IN A NATURAL TROPICAL FOREST LANDSCAPE OF BAWANGLING, HAINAN ISLAND, SOUTH CHINA[J]. Chin J Plan Ecolo, 2007, 31(6): 1079 -1091 .