植物学报 ›› 2025, Vol. 60 ›› Issue (1): 1-16.DOI: 10.11983/CBB24034  cstr: 32102.14.CBB24034

• 研究论文 •    下一篇

维管植物质体DNA数据在物种和区域上的空缺研究(长英文摘要

邓言1,2, 鲁丽敏2,3, 张强4,*(), 陈之端2,3, 胡海花2,3,*()   

  1. 1广西师范大学生命科学学院, 桂林 541006
    2中国科学院植物研究所, 植物多样性与特色经济作物全国重点实验室, 系统与进化植物学重点实验室, 北京 100093
    3国家植物园, 北京 100093
    4广西壮族自治区中国科学院广西植物研究所, 广西喀斯特植物保育与恢复生态学重点实验室, 桂林 541006
  • 收稿日期:2024-03-06 接受日期:2024-05-27 出版日期:2025-01-10 发布日期:2024-05-27
  • 通讯作者: 胡海花, 中国科学院植物研究所副研究员。长期从事维管植物生命之树重建及生物多样性格局与保护研究。已发表研究论文20余篇, 其中以第一作者(含共同第一作者)身份在National Science Review、Fundamental Research、Journal of Systematics and Evolution等期刊发表研究论文4篇。所在的植物大数据与生物多样性保护研究团队利用多学科手段, 在较高分类阶元上探讨植物的系统发育关系和演化, 并将形态学、古植物学和分子系统学的研究结果相结合, 研究植物类群的起源、分化和现代地理分布格局及其成因。近年来, 以生命之树为依托, 结合海量物种分布数据, 从时间和空间维度探究植物区系的演化历史、多样性格局及其成因及生物多样性保护策略。E-mail: huhh@ibcas.ac.cn;张强, 广西壮族自治区中国科学院广西植物研究所研究员。主要从事被子植物(金粟兰科等类群)系统发育与生物地理、喀斯特植物(毛茛科天葵属和苦苣苔科等)适应性演化以及分子进化和生物信息学方法研究。已发表研究论文30余篇。开发出同源序列比对矩阵质量过滤软件alignmentFilter。E-mail: qiangzhang04@126.com
  • 基金资助:
    国家自然科学基金(32200190);国家自然科学基金(32122009)

A Comprehensive Evaluation of the Plastid DNA Data Gaps of Vascular Plants in Species and Geographic Area

Yan Deng1,2, Limin Lu2,3, Qiang Zhang4,*(), Zhiduan Chen2,3, Haihua Hu2,3,*()   

  1. 1College of Life Sciences, Guangxi Normal University, Guilin 541006, China
    2Key Laboratory of Systematic and Evolutionary Botany, State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
    3China National Botanical Garden, Beijing 100093, China
    4Guangxi Key Laboratory of Plant Conservation and Restoration Ecology in Karst Terrain, Guangxi Institute of Botany, Chinese Academy of Sciences, Guilin 541006, China
  • Received:2024-03-06 Accepted:2024-05-27 Online:2025-01-10 Published:2024-05-27
  • Contact: * E-mail: qiangzhang04@126.com; huhh@ibcas.ac.cn

摘要: 在植物大数据时代, 测序数据成为众多生物学研究的重要基础, 了解测序数据的现状有利于更好地利用这些数据。质体DNA数据因易获取、单亲遗传及变异速率适中而被广泛应用。基于GenBank公共数据库全面评估和分析了全世界维管植物质体DNA数据取样情况, 结果表明, 仅有33.75%的维管植物种类已测序。已测序物种在不同类群间取样不均衡, 缺失率大致与类群多样性呈显著正相关, 其中缺失最严重的目和科分别是盔被花目(Paracryphiales)、胡椒目(Piperales)和五桠果目(Dilleniales), 以及霉草科(Triuridaceae)、五膜草科(Pentaphragmataceae)和黄眼草科(Xyridaceae)。在地理空间上, 维管植物数据缺失程度从赤道向两极递减, 且生物多样性高的地区缺失更严重, 包括多个生物多样性热点地区。此外, 各地区特有种的数据普遍缺失严重。基于上述结果, 建议针对分子数据缺失程度较高的类群和生物多样性高的地区进行重点采集和测序, 尤其注重对特有种补充取样, 以增加这些类群遗传数据的代表性。

关键词: 质体DNA, 维管植物, 数据缺失, 植物大数据, GenBank

Abstract: INTRODUCTION: Molecular data is one of the most important bases for many biological studies, including phylogeny, ecology, and biogeography etc. Incomplete sampling may lead to biased results and inadequate conclusions. However, few studies have evaluated current state of sampling density for sequencing DNA data comprehensively. Plastid DNA sequences have been applied in scientific studies of plants extensively due to their easy accessibility, uniparental inheritance, and moderate rate of mutation. Therefore, it is essential to investigate the current state of sampling density for sequencing plastid DNA data in species and geographic area for researchers to better utilize it.

RATIONALE: The GenBank is the biggest and most commonly used database of sequencing DNA data. The data gap of plastid DNA in species and geographic area for vascular plants was investigated based on the GenBank database in this study. Firstly, the plastid DNA data of vascular plant species were downloaded from the GenBank database and cleaned. Secondly, species names were standardized according to the World Checklist of Vascular Plants (WCVP) database. Thirdly, to evaluate the current state of sampling density for plastid DNA data of vascular plants, we counted the number of species with plastid DNA sequenced and the proportion of missing data of lineages representing orders and families. We also mapped the proportion of missing data in each region to evaluate the current state of sampling density of plastid DNA data geographically. To further investigate the potential influencing factors of the plastid DNA data gap, Spearman’s correlations between the proportion of missing data and species diversity among major groups of vascular plants or regions were calculated.

RESULTS: Only 33.75% vascular plant species have at least one record of DNA in GenBank, covering 139 005 vascular plant species (angiosperms: 131 220 species, gymnosperms: 1 154 species, and pteridophytes: 6 631 species). For data gap in species, sequenced species were unevenly sampled among lineages, with the proportion of missing data generally correlated with species richness within the lineages. The top three orders of the highest proportion of missing data were Paracryphiales, Piperales, and Dilleniales, and the top three families were Triuridaceae, Pentaphragmataceae, and Xyridaceae. For data gap in geographic area, the proportion of missing data of plastid DNA of vascular plant species showed a trend of latitudinal gradient, with the degree of missing data decreasing from the equator to the poles. Regions with high proportion of missing data usually possess high biodiversity, including many biodiversity hotspots. In addition, endemic species were generally with the high proportion of missing data in the majority of regions.

CONCLUSION:Our research evaluated the current state of sampling density for plastid DNA data in species and geographic area comprehensively. Our results suggested that about 140 000 vascular plant species have been sequenced for the plastid DNAs. However, there are still large data gaps for the plastid DNA of vascular plants in the following three aspects: (1) Only 1/3 vascular plant species have been sequenced; (2) Ratios of species with plastid DNA sequenced are uneven among lineages; (3) The proportion of missing data decreases from the equator to the poles, with more deficiencies in biodiversity hotspots and endemic species. Based on the results of this study, we propose to give priority to collection and sequencing of vascular plants for groups with high proportion of missing data and regions with high biodiversity, particularly for the endemic species. Our research points out the direction of filling plastid DNA data gap and will be beneficial to biodiversity protection.

Key words: plastid DNA, vascular plants, missing data, big data of plant, GenBank