植物学报

• •    

维管植物质体DNA数据在物种和区域上的空缺研究

邓言1, 2, 鲁丽敏2, 3, 张强4*, 陈之端2, 3, 胡海花2, 3*
  

  1. 广西师范大学生命科学学院, 广西 桂林 541006; 2中国科学院植物研究所 植物多样性与特色经济作物重点实验室, 北京 100093; 3国家植物园, 北京 100093; 4广西壮族自治区中国科学院广西植物研究所 广西喀斯特植物保育与恢复生态学重点实验室, 广西 桂林 541006



  • 收稿日期:2024-03-06 修回日期:2024-05-16 出版日期:2024-05-27 发布日期:2024-05-27
  • 通讯作者: 张强, 胡海花

A Comprehensive Evaluation of the Plastid DNA Data Gaps of Vascular Plants in Species and Geographic Space

Yan Deng1, 2, Limin Lu2, 3, Qiang Zhang3*, Zhiduan Chen2, 3, Haihua Hu2, 3* #br# #br#   

  1. 1College of Life Sciences, Guangxi Normal University, Guilin 541006, China; 2State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China; 3China National Botanical Garden, Beijing 100093, China; 4Guangxi Key Laboratory of Plant Conservation and Restoration Ecology in Karst Terrain, Guangxi Institute of Botany, Chinese Academy of Sciences, Guilin 541006, China



  • Received:2024-03-06 Revised:2024-05-16 Online:2024-05-27 Published:2024-05-27
  • Contact: Qiang Zhang, Haihua Hu

摘要: 在植物大数据时代, 测序数据成为众多生物学研究的重要基础, 了解测序数据的现状有利于科研人员更好地利用这些数据。质体DNA数据因其易获取、单亲遗传、变异速率适中等特点应用十分广泛。本研究基于GenBank公共数据库对全世界维管植物质体DNA数据取样情况进行了全面评估和分析。结果表明, 仅有33.75%的维管植物已测序。已测序的物种在不同类群间取样不均衡, 缺失率大致与类群多样性呈显著正相关, 其中缺失最严重的目和科分别是盔被花目(Paracryphiales)、胡椒目(Piperales)和五桠果目(Dilleniales), 以及霉草科(Triuridaceae)、五膜草科(Pentaphragmataceae)和黄眼草科(Xyridaceae)。在地理空间上, 维管植物数据缺失程度由赤道向两极递减, 且生物多样性高的地区缺失比较严重, 包括多个生物多样性热点地区。此外, 各地区特有种的数据缺失普遍严重。基于上述结果, 本文建议针对分子数据缺失程度较高的类群和区域进行重点采集和测序, 尤其注重特有种的补充取样, 以增加这些类群遗传数据的代表性。

关键词: 质体DNA, 维管植物, 数据缺失, 植物大数据, GenBank

Abstract: Molecular data are the basis for many biological studies in the big data era. Understanding the current state of sequencing data is beneficial for researchers to better utilize the data. Plastid DNA sequences have been extensively applied in scientific studies of plants due to their easy accessibility, uniparental inheritance, and moderate rate of mutation. In this study, current situation of sequenced plastid DNA data of the vascular plants in the world were evaluated based on the GenBank database. The results showed that the proportion of sequenced species was low, with only 33.75% vascular plants having plastid DNA data. Sequenced species were unevenly sampled among lineages. The ratios of missing data are generally correlated with species richness within the lineages. The top three orders of the highest missing data ratio were Paracryphiales, Piperales, and Dilleniales, and the top three families were Triuridaceae, Pentaphragmataceae, and Xyridaceae. In the geographic space, the missing data ratio of plastid DNA of vascular plants showed a trend of latitudinal gradient, with the degree of missing data decreasing from the equator to the poles. Regions with high missing data ratio of plastid DNA usually possess high biodiversity, including many biodiversity hotspots. In addition, endemic species were generally with the high proportion of missing data in the majority of regions. Based on the results of this study, we suggest that priority should be given to data collection for groups with high missing data ratio and regions with high biodiversity, particularly for endemic species, to improve the sampling of genetic data of these species and regions.

Key words: Plastid DNA, vascular plants, molecular data gaps, big data of plant, GenBank