14
RESEARCH ARTICLE Complete chloroplast genome of Sophora alopecuroides (Papilionoideae): molecular structures, comparative genome analysis and phylogenetic analysis XI ZHA, XIAOYANG WANG, JINRONG LI, FEI GAO* and YIJUN ZHOU* College of Life and Environmental Sciences, Minzu University of China, #27, Zhongguancun South Street, Haidian, Beijing 100081, People’s Republic of China *For correspondence. E-mail: Fei Gao, [email protected]; Yijun Zhou, [email protected]. Received 4 June 2019; revised 9 August 2019; accepted 13 November 2019 Abstract. Sophora alopecuroides belongs to the genus Sophora of the family Papilionoideae. It is mainly distributed in the desert and semi- desert areas of northern China, and has high medicinal value and ecological function. Previous studies have reported the chemical composition and ecological functions of S. alopecuroides. However, only a few reports are available on the genomic information of S. alopecuroides, especially the chloroplast genome, which greatly limits the study of the evolutionary relationship between other species of Papilionoideae. Here, we report the complete chloroplast genome of S. alopecuroides. The size of the chloroplast genome is 155,207 bp, and the GC content is 36.44%. The S. alopecuroides chloroplast genome consists of 132 genes, including 83 protein-coding genes, 41 transfer RNA (tRNA) genes, and eight ribosomal RNA (rRNA) genes. Phylogenetic analysis revealed the taxonomic position of S. alopecuroides in Papilionoideae, and the genus Sophora and the genus Ammopiptanthus were highly related. Comparative genomics analysis revealed the gene rearrangement in the evolution of S. alopecuroides. The comparison between S. alopecuroides and the species of the Papilionoideae identified a novel 23 kb inversion between the trnC-GCA and trnF-GAA which occurred before the divergence of Sophora and Ammopiptanthus of Thermopsideae. This study provided an essential data for the understanding of phylogenetic status of S. alopecuroides. Keywords. chloroplast genome; comparative genomics; phylogenetic analysis; IR region; Sophora alopecuroides. Introduction Sophora alopecuroides, a perennial herb belongs to the genus Sophora of Papilionoideae, mainly distributed in the northwestern regions of Xinjiang, Qinghai, Ningxia and Tibet in China. Owning to the developed underground root system, S. alopecuroides plays an important role in soil conservation and fixation, and biological nitrogen fixation (Iinuma et al. 1995; Tanaka et al. 1998; Liang et al. 2012). S. alopecuroides contains alkaloids, flavonoid, organic acids, polysaccharides and lignans, and studies have shown that the alkaloids extracted from S. alopecuroides have various pharmacological effects such as anti-tumour, anti-arrhyth- mia, antibacterial, anti-toxin and immune regulation (Zhang et al. 2014a). Due to its high medicinal value and ecological significance, more and more attention has been paid to the protection and exploitation of S. alopecuroides resources (Lu et al. 2014; Li et al. 2016). In general, chloroplasts are important semiautonomous organelles in plants that provide the necessary energy (Daniell et al. 2016). The chloroplast genome of angiosperms is a double-stranded circular DNA molecule, with a highly con- served structure, including a large single copy region (LSC), a small single copy area (SSC), and two inverted repeat (IR) sequences of the same size. Usually the genome size is about 120–160 kb and contains 130 genes. Due to the slow evolu- tion of the plant chloroplast genome, it has long been used for plant classification and molecular evolution studies. However, the multiple differentiation of chloroplast DNA led to the gene rearrangement (Jansen and Palmer 1987; Yang et al. 2019) and gene loss which altered the conservation of the chloroplast genome (Tangphatsornruang et al. 2009; Tonti-Filippini et al. 2017). Because of the maternal inheritance and conserved structural features of the chloroplast genome, intact chloro- plast DNA can be used in plant systematics and evolution studies (Wicke et al. 2011), especially in comparative analysis Journal of Genetics (2020)99:13 Ó Indian Academy of Sciences https://doi.org/10.1007/s12041-019-1173-3

Complete chloroplast genome of Sophora alopecuroides

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Complete chloroplast genome of Sophora alopecuroides

RESEARCH ARTICLE

Complete chloroplast genome of Sophora alopecuroides(Papilionoideae): molecular structures, comparative genome analysisand phylogenetic analysis

XI ZHA, XIAOYANG WANG, JINRONG LI, FEI GAO* and YIJUN ZHOU*

College of Life and Environmental Sciences, Minzu University of China, #27, Zhongguancun South Street,Haidian, Beijing 100081, People’s Republic of China

*For correspondence. E-mail: Fei Gao, [email protected]; Yijun Zhou, [email protected].

Received 4 June 2019; revised 9 August 2019; accepted 13 November 2019

Abstract. Sophora alopecuroides belongs to the genus Sophora of the family Papilionoideae. It is mainly distributed in the desert and semi-desert areas of northern China, and has highmedicinal value and ecological function. Previous studies have reported the chemical compositionand ecological functions of S. alopecuroides. However, only a few reports are available on the genomic information of S. alopecuroides,especially the chloroplast genome, which greatly limits the study of the evolutionary relationship between other species of Papilionoideae.Here, we report the complete chloroplast genome of S. alopecuroides. The size of the chloroplast genome is 155,207 bp, and the GC content is36.44%. The S. alopecuroides chloroplast genome consists of 132 genes, including 83 protein-coding genes, 41 transfer RNA (tRNA) genes,and eight ribosomal RNA (rRNA) genes. Phylogenetic analysis revealed the taxonomic position of S. alopecuroides in Papilionoideae, and thegenus Sophora and the genus Ammopiptanthus were highly related. Comparative genomics analysis revealed the gene rearrangement in theevolution of S. alopecuroides. The comparison between S. alopecuroides and the species of the Papilionoideae identified a novel 23 kbinversion between the trnC-GCA and trnF-GAAwhich occurred before the divergence of Sophora and Ammopiptanthus of Thermopsideae.This study provided an essential data for the understanding of phylogenetic status of S. alopecuroides.

Keywords. chloroplast genome; comparative genomics; phylogenetic analysis; IR region; Sophora alopecuroides.

Introduction

Sophora alopecuroides, a perennial herb belongs to thegenus Sophora of Papilionoideae, mainly distributed in thenorthwestern regions of Xinjiang, Qinghai, Ningxia andTibet in China. Owning to the developed underground rootsystem, S. alopecuroides plays an important role in soilconservation and fixation, and biological nitrogen fixation(Iinuma et al. 1995; Tanaka et al. 1998; Liang et al. 2012).S. alopecuroides contains alkaloids, flavonoid, organic acids,polysaccharides and lignans, and studies have shown that thealkaloids extracted from S. alopecuroides have variouspharmacological effects such as anti-tumour, anti-arrhyth-mia, antibacterial, anti-toxin and immune regulation (Zhanget al. 2014a). Due to its high medicinal value and ecologicalsignificance, more and more attention has been paid to theprotection and exploitation of S. alopecuroides resources (Luet al. 2014; Li et al. 2016).

In general, chloroplasts are important semiautonomousorganelles in plants that provide the necessary energy (Daniellet al. 2016). The chloroplast genome of angiosperms is adouble-stranded circular DNA molecule, with a highly con-served structure, including a large single copy region (LSC), asmall single copy area (SSC), and two inverted repeat (IR)sequences of the same size. Usually the genome size is about120–160 kb and contains 130 genes. Due to the slow evolu-tion of the plant chloroplast genome, it has long been used forplant classification andmolecular evolution studies. However,the multiple differentiation of chloroplast DNA led to the generearrangement (Jansen andPalmer 1987;Yang et al. 2019) andgene loss which altered the conservation of the chloroplastgenome (Tangphatsornruang et al. 2009; Tonti-Filippini et al.2017). Because of the maternal inheritance and conservedstructural features of the chloroplast genome, intact chloro-plast DNA can be used in plant systematics and evolutionstudies (Wicke et al. 2011), especially in comparative analysis

Journal of Genetics (2020) 99:13 � Indian Academy of Scienceshttps://doi.org/10.1007/s12041-019-1173-3 (0123456789().,-volV)(0123456789().,-volV)

Page 2: Complete chloroplast genome of Sophora alopecuroides

of chloroplast genomes in closely related species. Previousstudies have shown that chloroplast genome sequences haveincreased phylogenetic resolution at lower taxonomic levelsand are an effective tool for plant phylogeny and geneticpopulation analysis (Parks et al. 2009). In recent years,sequencing and phylogenetic analysis of the completechloroplast genome has been an efficient and relatively low-cost method for improving intraclass classification and pop-ulation analysis. In particular, the comparative analysis of thecomplete chloroplast genome of single or several speciesprovides research ideas for study of species phylogeny andgenetic evolution (Lemieux et al. 2016).

Papilionoideae is the largest subfamily of the Leguminosae,including 32 tribes, 440 genera and about 12,000 species(http://frps.iplant.cn/frps/PAPILIONOIDEAE). Papil-ionoideae is a monophyletic group widely distributed

throughout the world. In China, there are 128 genera, morethan 1000 species in Papilionoideae distributed, and theSophoreae in China include six genera and 21 species (http://frps.iplant.cn/frps/Trib.%20SOPHOREAE). Almost all spe-cies of Papilionoideae have nodules and can be used fornitrogen fixation (Wunderlin 1982). Papilionoideae containsmany economically valuable plants such as Glycine max,Phaseolus vulgaris, Medicago sativa and Ormosia hosiei.Early analysis of phylogenetic studies showed that Crotalar-ieae, Euchresteae, Genisteae, Podalyrieae, Sophoreae, andThermopsideae belong to the ‘core genistoids’, but the rela-tionship between the populations under the branch is still farfrom being solved. Determination of the chloroplast genomeof S. alopecuroideswill provide importantmolecular evidencefor understanding the phylogenetic status of Sophoreae in‘core genistoids’.

Figure 1. Schematic representation of the S. alopecuroides chloroplast genome. The predicted genes are shown and colours representfunctional classifications which are shown at the left bottom. The genes drawn outside the circle are transcribed clockwise, whereas thosedrawn inside the circle are transcribed counterclockwise. The inner circle shows the GC content. The LSC, SSC and IR regions are shownin the inner circle.

13 Page 2 of 14 Xi Zha et al.

Page 3: Complete chloroplast genome of Sophora alopecuroides

Here, we report the information related to the chloroplastgenome of S. alopecuroides and analyse the simple-se-quence repeat (SSR) and long repeats sequence. We con-ducted the phylogenetic analysis of S. alopecuroides andanalysed its collinearity and of other Papilionoideae species.These results provide important information for futureresearch on the phylogenetic status of S. alopecuroides.

Materials and methods

Plant material and chloroplast DNA purification

Fresh leaves of S. alopecuroides were collected from adultplants in Erdos, Inner Mongolia Autonomous Region,

China. Total genomic DNAwas extracted from the leaves byusing a Plant Genomic DNA kit (Tiangen Biotech, Beijing,China), according to the manufacturer’s instructions. TheDNA quality was estimated by a NanoDrop 2000 spec-trophotometer (Nanodrop technologies, Wilmington, USA).

Chloroplast genome sequencing, assembly, annotationand gap filling

SOAPdenovo2 (v2.04, Hongkong, China) (Luo et al. 2012)was used to assemble reads, and obtained the optimalassembly after adjusting the parametersmultiple times, furtherthe reads were compared with the back-assembled contig, andthe assembly results were locally assembled and optimizedaccording to paired-end and overlap relationships of reads.The GapCloser software (v1.12, Hongkong, China) was usedto repair the assembly results, and remove the redundantsegment sequence to get the final assembly results. The geneannotation was performed using the online program DualOrganellar GenoMe Annotator (DOGMA, performing a fullannotation of the chloroplast genome) and GeSeq combinedwith manual correction (Wyman et al. 2004). tRNAscan-SEand rRNAmmerwere used to identify tRNA genes and rRNAs((Lowe and Eddy 1997; Lagesen et al. 2007). The circularchloroplast genome map was generated by Organellar Gen-ome DRAW (OGDRAW) v1.2. (Lohse et al. 2007; Tillichet al. 2017). We used the NCBI specified software Sequin toget sequin format file. The sequence was submitted to NCBIby uploading the sequence data and the complete

Table 1. Characteristics of S. alopecuroides chloroplast genome.

ItemChloroplast genome

characteristics

Size of chloroplast genome(bp)

155,207

Size of SSC (bp) 85,350Size of LSC (bp) 18,107Size of IRA (bp) 25,875Size of IRB (bp) 25,875No. of genes 132No. of protein-coding genes 83No. of tRNA genes 41No. of rRNA genes 8

Table 2. List of genes in the S. alopecuroides chloroplast genome.

Group of genes Group of genes Name of genes

Genes forphotosynthesis

Subunits of photosystem I psaA, psaB, psaC, psaI, psaJSubunits of photosystem II psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM,

psbN, psbTSubunits of ATP synthase atpA, atpB, atpE, atpF*, atpH, atpISubunits of cytochrome b/f complex petA, petB, petD, petG, petLSubunits of NADH-dehydrogenase ndhA*, ndhB*a, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhKLarge subunit of RuBisco rbcL

Self-replication Ribosomal RNAs rrn16a, rrn23a, rrn4.5a, rrn5a

Transfer RNAs trnA-UGC*a, trnC-GCA, trnD-GUC, trnE-UUC*a, trnF-GAA, trnG-UCC,trnH-GUG, trnl-CAU, trnl-GAU, trnK-UUU*, trnL-UAA*, trnY-GUA,trnL-CAAa, trnL-UAG, trnM-CAUa, trnfM-CAU, trnN-GUUa, trnP-GGG,trnP-UGG, trnQ-UUG, trnR-ACG, trnR-CCG, trnR-UCUa, trnS-GCU,trnS-UGA, trnS-CGA, trnT-CGU*, trnT-UGU, trnV-GACa, trnV-UAC*,trnW-CCA

Proteins of small ribosomal subunit rps2, rps3, rps4, rps7a, rps8, rps11, rps12*a, rps14, rps15, rps16*, rps18,rps19

Proteins of large ribosomal subunit rpl2*a, rpl14, rpl16, rpl20, rpl23a, rpl32, rpl33, rpl36Subunits of RNA polymerase rpoA, rpoB, rpoC1*, rpoC2

Other genes Acetyl-CoA carboxylase AccDCytochrome c biogenesis ccsAEnvelope membrane protein cemAMaturase matKProtease clpP**

Unknown Conserved ycf1, ycf2a, ycf3**, ycf4

*Gene with one intron, **Gene with two introns. aDuplicated gene (genes present in the IR regions).

Complete chloroplast genome of S. alopecuroides Page 3 of 14 13

Page 4: Complete chloroplast genome of Sophora alopecuroides

S. alopecuroides chloroplast genome sequence was main-tained in GenBank (accession number: MK_114100).

Repeat sequence analysis

REPuter (Kurtz et al. 2001) was used to identify the size andlocation of repeat sequences in the chloroplast genomes offour species, the minimal repeat size was set at 30 bp, andthe cutoff for similarities among the repeat units was set at

90%. SSRs were detected using MISA Perl Script availableat (http://pgrc.ipk-gatersleben.de/misa/), with the followingthresholds: 10 repeat units for mononucleotide SSRs, sixrepeat units for dinucleotide and trinucleotide repeat SSRs,and five repeat units for tetranucleotide, pentanucleotide andhexanucleotide repeat SSRs. Tandem repeats were analysedusing Tandem Repeats Finder (v4.09) with parameter set-tings of two for matches and seven for mismatches andindels (Benson 1999). The minimum alignment score andmaximum period size were set at 50 and 500, respectively.All the identified repeats were manually verified and nestedor redundant results were removed.

Phylogenetic analysis

A total of 43 complete chloroplast DNA sequences belongingto the Papilionoideae subfamily were obtained from RefSeqdatabase. For the phylogenetic analysis, 61 protein (ccsA,cemA, clpP, atpA, atpB, atpE, atpF, atpH, atpI, ndhA, ndhB,ndhC, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK, psaA,psaB, psaC, psaJ, psbA, psbB, psbC, psbD, psbE, psbF, psbH,psbI, psbJ, psbK, psbL, psbM, psbN, psbT, petA, petB, petD,petG, petL, rbcL, rpl2, rpl14, rpl16, rpl20, rpoA, rpoB, rpoC1,rpoC2, rps11, rps14, rps15, rps3, rps4, rps7, rps8, ycf1, ycf2and ycf3) sequences shared among all these 43 species andS. alopecuroides were aligned using the web-based multiplesequence alignment program Clustal Omega. The alignmentwas manually examined and adjusted. The evolutionary his-tory was inferred using the maximum likelihood methodimplemented in MEGA X (Kumar et al. 2018). The bootstrapconsensus tree inferred from 1000 replicates is taken to rep-resent the evolutionary history of the taxa analysed.

Comparative genomics analysis

Whole-genome alignment was conducted using MAUVE(Darling et al. 2004), with N. tabacum and A. thaliana as

Figure 2. Comparison of the predicted SSRs in S. alopecuroidesand other five Papilionoideae species. (a) The number anddistribution of SSRs in the chloroplast genomes of six Papil-ionoideae species. (b) The length of SSRs in the chloroplastgenomes of six Papilionoideae species. (c) The proportion of SSRin S. alopecuroides.

Figure 3. The type and number of repeats in the chloroplastgenomes of six Papilionoideae species. F, forward; P, palindromic;T, tandem; R, reverse.

13 Page 4 of 14 Xi Zha et al.

Page 5: Complete chloroplast genome of Sophora alopecuroides

reference species. The conserved sequences were identifiedbetween the chloroplast genomes of S. alopecuroides and thoseof N. tabacum, P. vulgaris, T. subterraneum, I. tinctoria,M. truncatula, A.mongolicus,C. arietinum, T. repens, L. albus,L. japonicas, R. pseudoacacia, A. nanus andM. floribunda, thecomparisonmethodwasBLASTN, and the parameterwas set toE-value cutoff to le-10. The homologous regions and geneannotationswere visualizedusing aweb-basedgenome syntenyviewer, AutoGRAPH (Derrien et al. 2006).

Results and discussion

Gene content and structure

Chloroplast genome of S. alopecuroideswas sequenced on anIllumina HiSeq 2000 platform and assembled using SOAP-denovo2 (Luo et al. 2012). We finally got a completechloroplast genome of S. alopecuroides by gap filling

using GapCloser software. The length of the completeS. alopecuroides chloroplast genome sequence was155,207 bp, including a LSC (85,350 bp), a SSC (18,107 bp),and two IRs (25,875 bp) (figure 1). The chloroplast genomeof S. alopecuroides consists of 132 coding regions, including83 protein-coding genes, 41 tRNAs and eight rRNAs (ta-bles 1–2). Of the 132 genes, 115 were single-copy, and 17were duplicated in IR region, including seven tRNAs, sixprotein-coding genes, and four rRNAs. Of the 115 single-copygenes, three passed through different chloroplast genomeboundaries: trnH-GUG crossed the IRAandLSC regions, ycf1crossed the IRA and SSC regions, rps19 crossed the IRB andLSC regions. Of the remaining 112 genes, 82 were located onthe LSC (including 57 protein-coding genes and 25 tRNAs),12 located on the SSC region (including 11 protein-codinggenes and one tRNA), 17 located on the IR region (includingseven tRNAs, six protein-coding genes, and four rRNAs). Atotal of 13 genes (atpF, ndhA, ndhB, trnK-UUU, trnL-UAA,trnT-CGU, trnA-UGC, trnE-UUC, trnV-UAC, rps16, rps12,

Figure 4. Molecular phylogenetic analysis of the Papilionoideae subfamily. The tree was constructed with the sequences of 61 proteinspresent in all 43 species (S. alopecuroides, A. thaliana, L. japonicas, N. tabacum, G. max, P. vulgaris, C. arietinum, T. subterraneum,V. radiata, L. sativus, P. sativum, M. pinnata, V. unguiculata, V. angularis, G. tomentella, G. cyrtoloba, G. stenophita, G. canescens,G. dolichocarpa, G. falcata, G. syndetika, G. soja, L. luteus, T. aureum, T. repens, T. grandiflorum, G. glabra, T. meduseum,T. glanduliferum, A. americana, T. boissieri, T. strictum, A. hypogaea, I. tinctoria, L. albus, P. erosus, R. pseudoacacia, M. truncatula,A. mongolicus, A. nanus, S. bouffordiana, S. flavescens and M. floribunda), using the maximum likelihood method implemented in MEGAX. Two taxa, N. tabacum and A. thaliana were used as outgroups. The tribes to which each species belongs are shown to the right side ofthe tree. Bootstrap supports were calculated from 1000 replicates.

Complete chloroplast genome of S. alopecuroides Page 5 of 14 13

Page 6: Complete chloroplast genome of Sophora alopecuroides

rpl2 and rpoC1) contained one intron, while ycf3 and clpPgenes contained two introns (table 2).

Repeat and SSR analysis

SSR is a molecular marker of high variation within the samespecies and has been used in population genetics and poly-morphism studies (Powell et al. 1995; Yang et al. 2016).Here, we analysed the occurrence, type and distribution ofSSRs in S. alopecuroides chloroplast genome (figure 2). Thedata of S. alopecuroides was compared with Ammopiptan-thus mongolicus, A. nanus, Maackia floribunda, Lupinusluteus and L. albus, which are closely related to S. alope-curoides. In total, 115, 116, 118, 67, 99 and 99 SSRs weredetected in S. alopecuroides, A. mongolicus, A. nanus,M. floribunda, L. luteus and L. albus, respectively. Themononucleotide repeats were the most abundant SSR in eachspecies, accounting for 85%, 95%, 94%, 84%, 90% and 90%of the total SSRs, respectively. Most of the dinucleotiderepeats were AT/TA repeats, and all trinucleotide repeatswere AAT/ATT repeats. There were no tetranucleotate,pentanucleotide and hexanucleotide repeats in these sixspecies. In these six species, the length of SSRs were mainlyconcentrated in 10–16 bp. For S. alopecuroides, SSR wasmainly distributed in LSC, accounting for 72.28% of thetotal number of SSRs.

Dispersed repeat sequences, which play an importantrole in genome rearrangement have been used as a sourcefor understanding the phylogenetic relationships of spe-cies (Park et al. 2017). These repeats were mainly dis-tributed in intergenic spacers (IGS) and introns. Therewere 22 forward repeats, 25 palindromic repeats, and 49tandem repeats and three reverse repeats in chloroplast ofgenome of S. alopecuroides (figure 3). Similarly, 17, 20,8, 21, 17 forward repeats, 22, 12, 13, 29, 33 palindromic

Table 3. Chloroplast genome information of 43 species used forphylogenetic analysis.

SpeciesAccessionnumber

Length(bp)

Number of proteincoding genes

Sophoraalopecuroides

MK_114100 155207 83

Arabidopsisthaliana

NC_000932 154478 85

Lotus japonicus NC_002694 150519 82Nicotianatabacum

NC_001879 155943 98

Glycine max NC_007942 152218 83Phaseolusvulgaris

NC_009259 150285 83

Cicer arietinum NC_011163 125319 75Trifoliumsubterraneum

NC_011828 144763 74

Vigna radiata NC_013843 151271 82Lathyrus sativus NC_014063 121020 74Pisum sativum NC_014057 122169 74Millettia pinnata NC_016708 152968 83Vigna unguiculata NC_018051 152415 84Vigna angularis NC_021091 151683 81Glycinetomentella

NC_021636 152728 82

Glycine cyrtoloba NC_021645 152381 81Glycinestenophita

NC_021646 152618 82

Glycine canescens NC_021647 152518 82Glycinedolichocarpa

NC_021648 152804 82

Glycine falcata NC_021649 153023 82Glycine syndetika NC_021650 152783 82Glycine soja NC_022868 152217 83Lupinus luteus NC_023090 151894 83Trifolium aureum NC_024035 126970 76Trifolium repens NC_024036 132120 75Trifoliumgrandiflorum

NC_024034 125628 75

Glycyrrhizaglabra

NC_024038 127943 76

Trifoliummeduseum

NC_024166 142595 75

Trifoliumglanduliferum

NC_025744 126149 75

Apios americana NC_025909 148772 82Trifolium boissieri NC_025743 125740 74Trifolium strictum NC_025745 125834 75Arachis hypogaea NC_026676 156395 81Indigoferatinctoria

NC_026680 158367 82

Lupinus albus NC_026681 154140 83Pachyrhizuserosus

NC_026682 151947 83

Robiniapseudoacacia

NC_026684 154835 80

Medicagotruncatula

NC_003119 124033 76

Ammopiptanthusmongolicus

KY_034453 153935 85

Ammopiptanthusnanus

KY_034454 154140 85

Maackiafloribunda

NC_034774 154541 82

Table 3 (contd)

Species Accessionnumber

Length(bp)

Number of proteincoding genes

Sophoraflavescens

MH_748034 154378 84

Salweeniabouffordiana

MF_449303 153730 83

cFigure 5. Gene order comparison of Papilionoideae genomes,with N. tabacum and A. thaliana chloroplast genomes as reference,using MAUVE software. The boxes above the line represent thegene sequence in clockwise direction, and the boxes below the linerepresent gene sequences in the opposite orientation. Within each ofthe alignments, local collinear blocks are represented by blocks ofthe same colour connected by lines.

13 Page 6 of 14 Xi Zha et al.

Page 7: Complete chloroplast genome of Sophora alopecuroides

Complete chloroplast genome of S. alopecuroides Page 7 of 14 13

Page 8: Complete chloroplast genome of Sophora alopecuroides

repeats, and 39, 50, 64, 36, 45 tandem repeats wereidentified in the chloroplast genomes of A. mongolicus, A.nanus, M. floribunda, L. luteus and L. albus, respectively.It is worth mentioning that only the reverse repeatsexisted in the chloroplast genomes of S. alopecuroidesand A. nanus.

Phylogenetic analysis of S. alopecuroides based on conservedprotein sequences

To determine the phylogenetic position of S. alopecuroidesin Papilionoideae, we downloaded 42 complete chloroplastgenome sequences from the GenBank database, and con-structed phylogenetic tree using 61 protein sequences from40 species, with N. tabacum and A. thaliana as the outgroup(figure 4; table 3). These 40 species belong to Sophoreae,Robinieae, Cicereae, Thermopsideae, Dalbergieae, Fabeae,Galegeae, Indigofereae, Loteae, Millettieae, Genistaeae,Trifolieae, and Phaseoleae. The phylogenetic tree showedthat the closest relative species of S. alopecuroides wereA. mongolicus, A. nanus and S. bouffordiana.

Comparative genomics analysis

During the process of evolution, several inversions occurredin certain regions of the chloroplast genome in Papil-ionoideae, which changed the order of gene arrangement. A50 kb inversion between rbcL and rps16 was reported earlierin the legume family, and many legume branches have thisinversion, such as Ormosia clade, Mirbelioid clade, and CoreGenistoid clade (Doyle et al. 1996; Cardoso et al. 2013). Toreveal the dynamic change in the gene order caused byinversion during evolution and to identify the conserved andnonconserved regions between the chloroplast geneomes ofS. alopecuroides and other related species of Papilionoideae,whole-genome alignment analysis was conducted usingMAUVE software (Darling et al. 2004), with N. tabacum

and A. thaliana as reference (figure 5). The chloroplastgenomes of 18 plant species in Papilionoideae were used forthis comparative analysis, including those of G. soja, L.sativus, M. pinnata, A. hypogaea, G. glabra, I. tinctoria, M.truncatula, A. mongolicus, C. arietinum, L. luteus, L. albus,L. japonicus, R. pseudoacacia, A. nanus, S. bouffordiana, S.flavescens, and M. floribunda. The results showed that,although the chloroplast genome sizes of these species weredifferent, the relative of IR region were similar.

To further analyse the evolutionary relationships betweenS. alopecuroides and the other related species, gene rear-rangement at the genome level was conducted using thenetwork-based genomic collinear visualization softwareautoGRAPH (Derrien et al. 2006). Firstly, we performed acollinear comparison between S. alopecuroides andN. tabacum and found three different sizes of inversionbetween rbcL and rps16 in the LSC region of S. alopecur-oides (figure 6). Then, we performed collinear comparisonbetween S. alopecuroides and 15 species of Papilionoideae(figure 7). The 15 chloroplast genomes were classified intosix groups (groups A–F) based on the results of the collinearcomparison. Group A comprise of the chloroplast genome ofA. mongolicus, A. nanus, S. flavescens and M. floribunda,which belong to Thermopsideae and Sophoreae. Weobserved a high level of collinearity between the chloroplastgenome of S. alopecuroides and A. mongolicus, A. nanus,and M. floribunda. There are a 12 kb inversion between thendhF and ndhH in the LSC region of S. alopecuroides andS. flavescens (figure 7a), which may have occurred duringthe evolution of plants in Sophora. Group B comprise of thechloroplast genomes of L. albus and L. luteus, two species inGenisteae. There is a 23 kb inversion in the LSC regionbetween the chloroplast genome of S. alopecuroides andgroup B. In addition, a 24 kb inversion exist in the SSCregion between S. alopecuroides and L. luteus. Group Ccomprise of L. japonicus, I. tinctoria and A. hypogaea,which belong to Loteae, Indigofereae and Dalbergieae,respectively. There is a 20 kb inversion in the LSC regionbetween S. alopecuroides and group C. Group D includesthe representative species of Trifolieae, Cicereae, Galegeae

Figure 6. Synteny analyses of chloroplast genomes from S. alopecuroides and N. tabacum.

13 Page 8 of 14 Xi Zha et al.

Page 9: Complete chloroplast genome of Sophora alopecuroides

Figure 7. (Contd)

Complete chloroplast genome of S. alopecuroides Page 9 of 14 13

Page 10: Complete chloroplast genome of Sophora alopecuroides

Figure 7. (Contd)

13 Page 10 of 14 Xi Zha et al.

Page 11: Complete chloroplast genome of Sophora alopecuroides

and Phaseoleae, M. truncatula, C. arietinum, G. glabra andG. soja, respectively. There are at least two inversions in theLSC region and one in the SSC region. Groups E and Fcomprise of two species in Fabeae and Robiniea, L. sativusand R. pseudoacacia, respectively. There is a very compli-cated gene rearrangement between S. alopecuroides and twospecies, which may be caused by multiple inversions.

Further to reveal the gene rearrangement history ofN. tabacum and S. alopecuroides, we conducted serial col-linear comparision using N. tabacum and L. japonicas, L.japonicas and L. albus, L. albus and S. alopecuroides. L.japonicus was selected as a representative legume after the50 kb inversion occurred early in Papilionoideae (Palmeret al. 1988), and L. albus was used as a representativelegume after the 36 kb inversion occurred early in coregenistoids (figure 8) (Martin et al. 2014). The results clearlyrevealed a 23 kb inversion between trnC-GCA and trnF-GAA, in the LSC region between L. albus and S. alopecur-oides. This study demonstrates for the first time that thenovel 23 kb inversion occurs before the divergence ofSophoreae and Ammopiptanthus, Thermopsideae. Thisresult provided new insight into the phylogentic status ofSophoreae and evolutionary lineage.

Gene losses among the chloroplast genomes ofS. alopecuroides and other relative species

During the process of evolution, loss of some gene oftenoccurs in the chloroplast genome of species (Downie et al.1996). Here, we summarize the gene loss in the chloroplast

genome of S. alopecuroides and other relative species (fig-ure 9). Rpl22 were lost in S. alopecuroides and the other 43chloroplast genomes (Gantt et al. 1991; Jansen et al. 2010).The four ycf4-psaI-accD-rps16 regions of the inverted repeatloss clade (IRLC) chloroplast genome belong to the regionwith high mutation rate, which are lost in various speciessuch as Taxus chinensis (Zhang et al. 2014b). Of the 44species, the rps16 gene is lost in 23 species; the ycf4 gene islost in 16 species; the accD gene was lost in six species, allof which belong to Trifolium, while these four gene, i.e.rps16, ycf4, accD, and psaI, exist in the S. alopecuroideschloroplast genome.

IR expansion/contraction in the S. alopecuroides chloroplastgenome

IRs are the most conserved regions in the chloroplastgenome, and in this region the contraction and expansionat the boundary are common evolutionary events, repre-senting one of the main factors affecting of chloroplastgenome size (Civan et al. 2014). Using N. tabacum andA. thaliana as the reference species, we compared the IRboundaries of the chloroplast genomes of A. mongolicus,A. nanus, M. floribunda, L. albus and L. sativus (fig-ure 10). The result showed that in these species, the rps19gene is present at the LSC/IRB border, the trnH gene ispresent at the IRA/LSC boundary, and the ycf1 and ycf1pseudogenes span the IRB/SSC and SSC/IRA boundaries.The rps19 gene in the chloroplast genome of S. alope-curoides and is present in the LSC region, whereas therps19 of A. thaliana spans the LSC and IRB regions. The

Figure 7. Synteny analyses of chloroplast genomes from S. alopecuroides and the representative species ofCicereae, Thermopsideae,Dalbergieae,Fabeae,Galegeae, Indigofereae, Loteae,Millettieae,Robinieae,Genisteae, Sophoreae, Trifolieae andPhaseoleae. (a) Synteny analyses of chloroplastgenomes from S. alopecuroideswithA.mongolicus,A. nanus andM.floribunda. (b) Synteny analyses of chloroplast genomes from S. alopecuroideswith L. albus and L. luteus. (c) Synteny analyses of chloroplast genomes from S. alopecuroides with L. japonicas, I. tinctoria and A. hypogaea.(d) Synteny analyses of chloroplast genomes from S. alopecuroideswithM. truncatula,C. arietinum,G. glabra andG. soja. (e) Synteny analyses ofchloroplast genomes fromS. alopecuroideswithL. sativus. (f) Synteny analyses of chloroplast genomes fromS. alopecuroideswithR. pseudoacacia.

Complete chloroplast genome of S. alopecuroides Page 11 of 14 13

Page 12: Complete chloroplast genome of Sophora alopecuroides

location of rps19 in the chloroplast genome of S. alope-curoides is similar to that of N. tabacum, L. albus,L. sativus, A. mongolicus, A. nanus and M. floribunda.Compared with N. tabacum, there is a ycf1 pseudogenecrossing the border in S. alopecuroides at the IRB/SSCboundary, which also seen in A. thaliana, L. albus,L. sativus, A. mongolicus, A. nanus and M. floribunda.Compared with N. tabacum and A. thaliana, the distancebetween the ycf1 gene, which span the SSC and IRAregions, in S. alopecuroides become shorter. In addition,the distance between the trnH gene and the border of LSCS. alopecuroides chloroplast genome is increased, and thesize of distance is 59 bp, which is 4–6 bp in N. tabacumand A. thaliana. This increased distance observed inS. alopecuroides is also present in L. albus (46 bp) andL. sativus (46 bp).

In conclusion, in the present study, chloroplast gen-ome of S. alopecuroides was sequenced on an IlluminaHiSeq 2000 platform. The SSRs and long repeats wereidentified and the result shows that 115 SSR loci werefound in the chloroplast genome of S. alopecuroides and

most of the identified SSRs were mononucleotides,among the long repeats, the palindromic repeat is themost common repeat. Based on the 61 conserved pro-teins, we performed phylogenetic analysis on 43chloroplast genomes and collinear analysis on repre-sentative species of different genus of the Papil-ionoideae. The genes lost in the 44 species chloroplastgenomes of the Papilionoideae are summarized and thecontraction and expansion of the IR regions of the fivespecies closely related to S. alopecuroides were com-pared. On the one hand, the results of the above studiesillustrate the genetic relationship and evolutionary statusof S. alopecuroides in the subfamily of the Papil-ionoideae, and to some extent, the results confirmed thatS. alopecuroides has a high affinity with A. mongolicus,A. nanus and M. floribunda. In addition, there is a 23 kbinversion between Sophoreae and Ammopiptanthus ofThermopsideae, which overlaps with the 36 kb inversionoccurring in core genistoids. Our study provided anessential data basis for the determination of phylogeneticstatus of S. alopecuroides.

Figure 8. Phylogenetic relationship analysis and comparative plastome maps demonstrated large structural variations between S.alopecuroides and other plant plastomes. The plastome map of N. tabacum represents the structure of most angiosperms, while L. japonicusrepresents that of the Papilionoideae subfamily of legumes, L. albus represents the Genitoids clade.

13 Page 12 of 14 Xi Zha et al.

Page 13: Complete chloroplast genome of Sophora alopecuroides

Acknowledgements

This research was funded by the National Natural Science Foun-dation of China, grant numbers 31770363 and 31670335, and theMinistry of Education of China through 111 and Double First-ClassProjects, grant numbers B08044 and Yldxxk201819.

References

Benson G. 1999 Tandem repeats finder: a program to analyze DNAsequences. Nucleic Acids Res. 27, 573–580.

Cardoso D., Pennington R. T., De Queiroz L. P., Boatwright J. S.,Van Wyk B. E., Wojciechowski M. F. et al. 2013 Reconstructingthe deep-branching relationships of the papilionoid legumes. S.Afr. J. Bot. 89, 58–75.

Civan P., Foster P. G., Embley M. T, Seneca A. and Cox C. J. 2014Analyses of charophyte chloroplast genomes help characterizethe ancestral chloroplast genome of land plants. Genome Biol.Evol. 6, 897–911.

Daniell H., Lin C. S., Yu M. and Chang W. J. 2016 Chloroplastgenomes: diversity, evolution, and applications in geneticengineering. Genome Biol. 17, 134.

Darling A. C., Mau B., Blattner F. R. and Perna N. T. 2004 Mauve:multiple alignment of conserved genomic sequence with rear-rangements. Genome Res. 14, 1394–1403.

Derrien T., Andre C., Galibert F. and Hitte C. 2006 AutoGRAPH:an interactive web server for automating and visualizingcomparative genome maps. Bioinformatics 23, 498–499.

Downie S. R., Llanas E. and Katz-Downie D. S. 1996 Multipleindependent losses of the rpoC1 intron in angiosperm chloroplastDNAs. Syst. Bot. 21, 135–151.

Doyle J. J., Doyle J. L., Ballenger J. A. and Palmer J. D. 1996 Thedistribution and phylogenetic significance of a 50-kb chloroplast

Figure 9. Comparison of the border positions of the LSC, SSC and IR regions among eight chloroplast genomes. W: pseudogenes, / \:distance from the edge.

Figure 10. Gene losses among the chloroplast genomes inPapilionoideae.

Complete chloroplast genome of S. alopecuroides Page 13 of 14 13

Page 14: Complete chloroplast genome of Sophora alopecuroides

DNA inversion in the flowering plant family Leguminosae. Mol.Phylogenet. Evol. 5, 429–438.

Gantt J. S., Baldauf S. L., Calie P. J., Weeden N. F. and Palmer J. D.1991 Transfer of rpl22 to the nucleus greatly preceded its lossfrom the chloroplast and involved the gain of an intron. EMBO J.10, 3073–3078.

Iinuma M., Ohyama M. and Tanaka T. 1995 Six flavonostilbenesand a flavanone in roots of Sophora alopecuroides. Phytochem-istry 38, 519–525.

Jansen R. K. and Palmer J. D. 1987 Chloroplast DNA from lettuceand Barnadesia (Asteraceae): structure, gene localization, andcharacterization of a large inversion. Curr. Genet. 11, 553–564.

Jansen R. K., Saski C., Lee S. B, Hansen A. K. and Daniell H. 2010Complete plastid genome sequences of three rosids (Castanea,Prunus, Theobroma): evidence for at least two independenttransfers of rpl22 to the nucleus. Mol. Biol. Evol. 28, 835–847.

Kumar S., Stecher G., Li M., Knyaz C. and Tamura K. 2018MEGA X: molecular evolutionary genetics analysis acrosscomputing platforms. Mol. Biol. Evol. 35, 1547–1549.

Kurtz S., Choudhuri J. V., Ohlebusch E., Schleiermacher C., StoyeJ. and Giegerich R. 2001 REPuter: the manifold applications ofrepeat analysis on a genomic scale. Nucleic Acids Res. 29,4633–4642.

Lagesen K., Hallin P., Rødland E. A., Stærfeldt H. H., Rognes T.and Ussery D. W. 2007 RNAmmer: consistent and rapidannotation of ribosomal RNA genes. Nucleic Acids Res. 35,3100–3108.

Lemieux C., Otis C. and Turmel M. 2016 Comparative chloroplastgenome analyses of streptophyte green algae uncover majorstructural alterations in the Klebsormidiophyceae, Coleochaeto-phyceae and Zygnematophyceae. Front. Plant Sci. 7, 697.

Li J. G., Yang X. Y. and Huang W. 2016 Total alkaloids of Sophoraalopecuroides inhibit growth and induce apoptosis in humancervical tumor hela cells in vitro. Pharmacogn. Mag. 12, S253.

Liang L., Wang X. Y., Zhang X. H., Ji B., Yan H. C., Deng H. Z.et al. 2012 Sophoridine exerts an anti-colorectal carcinoma effectthrough apoptosis induction in vitro and in vivo. Life Sci. 91,1295–1303.

Lohse M., Drechsel O. and Bock R. 2007 Organel-larGenomeDRAW (OGDRAW): a tool for the easy generationof high-quality custom graphical maps of plastid and mitochon-drial genomes. Curr. Genet. 52, 267–274.

Lowe T. M. and Eddy S. R. 1997 tRNAscan-SE: a program forimproved detection of transfer RNA genes in genomic sequence.Nucleic Acids Res. 25, 955–964.

Lu X., Lin B., Tang J. G., Cao Z. and Hu Y. 2014 Study on theinhibitory effect of total alkaloids of Sophora alopecuroides onosteosarcoma cell growth. Afr. J. Tradit. Complement. Altern.Med. 11, 172–175.

Luo R., Liu B., Xie Y., Li Z., Huang W., Yuan J. 2012SOAPdenovo2: an empirically improved memory-efficientshort-read de novo assembler. Gigascience 1, 18.

Martin G. E., Rousseau-Gueutin M., Cordonnier S., Lima O.,Michon-Coudouel S, Naquin D. et al. 2014 The first completechloroplast genome of the Genistoid legume Lupinus luteus:

evidence for a novel major lineage-specific rearrangement andnew insights regarding plastome evolution in the legume family.Ann. Bot. 113, 1197–1210.

Palmer J. D., Osorio B. and Thompson W. F. 1988 Evolutionarysignificance of inversions in legume chloroplast DNAs. Curr.Genet. 14, 65–74.

Park I., Yang S., Choi G., Kim W. J. and Moon B. C. 2017 Thecomplete chloroplast genome sequences of aconitum pseu-dolaeve and aconitum longecassidatum, and development ofmolecular markers for distinguishing species in the aconitumsubgenus lycoctonum. Molecules 22, 2012.

Parks M., Cronn R. and Liston A. 2009 Increasing phylogeneticresolution at low taxonomic levels using massively parallelsequencing of chloroplast genomes. BMC Biol. 7, 84.

Powell W., Morgante M., McDevitt R., Vendramin G. G. andRafalski J. A. 1995 Polymorphic simple sequence repeat regionsin chloroplast genomes: applications to the population geneticsof pines. Proc. Natl. Acad. Sci. USA 92, 7759–7763.

Tanaka T., Ohyama M., Iinuma M., Shirataki Y. and Komatsu M.1998 Isoflavonoids from Sophora secundiflora, S. arizonica andS. gypsophila. Phytochemistry 48, 1187–1193.

Tangphatsornruang S., Sangsrakru D., Chanprasert J., Uthaipaisan-wong P., Yoocha T, Jomchai N. et al. 2009 The chloroplastgenome sequence of mungbean (Vigna radiata) determined byhigh-throughput pyrosequencing: structural organization andphylogenetic relationships. DNA Res. 17, 11–22.

Tillich M., Lehwark P., Pellizzer T., Ulbricht-Jones E. S., FischerA., Bock R. et al. 2017 GeSeq–versatile and accurate annotationof organelle genomes. Nucleic Acids Res. 45, W6–W11.

Tonti-Filippini J., Nevill P. G., Dixon K. and Small I. 2017 Whatcan we do with 1000 plastid genomes? Plant J. 90, 808–818.

Wyman S. K., Jansen R. K. and Boore J. R. 2004 Automaticannotation of organellar genomes with DOGMA. Bioinformatics20, 3252–3255.

Wicke S., Schneeweiss G. M., Muller K. F., dePamphilis C. W. andQuandt D. 2011 The evolution of the plastid chromosome in landplants: gene content, gene order, gene function. Plant Mol. Biol.76, 273–297.

Wunderlin R. 1982 The Leguminosae: a source book of charac-teristics, uses, and nodulation. Econ. Bot. 36, 224–224.

Yang Y., Zhou T., Duan D., Yang J., Feng L. and Zhao G. 2016Comparative analysis of the complete chloroplast genomes offive Quercus species. Front. Plant Sci. 7, 959.

Yang Z., Huang Y., An W., Zheng X., Huang S. and Liang L. 2019Sequencing and structural analysis of the complete chloroplastgenome of the medicinal plant Lycium chinense mill. Plants 8,87.

Zhang L., Zheng Y., Deng H., Liang L. and Peng J. 2014aAloperine induces G2/M phase cell cycle arrest and apoptosis inHCT116 human colon cancer cells. Int. J. Mol. Med. 33,1613–1620.

Zhang Y., Ma J., Yang B., Li R., Zhu W., Sun L. et al. 2014b Thecomplete chloroplast genome sequence of Taxus chinensis var.mairei (Taxaceae): loss of an inverted repeat region andcomparative analysis with related species. Gene 540, 201–209.

Corresponding editor: H. A. RANGANATH

13 Page 14 of 14 Xi Zha et al.