Species and genetic variability of sweet potato viruses in China

China is the world’s largest producer of sweet potato (Ipomoea batatas (L.) Lam.). Considering that there are numerous sweet potato-producing regions in China and sweet potato is a vegetatively propagated crop, the genetic diversity of sweet potato viruses could be high in the country. However, studies on species and genetic variabilities of sweet potato viruses in China are limited, making it difficult to prevent and control viral diseases in this crop. During 2014–2019, sweet potato samples with viral disease-like symptoms were randomly collected from sweet potato fields in 25 provinces in China. Twenty-one virus species, including 12 DNA and 9 RNA viruses, were identified in the samples using next-generation sequencing, polymerase chain reaction and rolling-circle amplification methods. One novel sweepovirus species, Sweet potato leaf curl Hubei virus (SPLCHbV), was identified. Two species, Sweet potato collusive virus and Tobacco mosaic virus, were identified for the first time in sweet potato in China. Full-length or nearly full-length genomic sequences of 111 isolates belonging to 18 viral species were obtained. Genome sequence comparisons of potyvirus isolates obtained in this study indicate that the genome of sweet potato virus 2 is highly conserved, whereas the other four potyviruses, sweet potato feathery mottle virus, sweet potato virus G, sweet potato latent virus and sweet potato virus C, exhibited a high genetic variability. The similarities among the 40 sweepovirus genomic sequences obtained from eight sweepovirus species are 67.0–99.8%. The eight sweepoviruses include 14 strains, of which 4 novel strains were identified from SPLCHbV and 1 from sweet potato leaf curl Guangxi virus. Five sweet potato chlorotic stunt virus (SPCSV) isolates obtained belong to the WA strain, and the genome sequences of SPCSV are highly conserved. Together, this study for the first time comprehensively reports the variability of sweet potato viruses in China.


Background
Sweet potato, an important food crop worldwide, ranks third in production and fifth in calorific contribution to human diet among all crops globally (de Albuquerque et al. 2019). China is the world's largest producer of sweet potato, with a plantation area of 2.37 × 10 6 ha and yield of 5.20 × 10 7 tons in 2019 (FAOSTAT 2019).
In China, sweet potato is produced in the northern, Yangtze River and southern regions, spanning tropical, subtropical, warm temperate and temperate zones and covering mountainous areas, hills, plains and coastal areas, and presents complex ecotypes. Accordingly, sweet potato viruses could be highly diverse in China. Practically, sweet potato viral disease control and prevention is a big challenge owing to a lack of comprehensive research on the casual viruses. Advances in molecular detection techniques of plant viruses, particularly next-generation sequencing (NGS), have provided critical tools for studying viral diversity (Kreuze et al. 2009). More than 100 novel DNA and RNA plant virus species have been discovered via NGS in recent years (Gu et al. 2014;Hadidi et al. 2016). The aim of the present study was to investigate the sweet potato virus species and their genetic variability using molecular detection techniques. Here, 21 virus species were identified in sweet potato samples in China, and genome variability of the major viruses was analyzed. The results provide a basis for future research on the control and management of sweet potato viral diseases in China.

Sweet potato virus species and their distribution in China
In this study, 21 virus species were identified in sweet potato samples using the NGS, PCR, RT-PCR and RCA methods (Table 1 and Additional file 1: Table S1). There are 12 DNA and 9 RNA viruses, belonging to nine genera and including one novel sweepovirus species, Sweet potato leaf curl Hubei virus (SPLCHbV). SPLCHbV was approved and ratified recently by the International Committee on Taxonomy of Viruses (ICTV), and has been listed in the ICTV website. In addition, to the best of our knowledge, here we identified tobacco mosaic virus (TMV) and sweet potato collusive virus (SPCV) in sweet potatoes for the first time in China. In our investigation, SPFMV was the most common virus infecting sweet potato and was detected in 23 provinces, followed by SPVG, SPLCV and SPCSV, which were detected in 22, 18 and 18 provinces, respectively (Table 1 and Additional  file 1: Table S1).

Genetic diversity of potyviruses
Full-length or nearly full-length genomic sequences of 14 isolates of SPFMV were obtained from 9 provinces, 18 isolates of SPVG were obtained from 8 provinces, 10 isolates of SPLV were obtained from 4 provinces, 3 isolates of SPV2 were obtained from 1 province, and 11 isolates of SPVC were obtained from 5 provinces (Additional file 1: Table S2).

Genetic diversity of sweepoviruses
Full-length genomic sequences of 40 sweepovirus isolates from 12 provinces (Additional file 1: Table S2) were obtained, with the nt sequence identity among these isolates ranging from 68.0 to 99.8%. According to the criteria set by the ICTV for begomovirus species demarcation (91% nt sequence identity) and the SDT v1.2 analysis result (Brown et al. 2015), 29 of these 40 isolates were classified into seven species: SPLCCV, SPLCGoV, SPL-CGV, SPLCSiV1, SPLCSiV2, SPLCV and SPLCCNV (Additional file 1: Tables S2, S4)  highest nt sequence identity, 73.4%, with known begomoviruses, indicating that the nt sequence identity was lower than 91% (Additional file 1: Table S4). Furthermore, the full-length genomic sequence identity among the 11 isolates ranged from 89.6% to 99.8%. Therefore, these 11 isolates belong to a same and novel species, with Sweet potato leaf curl Hubei virus (SPLCHbV) as the proposed name. According to the strain demarcation criteria for begomoviruses (ranges 91-94% nt sequence identity), as proposed by the ICTV (Haible et al. 2006), these 11 isolates of SPLCHbV are composed of four strains with SPLCHbV-Hb, SPLCHbV-Js, SPLCHbV-Hn and SPLCHbV-Sd as the proposed names (Fig. 2, Additional file 1: Table S5 and Additional file 2: Figure S3). Complete genome sequence comparisons using SDT v1.2 revealed that the genomic sequence identity of the 6 SPLCGV isolates from 3 provinces ranged from 90.4 to 99.4%. Further phylogenetic analysis revealed that four (MK951965, MK951966, MK951967 and MK951968) of these isolates presented an nt sequence identity of 98.2-99.4% (Additional file 1: Table S6 and Additional file 2: Figure S4) and are phylogenetically distant from other SPLCGV isolates analyzed (Fig. 2). They share the highest identity of 93.1% with other known SPLCGV isolates, which is lower than the criteria required for begomoviruses strain classification (94% limit); therefore, these four isolates constitute a novel strain, designated as SPL-CGV-Gd. In addition, the complete genomic sequences of 7 SPLCV isolates were obtained from 6 provinces, with their nt sequence identity ranging from 82.4 to 98.1%. The SDT and phylogenetic analyses revealed that these 7 isolates belong to strains SPLCV-US, SPLCV-KR and SPLCV-Fu ( Fig. 2 and Additional file 1: Table S7). The genome sequences of 5 other sweepoviruses, namely, SPLCCV, SPLCGoV, SPLCSiV1, SPLCSiV2 and SPL-CCNV, are conservative, with 94.8-99.7% nt sequence identity.
The 40 sweepovirus isolates showed typical genomic structural features of sweepoviruses: genome size ranging from 2770 to 2821 bp; six open read frames (ORFs); viral-sense strand with two ORFs (AV1 and AV2), and complementary-sense strand with four ORFs (AC1, AC2, AC3 and AC4). The genomes of these 40 sweepovirus isolates all encompass a highly conserved stem-loop structure containing the TAA TAT TAC/T sequence.

Molecular variation of SPCSV
Nearly full-length genomic sequences of 5 SPCSV isolates were obtained from 5 provinces (Additional file 1: Table S2). The RNA1 and RNA2 segments of the 5 isolates have 4 and 9 ORFs, respectively, similar to those of the Jiangsu isolate of the SPCSV WA strain (Qin et al. 2013b). The nt sequence identities of RNA1 and RNA2 among the five isolates are 98.9-100% and 98.8-99.9%, respectively. The phylogenetic analysis revealed that the five isolates belong to the WA strain (Figs. 3, 4). Therefore, the WA strain could be the dominant strain in sweet potatoes in China, and this is consistent with the findings of our previous study (Qin et al. 2013b).
Nearly full-length genomic sequence of one TMV isolate was obtained (Additional file 1: Table S2). The isolate has the highest nt sequence identity of 99.8% with a Nicotiana benthamiana isolate from Spain (MK087763). To the best of our knowledge, this is the first study to report TMV infecting sweet potatoes in China.
The full-length genomic sequences of 5 SPSMV-1 isolates were obtained (Additional file 1: Table S2). Sequence comparisons indicate that the nt sequence identity of these 5 isolates with other isolates of the virus in GenBank ranged from 97.4 to 99.8% (Additional file 1: Table S8 and Additional file 2: Figures S7), suggesting that the SPSMV-1 genomic sequences are highly conserved.

Discussion
In the present study, viruses infecting sweet potato in China were identified using NGS, PCR and RCA approaches. Twenty-one virus species were identified, including 1 novel virus and 5 novel viral strains, and 2 viruses were found to infect sweet potato in China for the first time. The results highlight the high diversity of the sweet potato viromes in China. The full-length or nearly full-length genomic sequences of 111 isolates were obtained, belonging to 18 sweet potato virus species, and the molecular diversity of major sweet potato viruses was analyzed. The present study for the first time comprehensively reports the variability of sweet potato viruses in China.
Genomic sequence comparisons indicate that three SPFMV isolates (19-2, 32-1 and 33-1) obtained in this study share nt and aa sequence identities of 86.0-90.7% and 92.3-95.8% with the O, RC and EA strain isolates, respectively. Previous studies have reported that the polyprotein nt and aa sequence identities between different SPFMV strains are lower than 94.0% and 96.0%, respectively (Adams et al. 2005;Yamasaki et al. 2010;Wylie et al. 2017). Therefore, these three isolates could be a novel SPFMV strain, and further research on biological characteristics of these three isolates is warranted.
The detection of eight sweepoviruses in sweet potatoes in China is the highest number of virus species detected in a study to date in the country. The eight sweepoviruses include one novel species, which was identified in the present study, and four novel species recently identified in China (Luan et al. 2006;Liu et al. 2013Liu et al. , 2014Liu et al. , 2017. The findings indicate that there is a great diversity in sweepoviruses found on sweet potatoes in China. Sweepoviruses are transmitted by the whitefly (Bemisia tabaci), an important agricultural pest transmitting numerous economically important plant viruses, which has recently exhibited increasing trends in sweet potato cultivation regions in China (Gilbertson et al. 2015). The diversity of sweepovirus could be associated with the activities of whiteflies on sweet potatoes in China. SPCSV is another whitefly-transmitted virus. In the present study, SPCSV was detected from 18 provinces. SPCSV was reported in Guangdong Province of China for the first time in 2011 (Qiao et al. 2011). The rapid spread of SPCSV could also be associated with the expansion of whitefly populations in China. Therefore, controlling whitefly populations in sweet potato fields is a potential strategy to manage SPCSV incidence and proliferation. SPCSV can also synergistically infect sweet potatoes with several other viruses (Karyeija et al. 2000;Mukasa et al. 2006;Cuellar et al. 2011Cuellar et al. , 2015. Co-infection by SPCSV and SPFMV causes SPVD, a destructive disease in sweet potatoes (Schaefers and Terry 1976;Clark et al. 2012). According to the NGS results, SPVD detection rate was as high as 48.4%, suggesting that SPVD is a common disease (Additional file 1: Table S9) and poses a major threat to sweet potato production in China. Therefore, it is vital to bolster SPVD monitoring and control activities.
In the present study, sweet potato samples were collected randomly from sweet potato fields in 25 provinces, where the areas under sweet potato cultivation vary considerably. The amount of sweet potato plants sampled was not proportional to sweet potato cultivation area in the provinces. Although as many as 21 virus species were identified in the present study, their distribution and incidence remain unclear in China. Consequently, more samples should be collected and more reliable sampling efforts should be implemented in future studies. Nonetheless, our results provide an important basis for research on the control and management of sweet potato viral diseases in China.

Conclusions
We identified 21 virus species, including 12 DNA and 9 RNA viruses, in the sweet potato samples from China. One novel sweepovirus, SPLCHbV, was identified. Two species, SPCV and TMV, were identified for the first time in sweet potato in China. Genome sequence comparisons of five potyvirus indicate that the genome of SPV2 is highly conserved, and the other four potyviruses, SPFMV, SPVG, SPLV and SPVC, exhibited a high genetic variability. The similarity among the sweepovirus genomic sequences obtained from eight sweepovirus species varied greatly. The genome sequences of SPCSV are highly conserved. The present study for the first time comprehensively reports the variability of sweet potato viruses in China. The results provide a basis for future research on the control and management of sweet potato viral diseases in China.

Collection of samples
Approximately 2000 samples with viral disease-like symptoms such as leaf shriveling, curling, malformation, vein clearing, chlorosis and mosaic were randomly collected from sweet potato fields in 25 provinces in China during 2014-2019. Most of the samples were leaves, and only a few were storage root and seed samples. The collected leaf and storage root samples were stored at − 80 °C until further analyses, and the seed samples were stored under dry conditions at room temperature until use.

Primers
Primers for virus detection and viral genome sequence amplification were designed based on the nucleotide sequences of sweet potato viruses obtained from Gen-Bank or NGS-assembled sequences (Additional file 1: Tables S10, S11). All primers were synthesized by Sangon Bioengineering and Technology Service Co., Ltd. (Shanghai, China).

Nucleic acid extraction
DNA and RNA were extracted from samples ground in liquid nitrogen, using the DNA extraction Kit (Omega Bio-tek, Norcross, USA) and the Total Plant RNA Extraction Miniprep System (Sangon, Shanghai, China) according to the manufacturer's instructions. The extracted nucleic acid samples were stored at − 80 °C until use.

Next-generation sequencing and data processing
Sixty-four samples were used for NGS (Additional file 1: Table S9). The total RNA was extracted from sweet potato samples using the TRIzol Kit (Invitrogen, Carlsbad, CA, USA), and its concentration and purity was measured using the Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA) and by agarose gel electrophoresis. Subsequently, the small RNAs were enriched using the sRNAeasy kit (Qiagen, Dusseldorf, Germany) to construct small RNA libraries for small RNA sequencing (sRNA-seq); the sRNA libraries were built using the TruSeq Small RNA Sample Prep Kit (Illumina, San Diego, USA). An Illumina Hiseq2500 platform (Illumina) set with 50-bp read lengths was used for sequencing (BGI Tech Company, Shenzhen, China). The ribosome total RNA was removed using the RiboZero Magnetic Kit (Epicenter, USA) to construct ribo-depleted RNA libraries for RNA sequencing (RNA-Seq). The libraries were built using the TruSeq RNA Sample Prep Kit (Illumina). An Illumina HiSeq X-ten platform (Illumina) set with 150-bp pair-end reads was used for sequencing (BGI Tech Company). After deep sequencing, adaptor and low-quality sequences were removed from raw reads, using Trimmomatic (Bolger et al. 2014), yielding more than 8-Gb clean reads for each sample library. The remaining RNA-Seq reads were mapped to the genome sequences of sweet potato (downloaded from http:// public-genom es-ngs. molgen. mpg. de/ Sweet Potato/) (Yang et al. 2017) using CLC Genomic Workbench 10.0 (Qiagen). The reads with sequence similarities of > 60% to the sweet potato genome sequences were eliminated to reduce interference of the host background, and the remaining unique reads were de novo assembled using CLC Genomic Workbench 10.0 (Qiagen). The clean reads of 18-26 nt from sRNA-Seq were subjected to de novo assembly using CLC Genomic Workbench 10.0 (Qiagen). The resulting contigs were subjected to BLASTx and BLASTn searches against viral (taxid:10239) and viroidal (taxid:2559587) sequences of local datasets retrieved from the National Center for Biotechnology Information (NCBI) databank (Wu et al. 2015;Zhang et al. 2021). These processes enabled the identification of contigs with viral sequence attributes.

Rolling-circle amplification
One hundred samples with leaf curl symptoms were used in rolling-circle amplification (RCA). DNA extracted from the samples was amplified using the TempliPhi ™ 100 Amplification Kit (GE Healthcare, USA) according to the method of Haible et al. (2006). The RCA products (5 µL) were digested in a 30-µL container using the restriction enzyme BamHI (10 U) (TaKaRa, Japan) at 37 °C for 4 h. The digested products were separated on 1% agarose gel. The DNA fragments (2.8 kb) were purified using the Gel Extraction Kit (AXYGEN, USA), ligated into the PUC-118 vector, and then transformed into Escherichia coli strain JM109. Positive clones were identified by PCR and sequenced by TaKaRa Bioengineering Co., Ltd. (Dalian, China).

(RT-)PCR and sequencing
For RNA viruses, the extracted RNA samples were reverse transcribed into cDNA using M-MLV reverse transcriptase (TaKaRa) and reverse primers of the corresponding viral species (Additional file 1: Table S10), followed by PCR amplification with the synthesized cDNA as the template. The PCR mixture (25 µL) consisted of 2 µL of cDNA, 2.5 µL of 10×Ex Taq or LA Taq Buffer (MgCl 2 ), 2 µL of 2.5 mM dNTP mixture, 2 µL of each forward and reverse primers (Additional file 1: Table S10), and 0.2 µL of Ex Taq DNA polymerase or LA Taq DNA polymerase. PCR amplifications were performed under the following conditions: an initial denaturation for 5 min at 94 °C, followed by 35 cycles at 94 °C for 1 min, 55 °C for 1 min, 72 °C for 1 min/ kb, and a final extension at 72 °C for 10 min. The PCR products were separated on 1% agarose gels for 30 min. Fragments of interest were recovered using the Gel Extraction Kit (Omega Bio-tek) according to the manufacturer's recommendation. The eluted DNA was ligated into the pMD19-T vector (TaKaRa) according to the manufacturer's instructions and cloned into E. coli strain JM109. Plasmids containing the amplified viral sequence were sequenced by TaKaRa Bioengineering Co., Ltd.. For DNA viruses, DNA was extracted and used as the template for PCR amplification. The PCR mixture, cycling conditions and sequencing procedures were similar to those for RNA viruses.

Cloning and sequencing of the genome of sweet potato viruses
Full-length genomic sequences of potyviruses were cloned by subsection amplification, and the primers used are listed in Additional file 1: Table S11. RT-PCR was performed as previously described. Overlapping fragments of 500-1000 bp were amplified and sequenced after cloning. At least three independent clones for each fragment were sequenced. 5ʹ-RACE and 3ʹ-RACE were conducted by TaKaRa Bioengineering Co., Ltd. Full-length genomic sequences of sweepoviruses and SPSMV-1 were cloned using the back-to-back primers (Additional file 1: Table S11) or RCA. The genomic sequences of other viruses, including TMV, CMV, SPCFV and SPCSV were obtained by NGS assembly.