Skip to main content

Identification of new banana endogenous virus sequences highlights the hallmark gene encoded by retroviruses integrated in banana genomes

Abstract

Endogenous pararetrovirus sequences (EPRVs) originated from DNA viruses of the family Caulimoviridae are widely present in plant genomes. Banana streak viruses (BSVs) are a group of circular double-stranded DNA viruses in the genus Badnavirus of the family Caulimoviridae. Banana endogenous virus sequences (BEVs) derived from the ancestral genes of badnaviruses and fixed in the genomes of various bananas. However, the genomic characteristics of BEVs remain unknown. In this study, we identified 2 new variants of BEVs GZ5 and GZ13 by sequences analyses, Southern blot, and fluorescent in situ hybridization (FISH). BEV GZ5 had one copy of integration in the BB genome of bananas, while BEV GZ13 was only present in the genome of the variety Dajiao. Importantly, BEV GZ5 contained a complete gene of reverse transcriptase (RT) and ribonuclease H (RNase H) (RT/RNase H). In addition, a 340-bp inverted repeat sequence partially overlapping with RNase H was found upstream and downstream of BEV GZ5. However, the amino acid sequences of BEV GZ5 had deletions and mutations compared with BSVs. The bioinformatics analyses showed that BEV GZ5 protein composed of 412 amino acids with a molecular weight of 47.37 kDa and an isoelectric point of 9.40. Leucine, isoleucine, and lysine (Lys) were the main amino acids of BEV GZ5 protein. The analyses revealed that BEV GZ5 protein contained 35 potential phosphorylation sites. Additionally, it was a hydrophilic protein without a signal peptide and transmembrane region. The secondary structure of BEV GZ5 protein consisted of 37.26% α-helix, followed by 36.25% random coil. To our knowledge, this is the first report that novel BEVs with the complete gene of RT/RNase H has been characterized, which provide a basis for further exploration the function and integration mechanism of BEVs in bananas.

Background

Endogenous pararetrovirus sequences (EPRVs), which derived from the family Caulimoviridae, are present in a variety of plant genomes, including rice (Chen and Kishima 2016), tobacco (Mette et al. 2002), banana (Chabannes et al. 2021), and other plants. Most EPRVs are fragmented and rearranged compared with the corresponding virus genomes. They likely became integrated in the host genome via illegitimate recombination during the repair of breaks in host DNA (Feschotte and Gilbert 2012), and then retained in the plant genome (Harper et al. 2002; Gayral and Iskra-Caruana 2009; Iskra-Caruana et al. 2014) as a component of the host plant genome (Yu et al. 2019). However, integrated EPRVs persisted in host genomes as they enhanced the resistance of plants against virus infection by inducing transcriptional or post-transcriptional gene silencing of homologous sequences (Hull et al. 2000). For instance, Staginnus et al. (2007) observed a significant increase in the transcript levels of EPRVs after viral infection. Mette et al. (2002) demonstrated that the enhancer of tobacco EPRVs was silenced in the transgenic tobacco, but expressed in transgenic Arabidopsis, suggesting that EPRVs played a crucial role in the biological process of host plant resistance to viruses.

EPRVs were relics of ancient viruses that infected host plants (Chen and Kishima 2016). These EPRVs coevolved with their host plants, resulting in a clear concordance between EPRVs and their hosts. Therefore, the integration time of EPRVs was estimated by analyzing co-evolutionary genes between endogenous viruses and their hosts (Feschotte and Gilbert 2012; Chen et al. 2017; Diop et al. 2018), especially the divergence time of the ancestor host. For example, the origin of lentivirus could potentially be traced back to more than 12 million years, when the host range of rabbit endogenous lentivirus type K expanded to rabbits and hares (Katzourakis et al. 2007; van der Loo et al. 2009). Schmidt et al. (2021) estimated that the integration of beetEPRV3 occurred approximately 13.4–7.2 million years ago based on the divergence times of Corollinae and Nanae. Therefore, EPRVs served as markers to clarify the phylogenetic relationship between the virus and its host.

Banana streak virus (BSV) belongs to the genus Badnavirus in the family Caulimoviridae (Jaufeerally-Fakim et al. 2006; Staginnus et al. 2009). The International Committee on Taxonomy of Viruses (ICTV) recommended 80% nucleotide sequence identity in the RT/RNase H-coding region as a criterion for distinguishing species of badnaviruses (Geering et al. 2014). At present, thirteen distinct banana streak viruses (BSVs) have been identified (Lheureux et al. 2007; Geering et al. 2011), such as banana streak OL virus (BSOLV), banana streak GF virus (BSGFV), banana streak VN virus (BSVNV), etc. Additionally, endogenous banana streak virus (eBSV) and banana endogenous badnavirus sequences (BEVs) were discovered in the banana genomes (Geering et al. 2005; Harper et al. 2005; Gayral and Iskra-Caruana 2009; Chabannes et al. 2021). Based on partial sequences of the RT/RNase H genes, BSVs and BEVs were classified into three distinct clades. Clade I and III contained BSVs (Harper et al. 2005; Gayral and Iskra-Caruana 2009; Li et al. 2020). However, BSVs in Clade III were endemic in East Africa (Chabannes et al. 2021). Clade II only comprised different BEVs (Geering et al. 2005; Chabannes et al. 2021).

BEVs were present in various banana genomes (Gayral and Iskra-Caruana 2009; D’Hont et al. 2012; Iskra-Caruana et al. 2014). Following the classification criteria for badnavirus recommended by ICTV (Geering et al. 2014; Chabannes et al. 2021), many BEVs have been identified (Geering et al. 2005). Similarly, BSVs and BEVs from bananas were discovered in Uganda (Harper et al. 2005). At present, BEV NGA (D’Hont et al. 2012), UC, UD, UF, UG, UH, P, and Q (Chabannes et al. 2021) were related to the BEVs reported by Geering et al. (2005). The presence of BEVs in banana genomes correlated with genotypes and varieties of bananas. Chabannes et al. (2021) investigated BEVs among diverse genotypes of bananas in Uganda and discovered that BEV UC and UG were commonly present in bananas with A and B genotypes, respectively. In contrast, BEV UD, UH, and NGA were found only in the A genomes of bananas. Additionally, BEV P and Q were solely in the B genomes of bananas.

BSVs are episomal in the bananas; while eBSVs exist in the banana genomes, some can release the infectious BSVs, but others cannot. Nowadays, an increasing number of BEVs are found in banana genomes. However, none of the BEVs can be activated as viruses due to their currently known limited genes, namely the partial RT/RNase H gene of BSVs (D’Hont et al. 2012; Geering et al. 2014), their gene sequences and structural characteristics in banana genomes are still unknown. In a previous study, we found that BEV GZ5 and GZ13 were new BEVs. BEV GZ5 was located on chromosome 5 of Musa balbisiana subsp. PKW (BB) (Rao et al. 2023). However, BEV GZ13 was not found on any chromosomes (Rao et al. 2023) of M. balbisiana subsp. PKW (BB) (Davey et al. 2013), M. acuminata subsp. DH-Pahang (AA) (Belser et al. 2021), or M. schizocarpa subsp. HN8 (SS) (Belser et al. 2018). In this study, we identified that BEV GZ5 and GZ13 were new BEVs by sequence analyses, Southern blot, and fluorescent in situ hybridization (FISH). Additionally, we characterized the upstream and downstream sequences of BEV GZ5 and performed bioinformatics analyses, which will lay a foundation for further research on the function of BEVs in bananas.

Results

Nucleotide and amino acid sequences analyses of new BEVs

According to the classification criteria of badnaviruses, BEV GZ5 and GZ13 were identified from different banana samples. BEV GZ5 was detected in the samples collected from Maoming, Qinzhou, and Yulin, while GZ13 was found in the sample from Wuzhou. To analyze whether BEV GZ5 and GZ13 were BSVs, their nucleotide sequences and amino acid sequences compared with those of BSVs from GenBank. The results showed that the nucleotide sequence identity of BEV GZ5 ranged from 62.4% to 67.9%, and the amino acid sequence identity of BEV GZ5 ranged from 61.8% to 71.4% compared with BSVs. Similarly, BEV GZ13 shared 64.9%–68.5% sequence identity at the nucleotide level and 62.6%–70.9% sequence identity at the amino acid level when compared with BSVs. These results suggested that BEV GZ5 and GZ13 were not BSVs.

To analyze whether BEV GZ5 and GZ13 were new BEVs, their nucleotide and amino acid sequences were compared with the BEVs from GenBank and those reported by Rao et al. (2023). The nucleotide sequence identities of BEV GZ5 and GZ13 shared 59.8%–77.2% and 61.5%–77.6% with those of BEVs from GenBank (Fig. 1a), while the amino acid sequence identities of BEV GZ5 and GZ13 had 63.9%–80.1% and 65.4%–83.8% with those of BEVs from GenBank (Fig. 1b), respectively. In addition, when compared with other BEVs reported by Rao et al. (2023), the nucleotide identities of BEV GZ5 and GZ13 showed 64.9%–75.9% and 65.0%–77.7%, while the amino acid identities of BEV GZ5 and GZ13 were 65.2%–78.6% and 63.2%–84.4%, respectively. These results suggested that BEV GZ5 and GZ13 were new BEVs.

Fig. 1
figure 1

Analyses of BEV GZ5 and GZ13 based on the partial sequences of RT/RNase H with BEVs from GenBank. a Alignment of nucleotide sequence. b Alignment of amino acid sequence. c Phylogenetic tree based on the partial RT/RNase H region

To demonstrate the evolutionary relationships of BEV GZ5 and GZ13 with other BEVs, a phylogenetic tree based on the partial genomes of RT/RNase H gene was constructed. The results indicated that BEVs in this study and from GenBank as well as BSVs from GenBank were grouped into three distinct clades. Both Clade I and II contained BEVs, while Clade I and III contained BSVs. However, BEV GZ5 was clustered into a different branch from other BEVs in Clade II, GZ13 was in a different branch closer to BEV 23 (Bat36, AY189430) than other BEVs in Clade II (Fig. 1c). BEV GZ5 and GZ13 were new BEVs confirmed by the phylogenetic relationship constructed based on the partial sequences of the RT/RNase H-encoding gene.

Southern blot analyses of BEV GZ5

To analyze the endogenous characteristic of BEV GZ5, Southern hybridization was performed on the main banana cultivars in Guangdong. The results demonstrated that the BEV GZ5 probe only generated hybridization signals in bananas with BB genomes, such as Fenza 1 and Guangfen 1. The size of the hybridization band corresponded to the banana genomes. However, the bananas with AAA genomes, such as Williams and Brazilian bananas, did not show any hybridization signals (Fig. 2a). Compared with the undigested genomes of Fenza 1 (ABBB) and Guangfen 1 (ABB) (lanes 1 and 2) (Fig. 2b), the results indicated that the digested banana genomes of Fenza 1 (ABBB) and Guangfen 1 (ABB) had only one hybridization band of approximately 9400 bp (lanes 3 and 4) (Fig. 2b), respectively. These findings suggested that BEV GZ5 had only one integration site in bananas with BB genome.

Fig. 2
figure 2

Identification of BEV GZ5 and GZ13. a Southern blot analysis of BEV GZ5 in undigested total banana genomic DNA. 1, Brazilian (AAA); 2, Williams (AAA); 3, Fenza 1 (ABBB); 4, Guangfen 1 (ABB). b Southern blot analyses of BEV GZ5 in banana total genomic DNA. 1–2, Undigested total genomic DNA of Fenza 1 and Guangfen 1; 3–4, Total genomic DNA of Fenza 1 and Guangfen 1 digested by EcoR V. c Localization of BEV GZ5 integrations in the chromosomes of Guangfen 1 by FISH. The bright spot indicated by the red arrows was BEV GZ5, while blue represented the banana chromosomes counterstained with DAPI. d Southern blot analyses of BEV GZ13 in undigested banana total genomic DNA. 1, Brazilian; 2, Williams; 3, Dajiao (ABB); 4, Fenza 1; 5, Guangfan 1; 6, Jinfen (ABB). M, Digoxigenin labeled marker

Identification of BEV GZ5 by FISH

To confirm the chromosomal location of BEV GZ5, we performed FISH tests using the digoxigenin-labeled BEV GZ5 probes on the mitotic chromosomes of Guangfen 1 (ABB), one of the main banana cultivars in Guangdong. We observed only one hybridization signal in the chromosome of Guangfen 1 (Fig. 2c), which confirmed the results of the chromosomal location (Rao et al. 2023) and Southern blot analyses in this study. It suggested that BEV GZ5 was only present in the BB genomes of bananas with one integration site.

Analyses of endogeny and location of BEV GZ13

To confirm that BEV GZ13 originated from banana genomes, Southern blot was performed. The results showed that the BEV GZ13 probes only reacted with Dajiao (ABB) genome and not with those of the other five tested banana varieties, namely Brazilian (AAA), Williams (AAA), Fenza 1 (ABBB), Guangfan 1 (ABB), and Jinfen (ABB) (Fig. 2d). Additionally, the size of the hybridization band was similar to that of the banana genome. It suggested that BEV GZ13 integrated only in the genome of Dajiao (ABB).

Analyses of the upstream and downstream sequences of BEV GZ5 in Guangfen 1

To investigate the upstream and downstream sequences of BEV GZ5, a 10,000-bp gene fragment was amplified through PCR based on the BEV GZ5 location in the banana BB genome, then cloned and sequenced. After assembly, a 9734-bp nucleotide sequence was obtained, which contained a total of 1743-bp BEV GZ5, and a 340-bp inverted repeat sequences (IRS) upstream, and downstream of BEV GZ5 with partial sequences overlapping with the RNase H gene of BEV GZ5 (Fig. 3a). In addition, a gene encoding a zinc finger protein (ZinF) was discovered upstream of BEV GZ5, and a 621-bp gene sequence (GBV1-1) was found downstream of BEV GZ5, which shared 70.6% nucleotide sequence identity with that of grapevine badnavirus 1 isolate VLJ-178 (GBV1-VLJ-178); however, no functional proteins were predicted.

Fig. 3
figure 3

Schematic diagram of BEV GZ5 and the corresponding protein alignment of BEV GZ5 and BSVs. a Schematic diagram of BEV GZ5. GZ5YC1–GZ5YC4 indicated BEV GZ5 was amplified in four segments. The yellow box represented zinc finger protein (ZinF), the gray box containing RT/RNase H indicated BEV GZ5, the light blue represented RT, the green box represented RNase H, and the purple box denoted GBV1-1. The arrows showed the reverse repeat sequence (IRS), the green box in the arrows represented that the region of IRS overlapped with RNase H, and the dark blue one in the arrows represented the non-encoding region. b Amino acid sequence of BEV GZ5 compared with the corresponding protein of BSVs, orange indicated 100% sequence identity, yellow represented more than 75% sequence identity, blue denoted greater than 50% sequence identity, and white expressed less than 25% sequence identity. The red triangles were the amino acid mutation sites, and the blue stars were the amino acid deletion sites

Analyses of upstream and downstream sequence of BEV GZ5 revealed that the upstream sequence (1–3853bp) of BEV GZ5 showed 99.9% nucleotide sequence identity with that of M. balbisiana subsp. PKW (BB). Similarly, the downstream sequence (5597–9734bp) of BEV GZ5 shared 99.0% nucleotide sequence identity with that of M. balbisiana subsp. PKW (BB). Therefore, the upstream and downstream sequences of BEV GZ5 had the highest nucleotide sequence identities with the BB genomes of bananas.

When comparing the nucleotide sequence of BEV GZ5 with the corresponding region of the badnaviruses, the highest sequence identity was 71.67% at the nucleotide level with grapevine roditis leaf discoloration-associated virus isolate w4. Furthermore, the amino acid sequence of BEV GZ5 was aligned with the corresponding region of BSVs; it was found that BEV GZ5 contained the intact encoding gene of RT/RNase H, which was common to all the badnaviruses (Fig. 3b). Additionally, the amino acid sequence of BEV GZ5 included typical motifs of the RT/RNase H domain. However, it differed from BSVs in that there were 13 amino acid mutation sites and 2 deletion sites (Fig. 3b). Among them, 9 of the 13 amino acid mutation sites and 1 deletion site were in the RT region, 2 of them in RNase H region, and 1 deletion site in the intergenic region, respectively. Motif 3 in the RT region was the most variable motif in BEV GZ5 compared with the corresponding region of the BSVs.

Amino acid sequence composition and physicochemical properties of the BEV GZ5 protein

To identify the fundamental characteristics and potential structures of the protein encoded by BEV GZ5, we conducted bioinformatics analyses, which will aid in determining the function of the BEVs. The amino acid sequence composition and physicochemical properties of BEV GZ5 protein were analyzed (Table 1). The results revealed that the proteins of the BEV GZ5 and the chosen ten BSVs consisted of 410–414 amino acid residues with a relative molecular weight of 46.57–47.56 kDa. Their isoelectric point ranged from 9.14 to 9.42, with 44 to 53 negatively charged residues, such as aspartic acid (Asp) and glutamate acid (Glu), and 61 to 67 positively charged residues, including arginine (Arg) and lysine (Lys). The aliphatic amino acid coefficient ranged from 85.75 to 95.17. The instability coefficient of the BEV GZ5 and the ten BSVs proteins ranged from 30.27 to 46.16, indicating that they were unstable (with coefficients more than 30). The predicted hydrophilicity values ranged from -0.454 to -0.311, which suggested that they were hydrophilic (with negative values). All the proteins of BEV GZ5 and the ten BSVs consisted of 20 amino acids with similar proportions of different amino acids (Additional file 1: Table S1). Among them, leucine, isoleucine, and Lys were more abundant, while cysteine was the least one. The difference of amino acids might be related to the properties of the viral proteins.

Table 1 Primary structure analysis of corresponding proteins BEV GZ5 and BSVs

Prediction of subcellular localization, transmembrane regions, secondary structure, and signal peptide regions of BEV GZ5 proteins

The subcellular localizations of proteins encoded by BEV GZ5 and ten BSVs were predicted by Wolfpsort, the results indicated that they were primarily present in peroxisomes and the cytoplasm (Table 2). The transmembrane region prediction for BEV GZ5 protein showed that it lacked a transmembrane region. Furthermore, the prediction of signal peptide region of BEV GZ5 protein indicated that it did not possess a signal peptide region (Additional file 2: Figure S1). These results suggested that the BEV GZ5 protein was hydrophilic with a negative charge in most amino acid sites (Fig. 4a). The hydrophilicity results obtained from the ProtScale online tool were consistent with the results from the online software ExPASy Proteomic. The secondary structure of the BEV GZ5 protein was composed of 37.26% α-helix, 36.25% random coil, 20.28% extended strand, and 6.41% β-fold.

Table 2 Subcellular localization prediction of BEV GZ5 and corresponding proteins of BSVs
Fig. 4
figure 4

Prediction of hydrophilicity and potential phosphorylation sites of BEV GZ5 protein. a Hydrophobicity or hydrophilicity prediction. The abscissa represented the position of the amino acid, while the ordinate represented the hydrophobicity score. The graph displayed hydrophobicity as positive values and hydrophilicity as negative values. b Potential phosphorylation sites prediction

Phosphorylation sites of BEV GZ5 protein

The prediction of phosphorylation sites showed that the BEV GZ5 protein had 35 potential phosphorylation sites (Fig. 4b). Of these, 17 were identified as potential serine phosphorylation sites, 11 as threonine phosphorylation sites, and 7 as tyrosine phosphorylation sites. These results suggested that serine was the primary site for potential phosphorylation on the BEV GZ5 protein.

Discussion

Endogenous viral sequences are common in host plant genomes (Ndowora et al. 1999; Kunii et al. 2004; Geering et al. 2005; Yu et al. 2019), and play a significant role in shaping the genome of the host plant (Richert-Poggeler et al. 2021). BEVs with partial RT/RNase H sequences had been found in various genotypes of banana (Geering et al. 2005; Chabannes et al. 2021). However, we found that BEV GZ5 had the complete genes of RT/RNase H. To our knowledge, this is the first report that BEVs with the complete genes of RT/RNase H has been characterized. Our findings provide a theoretical foundation for exploring the integration mechanisms of BEVs.

As molecular fossils, BEVs provide strong evidence for the viral ancestors infecting bananas. However, it is important to note that viruses mainly classified based on their relevant biological information such as infectivity, morphology, and transmission. This suggests that it is difficult to recognize ancestral viruses without biological information (Vassilieff et al. 2023). To distinguish between endogenous pararetroviral sequences and homologous episomal viruses, Staginnus et al. (2009) suggested adding the prefix 'E-' or 'e-' or the suffix '-EPRS' to the integrated viral sequences. Assuming that EPRVs were molecular fossils of ancestral viruses, Geering et al. (2010) classified BEVs into tentative genera based on the classification criterion of badnaviruses. Therefore, many tentative species of BEVs were identified based on the threshold of 80% nucleotide sequence identity of the RT/RNase H gene (Geering et al. 2010; D’Hont et al. 2012; Chabannes et al. 2021). According to this classification criterion, we identified two new tentative species of BEVs, BEV GZ5 and GZ13. Southern hybridization confirmed the endogeny of the two BEVs. However, BEV GZ5 was distinct from BEV GZ13 and the BEVs from GenBank (Geering et al. 2010; D’Hont et al. 2012; Chabannes et al. 2021) in that it contained intact encoding genes of RT/RNase H and had an inverted repeat sequence at its both ends. The nucleotide sequence upstream and downstream of BEV GZ5 showed the highest identities with the BB genomes of bananas, respectively. Furthermore, the genetic structure of BEV GZ5 was different from both eBSVs (Cote et al. 2010; Iskra-Caruana et al. 2010; Chabannes et al. 2013) and episomal BSVs (Lheureux et al. 2007; James et al. 2011). Therefore, BEV GZ5 was classified as novel BEVs.

Endogenous pararetroviruses could be identified as a distinct category of transposable elements (TEs), as they lacked the repeats of retrotransposons with long terminal repeats (LTRs) (Jakowitsch et al. 1999). EPRVs could potentially be domesticated as TEs and integrated in host genomes during evolution (Yu et al. 2019). DNA transposons carrying terminal inverted repeat sequences were generally transposed by transposases encoded by their autonomous elements, and the target site sequences at both ends of the element were duplicated when a transposon was actively inserted into the genome (Liu et al. 2012). In tobacco, Jakowitsch et al. (1999) discovered a 63-bp inverse repeat sequence that might be related to the EPRVs recombination process in tobacco genes. In this study, we found a pair of IRS that partial sequences overlapped with the RNase H gene at both ends of BEV GZ5, which was similar to the transposable unit, suggesting that the IRS might play a role in the integration of BSVs in the banana genome or in the acquisition of episomal BSV genes by bananas. Although there was no direct evidence of BSV genes integration and acquisition, the results of Southern hybridization and FISH demonstrated the integration of BEV GZ5 in the bananas with BB genomes. In addition, the IRS at both ends of BEV GZ5 might help the replication of endogenous viral genes via gene homologous recombination (White et al. 1994) or promote chromosomal rearrangements through the identical mechanism (Hughes and Coffin 2001) to improve the environmental adaptability of bananas.

BSVs had integrated in the genomes of bananas in a rearranged and fragmented manner (D’Hont et al. 2012). The integration of EPRVs in host plants had enhanced their resistance by silencing genes at the transcriptional and post-transcriptional levels (Hull et al. 2000). For example, the significant quantity of small RNAs derived from EPRVs may improve host resistance to related viral infections (Huang and Li 2018). Small RNAs derived from EPRVs might suppress potentially pathogenic EPRVs (Schmidt et al. 2021). Therefore, the EPRVs integration in host plants was considered a beneficial component of host resistance against viruses (Zhang et al. 2015; Yang et al. 2016). It is interesting to know that the RT/RNase H gene was present in both retroviruses and retrotransposons (Xiong and Eickbush 1990; Harper et al. 2002). Unlike most BEVs from GenBank, BEV GZ5 contained the intact encoding genes of both RT and RNase H. Importantly, both BEV GZ5 and BEVs from GenBank contained the conserved structural domain of RT/RNase H, which might protect bananas against the pathogens with RT/RNase H genes. It suggests that the hallmark gene of RT and RNase H encoded by retroviruses has been integrated in banana genomes.

RT-like sequences were found in various elements, including plant and animal DNA viruses, transposable elements in fruit flies, yeast, trypanosomes, slime mold, and mitochondrial introns (Xiong and Eickbush 1988), suggesting that they might share an ancestor. Rao et al. (2023) demonstrated that BEVs and BSVs coevolved with bananas and originated from a common badnavirus ancestor, some BEVs predated the differentiation of Musa ancestor. This study revealed that different types of BEVs shared a partial RT/RNase H region, indicating that BEVs had a common origin with retroviruses, transposable elements, and mitochondrial introns, which advanced the coevolutionary timeline among BEVs, BSVs, and bananas.

The distribution of BEVs varied across the chromosomes of bananas, despite their presence in multiple banana genomes (Geering et al. 2005; Perrier et al. 2011; Chabannes et al. 2021). Chabannes et al. (2021) discovered different distribution patterns of BEVs in bananas. This study found that BEV GZ5 was present in BB genomes of bananas at a low copy, as confirmed by Southern hybridization and FISH. However, it did not found in AA and SS genomes of bananas. BEV GZ13 was only present in Dajiao (ABB) and not in other ABB banana varieties. It hypothesized that the integration of BEV GZ5 in BB genomes of bananas was earlier than that of BEV GZ13. Additionally, due to external factors, the integration of BEV GZ13 was a low-frequency event and was challenging to spread among different banana varieties with the same genotype.

Bioinformatics analyses were used to identify the potential functions of the BEV GZ5 protein. The results revealed that the RT/RNase H protein encoded by BEV GZ5 was a hydrophilic protein and lacked a transmembrane structure. It was primarily present in peroxisomes and the cytoplasm. Phosphorylation was the most fundamental and crucial mechanism for regulating and controlling protein viability and function (Keck et al. 2015). The potential phosphorylation sites suggested that the BEV GZ5 protein might play a significant role in the transcriptional regulation of proteins.

Most EPRVs in the host genome frequently led to early termination of the open reading frame (ORF) or translational frame shifts due to nucleotide substitutions, insertions, or deletions (Feschotte and Gilbert 2012). The integration of BSVs in the banana genome led to genetic mutations, rearrangements, and, in some cases, fragmentation of the entire banana genome triggered by natural selection or host interactions. D’Hont et al. (2012) reported the discovery of 24 BEVs integrations across ten chromosomes in M. acuminate subsp. DH-Pahang, all of which were highly recombinant and fragmented, thus failed to form active viruses. Similarly, Chabannes et al. (2021) discovered a significant genetic diversity of BEVs, which often showed translational frame shifts. In this study, compared with RT and RNase H of BSVs, BEV GZ5 had amino acid mutation and deletion. However, BEVs in this study and from GenBank showed partial common genes of RT and RNase H, which suggested that the integration of BEVs in the banana genomes were different. Valli et al. (2023) conducted extensive research on viral functional genes and their associated EPRVs in four representative eggplant genomes; they identified four distinct endogenous viral genomes and other associated EPRVs. Therefore, it was important to explore the distribution of the other functional genes of BSVs in the banana genome, which would provide further insight into the genetic evolutionary mechanism of BSVs and their interaction with bananas.

Conclusions

In this study, we identified two new BEVs, BEV GZ5 and GZ13. BEV GZ13 only had a partial RT/RNase H region, while BEV GZ5 had the complete genes of RT/RNase H, with a 340-bp IRS partially overlapping with RNase H showed upstream and downstream sequences of BEV GZ5. BEV GZ5 was present in different banana varieties with BB genomes, whereas BEV GZ13 was only present in the Dajiao genome (ABB). The new BEVs were different from BSVs; however, both of them shared partial RT/RNase H sequences with retroviruses. This suggested that the hallmark gene of RT and RNase H encoded by retroviruses had integrated in banana genomes, which would advance the co-evolution of BEVs, BSVs, and bananas. This research provides a theoretical foundation for further studying the integration mechanism of different BEVs in bananas.

Methods

DNA extractions, PCR cloning, and sequencing

During 2021–2022, the banana samples collected from Maoming in Guangdong Province, and Qinzhou, Wuzhou, and Yulin in Guangxi Province were stored at -80°C, respectively. To determine whether BSVs infected these samples, immunocapture-PCR (IC-PCR) was conducted on the total DNA (Le Provost et al. 2006). Total DNA from the bananas was extracted using the plant genomic DNA extraction kit following the instructions recommended by Tiangen (Beijing, China). The degenerate primers of Badv-RT were used for PCR amplification with high-fidelity enzymes from Vazyme (Nanjing, China), and the PCR reaction condition was performed as described by Rao et al. (2023). The PCR product was analyzed by electrophoresis in 1 % agarose gel, and the target fragments were purified by a gel recovery kit (Axygen, MA, USA). The purified PCR product of each DNA fragment was cloned into pMD 19T (Sanggon, Shanghai, China) and sequenced. Five clones of each positive sample were sequenced by Sangon (Shanghai, China).

Phylogenetic tree of BEVs

To determine the phylogenetic relationship of new BEVs with other BEVs and BSVs, BEV GZ5 and GZ13 in this study along with BEVs and BSVs from GenBank were analyzed. These sequences were aligned using Clustal W implemented in MEGA11 software (Tamura et al. 2021) and corrected manually when necessary. The phylogenetic tree was constructed by MEGA11 software (Tamura et al. 2021) using the Neighbor-joining (NJ) method with 1000 Bootstrap replicates (Saitou and Nei 1987).

Southern blot

To analyze whether BEV GZ5 and GZ13 were located in banana genomes, we conducted Southern blot to examine their endogeny. Genomic DNA were extracted from different banana varieties, such as Brazilian (AAA), Williams (AAA), Fenza 1 (ABBB), Guangfen 1 (ABB), Jinfen (ABB), and Dajiao (ABB) through the new plant genomic DNA extraction kit from Tiangen (Beijing, China) according to the manufacturer's instructions. The probe primers of GZ5TZ and GZ13TZ were designed based on the gene sequences of BEV GZ5 and GZ13 following the principles of probe design of Southern hybridization (Table 1), and synthesized through PCR amplification using the Roche PCR DIG Probe Synthesis kit (Roche, Switzerland). Southern blot analysis was performed as described previously (Rao et al. 2023). The hybridization signals were detected by the bio-macromolecular analyzer (Bio-Rad, USA).

FISH of BEV GZ5

To confirm the existence of endogenous BEV GZ5 in banana genomes, FISH was performed using the specific probes of BEV GZ5 on the young roots of Guangfen 1, the major cultivated varieties with BB genome in Guangdong Province. The specific DNA probe of BEV GZ5 was generated using the Roche PCR DIG Probe Synthesis kit (Roche, Basel, Switzerland). The banana chromosome preparations and FISH conducted on the young growing root tips of Guangfen 1 following the protocols described by Chabannes et al. (2013). Banana chromosomes were counterstained with 4 ', 6 ' -diamidino-2-phenylindole (DAPI) (Solarbio, Beijing, China) and mounted in anti-fade solution. The observations and images were made using a Leica inverted fluorescence biomicroscope DMi8 (Leica, Wetzlar, Germany).

Sequence amplification and analyses of upstream and downstream sequences of BEV GZ5

BEV GZ5 was located on chromosome 5 of M. balbisiana subsp. PKW (BB) with one locus (Rao et al. 2023). To analyze the sequence characteristics near BEV GZ5 in banana genomes, primers were designed based on the genome sequence of M. balbisiana subsp. PKW (BB) to amplify the upstream and downstream sequences of BEV GZ5 in the BB genomes of bananas (Table 3). PCR amplification of extracted DNA was carried out using high-fidelity enzymes following the protocol, 10 μL 2 × HiFi PCR StarMix, 1 µL 10 µM forward primer and 1 µL 10 µM of reverse primer, 7 µL ddH2O, and 1 µL DNA. The PCR profile was as follows, 94℃ for 5 min; 35 cycles of 94℃ for 1 min, primer-specific annealing temperature for 1 min, and 72℃ for 1 min; and 72℃ for 10 min, then stored at 4 ℃. The target fragments were gel purified and cloned in pMD 19T vectors (Sanggon, Shanghai, China) according to the manufacturer’s instructions. Plasmid DNA was extracted with plasmid DNA purification system (Axygen, USA) according to the manufacturer’s instructions. The upstream and downstream sequences of BEV GZ5 were sequenced. Each clone was sequenced at least twice. Finally, the nucleotide sequences obtained in both directions of BEV GZ5 were assembled, respectively. After that, the nucleotide sequences and amino acid sequences of BEVs were analyzed by DNAMAN software. Additionally, the identification of motifs of amino acid and prediction of protein structural domains were carried out using MEME Suite (http://meme-suite.org/) and ProScan (http://www.ebi.ac.uk/interpro/search/sequence/), respectively.

Table 3 Primers in this study

Bioinformatics analysis of BEV GZ5 protein

To investigate the characterization and function of BEV GZ5 protein, the physicochemical properties, amino acid sequence composition, hydrophobicity, transmembrane structure, signal peptide, subcellular localization, phosphorylation site, and secondary structures were predicted using bioinformatics softwares available online. The nucleotide sequences of BEV GZ5 were translated into amino acid sequences using Editseq software (V7.1.0.44). The physicochemical properties such as molecular weight, isoelectric point, and amino acid composition of BEV GZ5 and ten BSVs were analyzed using the ExPASy Proteomic (http://web.expasy.org/) and Wolfpsort (https://wolfpsort.hgc.jp). Furthermore, the potential subcellular localization of BEV GZ5 and the ten BSVs were predicted using Wolfpsort (https://wolfpsort.hgc.jp). The transmembrane structural domain of BEV GZ5 protein was analyzed using the TMHMM 2.0 Server (http://web.expasy.org/ProtScal). Additionally, the signal peptide region of the BEV GZ5 protein was predicted using the SignalP 4.1 server (http://www.cbs.dtu.dk/servers/SignalP). The hydrophilicity of the BEV GZ5 protein was analyzed using the online tool ProtScale (http://web.expasy.org/ProtScale). The potential phosphorylation sites of the BEV GZ5 protein were predicted using the online software NetPhos 3.1 Server (http://www.cbs.dtu.dk/services/NetPhos) with a threshold value of 0.5. The secondary structure of BEV GZ5 protein was predicted through SOPMA (http://npsa-prabi.ibcp.fr/cgi-bin/scpred_sopma.pl) after manual correction.

Availability of data and materials

Not applicable.

Abbreviations

Arg:

Arginase

Asp:

Aspartic acid

BEVs:

Banana endogenous virus sequences

BSV:

Banana streak virus

BSOLV:

Banana streak OL virus

BSGFV:

Banana streak GF virus

BSVNV:

Banana streak VN virus

eBSV:

Endogenous banana streak virus

EPRVs:

Endogenous pararetrovirus sequences

FISH:

Fluorescent in situ hybridization

GBV1:

Grapevine badnavirus 1

Glu:

Glutamate acid

ICTV:

International Committee on Taxonomy of Viruses

IRS:

Inverted repeat sequences

LTRs:

Long terminal repeats

Lys:

Lysine

ORF:

Open reading frame

RT:

Reverse transcriptase

RNase H:

Ribonuclease H

TEs:

Transposable elements

ZinF:

Zinc finger protein

References

Download references

Acknowledgements

Not applicable.

Funding

This study was supported by National Key R&D Program of China (2019YFD1001800) and China Agriculture Research System of MOF and MARA (CARS-31).

Author information

Authors and Affiliations

Authors

Contributions

HP and XQ designed experiments. HZ performed the experiments and analyzed the data. HZ and XQ wrote the manuscript, HP revised the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Huaping Li or Xueqin Rao.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Supplementary Information

Additional file 1: Table S1. 

Amino acid composition and content analyses (%) of BEV GZ5 and corresponding proteins of BSVs.

Additional file 2: Figure S1. 

Prediction of transmembrane region and signal peptide region of BEV GZ5 protein.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, H., Li, H. & Rao, X. Identification of new banana endogenous virus sequences highlights the hallmark gene encoded by retroviruses integrated in banana genomes. Phytopathol Res 6, 39 (2024). https://doi.org/10.1186/s42483-024-00256-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s42483-024-00256-7

Keywords