The hypervariable N-terminal of soybean mosaic virus P1 protein influences its pathogenicity and host defense responses

Soybean mosaic virus (SMV; Potyvirus, Potyviridae) is one of the most prevalent and destructive viral pathogens in the world. The P1 protein is the first N-terminal product in the potyvirus genome and shows a high sequence variability that may be related to virus adaptation to hosts. In this work, we focused on the different functions of P1 proteins in two SMV isolates SMVGZL and SMVNB during their infection of plants. Isolate SMVGZL induced weaker symptoms than SMVNB in mechanical inoculation assays, and the accumulation level of SMV CP in SMVGZL-infected leaves was lower than that in leaves infected with SMVNB, especially at the late stage of infection. The isolates SMVGZL and SMVNB had a high similarity in genome sequence except for the P1 region. P1GZL induced a higher salicylic acid (SA) response than P1NB in Nicotiana benthamiana, which may explain the lower virus titers in plants infected with SMVGZL. Our results suggest that the divergence in the P1 proteins of these SMV isolates influenced their virulence via differentially regulating SA signaling.


Background
Soybean, Glycine max (L), is one of the world's most important economic crops, providing high-quality vegetable oil and protein for human and animal diets (Wilson 2008;Gao et al. 2015). Soybean is a natural host to many plant viruses, and some of them can cause significant losses in soybean yield and quality. Soybean mosaic virus (SMV) is the most economically damaging of these and is a long-standing problem in all soybean-growing areas in the world. It typically causes 8-35% yield losses (Hill and Whitham 2014), but losses of 50-100% have been reported during severe outbreaks (Song et al. 2016).
SMV is a member of the genus Potyvirus in the family Potyviridae (Hajimorad et al. 2018;Gao et al. 2020). It is transmitted in the field by aphids in a nonpersistent manner and is also spread through infected seeds (Widyasari et al. 2020). Plants infected by SMV show leaf mosaic, mottling, wrinkling, stem necrosis and plant dwarfing, which greatly reduces yield. Different strains of SMV have been recognized based on the different phenotypic responses to the virus by a set of soybean cultivars. Based on the different identification systems, SMV isolates in the United States have been classified into seven strains (G1-G7) (Cho and Goodman 1979), while 22 strains (SC1-SC22) have been recognized in China (Li et al. 2010;Wang et al. 2013). Like other potyviruses, the SMV genome consists of a single-stranded RNA that is approximately 10 kb in length and has a single open reading frame (ORF) (Hajimorad et al. 2018). The polyprotein encoded is processed by self-proteolysis into a series of multifunctional proteins known as P1, helper component proteinase (HC-Pro), P3, PIPO (a product of slippage in the P3 coding sequence), 6K1, cylindrical inclusion (CI) protein, 6K2, nuclear inclusion a (NIa-Pro) protein, nuclear inclusion b (NIb) protein and capsid protein (CP) (Urcuqui-Inchima et al. 2001;Gao et al. 2020;Widyasari et al. 2020).

Open Access
The P1 protein is at the N-terminal of the viral polyprotein and has serine protease activity (Tena Fernandez et al. 2013) which enables it to cleave itself from the adjacent HC-Pro, a protein that is critical for inhibiting host RNA silencing defenses (Verchot et al. 1991;Adams et al. 2005). Earlier studies suggested that P1 enhanced the RNA silencing suppressor activity of HC-Pro. The potato virus Y (PVY) HC-Pro expressed from a P1-HC-Pro sequence increased the accumulation of a reporter gene, whereas protein expressed from an HC-Pro sequence alone did not (Anandalakshmi et al. 1998;Tena Fernandez et al. 2013). The P1 proteins of potyviruses are highly variable both in length and in amino acid sequence, especially at their N-termini (Valli et al. 2007;Shan et al. 2018). These highly disordered N-termini modulate potyviral replication and host defense responses by negatively regulating the self-cleavage activity of P1, which contributes to the long-term replication of virus (Pasin et al. 2014). Removal of the N-terminus of P1 enhances early amplification of virus, and induces salicylate-dependent defense responses (Pasin et al. 2014). Thus, although P1 is not directly involved in suppressing RNA silencing, it can modulate the function of HC-Pro by its self-cleavage activity, and therefore modulates the level of virus amplification and alleviates the host antiviral responses.
Pathogenesis-related (PR) proteins are components of the innate immunity system in plants induced by various viral infections and environmental stresses , and are recognized as markers for systemic acquired resistance (SAR) (van Loon and van Strien 1999;Nürnberger and Brunner 2002). Salicylic acid (SA) plays an important role in SAR. Following pathogen infection, SA accumulation induces the expression of multiple PR protein genes by binding to the key regulator NPR1 (Non-expressor of pathogenesis-related gene 1) (Gaffney et al. 1993;van Loon et al. 2006). PR proteins have been classified into 17 families (PR-1 to PR-17) based on their sequence similarity and molecular biological properties (van Loon et al. 1994;Christensen et al. 2002) and some of these PR protein genes have been characterized in many plants (van Loon et al. 2006;Shin et al. 2014;Almeida-Silva and Venancio 2022). For example, overexpressing the PR-1 gene of rice increases resistance to bacterial infection (Shin et al. 2014). Exogenous SA treatment activates the resistance responses associated with TMV-induced hypersensitivity in tobacco plants by inducing PR-2 and PR-3 transcription (Yalpani et al. 1991;Heitz et al. 1994). Thus, the accumulation of PR proteins is closely associated with the severity of symptoms and the host defense response to pathogen infection .
In a previous study of SMV isolates from different soybean-growing areas in China, we identified isolate SMV_JL_GZL (SMV GZL ) that causes leaf crinkling of soybean and clusters separately from other Chinese SMV isolates in phylogenetic analyses (Wei et al. 2021). We now report another SMV isolate, SMV_ZJ_NB (SMV NB ), which causes severer symptoms and accumulates virions in plants to a greater extent than SMV GZL . The P1 protein sequences of these two isolates are divergent, especially at the N-terminus. Both P1 GZL and P1 NB localized in chloroplasts where P1 GZL -GFP gave a stronger GFP signal. In mechanical inoculation assays, SMV NB and its P1 NB protein induced weaker expression of salicylate-dependent pathogenesis-related genes, and caused more severe symptoms. Thus, our results show that the divergence in P1 sequence is associated with the severity of SMV symptoms.

Results
SMV NB and SMV GZL isolates caused different symptoms on soybean SMV_ZJ_NB (SMV NB ), an isolate of SMV newly obtained from soybean in Ningbo, China, was compared with isolate SMV_JL_GZL (SMV GZL , GenBank Access no. MW354946.1) that had been obtained from northeast China through high-throughput sequencing (HTS) in a previous study (Wei et al. 2021). Both isolates were inoculated to soybean cv. Dongnong 51 in a mechanical transmission assay. The plants inoculated with SMV GZL developed leaf crinkling, while SMV NB induced severe leaf curling and crinkling (Fig. 1a). Reverse transcription-quantitative PCR (RT-qPCR) ( Fig. 1b) and western-blotting analyses (Fig. 1c) confirmed the presence of SMV RNA and coat protein (CP), respectively, with greater accumulation of both in SMV NB -infected leaves than in those inoculated with SMV GZL .
In studies of plants at different time points after inoculation, both SMV NB and SMV GZL induced mild leaf mosaic at 5 days post-inoculation (dpi) but subsequently showed differences in symptoms. The isolate SMV NB caused leaf crinkling at 10 dpi, while SMV GZL caused leaf crinkling till 15 dpi (Fig. 2a). Further results from western blotting assay confirmed that accumulation levels of SMV CP were greater in SMV NB -infected leaves than in those inoculated with SMV GZL (Fig. 2b).

Phylogenetic analysis of SMV NB and SMV GZL isolates
To explore the basis underlying different symptoms caused by these two SMV isolates, we first obtained the near-complete genome sequence of isolate SMV NB via transcriptome sequencing (GenBank Access no. OK625818). The phylogenetic relationship between the two isolates was then examined by constructing a Maximum-Likelihood (ML) tree based on their genome sequences and those of other representative SMV genomes retrieved from GenBank (G1-G7, United States; SC3, SC7, SC15, SMV_AH_SZ, SMV_ N1, SMV_N3, SMV_NE-N1 and SMV_4469-4, China). The 17 SMV isolates segregated into two major clades with clade I containing three subgroups (Fig. 3). Isolate SMV NB clustered with the Chinese SC7 strain in clade Ib, while isolate SMV GZL grouped with the SC3 strain within clade Ic. The isolates G1-G7 from USA clustered either in clade Ia or in clade II. This result suggested that the isolates SMV NB and SMV GZL are members of different SMV strains.

The P1 proteins of SMV NB and SMV GZL are very different
The ORF of isolate SMV NB is 9606 nt in length (encoding 3203 amino acids), while that of isolate SMV GZL is shorter (9204 nt; 3,068 aa). Most of the mature viral proteins encoded by these two isolates are similar to each other (95.4-100% nt identity in the coding regions), but the similarity was much lower in the P1 cistron (42.8% nt identity) (Fig. 4a). The P1 of SMV NB isolate (P1 NB ) has 444 amino acids, while that of SMV GZL isolate (P1 GZL ) has only 309 amino acids for the differences in the N-terminal region of the protein (Fig. 4b). The sequence difference between the two isolates is therefore mainly due to these differences in the N-terminus of the P1 protein.
To determine whether the variability of P1 protein is common in SMV strains, we performed a phylogenetic analysis based on the nucleotide sequence of the P1 protein using ML methods. It was revealed that the P1 proteins from different SMV strains were clustered in two different genotypes (Fig. 5). Type I was represented by the P1 NB , which encodes longer P1 proteins (1332 nt). While, the other genotype was similar to P1 GZL , which lacks residues at the N-terminus and encodes shorter P1 proteins (924-927 nt). These results suggested that P1 is a highly divergent protein with variable sequences in SMV, and P1 GZL and P1 NB belong to two distinct genotypes.

Subcellular localization of P1 GZL and P1 NB in N. benthamiana
To determine the subcellular localization of different P1 proteins, GFP fusions driven by the CaMV35S promoter were constructed and transiently expressed in N. benthamiana leaves. Free GFP, used as a control, was observed in the cytoplasm and nucleus (Fig. 6c), whereas the fusion proteins were observed in structures similar to chloroplasts but with a weaker GFP signal for P1 NB -GFP (Fig. 6a) than for P1 GZL -GFP (Fig. 6b). Further co-localization showed that the GFP signals of P1 GZL -GFP and P1 NB -GFP merged with the autofluorescence of chloroplasts, confirming that they were indeed localized in the chloroplast (Fig. 6).  to viruses (van Loon and van Strien 1999;Nürnberger and Brunner 2002). We therefore compared the accumulation of GmNPR1 and GmNPR2 transcripts in soybean plants infected with SMV GZL or SMV NB . GmNPR1 and GmNPR2 accumulation was significantly higher in plants infected with SMV GZL despite the fact that SMV CP level was lower than those in plants infected with SMV NB (Fig. 7a-d). Quantification of hormone contents further revealed that the production of SA was higher in SMV GZL -infected plants than that in SMV NB -infected plants (Fig. 7e).
In light of the divergence between the SMV P1 proteins and also the fact that the N-terminal region of P1 of another potyvirus (plum pox virus, PPV) negatively regulates viral replication and alleviates the host antiviral responses (Pasin et al. 2014), we therefore tested whether the differences between P1 GZL and P1 NB are related to the expression of SA marker genes. When P1 GZL and P1 NB were transiently expressed in N. benthamiana leaves, SA marker genes NbPR1, NbNPR1 and NbSABP2  accumulated to a significantly greater extent in leaves expressing P1 GZL than in those expressing P1 NB (Fig. 7f and Additional file 1: Figure S1), and similar results were obtained in soybean inoculated with the respective viruses. P1 GZL and P1 NB were then transiently expressed in transgenic N. benthamiana NahG plants ( Fig. 7f and Additional file 1: Figure S1), which have a defect in SA-dependent defense response signaling (Govrin and Levine 2002). The accumulation of SA marker genes was markedly reduced in NahG plants, compared to that in wild-type plants (Fig. 7f ). Notably, the fold change for the differential expression of SA marker genes induced by P1 GZL and P1 NB was much weaker in NahG plants than that in wildtype plants (Fig. 7f ), suggesting that SA-mediated

Discussion
Previous studies have suggested that several of the SMVencoded proteins influence symptom development through interacting with the host plant. For example, SMV P1 plays a role in symptom development and host adaptation via its interaction with the Rieske Fe/S protein of the host (Shi et al. 2007), while CI acts as a virulence determinant and is also involved in inducing severe symptoms on soybean (Zhang et al. 2009). The two Chinese SMV isolates compared here were similar in most parts of their genomes (96.7% nt identity) but differed greatly in the size and sequence of their P1 proteins, prompting us to further investigate the role of SMV P1 in symptom development and host defense response.
P1 is the most divergent of the potyvirus proteins both in sequence and length (Adams et al. 2005), and it is thought that P1 diversification has contributed to host adaptation (Valli et al. 2007). Based on their functional diversity and host factor requirements, P1 proteins of members of the family Potyviridae have been categorized into two types, A and B (Rodamilans et al. 2013(Rodamilans et al. , 2021. Type A proteins are highly variable at their N terminus. They do not have RNA silencing suppression (RSS) activity, and require the assistance of a host factor for self-cleavage (e.g. PVY P1protein, TuMV P1 protein and CVYV P1a protein) (Rodamilans et al. 2013). Type B P1 proteins have RSS activity and contain a conserved zinc-finger motif at the N-terminal (e.g. WSMV P1protein and CVYV P1b protein) (Rodamilans et al. 2013(Rodamilans et al. , 2021. Although the proteins P1 GZL and P1 NB are divergent at their N-termini, nevertheless, they both cluster together with the P1 proteins of all the other members of the genus Potyvirus in clade type A (Additional file 1: Figure S2). The highly variable N-terminus of P1 may play an important role in host adaptation (Shi et al. 2007). Although many functions have been attributed to the P1, the location of the SMV P1 protein in plants is still Fig. 4 The genome organization and sequence comparisons of SMV isolates SMV NB and SMV GZL . a Diagrams showing the genome organization of isolates SMV NB and SMV GZL and the encoded proteins for each cistron. The percentage of nucleotide sequence identity for each cistron is shown between the two diagrams. b The alignment of P1 amino acid sequences from SMV NB and SMV GZL was performed using BioEdit unclear. Our transient expression assay showed that both P1 GZL and P1 NB were localized to the chloroplasts of N. benthamiana (Fig. 6), and P1 GZL -GFP gave a stronger GFP signal (Fig. 6c). Therefore, it does not appear that the N-terminal of P1 has any significant effect on the localization of the protein but is more likely to affect protein stability or degradation.
The C-terminal region of P1 is relatively well conserved within and even between species and is essential for selfcleavage from the viral polyprotein (Verchot et al. 1991). Recent work has shown that the N-terminal region of PPV P1 protein acts as a negative regulator for its selfcleavage, modulating viral replication and host defense response during virus infection (Pasin et al. 2014;Shan et al. 2018). In PPV infection, the first 164N-terminal residues of P1 have an antagonistic effect on its selfprocessing, and thus alleviate the host defense responses and regulate the level of virus amplification, contributing to long-term infection with higher replicative capacity. In the later stage of PPV infection, removal of the P1 N-terminus induced the accumulation of PR proteins and reduced the viral loads (Pasin et al. 2014). PR proteins have been found to be inducible by infection with various types of pathogens in many plants, and are recognized as markers for SAR (van Loon and van Strien 1999). Most PRs and related proteins are induced by the signaling compounds SA, jasmonic acid, or ethylene, and further modulated by abscisic acid (van Loon et al. 2006). In our case, transcript accumulations of GmNPR1 and GmNPR2 were significantly higher in SMV GZL -infected soybean plants than those in leaves infected by SMV NB , and the levels of viral CP were lower in the later stages of infection. Similar patterns of PR accumulation and viral loads were induced by these P1 proteins in transient expression assays in N. benthamiana. These results may indicate that the P1 N-terminus modulates the viral amplification and host defense response.

Conclusions
In this work, we identified two SMV isolates, SMV GZL and SMV NB , which showed differences in symptoms and virus accumulation during their infection of host plants. These two isolates shared a very high similarity in genome sequence except for the P1 region (especially on the P1 N-terminus), and P1 GZL and P1 NB belong to two typical genotypes that are widely present in SMV isolates. In addition, our results suggested that these two typical P1 proteins influenced the pathogenesis of SMV isolates by differentially regulating SA signaling.

Mechanical transmission assay
Inoculation of soybean plants was performed as previously described (Wei et al. 2021). Sap from symptomatic plants was used to inoculate a susceptible soybean variety (Dongnong 51). Each inoculum was prepared from 1 g of Fig. 5 Phylogenetic relationship among the P1 proteins of soybean mosaic virus (SMV) isolates from different countries. The phylogenic tree was constructed using MEGAX. The nucleotide sequence of P1 protein were obtained from NCBI (https:// www. ncbi. nlm. nih. gov/). The colored bars (orange and green) indicate the different clades observed in the phylogenies. The sequence length of P1 protein is shown in the right panel infected leaf tissue, which was homogenized in 10 mL of 0.1 M sodium phosphate buffer, pH 7.0, using a mortar and pestle. Inoculation was performed before the trifoliate leaves emerged. Unifoliate soybean leaves were dusted with carborundum before inoculation, then rubbed softly with a cotton puff to distribute the inoculum, and finally rinsed with tap water. Plants inoculated with phosphatebuffered saline were used as controls. Inoculated plants were grown at 25-28 °C (16-h light/8-h dark). Three independent experiments were conducted to provide data for statistical analysis, Values are means ± standard deviation (SD).  Table S1.

Phylogenetic analysis
The nearly complete genome sequences or amino acid sequences of virus isolates were aligned using Clustal X. Phylogenetic trees were constructed using the Maximum-Likelihood (ML) method with the best-fitting model: GTR + G + I (General time reversible + Gama distributed with invariant sites). Numbers on the nodes indicate percentage of bootstrap support from 1000 replicates. For the construction of SMV phylogenetic tree, Watermelon mosaic virus (WMV; genus Potyvirus, family Potyviridae) was used as the outgroup.

Agrobacterium tumefaciens-mediated transient expression in N. benthamiana
The recombinant binary constructs of P1 were introduced into A. tumefaciens strain GV3101 by electroporation. Transformed bacteria were grown overnight at 28 °C in Luria-Bertani (LB) medium supplemented with the appropriate antibiotic mixture. The cultures were collected and resuspended using an agroinfiltration buffer (10 mM MgCl 2 , 10 mM morpholineethanesulfonic acid, pH 5.6, and 150 μM acetosyringone). The suspensions were adjusted to an optical density of 0.5 at 600 nm (OD 600 ) before leaf infiltration. The cell suspensions were infiltrated into 4-to 6-week-old N. benthamiana leaves using a 2.5 mL sterile syringe. The plants were kept in a growth chamber for 36-44 h. Fluorescence signals for subcellular localization were observed using a confocal laser scanning microscope (Nikon Microsystems) (Sun et al. 2013).

SA measurement
Leaves of mock-treated and SMV-infected soybean plants were collected at 15 dpi, ground in liquid nitrogen and used for hormone extraction and analysis as previously described (He et al. 2017). Three biological replicates were conducted to provide data for statistical analysis.