Comparative genomics of Xanthomonas fragariae and Xanthomonas arboricola pv. fragariae reveals intra- and interspecies variations

The quarantine bacterium Xanthomonas fragariae causes angular leaf spots on strawberry. Its population structure was recently found to be divided into four (sub)groups resulting from two distinct main groups. Xanthomonas arboricola pv. fragariae causes bacterial leaf blight, but the bacterium has an unclear virulence status on strawberry. In this study, we use comparative genomics to provide an overview of the genomic variations of a set of 58 X. fragariae and five X. arboricola pv. fragariae genomes with a focus on virulence-related proteins. Structural differences within X. fragariae such as differential plasmid presence and large-scale genomic rearrangements were observed. On the other hand, the virulence-related protein repertoire was found to vary greatly at the interspecies level. In three out of five sequenced X. arboricola pv. fragariae strains, the major part of the Hrp type III secretion system was lacking. An inoculation test with strains from all four X. fragariae (sub)groups and X. arboricola pv. fragariae resulted in an interspecies difference in symptom induction since no symptoms were observed on the plants inoculated with X. arboricola pv. fragariae. Our analysis suggests that all X. fragariae (sub)groups are pathogenic on strawberry plants. On the other hand, the first genomic investigations of X. arboricola pv. fragariae revealed a potential lack of certain key virulence-related factors which may be related to the difficulties to reproduce symptoms on strawberry and could question the plant-host interaction of the pathovar.


Background
Strawberry is a small fruit crop of great economic importance in the world (Amil-Ruiz et al. 2011). The worldwide strawberry production increased from 2.4 Mt in 1990 to 9.1 Mt in 2016, representing a progress from 5.1 to over 10.9 billion US$ yearly (FAOSTAT 2020). Strawberry became a part of the major fruit industry for several countries (Kim et al. 2016). Strawberry cultivars exhibit diverse susceptibilities to a large variety of harmful organisms, reducing fruit quality and plant yield production (Simpson 1991;Maas 1998). Such diseases cause economic losses in strawberry fields and require to develop corresponding control measures (Amil-Ruiz et al. 2011). One of the main bacterial diseases affecting strawberry is caused by Xanthomonas fragariae, to which all commercial strawberry cultivars commercialized before 2003 were found to be susceptible (Hartung et al. 2003). Numerous strawberry cultivars including wild species and advanced breeding clones from breeding programs were assessed for resistance to X. fragariae, leading to the result that four resistant genotypes were detected (Maas et al. 2000;Hartung et al. 2003;Roach et al. 2016).
X. fragariae is considered as quarantine organism by the European and Mediterranean Plant Protection Organization (OEPP/EPPO 1986), and the symptoms it caused are defined as angular leaf spots (ALS) affecting strawberry plant leaves. The bacterium was first described in 1960 from the USA (Kennedy and King 1960), and was subsequently found in most major strawberry producing regions worldwide (Zimmermann et al. 2004;OEPP/EPPO 2006). The disease begins with X. fragariae invading the plant through natural openings, such as stomata, hydathodes or wounds (Bestfleisch et al. 2015). The first symptoms occur as water-soaked bacterial lesions in the early stages and appear angular in shape. Then, the lesions spread over the foliage and form larger necrotic spots before the plants suffer from vascular collapse (Hildebrand et al. 1967). The artificial infection of X. fragariae displayed different disease incidences on strawberry cultivars indicated by variably severe symptoms on plant leaves (Bestfleisch et al. 2015).
In the last decade, genomic information of X. fragariae was made publicly available (Vandroemme et al. 2013a;Henry and Leveau 2016;Gétaz et al. 2017b). Four subgroups (Xf-CGr-IA, Xf-CGr-IB, Xf-CGr-IC and Xf-CGr-II) included in two major groups of strains (Xf-CGr-I and Xf-CGr-II) were defined using two types of molecular markers: variable numbers of tandem repeats (VNTRs) and clustered regularly interspaced short palindromic repeats (CRISPRs) (Gétaz et al. 2018b). When compared to other Xanthomonas genomes, the size of the genome of X. fragariae was smaller. Absent genes/ regions are potentially involved in xylan degradation and metabolism, the β-ketoadipate phenolics catabolism pathway, one of two type II secretion systems (T2SS) and the glyoxylate shunt pathway. The absence of these genes could possibly impact the plant-host interaction (Vandroemme et al. 2013a). However, a type III secretion system (T3SS), known to be essential for bacterial pathogenicity (Galán and Collmer 1999;Ghosh 2004), a distinct type III secretion system effector (T3E) repertoire, a type IV secretion system (T4SS) and a type VI secretion system (T6SS), all of which could play a role in specific and mostly endophytic association of X. fragariae with its plant host, were observed from the first draft genome (Vandroemme et al. 2013a).
X. fragariae was long considered as the only bacterial pathogen causing disease on strawberry (Kennedy and King 1962). However, in 1993, another causal agent, Xanthomonas arboricola pv. fragariae (Janse et al. 2001), was observed on strawberry plants in northern Italy (Scortichini 1996), and later in Turkey (Ustun et al. 2007). In contrast to the water-soaked regions on the leaves resulting from X. fragariae, the symptoms caused by Xanthomonas arboricola pv. fragariae were reddishbrown lesions on the leaf surface that enlarge and become surrounded by a chlorotic halo (Ferrante and Scortichini 2018). X. arboricola pv. fragariae incites the leaf symptoms mainly in open-field cultivations and during mid-autumn weather conditions characterized by a very high relative air humidity (Scortichini and Rossi 2003). The pathogenicity of X. arboricola pv. fragariae upon artificial inoculation on strawberry plants was not always reproducible in glasshouse or in laboratory experiments and virulence among strains was variable (Scortichini and Rossi 2003;Vandroemme et al. 2013b;Merda et al. 2016). However, two studies could obtain symptoms like extensive vascular discoloration and wilting leaves after vein inoculations (Janse et al. 2001) as well as extensive necrotic lesions on the major leaf vein (Ferrante and Scortichini 2018). Genetic variability among Italian strains of X. arboricola pv. fragariae using repetitive PCR genomic fingerprinting revealed a high overall similarity of the pattern but with distinct genomic profiles (Scortichini and Rossi 2003). A multilocus sequence analysis (MLSA) showed that X. arboricola pv. fragariae strains are not in a monophyletic group, but were found to be spread within the X. arboricola clade ( Vandroemme et al. 2013b).
In this study, we applied a comparative genomics workflow to assess the genomic variations within and between X. fragariae and X. arboricola pv. fragariae species. For X. fragariae, differential plasmid numbers and/ or content, genomic rearrangements but a conserved virulence-related gene repertoire were revealed among strains. And meanwhile, all the X. fragariae strains tested were pathogenic on strawberry. For X. arboricola pv. fragariae, two groups of strains relative to its virulencerelated gene repertoire were revealed, but none of the inoculated strains could incite symptoms on the tested strawberry cultivar.

Genome sequences
As the genome sequences of X. fragariae and X. arboricola pv. fragariae strains included in this study were resulted from different sequencing technologies with different read lengths, the genomes vary in their total numbers of contigs and genome size (Table 1). Based on the genome data for X. fragariae, an average of 3960 genes was found through PacBio or Illumina MiSeq sequencing, whereas an average of 3510 genes was obtained when Illumina HiSeq sequencing was applied. The differential gene content of on average 450 CDS which were lacking in the Illumina HiSeq assemblies were examined by a comparison of the complete PacBio assembly of dually sequenced strains (PD 885 T , NBC 2815, PD 5205) with their respective Illumina HiSeq assemblies. The lacking CDS in the gaps were identified as hypothetical proteins (51%), transposases (37%), integrase proteins (3%), and phage related proteins (3%). Some other annotated genes (6%) were expected to be located in observed gaps as they were surrounded on both sides by transposases leading to assembly ambiguities. The combination of highly repetitive regions and too short reads from HiSeq sequencing (120 bp) led to the result that more gaps remained in these genomes, but the low amount of CDS did not influence the current comparative genomics analysis.
Comparative genomics analysis based on whole-genome sequencing data We have already reported the average nucleotide identities (ANIb) of the X. fragariae strains, and confirmed that they all belonged to the same species with identity values ranging between 99.48 and 99.97%, indicating close clonality of the isolates (Gétaz et al. 2018b). Here, we further evaluated the intraspecies and interspecies relatedness within and between these X. fragariae isolates and X. arboricola pv. fragariae isolates. For the analysis of the annotated genes, an amino acid identity (AAI) comparison between genome sequences of X. fragariae and X. arboricola pv. fragariae was computed with EDGAR (Additional file 1: Table S1). An overall AAI of 91.79 and 91.98% was found for interspecies relatedness. Intraspecies AAI within X. fragariae could discriminate both Xf-CGr-I and Xf-CGr-II groups with an average of 99.78% between groups, and 99.9-100% within groups. The AAI comparison results within X. arboricola pv. fragariae isolates suggested that there were two groups, with AAI of 98.1-98.85% and greater than 99% between and within groups, respectively. The X. arboricola pv. fragariae strains CFBP 6773, LMG 19144 and LMG 19145 PT grouped together whereas CFBP 6762 and LMG 19146 grouped separately. This corresponds well with the grouping of X. arboricola pv. fragariae strains in a partial gyrB sequence-based phylogeny, thus supporting the conclusion that X. arboricola is a polyphyletic group (Vandroemme et al. 2013b).
The alignment with MAUVE of the complete genomes of strains PD 5205, PD 885 T and NBC 2815 (Gétaz et al. 2017b), belonging to three different X. fragariae (sub)groups (Gétaz et al. 2018b), showed an overall conservation of the genomes and illustrated that rearrangements of long syntenic regions between genomes has occurred (Additional file 2: Fig. S1). Two further complete genomes of strains FaP21 and FaP29 (Henry and Leveau 2016) were identical in structure to PD 5205, which belongs to the same (sub)group (i.e. Xf-CGr-IC; Additional file 2: Fig. S2). No large-scale genomic alterations such as insertions, deletions and duplications were observed between the aligned complete genomes, apart from the variable number of plasmids. Possibly, eight large-scale rearrangements have occurred between strains from Xf-CGr-II and Xf-CGr-IA, whereas only two rearrangements were necessary to explain the differences between strains from Xf-CGr-IA and -IB. The synteny of genes flanking a recombination site indicated that these rearrangements were linked to the presence and/or activity of transposable elements.
EDGAR analysis comparing the CDS contents among X. fragariae (sub)groups and X. arboricola pv. fragariae genomes revealed a homogenous repartition of CDS between X. fragariae (sub)groups, while only 72% of the CDS from X. arboricola pv. fragariae were shared with X. fragariae (Fig. 1). The CDS that were found to be exclusively present in the X. arboricola pv. fragariae core genome (n = 1059, hypothetical = 270) represented 30% of its whole core genome and may reflect genes specific to the X. arboricola species level or even to the pathovar level. Among the CDS harbored uniquely in X. arboricola pv. fragariae strains, two clusters are potentially involved in degradation of lignin compounds. The presence of xylan degradation loci and a ß-ketoadipate pathway may indicate that X. arboricola pv. fragariae is able to degrade xylan and metabolize the phenolic monomeric components of lignin, two important elements of the secondary plant cell wall (Déjean et al. 2013). Here, 39 CDS suggested to function as plant cell wall degrading-enzymes (CWDE) were found uniquely in X. arboricola pv. fragariae. On the other hand, a smaller CWDE set was present in X. fragariae compared to other Xanthomonas (Vandroemme et al. 2013a).
In X. arboricola pv. fragariae, a distinct nitrogen assimilation cluster from nitrate form, possibly affecting cell metabolism and their cell growth (Snoeijers et al. 2000), was found. Additionally, the gum-associated genes gumO, and gumP were also found, but these genes were considered as unessential for xanthan biosynthesis and virulence (Lu et al. 2008). Subsequently, although one system was shared with X. fragariae, a second complete T2SS cluster containing proteins encoding both pilus (XpsG to XpsK) and membrane system (XpsC to XpsF, XpsL and XpsM) was found in X. arboricola pv. fragariae (Vandroemme et al. 2013a).
Common CDS to all X. fragariae (sub)groups (Fig. 1) represented a high percentage of core genomes in each (sub)group: 81.7% for Xf-CGr-IA, 82.6% for Xf-CGr-IB, 90% of Xf-CGr-IC and 87.7% of Xf-CGr-II. Of the remaining singletons, an overall proportion of 55-63% singleton CDS within each (sub)group was found to be hypothetical proteins. The number of CDS being common to all X. fragariae (sub)groups but absent from the X. arboricola pv. fragariae genomes was 590 CDS. Among them, a full T6SS consisting of 13 genes from both membrane complex (tssJ, tssK, tssL and tssM) and phage-related complex (tssA, tssB, tssC, hcp, tssE, tssF, tssG, clpV and vgrG) required for an operational system (Zoued et al. 2014) was present in all X. fragariae (sub)groups.
Overall, the X. arboricola pv. fragariae genomes did not contain phage-related genes. However, a cluster of phage genes integrated in the chromosome was found in the core genome of X. fragariae, and different clusters or remnants thereof with proteins annotated as phagerelated were found in the singletons of X. fragariae.
In order to assess intra-(sub)group CDS variations, the generated pan genome was compared to all 58 X. fragariae strains. It confirmed an overall conservation of the genome content in X. fragariae (Fig. 2). However, in addition to CDS variations between (sub)groups, variations within the (sub)groups were observed suggesting that genetic variability also exists at a smaller scale. Overall, 13 out of 30 variable sites were found to be clusters of hypothetical proteins. Due to annotation limitations, no conclusion could be done about these variations.

In silico screening of virulence-related genes
Of a collection of 163 virulence-related proteins, only 118 orthologous proteins were present in the strains included in this study (Fig. 3). The group of absent proteins included mostly T3E (Xop), and this repertoire was already reported to be smaller in X. fragariae compared to other Xanthomonas (Vandroemme et al. 2013a). A total of 27 T3SS proteins, 31 flagellar T3SS, 27 T3E, 7 T4SS, 17 T6SS, 2 LPS and 7 EPS synthesis proteins were conserved in all X. fragariae (sub)groups (Fig. 3). The absence of XopE4 from twelve genomes of X. fragariae strains belonging to different (sub)groups can be explained by the presence of a gap at the corresponding position in the draft genome assemblies resulting from both Illumina MiSeq and HiSeq technologies. A cluster of T4SS proteins (VirB2, VirB3, VirB4, VirB6, VirB9 and VirB11) was found to be located on the chromosome and surrounded by transposases, suggesting being the remainder of an integrated plasmid. An additional tBLASTn search of these proteins in the five fullgenome assemblies (Henry and Leveau 2016;Gétaz et al. 2017b) revealed another set of T4SS proteins with identity below 40% on plasmid regions. Paralogs of virB11 and virB6 were found in all five complete genome sequences, whereas the paralogs for virB4, virB8 and virD4 could only not be identified in strain NBC2815.
At the amino acid level, a limited number of variations were observed that may discriminate X. fragariae strains to the level of their (sub)group. Most of the virulencerelated proteins with non-synonymous SNPs between (sub)groups had only 1 to 4 amino acid changes, which probably represent drift variations (between 0.18 to 1% Fig. 1 Five-way genome comparison. Five-set Venn diagram constructed using EDGAR (Blom et al. 2016) and visualizing the common gene pools among the core genomes of 1) Xanthomonas arboricola pv. fragariae, 2) Xanthomonas fragariae CRISPR group IA (Xf-CGr-IA), 3) Xf-CGr-IB, 4) Xf-CGr-IC and 5) Xf-CGr-II. The numbers indicated in the diagram correspond to the amounts of CDS. The table summarize the pan genome, core genome and singleton information in each X. fragariae (sub)group and in X. arboricola pv. fragariae of total amino acid positions per protein, dependent on the size of the protein). These values correspond to the relative variation obtained by the intraspecies X. fragariae AAI comparison, approximating variations up to 0.25% (Additional file 1: Table S2).

Fig. 2 Circular representation of non-linear pan genome from 58
Xanthomonas fragariae genomes used in this study. The complete genome of strain PD 885 T was used as reference. A total of 10,189 CDS were included in the pan genome and compared to all 58 strains. Each circle corresponds to a single X. fragariae strain. The colors used for the circles correspond to the X. fragariae CRISPR (sub) groups (Gétaz et al. 2018b); Xf-CGr-II: blue, Xf-CGr-IA: red, Xf-CGr-IB: orange, and Xf-CGr-IC: green. Strong colors correspond to a 100% identity, lighter colors correspond to 90% identity, and grey regions correspond to 70% identity. Numbers from 1 to 30 correspond to variable sites between assemblies. The sites 1 and 21 in the red squares correspond to the plasmid regions pXf29 and pXf21, respectively. The site 6 corresponds to CRISPR associated proteins Cas3, Csy1 and Csy2. Site 25 is composed of a phage-related cluster of proteins. Site 30 comprised a cluster of 10 VirB proteins, TrbM and hypothetical proteins and is present only in four strains ICMP 20572 to ICMP 20575. Finally,sites 2,7,8,10,14,15,17,19,20,22,26,28 and 29 include mainly hypothetical genes The higher number of non-synonymous SNPs in these proteins may be indicative of a positive selection in these virulence-related genes. Transcription activator-like (TAL) effectors (Boch and Bonas 2010) were not found in the X. fragariae genomes with the exception of the non-TAL effector-like AvrBs2 (Kearney and Staskawicz 1990). Compared to X. fragariae, X. arboricola pv. fragariae harbored a smaller virulence-related protein repertoire (Fig. 3). The genome sequences identified the same two groups of X. arboricola pv. fragariae as resulting from the AAI analysis (Additional file 1: Table S2) based on the differences in the virulence-related genes repertoire size. Strains CFBP 6762 and LMG 19146 encoded 25 and 26 T3SS proteins respectively, whereas only two of these, HrpG and HrpX, were found in the proteome of strains LMG 19144, LMG 19145 PT and CFBP 6773. As a functional Hrp system is thus absent in strains LMG 19145 PT , CFBP 6773 and LMG 19144, the absence of T3E in these strains may be evident. On the other hand, the strains CFBP 6762 and LMG 19146, which encode a functional Hrp system, have only four effectors: AvrBs2, XopF1, XopF2 and XopR (Fig. 3). We can hypothesize that the small T3SS and T3E repertoire in both groups of X. arboricola pv. fragariae may not elicit an HR in strawberry plants.
The screened proteins, which were present in both species, harbored interspecies amino acid variations between 1.15 and 39% (Additional file 1: Table S3). The variability of flagellar T3SS proteins, LPS and EPS synthesis protein was ranging between 1.15 and 11%, suggesting purification selection for highly interspecies conserved proteins and genetic drift for variability similar to the interspecies AAI values, thus suggesting a conservation of functional elements in both species. On the other hand, interspecies variations of most of the T3SS cluster with higher variability was observed, and this T3SS cluster, which was reported as conserved in X. arboricola strains (Cesbron et al. 2015), could have been independently acquired by horizontal transfer in both species. The four orthologs of T3E protein sequences in both species also greatly varied (between 13.20 and 32.70%), thus suggesting an independent acquisition of effectors. Most of the elements from the T3SS and T3E could therefore result from horizontal transfer as already reported for Gram-negative bacterial pathogens (Brown and Finlay 2011;Puhar and Sansonetti 2014). Similarly, most of the T4SS proteins were variable between 19.2 and 39% between species, suggesting similar acquisition patterns.

Plasmid diversity
The five complete X. fragariae genomes (Henry and Leveau 2016;Gétaz et al. 2017b) contain either one or two plasmids that we can appoint to two distinct plasmid families (Fig. 4). The plasmids from strains FaP21 and FaP29 were identical to those from PD 5205 belonging to the same (sub)group (Additional file 2: Fig. S3).
The plasmid pXf21 family is represented by two different variants. The plasmid from strain PD 885 T contains a 14kb insert that encodes two T4SS proteins (VirB5 and VirB6), but this is not a complete T4SS. Additionally, the insert includes a relaxase from the MobC superfamily and more specifically in the MobC1 (Gammaproteobacteria) or MoBC_CloDF13 family (Garcillán- Barcia et al. 2009), as well as a third RelE/ParE toxin/antitoxin element. The 14kb region may have been inserted into the plasmid due to the concerted action of two transposable elements (PD885_ RS20135 and PD885_RS20140) located at the border of the region. On the other hand, plasmids pNBC2815-21 and pPD5205-21 have a common 8-kb region instead. They include a RelE/Stb stabilization toxin family and a RelB/DinJ toxin/antitoxin element, in addition to two similar RelE/ ParE toxin/antitoxin found as well in pPD885-27. Based on an incomplete T4SS in pPD885-27 and the lack of a conjugation system in the other plasmids from pXf21 family, this plasmid family may be non-transmissible. However, plasmid pPD885-27 is the only plasmid of the family harboring a relaxase and could therefore be mobilizable in the presence of an oriT (Smillie et al. 2010). The 14-kb insert was found to be common and unique to all strains from Xf-CGr-IA (Fig. 2), whereas the 8-kb region was found in the other groups.
The plasmid family pXf29 has a more conserved structure. Only one additional transposase (PD5205_RS19765) was found in plasmid pPD5205-30 from X. fragariae PD5205. This protein is an IS3 family transposase found several times with high sequence identity in X. fragariae chromosomes. All plasmids from the pXf29 family contain a relaxase from the same MobC1 family (Garcillán- Barcia et al. 2009) as the one found in pPD885-27 and two toxin/antitoxin elements RelE/ParE and RelB/DinJ found as well in the pXf21 plasmids harboring the 8-kb region. Additionally, they all contain a complete T4SS. The pXf29 (See figure on previous page.) Fig. 3 Variable virulence-related genes within and between X. fragariae and X. arboricola pv. fragariae genomes. Comparison between genomes and protein sequences of virulence-related genes from type III secretion system (T3SS), flagellar-related T3SS, type III secretion effectors (T3E), type IV secretion system (T4SS), type VI secretion system (T6SS), lipopolysaccharide (LPS) and extracellular polysaccharides (EPS) synthesis proteins. Variations were observed between strains and reported with color-coded labels; red: absence, dark green: presence with 100% identical to reference, light green: truncated sequence due to end of contig, yellow: sequences variation between 1 and 14 amino acids, orange: amino acid variation above 14 different amino acids plasmid family could therefore be effectively mobilizable and conjugative (Garcillán-Barcia et al. 2009;Garcillán-Barcia et al. 2011).
The origin of replication of both plasmids was used to perform an in silico screening for the presence of the plasmid families within all genomes. Overall, a variation of plasmid number among strains screened in this study was observed as strains harbored either none, one or two plasmids (Table 1). Strains from subgroups Xf-CGr-IA and -IC had both plasmid families, whereas Xf-CGr-IB and -II strains usually had only the plasmid pXf21 family (Additional file 1: Table S4).

Growth behavior of X. fragariae strains
To investigate if the above described differences in the genome content and arrangement of the different X. fragariae (sub) groups would influence growth behavior, the strains were grown under the same culture conditions in liquid Wilbrinks medium, of which sucrose is the carbon source. Average generation times for X. fragariae strains were ranging between 2.02 and 5.96 h (Table 1 and Fig. 5). No significant differences in generation time were observed between the X. fragariae (sub)groups (p-value > 0.05). On the other hand, X. arboricola pv. fragariae strains grew significantly faster than X. fragariae strains with generation times between 1.5 and 1.95 h (p-value < 0.05) in the same growth medium.

Bacterial virulence to strawberry plants
To test whether there is a link between bacterial genotype and its phenotype of virulence on strawberry plants, plants were inoculated with a representative set of X. fragariae strains, covering all (sub)groups. X. arboricola pv. fragariae was also included in these assays. All strawberry plants inoculated with X. fragariae strains showed symptoms after 8 to 14 days post inoculation (dpi), dependent on the strain used (Fig. 6) indicating that the strains representing each of the X. fragariae (sub)groups were all pathogenic on strawberry. Symptoms caused by X. fragariae were not uniform at the plant level, as only a limited number of leaves with variable symptom intensities were observed. Approximately one third to half of the leaves per plant showed symptoms. Starting from the first appearance of the symptoms, plants were frequently evaluated for their symptom evolution. At each evaluation day, the symptom intensity corresponding to the leaf with the most advanced symptom reaction per plant was recorded ( Fig. 6 and Fig. 7). Leaves, which were not yet present at the time of the inoculation, did not show any symptoms. A slight variation of intensity was observed between (sub)groups, but this could not be Fig. 4 The two plasmid families present in three complete Xanthomonas fragariae genome sequences. a The plasmid family pXf21 was variable between strains and an 8-kb region found in both pNBC2815-21 and pPD5205-21 is substituted by a 14-kb region in the plasmid pPD885-27, which includes two conjugal transfer proteins (VirB) and a mobilization protein, not present in the two other plasmids from pXf21 family. b The plasmid family pXf29 showed a greater conservation between the strains and only an additional transposase (PD5205_RS19760) was found in plasmid pPD5205-30. The genes are color-coded; yellow: replication proteins, orange: toxin-antitoxin related genes, green: recombinase, resolvase and transposase, red: type IV secretion systems (VirB cluster), black: relaxase protein MobC, pink: chromosome partitioning, white: hypothetical proteins, and grey: other genes. The grey shadings between the plasmids correspond to > 99% nucleotide identity statistically evaluated. On the other hand, in this study it was not possible to provoke symptoms by X. arboricola pv. fragariae strains on the Elsanta strawberry plants inoculated during the 30 days of incubation. The group of inoculated strains included strains with and without intact Hrp system and both types of strains resulted in absence of symptoms.

Discussion
Our study provides a thorough comparative genomics analysis of the sequence data from 58 X. fragariae genomes obtained in previous studies (Vandroemme et al. 2013a;Henry and Leveau 2016;Gétaz et al. 2017b;Gétaz et al. 2018b). Based on the whole genome-analysis, we could confirm that all the tested X. fragariae were closely related and therefore belonged to the same species. The overall structure of the genomes was highly similar in all X. fragariae strains but some slight differences in genome organization were observed. The main difference resides especially in the variable number of plasmids in X. fragariae, a feature mainly revealed by long-read sequencing. The plasmid diversity analysis indeed revealed a differential plasmid presence in X.
fragariae strains that reflects the X. fragariae population structure previously reported using the same bacterial strains (Gétaz et al. 2018b). This population structure study suggested that both Xf-CGr-I and -II groups were separated before the description of the X. fragariae type strain in 1960 (Kennedy and King 1960). In this perception, strains from Xf-CGr-IA were considered as more ancestral due to their CRISPR spacer composition (Gétaz et al. 2018b). From this population structure results and following the principle of parsimony with the minimal evolutionary changes, we can hypothesize that all strains from Xf-CGr-I could have had two plasmids while the strains from Xf-CGr-IB subsequently lost the pXf29 plasmid. On the other hand, strains from Xf-CGr-II only harbored one plasmid, suggesting either the acquisition of pXf29 in Xf-CGr-I or its loss in Xf-CGr-II, after the separation of both groups.
The comparative genomics approach focused mainly on the identification of known Xanthomonas virulencerelated factors. The results revealed that the virulencerelated gene repertoire was, with minor exceptions, identical among all X. fragariae genomes. During the thorough in silico screening of virulence-related genes, Fig. 5 Generation time of each Xanthomonas fragariae (sub)group and Xanthomonas arboricola pv. fragariae strain. Each box represents a given strains set of (sub)group Xf-CGr-IA (n = 5), −IB (n = 3), −IC (n = 30) or -II (n = 20) or X. arboricola pv. fragariae (n = 5). The median, error bars as well as outlier per group are represented on this boxplot. Letters A and B are used to show the statistical intraspecies and interspecies relationship and reflect both X. fragariae and X. arboricola pv. fragariae. Intraspecies variation was not significant (p-value > 0.05) whereas interspecies variation was significant (p-value < 0.05) GumE, GumK, XopAE and XopC were found to have a larger number of non-synonymous SNPs between Xf-CGr-I and Xf-CGr-II. The higher number of nonsynonymous SNPs in these proteins may be indicative of a positive selection in these virulence-related genes. Our analysis also showed the absence of the TAL effectors in X. fragariae, a result that was already reported before (Vandroemme et al. 2013a), but this conclusion was based on a genome that was sequenced with Illumina HiSeq technology yielding in short reads. However, due to the repetitive sequences of the TAL effectors, the absence might have been due to assembly issues, based on Fig. 6 Outcome of the inoculation test of strawberry plant Fragaria × ananassa variety Elsanta. The inoculation tests were carried out with 18 Xanthomonas fragariae strains belonging to all X. fragariae (sub)groups. Additionally, inoculations with two X. arboricola pv. fragariae strains and a buffer-only control were carried out. The evolution of the symptoms was color-coded (red (0) = no symptoms; light green (1) = low number of spots, visible through the leaf with a light; middle dark green (2) = extensive spots visible on both sides of the leaf; dark green (3) = extensive spots coloring into yellow, tending to necrosis; brown (4) = necrotic regions on leaves) Fig. 7 Evolution of symptoms on strawberry leaves inoculated with the Xanthomonas fragariae strain JvD-0051. At 8 dpi (a), single spots are visible on the abaxial side of the leaf (category = 1). From 15 dpi (b) to 19 dpi (c), spots are visible on both sides of the leaf and aggregate (category = 2). At 22 dpi (d), spots turn to yellow color (category = 3), and then turn to extensive necrosis (category = 4) between 26 dpi (e) and 30 dpi (f), the last time point of the experiment. The white line represents 1 cm on the photography the length of the used reads. Here, we confirm the absence of TAL effectors from the X. fragariae genomes based on the five complete genomes (Henry and Leveau 2016;Gétaz et al. 2017b). These structural variations, gene content and gene sequence differences were hypothesized to influence the ability of a strain to interact with its host. Therefore, plant inoculations using the susceptible strawberry cultivar Elsanta (Kastelein et al. 2014) were performed. These virulence tests revealed a slight variation of intensity between X. fragariae (sub)groups, but this could not be statistically evaluated. For this, additional inoculation tests, including more X. fragariae strains from all (sub)groups and more plant replicates per treatment would be required, eventually also considering the use of other strawberry cultivars with other sensitivity to the disease. Only then it can be determined whether the described genomic differences between (sub)groups are the reason for the influence on the symptom intensity on strawberry.
The cluster of phage genes and the different clusters with proteins annotated as phage remnant found in all the genomes of X. fragariae suggest a higher phage pressure on this species than on X. arboricola pv. fragariae. The phage pressure was already illustrated by the presence of CRISPR spacers targeted against different phagerelated sequences between X. fragariae (sub) groups (Gétaz et al. 2018b). Recently, the pressure was also demonstrated by the isolation of the first phage that could infect seven of eight X. fragariae strains tested and none of the 14 other Xanthomonas species tested (Miller et al. 2020). More generally these phages also contribute to the diversification of the bacterial genome architecture, for instance by horizontal gene transfer and most of the current evidence for the involvement of phages in shaping bacterial genomes, bacterial fitness, and hostpathogen interactions deals with events at this lowest taxonomy level (Brüssow et al. 2004).
The comparative genomics analysis of X. fragariae genomes was supplemented by the analysis of five X. arboricola pv. fragariae strains (Gétaz et al. 2018a). For this species, pathogenicity upon artificial inoculation on strawberry plants was not always reproducible in glasshouses or in various laboratory experiments (Scortichini and Rossi 2003;Vandroemme et al. 2013b;Merda et al. 2016). The inoculation test performed in this study showed that, in contrast to all X. fragariae (sub)groups, symptoms were not obtained after inoculation of X. arboricola pv. fragariae. The lack of symptoms with X. arboricola pv. fragariae could be attributed to the presence of a smaller virulence gene repertoire in X. arboricola pv. fragariae, where especially the lack of T3SS and T3E in their genomes was hypothesized to influence its pathogenicity. However, the lack of most Hrp T3SS proteins in LMG 19145 PT , CFBP 6773 and LMG 19144 strains could not explain its non-pathogenic behavior since the strain LMG 19146 harboring the full Hrp cluster was not pathogenic either. A detailed study of the pathogenicity of this pathovar and its host range needs thus to be done, as strain LMG 19145 PT was recently confirmed as causing symptoms on strawberry plants from cultivars Candonga, Sabrina, and Murano (Ferrante and Scortichini 2018) suggesting that another virulence factor may cause symptoms on some strawberry cultivars. In this study, the same strain, inoculated on the cultivar Elsanta, did not produce any symptoms. The lack of constancy in symptom reproduction on the various cultivars (Scortichini and Rossi 2003;Vandroemme et al. 2013b;Merda et al. 2016) may thus reflect that X. arboricola pv. fragariae could have a reduced host range possibly limited to certain strawberry cultivars only.
In previous studies on X. arboricola, strains from X. arboricola pv. fragariae were shown to be polyphyletic (Vandroemme et al. 2013b). Based on the genome sequences, we identified at least two groups with different gene repertoires in the five strains of this pathovar included in this study. Therefore, it is important that, when strains from X. fragariae pv. fragariae are tested, both groups are included in order to identify whether all strains of this pathovar are pathogenic on strawberry.
Here, X. arboricola pv. fragariae was also shown to grow faster than X. fragariae in the same liquid medium under the tested conditions. We hypothesize that this difference in growth behavior may yield a positive detection of X. arboricola pv. fragariae in the process of isolation of the pathogen from symptomatic plants having both X. arboricola pv. fragariae and X. fragariae on the diagnosed plant material. Indeed, it was already reported that X. arboricola pv. fragariae and X. fragariae could be co-isolated from symptomatic strawberry leaves (Scortichini and Rossi 2003;Vandroemme et al. 2013b). In order to avoid enrichment and biased growth between both bacterial species, the recently designed X. fragariae loopmediated isothermal amplification (LAMP) could be used to directly detect plant tissue (Gétaz et al. 2017a). For detection of X. arboricola pv. fragariae, a new assay would need to be designed using the available genome data.
Finally, the presence of a full T6SS in all X. fragariae (sub)groups is also reported in the present work. This T6SS cluster was recently described as having a particular genomic architecture in which structural genes are split in two clusters located around~300 kb from each other while additional genes are present (Bayer-Santos et al. 2019). The relatively high number of putative T6SS effectors identified in the X. fragariae genomes (Bayer-Santos et al. 2019; Bosis 2019) could also point to an important ecological role in the life of this bacterium. These observations together with the wide distribution of T6SS clusters in plant-associated bacteria could suggest that this system is crucial for optimal fitness during plant colonization, as the main role of T6SS in phytobacteria would be interbacterial competition rather than host manipulation (Bernal et al. 2018) as also recently demonstrated in the rice pathogen X. oryzae pv. oryzicola (Zhu et al. 2020). Since it was already reported that X. fragariae and X. arboricola pv. fragariae were coisolated (Scortichini and Rossi 2003;Vandroemme et al. 2013b), the T6SS found only in X. fragariae could be an advantage as soon as both bacterial pathogens compete with each other in the plant.

Conclusions
Overall, the comparative genomics approach on 58 X. fragariae strains and five X. arboricola pv. fragariae strains revealed both intraspecies and interspecies genomic variations. Within X. fragariae, this included largescale genetic rearrangements at the full-genome level, the presence of none, one or two plasmids in individual strains, but also a conserved virulence-related gene repertoire. Although all tested X. fragariae strains were pathogenic to strawberry, none of the tested X. arboricola pv. fragariae provoked symptoms. The lack of many T3SS and T3E proteins in the genomes of the five X. arboricola pv. fragariae strains suggests that this pathovar may not be able to cause symptoms as the important genes required for virulence are missing.
The co-isolation of X. fragariae and X. arboricola pv. fragariae is complicating diagnostics from symptomatic plants, and wrong conclusions may be drawn when picking a single colony from plate. Tools for diagnostics directly from infected plant material like LAMP are already available for the specific detection of X. fragariae (Wang and Turechek 2016;Gétaz et al. 2017a). A similar test would need to be designed as well for detection of X. arboricola pv. fragariae. However, the polyphylous nature of the pathogen may prohibit finding X. arboricola pv. fragariae-specific genes easily within the X. arboricola clade. For optimal detection, the combination of both assays should be used in diagnostic settings in order to avoid isolation biases. This may then allow the clarification of the reported differences in the description of X. arboricola pv. fragariae pathogenicity to strawberries.

Bacterial strains
A total of 58 genomes of X. fragariae strains including five complete and 53 partial genome sequences obtained in a population structure analysis (Gétaz et al. 2018b) were used to assess their intraspecies variations (Table 1). Automatic genome annotation of the X. fragariae genomes was performed using the GenDB platform v.2.4 (Meyer et al. 2003) except for three of them that were directly obtained from GenBank (Table 1). Additionally, five X. arboricola pv. fragariae strains (Gétaz et al. 2018a) were included in the analysis in order to assess variations within and between X. arboricola pv. fragariae and X. fragariae strains (Table 1). All genomes were added to EDGAR v.2.2 (Blom et al. 2016) for whole-genome comparisons.
Virulence-related proteins for Xanthomonas spp.
A list of virulence-related proteins of xanthomonads was compiled from a list of proteins thought to play a role in bacterial virulence in the first X. fragariae draft genome LMG 25863 (Vandroemme et al. 2013a), including T3SS, T4SS, T6SS (Bayer-Santos et al. 2019), lipopolysaccharides (LPS) and extracellular polysaccharides (EPS) synthesis proteins and from an online list of T3SS effectors found in the genus Xanthomonas (Koebnik 2016). Overlaps between both lists were removed, resulting in a final list of 163 proteins (Additional file 1: Table S2) that was used as a reference in this study.

Comparative genomics
Annotated complete genomes were aligned with MAUVE v.2.3.1 (Darling et al. 2004). Using tBLASTn v.2.3.0+ search (Camacho et al. 2009), all genomes were screened for the presence and sequence identity of virulence-related proteins (Additional file 1: Table S2). Sequences of present proteins were aligned using Clustal W (Thompson et al. 1994) on MEGA v.6.06 (Tamura et al. 2013) in order to manually screen intraspecies and interspecies amino acid variations, as well as hit length, which may be reduced due to contig edges, to late coding start or early stop codon. The sequence variation of virulence-related genes between strains was then based at amino acid sequence level, which does not consider synonymous mutations (Seo and Kishino 2008).
A non-linear pan genome of X. fragariae, containing all CDS of the 58 X. fragariae strains was generated using EDGAR v.2.2 (Blom et al. 2016) with the strain PD 885 T as reference. In order to visualize variations of CDS at the strain level within X. fragariae, the generated pan genome was used as reference sequence to compare the 58 X. fragariae strains with BLAST ring image generator (BRIG) v.0.95 (Alikhan et al. 2011). The EDGAR platform v.2.2 (Blom et al. 2016) was used to generate specific subsets of CDS in order to compare core genomes of each X. fragariae (sub) group and X. arboricola pv. fragariae. Genes were considered orthologous when a reciprocal best BLAST hits was found between two genes, and when both BLAST hits were based on alignments exceeding 70% sequence identity spanning over at least 70% of the query gene length (Blom et al. 2009). Subsequently, the average amino acid identity (AAI) was computed for all strains from both bacterial species on the MLSA: Multilocus sequence analysis; SNP: Single-nucleotide polymorphism; T2SS: Type II secretion system; T3E: Type III secretion system effector; T3SS: Type III secretion system; T4SS: Type IV secretion system; T6SS: Type VI secretion system; TAL: Transcription activator like; VNTR: Variable numbers of tandem repeats; Xf-CGr: Xanthomonas fragariae CRISPR group