Draft Genomes of Frankia strains AiPa1 and AiPs1 Retrieved from Soil with Monocultures of Picea abies or Pinus sylvestris using Alnus incana as Capture Plant

The genomes of two nitrogen-fixing Frankia strains, AiPa1 and AiPs1, are described as representatives of two novel candidate species. Both strains were isolated from root nodules of Alnus incana, used as capture plants in bioassays on soils from a reforested site at Karttula, Finland, that was devoid of actinorhizal plants but contained 25 year-old monocultures of spruce (Picea abies (L.) Karsten) or pine (Pinus sylvestris L.), respectively. ANI analyses indicate that each strain represents a novel Frankia species, with genome sizes of 6.98 and 7.35 Mb for AiPa1 and AiPs1, respectively. Both genomes harbored genes typical for many other symbiotic frankiae, including genes essential for nitrogen-fixation, for synthesis of hopanoid lipids and iron-sulfur clusters, as well as clusters of orthologous genes, secondary metabolite determinants and transcriptional regulators. Genomes of AiPa1 and AiPs1 had lost 475 and 112 genes, respectively, compared to those of other cultivated Alnus-infective strains with large genomes. Lost genes included one hup cluster in AiPa1 and the gvp cluster in AiPs1, suggesting that some genome erosion has started to occur in a different manner in the two strains.


Introduction
Technological advances in whole genome sequencing, in single-cell metagenomics and in comparative bioinformatics have revolutionized the description of microbial genera, species and subspecies [1][2][3]. Comparative sequence analyses of whole genomes and the ANI metric [4] are now used as foundations for the classification of both cultured and uncultured microbes [5][6][7]. Members of the genus Frankia are soil and nodule actinobacteria that have resisted isolation attempts for a long time. The first isolate was described only in 1978 [8], after which many more followed [9][10][11][12]. Differentiation of isolates has also been hampered by the limited availability of distinguishing phenotypic features between popula-tions [13]. Consequently, species of the genus Frankia have been scantily described for many years [14,15].
Whole genome sequencing techniques have permitted to overcome these difficulties, resulting in the description of twelve species in the genus Frankia so far, with type strains deposited in international culture collections [16][17][18]. Five candidate species have also been described using whole genome analyses of uncultured Frankia populations in root nodules [6,[18][19][20]. The number of available whole genome sequences for Frankia strains has increased significantly during the last years, with many strains potentially representing new species [21][22][23]. These data indicate that the genus Frankia is probably much Ivyspring International Publisher more diverse than the twelve species and five candidate species described so far [13,[24][25][26][27]. This statement is supported by recent genome analyses of Frankia strains isolated from nodules of Alnus glutinosa as representatives of three yet undescribed nitrogen-fixing symbiotic species [22], and by the identification of two additional species of non-nitrogen-fixing and non-symbiotic frankiae [21].
Comparative sequence analyses of amplicons of an actinobacteria-specific insertion in the 23S rRNA genes of frankiae identified strains AiPa1 and AiPs1 as additional candidates for the description of new species [28]. Strains AiPa1 and AiPs1 have been isolated from root nodules of Alnus incana that was used as capture plant in bioassays aiming to determine the effects of 25 year-old monocultures of spruce (Picea abies (L.) Karsten) and pine (Pinus sylvestris L.) at a reforested site at Karttula, Finland (62° 53′, 26° 58′) on the nodulation capacity and diversity of frankiae in soils devoid of actinorhizal plants [29]. Basic soil characteristics were virtually identical for both sites, characterized as a fine silty sand with 12 to 14% organic matter and a pH of 5.3 to 5.5 [29,30]. The aim of this study was to use whole genome sequence analyses in order to evaluate and corroborate the potential of strains AiPa1 and AiPs1 for the description of new Frankia species.

Sample preparation
Defined Propionate Medium (DPM) containing propionate and NH4Cl as C and N source [31], respectively, was used to grow Frankia strains AiPa1 and AiPs1. Cells of both strains had been preserved in 20% v/v glycerol at -80°C since 2003. After two weeks of growth at 30°C, cells were harvested by centrifugation (15,000 x g, 5 min). After a brief sonication to disrupt aggregates of cell filaments (10 s at 20% output in a S-450 sonifier, Branson Ultrasonics, Danbury, CT) [32] followed by an additional centrifugation step, DNA was extracted from cell pellets using the SurePrep TM Soil DNA Isolation Kit (Fisher Scientific, Houston, TX) [33]. DNA was sent to the Microbial Genomics Sequencing Center, Pittsburgh, PA, USA, for library preparation and sequencing using standard protocols for the Illumina tagmentation and the NextSeq Illumina platform (2 x 150 bp).

Comparative genomic analysis
We computed Average Nucleotide Identity (ANI) [4] of the assembled genomes of AiPa1 and AiPs1 with Frankia genomes of type strains of all described species and other selected genomes using the pyani platform with the b (Blast) setting ( [38]; https://pyani.readthedocs.io). Clusters of orthologous genes (COGs) [39], secondary metabolite clusters and genes specific to or lost in the new genomes were identified through antiSMASH [40] on the Mage platform [41]. An MLSA with AtpD, DnaA, FtsZ, Pgk, and RpoB was used to compute an AA distance matrix as done previously [42] to construct a phylogenetic tree using a rapid Neighbour Joining algorithm [43] and a bootstrap analysis [44].

Characteristics of the two Frankia genomes
CheckM scores of 98.6% and 98.5% indicated that the genomes of strains AiPa1 and AiPs1, respectively, could be considered complete, while contamination indices of 0.27 and 4.05 demonstrated that they were pure. Genome sizes of AiPa1 and AiPs1 were 6.98 Mb and 7.35 Mb with GC contents of 71.12 and 72.13%, respectively, and were made up of 165 and 1,203 contigs with the largest contig being 310,535 and 128,115, respectively (Table 1).

Phylogenetic analysis of Frankia spp.
The MLSA with Frankia type strains revealed that strains AiPa1 and AiPs1 were members of Frankia cluster 1 (Figure 1). AiPa1 and AiPs1 represent two distinct lineages within the genus Frankia: strain AiPa1 has an ANI of 82% to its closest relatives (Ag45/Mut15 and AgPM24), with ANI percentages to other Frankia strains between 76 and 80%, while AiPs1 is closely related to, but distant from Frankia alni ACN14a and Frankia torreyi CpI1 with ANI values of 90 and 91%, well below the threshold of 95 proposed to delineate species ( Figure 2). Both strains belong to cluster 1 frankiae, with ANI values ranging from 80% to 91%, while 76-77% values were obtained with cluster 2 genomes, and 77-78% with cluster 3 and 4 genomes ( Figure 2).
This study This study (46) * indicates the number of clusters identified by AntiSMASH ** indicates the number of hits (>50%) following a BlastP. nif is nitrogenase, shc is squalene hopene cyclase, hup is hydrogenase uptake, suf is sulfur-iron cluster, cel is cellulase, glx is glucose oxidase, bcs is cellulose synthase, gvp is gas vesicle cluster, sodF is superoxide dismutase iron, geoA is geosmine synthase, arG is arginine, acc is acetate carboxylase, can is carbonic anhydrase, rhb is rhizobactin, lac is laccase, phd is a phytoene desaturase, dct is a dicarboxylate transporter, tgs is diacylglycerol O-acyltransferase. ddn is F420H(2)-dependent quinone nitroreductase, mop is molybdenum transport, qor is quinone oxydoreductase, glb is hemoglobin. Figure 1. Phylogenetic tree of complete genomes using Streptomyces coelicolor A32 (AL645882) as outgroup. Frankia clusters are indicated on the right. Bootstrap results above 90% are given at nodes. The bar indicates 0.02%substitution/site. The two genomes described in the present study are framed.

Analysis of functional genes in Frankia spp. isolates
Genes such as nif, hup, suf, shc, cel, glx, bcsA meant to be characteristic for the symbiosis were present in the genomes of symbiotic lineages (clusters 1, 2 and 3) compared to non-symbiotic lineages (cluster 4) ( Table  1; Table S1). Both genomes lacked gvp genes encoding gas vesicle proteins, and one of the two hup clusters commonly found in cluster 1 frankiae was absent in AiPs1 but present in AiPa1.
Larger genomes have commonly been associated with duplications of genes involved in substrate transfers into central metabolic pathways [46]. Strains with larger genomes might therefore be considered to have a higher potential to exploit a large variety of environments [20,46]. Smaller genomes like those of F. casuarinae, F. nodulisporulans and F. coriariae have been linked to genome reductions resulting in reduced saprotrophic potential, though their symbio-tic potential is maintained [46]. The genome sizes of 6.98 and 7.35 of strains AiPa1 and AiPs1 fit into recent discoveries of smaller genome sizes for many cluster 1 strains, with a size range from 6.4 Mb to 6.7 MB (Table  1). Smaller genome sizes coincide with the loss of some duplicated genes such as, for example, the shc gene coding for the synthesis of hopanoid lipids [46], or the hup genes coding for hydrogen uptake for the recycling of hydrogen derived from nitrogenase [51]. Genes such as hup are lost by many strains (e.g. Ag45/Mut15, AgPM24, AgB32, AgKG'84/4) [22] including AiPa1, but not AiPs1 in this study, while others such as shc are lost less frequently (e.g. only in AgB32, AgKG'84/4) [23]. The numbers of genes lost by individual strains differ significantly, e.g. 475 or 112 genes for AiPa1 and AiPs1, respectively, 380 or 409 genes for AgB32 and AgKG'84/4, or 459 genes for Ag45/Mut15 and AgPM24 [22]. Thus, genome reductions could be caused by genome erosion, which could be more pronounced in some strains such as Ag45/Mut15, AgPM24, AgB32 and AgKG'84/4 compared to others, including strains AiPa1 and AiPs1.
While smaller genomes have been found so far in frankiae with high symbiotic specificity and reduced saprophytic capabilities (e.g. F. casuarinae, F. nodulisporulans and F. coriariae), potential genome reduction is not indicative of reduced saprophytic growth in strains Ag45/Mut15, AgPM24 and AiPa1. Similar to species described for clusters 1 and 3, as well as for strains AgB32, AgKG'84/4 and AiPs1, all three strains grow in the rhizosphere of host and non-host plants, but in contrast to these species also proliferate in the presence of leaf litter as sole C-and N-sources [32,52]. Thus, these strains are able to grow on complex organic material such as leaf litter, and do not require easily available C-resources such as root exudates [53][54][55][56]. Thus, genome erosions could have affected other traits of these strains, as indicated by the low competitive ability for nodule formation of strain Ag45/Mut15 compared to F. torreyi [57,58].
Relatives of Frankia strain AiPa1 were exclusively or most prominently found in recently vegetated soils [59,60], while relatives of AiPs1 were only found in the older part of the plantation (65 yrs), together with relatives of AiPa1, suggesting that environmental changes in time might have promoted Frankia population changes [59]. Nodulation capacities of soils under spruce and pine monocultures were similar [61], with root nodule populations representing relatives of AiPa1 under spruce and AiPs1 under pine [29]. However, both soils harbored other Frankia populations related to strain Ag45/Mut15 that is closely related to AiPa1 [29,61]. Thus, it remains highly speculative to draw any conclusions about potential effects of plant species on the development of infective Frankia populations in soils under 25 year-old monocultures of spruce and pine.
The availability of genome information of strains AiPa1 and AiPs1 as representatives of two new species within cluster 1 of the genus Frankia provides the possibility to assess the importance of gene erosion, but also the presence of unique genes with respect to different physiological and potentially ecological adaptations. Thus, additional studies need to include analyses of unique and additional genes to highlight differences between these two strains and other representatives of cluster 1, and relate those differences to environmental characteristics.