Genomics and Molecular Phylogenetics Tree Analysis of Actinopolyspora Iraqiensis

Abstarct Actinopolyspora iraqiensis IQ-H1 is a novel strain of actinobacteria isolated from extremely halophilic soil samples in Iraq. The whole-genome sequence of this strain is deposited in the National Center for Biotechnology Information (NCBI) GenBank under the accession number NZ_AICW01000000. In this study, the genome features and the molecular phylogenetic tree of Act. iraqiensis IQ-H1are analyzed. The RAST tool was used for genome annotation. The genomic features were elucidated using QUAST tool. The circular genome map, and the core and pan-genome map of Act. iraqiensis IQ-H1 was generated using CGView and the GView tools respectively. The JSpeciesWS server was used for the tetranucleotide signature analysis and the REALPHY server was utilized for the construction of the whole genome sequence based phylogenetic tree. The genome size of the strain was around 4.0 Mpb and the number of contigs was 110 with a GC content of 70.46%. The core genome of Act. iraqiensis IQ-H1 was estimated to be 2.2 Mpb. Based on z-scores of the tetranucleotide signature analysis, Act. halophila DSM 43834, Act. mortivallis DSM 44261 and Act. saharensis DSM 45459 were the most relative strains to Act. iraqiensis IQ-H1with zscores 0.99784, 0.98943 and 0.99789 respectively. Based on the phylogenetic tree constructed from the whole genome sequences, Act. iraqiensis IQ-H1 was the most closely related to Act. saharensis DSM 45459, Act. halophila DSM 43834 and Act. mortivallis DSM 44261. The results suggest that the web-based bioinformatics tools such as QUAST, CGView, GView, JSpeciesWS and REALPHY can be utilized for the analysis of the genomic features of Act. iraqiensis IQ-H1 and other species of the genus Actinopolyspora.

In our study, the whole genome sequence of Act. iraqiensis strain IQ-H1, which isolated from extremely saline soil samples in Iraq [9], along with eight Actinopolyspora strains whose whole genome sequences are available in the National Center for Biotechnology Information (NCBI) were utilized. The genomics and molecular phylogenetics analyses utilizing genomic comparative programs and tools showed that Act. iraqiensis IQ-H1 has unique genomic features and has different genomic characteristics at the strain level than other related Actinopolyspora species.

Materials Method 2.1. Whole genome sequences of Actinopolyspora
As the main objective of this research is to genomics analysis of Act. iraqiensis IQ-H1. The genomic reference sequences of Actinopolyspora genus as well as the whole genome sequence of Act. iraqiensis were utilised. The genome sequences of Act. halophila DSM 43834, Act. mortivallis DSM 4426, Act. erythraea YIM 90600, Act. righensis DSM 4550, Act. alba DSM 45004, Act. saharensis DSM 45459, Act. mzabensis DSM 45460 and Act. xinjiangensis DSM 46732 were obtained from the National Center for Biotechnology Information (NCBI) Genbank database as of March 2020. Genome sequences along with their accession numbers, genome sizes, number of contigs, and GC contents are listed in Table 1. The whole-genome sequences were downloaded and stored in a fasta format for further analyses.

Genome features of Act. iraqiensis IQ-H1
To study the genome features of Act. iraqiensis IQ-H1, the whole genome sequence of this strain was first uploaded to the Rapid Annotation using Subsystem Technology tool (RAST) [10] for annotation. The annotated genome was then sent to QUAST tool [11] to elucidate the unique features. rRNA annotation was done using tRNAscan -SE v2 .0 program [12]. The circular genome map of Act. iraqiensis IQ-H1 was performed using CGView Comparison Tool [13] as the annotated files produced by RAST were in the GeneBank (.gbk) and Gene-Finding (.gff) formats utilized. For the generation of the core and pan-genome map, the GView tool [14] was used. The Act. iraqiensis IQ-H1 genome along with all the reference genome sequences in GenBank format were uploaded to the server. The Act. erythraea YIM 90600 was selected as a seed genome and the other genomes were compared to the seed to locate the unique regions. The seed is incrementally built up with the unique features of the queries to become the pangenome. A BLAST atlas was created to display the presence or absence of features within the query genomes compared to the pan-genome.

Tetranucleotide Signature Analysis
The tetranucleotide signature analysis computes correlation coefficients between tetranucleotide usage patterns of DNA sequences, which can be used as an indicator of bacterial genome sequences relatedness. The calculation of tetranucleotide frequencies for each genome sequence was performed according to [15] through the JSpeciesWS server [16]. In brief, a fragment of DNA sequence with 4 bases can be transformed to an array of 256 possible tetranucleotide patterns and their corresponding expected frequencies are computed. The differences between frequencies and expected values are transformed into Z-scores for each tetranucleotide.

Whole Genome Sequence Based Phylogenetic Tree
For construction of a maximum likelihood phylogenetic tree based on whole genome sequences, the REALPHY version 1.12 method was used [17]. The whole genome of all Actionopolyspora sequences was submitted to the program in the Genbank format. Salinispora tropica was introduced as an outgroup species. The provided sequences were mapped to each other via bowtie2 aligning tool [18]. The sequence alignments of phylogeny were performed using PhyML. The phylogenetic tree was edited using MEGA-6 program [19].

Results and Discussion
A detailed summary of the genome features of Act. iraqiensis IQ-H1is shown in Table 2. From the results, it can be seen that the genome size of Act. iraqiensis IQ-H1was around 4.0 Mpb and the number of contigs was 110 with the largest one 217989 pb. The data obtained from the whole genome sequences have shown that the genome sizes of Actinopolyspora genus ranges from 4.23 Mpb as in the case of Act. mortivallis to 5.25 Mpb as in case of Act. halophila [20,21]. The results have also shown that the GC content of Act. iraqiensis IQ-H1was 70.46%. Previous studies found that Gram-positive bacteria have high GC content than Gram-negative bacteria [22]; also found that GC content is positively correlated with the genome size in bacteria [23]. Although Act. iraqiensis IQ-H1has the smallest genome size compared to the other species of Actinopolyspora that are included in this study (Table 1), it can be seen that Act. iraqiensis IQ-H1has the highest GC content. However, this is clearly because that the whole genome of Act. iraqiensis IQ-H1 was not sequenced completely as the genome sequence of this strain was deposited as a draft genome with 110 contigs and many regions from the genome might be missing. However, only the Act. erythraea YIM 90600 genome of the genus Actinopolyspora was sequenced completely [21]. Other genome features, including protein coding genes (CDS), tRNA genes, rRNA genes, open reading frames (ORF) and GC skew as well as GC content, are shown in   It was necessary for determining the core and the pan-genome of the candidate Actinopolyspora species to identify genomic features common to all and to distinguish those that are unique to Act. iraqiensis IQ-H1.The results from the core and pan-genome analysis reveal that the pan-genome size of the nine strains of Actinopolyspora comprises of 15 Mpb (Figure 2). However, studies have shown that adding more bacterial genome sequences result in an expansion in the pan-genome size of a bacterial species which is known as open pangenome [24,25]. The outer-most slot (red) represents the core genome of Act. iraqiensis IQ-H. The core genome of Act. iraqiensis IQ-H1 is estimated to be 2.2 Mpb (from 12.8-15 Mpb, Figure 2). The core genome slot shows regions, where a BLAST hit, was present between the reference sequence, Act. erythraea In the era of genome sequencing and bioinformatics, it is now generally accepted that genome sequencing has the potential to be a routine approach of measuring genetic relatedness between closely related species. It has been demonstrated in many studies, the analysis of tetranucleotide usage patterns is often as a much more reliable measure of sequence relatedness than the GC content the traditional method of DNA-DNA hybridisation [15,27,28]. The threshold value > 0.999 (Above cut-off) indicates that the two genomes are the same species while the threshold value > 0.989 (Below cut-off) indicates that the two genomes are distinctly different [15]. Based on these values, the results have shown that Act. iraqiensis IQ-H1 is a new distinct species in the genus of Actinopolyspora (Table 3). However, Act. halophila DSM 43834, Act. mortivallis DSM 44261 and Act. saharensis DSM 45459 seem to be the most relative strains to Act. iraqiensis IQ-H1with z-scores 0.99784, 0.98943 and 0.99789 respectively. Remarkably, it was noticed that when two genomes are closely related the distinction between the z-scores values will decrease while when the relatedness between two genomes decreased, the disparity between the z-scores values will increased [29]. According to this, Act. iraqiensis is the most related species to Act. saharensis DSM 45459 and Act. iraqiensis is the most distinct species to Act. righensis DSM 45501 with z-scores 0.99789 and 0.89345 respectively (  It was observed that using the whole genome sequence for phylogenetic analysis is quite complicated and that the phylogenetic trees based on whole-genome analysis are not similar [30]. In our study, we utilized a REALPHY bioinformatics program [17] to infer a phylogenetic tree of Act. iraqiensis IQ-H1 with those of closely related Actinopolyspora strains based on whole genome sequences. From the results, it is very obvious that Act. iraqiensis IQ-H1 is the most closely related to Act. saharensis DSM 45459, Act. halophila DSM 43834 and Act. mortivallis DSM 44261 (Figure 3). Figure 3. A maximum likelihood phylogenetic tree constructed from the nine whole genome sequences in the GenBank format using the REALPHY method [17] via bowtie2 aligning tool [18]. Salinispora tropica was introduced as an outgroup species. The phylogenetic tree was edited using MEGA-6 program [19]. phylogeny was tested by 1000 of bootstrap replications. Figure 3. A maximum-likelihood phylogenetic tree constructed from the nine whole genome sequences in the GenBank format using the REALPHY method [17] via bowtie2 aligning tool [18]. Salinispora tropica was introduced as an outgroup species. The phylogenetic tree was edited using MEGA-6 program [19]. The phylogeny was tested by 1000 of bootstrap replications.

Conclusion
This study has shown that Actinopolyspora iraqiensis IQ-H1 is an Iraqi novel strain of actinobacteria. Act. iraqiensis IQ-H1 was closely related strain to Act. halophila DSM 43834, Act. mortivallis DSM 44261 and Act. saharensis DSM 45459 based on z-scores; and most related strain to Act. saharensis DSM 45459, Act. halophila DSM 43834 and Act. mortivallis DSM 44261based on whole genome sequences phylogenetic tree. The findings indicate that the biological information in the form of whole genome sequences stored at the National Center for Biotechnology Information (NCBI) database along with the bioinformatics tools used in the study can be utilized for the molecular phylogenetics and genomic features analyses of Act. iraqiensis IQ-H1 and related species of the genus Actinopolyspora.