Nucleotide sequencing of the HoxA gene cluster using Gorilla fosmid clones

We sequenced the western gorilla (Gorilla gorilla) HoxA cluster region using seven fosmid clones, and found that the total tiling path sequence was 214,185 bp from the 5' non-genic region of HoxA1 to the 3' non-genic region of Evx1. We compared the nucleotide sequence with the gorilla genome sequence in the NCBI database, and the overall proportion of nucleotide difference was estimated to be 0.0005-0.0007. These estimates are lower than overall genomic polymorphism in gorillas.


Introduction
The euchromatic sequence of the human genome was sequenced using the hierarchical shotgun sequencing strategy, also known as clone-by-clone sequencing, by the International Human Genome Sequencing Consortium [1]. With the advent of next-generation sequencer (NGS), the hierarchical shotgun sequencing strategy is now less commonly used. However, hierarchical shotgun sequencing involving bacterial artificial chromosomes (BACs) and fosmids is still used for some purposes, such as to determine the long complete haploid in a chromosome region.
Kim et al. [2] constructed a western gorilla (Gorilla gorilla) fosmid library and established a simple polymerase chain reaction (PCR) screening system for it. They also selected seven fosmid clones, which constitutes the minimum tiling path for the entire HoxA gene cluster in the gorilla genome [2]. In this study we sequenced these fosmid clones and used them in our analysis.

Materials and Methods
Seven fosmid clones originated from one female gorilla individual "Taiko" (GGFP-562J15, GGFP-367A20, GGFP-347D05, GGFP-175G07, GGFP-452O13, GGFP-012E07, and GGFP-210K06) were screened and selected in a previous study [2]. These seven clones were used for this study. We determined nucleotide sequences primarily using the hierarchical shotgun sequencing method, as previously described [3], by sequencing each clone with more than tenfold-coverage. For base-calling, assembly, and to obtain a quality score for both raw and assembled data we used the Phred-Phrap software package [4]. Editing was performed with Consed [5]. Finishing was carried out by primer walking and PCR-coupled primer walking.
The fosmid sequences were compared with the nucleotide sequences (assembly NC_044609.1 from the Kamilah_GGO_v0 genome) available in the National Center for Biotechnology Information (NCBI) database. Pairwise sequence alignment was done manually using MEGA7 software [6]. Transitions, transversions, synonymous and nonsynonymous substitutions were counted using this software. Synonymous and nonsynonymous Ivyspring International Publisher substitutions were estimated using Nei and Gojobori's method [7]. When counting substitutions, the cluster sequence was divided into coding sequence (CDS), intron, and inter-genic regions. The 53 amino acid residues (e.g. RTNFTTKQLTELEKEFH FNKYLTRARRVEIAASLQLNETQVKIWFQNRRMK QK from HOXA1) were used as the homeodomain regions.
We compared the nucleotide difference (p-distance) between our two concatenated haploid gorilla fosmid sequences and the whole-genome shotgun sequence of the gorilla (assembly NC_044609.1 from the Kamilah_GGO_v0 genome) ( Table 1). The overall nucleotide difference in the HoxA cluster region between concatenated haploid sequence 1 and the shotgun sequence (100,577 bp) was 0.0005, and between concatenated haploid sequence 2 and the shotgun sequence (123,061 bp) was 0.0008. These estimates were lower than the genomic polymorphisms reported from two other western gorillas [Kamilah: 0.189% (0.00189), EB(JC): 0.178% (0.00178)] [8]. These results are reasonable, because nucleotide differences in the HoxA gene cluster region are expected to be lower than in other genomic regions.
In the CDS regions, the average p-distance was 0.0004, ranging from 0 [in HoxA1, HoxA3, HoxA6, HoxA7 (in concatenated haploid sequence 1), HoxA7 (in concatenated haploid sequence 2), HoxA11, HoxA13, and Evx1] to 0.0012 (in HoxA5 and HoxA9). One synonymous difference was observed in HoxA2. One nonsynonymous difference was observed for each of HoxA4, HoxA5, and HoxA9, although these changes were not located in homeobox regions.  An 18 bp gap was observed in exon 1 of HoxA10. The fosmid sequences were 18 bp longer than the whole-genome shotgun sequence of the gorilla (assembly NC_044609.1 from the Kamilah_GGO_v0 genome). When we examined the same region in humans (Homo sapiens), chimpanzees (Pan troglodytes), and orangutans (Pongo abelii), we found that they did not contain the 18 bp insertion. We used PCR to sequence this region in the DNA of the gorilla (Taiko) genome and confirmed that Taiko did have the 18 bp insertion. Because this insertion was observed in the non-homeobox region, we hypothesize that this insertion does not affect the fundamental function of HoxA10.
The estimates for the CDS, introns, and intergenic regions were lower than the genomic polymorphisms reported from two other western gorillas (Kamilah: 0.00189, EB(JC): 0.00178) [8]. It has been previously reported that the four Hox gene clusters have the lowest density of interspersed repeats in the human genome [9], probably because of the large-scale cis-regulatory elements that cannot tolerate being interrupted by insertions [10][11][12]. The lower estimates we observed in the HoxA gene cluster region are therefore reasonable.
In conclusion, we successfully sequenced the HoxA cluster region of the gorilla. This region consists of two stretched haploid sequences that will be available for further analysis, including cis-regulatory element, linkage disequilibrium, and recombination.

Abbreviations
NGS: next-generation sequencer; BAC: bacterial artificial chromosome; PCR: polymerase chain reaction; NCBI: National Center for Biotechnology Information; CDS: coding sequence.