The Cork Oak Genome Sequencing initiative (GENOSUBER) released the first draft genome assembly in 2018 (Ramos et al., 2018). The sequenced genotype (HL8) is located at Herdade dos Leitões (Montargil, Ponte de Sor, Portugal), owned by Fundação João Lopes Fernandes. HL8 was selected, among other cork oak trees, for the low level of homozygosity and good cork quality.
DNA sequencing was performed using Illumina Technology using a combination of paired-end and mate-pair libraries. Individual paired-end library assembly was performed using RAY, which was followed by a global assembly using GARM. The mate-pair libraries were further integrated in the previous assembly to merge different contigs and decrease whole genome fragmentation. This scaffolding strategy was performed using BESST and SOAP de novo Gap Closer. The removal of alternative heterozygous scaffolds was performed using Redundans.
The first draft genome has a predicted size of 953.3 M base pairs and is organized in 23,347 scaffolds, with 94.6% of the genome being represented in a 4,730 larger scaffolds (longer than 10 K base pairs).
Structural annotation of the genome predicted 79,752 genes with complete open reading frames, and 83,814 transcripts. Gene annotation was performed using AUGUSTUS, integrating mRNA expression data from five different tissues (leaf, xylem, inner bark, phellem and pollen) to provide species-specific signature structures, with gene models from Arabidopsis thaliana. Functional annotation was performed using BLASTP (against the databases NCBI-nr and Swiss-Prot) , gene ontologies were assigned using eggNOG-mapper, and conserved protein domains detected using InterProscan. Functional annotations in all databases were obtained for 40,599 transcripts (37,724 genes).
An improved version of the cork oak genome, integrating more advanced long-read sequencing technology is currently under development and will be made available in future releases of the CorkOakDB.