The genome annotation file (.gff) available on NCBI was modified to improve data search retrieval in the portal.

These changes include:

  • Renaming the mRNA features as transcripts
    • The transcript features now include mRNA, misc_RNA, ncRNA and transcribed pseudogenes
  • Adding polypeptide features with information identical to the CDS features
  • Adding gene features as "parents" of the polypeptide features
  • Adding the Product field to gene features (previously only present in mRNA and CDS features)
  • Adding the IDs used in the previous iteration of CorkOakDB with the tag EST_IDs to the transcript features
  • Adding a Description field to all features with curated data [from publicly available sources published before the draft genome]