The genome annotation file (.gff) available on NCBI was modified to improve data search retrieval in the portal.
These changes include:
- Renaming the mRNA features as transcripts
- The transcript features now include mRNA, misc_RNA, ncRNA and transcribed pseudogenes
- Adding polypeptide features with information identical to the CDS features
- Adding gene features as "parents" of the polypeptide features
- Adding the Product field to gene features (previously only present in mRNA and CDS features)
- Adding the IDs used in the previous iteration of CorkOakDB with the tag EST_IDs to the transcript features
- Adding a Description field to all features with curated data [from publicly available sources published before the draft genome]