GFF3 and GTF File Formats
GFF3 and GTF are tab-delimited annotation formats used to describe genomic features such as genes, transcripts, and exons.
TL;DR
- Both formats use 1-based, closed genomic coordinates.
- Core records are 9 columns:
seqid,source,type,start,end,score,strand,phase,attributes. - GFF3 uses
key=valueattributes and explicit parent/child IDs (ID,Parent). - GTF commonly uses
key "value";attributes, especiallygene_idandtranscript_id. - Prefer GFF3 for hierarchical feature models and standards compliance; GTF is common in RNA-seq pipelines.
Structure
Each non-comment line has 9 tab-delimited fields:
- seqid (chromosome/contig)
- source
- feature type (for example
gene,transcript,exon,CDS) - start (1-based)
- end (1-based, inclusive)
- score (
.if missing) - strand (
+,-, or.) - phase (
0,1,2for CDS, else.) - attributes
GFF3 example:
##gff-version 3
chr1 RefSeq gene 11869 14409 . + . ID=gene1;Name=DDX11L1
chr1 RefSeq mRNA 11869 14409 . + . ID=tx1;Parent=gene1
chr1 RefSeq exon 11869 12227 . + . ID=exon1;Parent=tx1
GTF example:
chr1 ENSEMBL gene 11869 14409 . + . gene_id "GENE1"; gene_name "DDX11L1";
chr1 ENSEMBL transcript 11869 14409 . + . gene_id "GENE1"; transcript_id "TX1";
chr1 ENSEMBL exon 11869 12227 . + . gene_id "GENE1"; transcript_id "TX1"; exon_number "1";
Coordinate model (important)
GFF3/GTF are 1-based and end-inclusive.
- Interval
chr1 11869 12227has length12227 - 11869 + 1 = 359bp. - Converting to BED requires coordinate shift:
BED_start = start - 1,BED_end = end.
Practical Conventions
- Keep feature hierarchy consistent (
gene -> transcript -> exon/CDS). - Use stable feature IDs (especially in GFF3
IDandParent). - Keep chromosome naming consistent across FASTA/BAM/VCF/BED.
- Use tabs only; spaces in attributes should remain inside quoted values.
- Sort by chromosome and start when possible for reproducible processing.
Common Pitfalls
- Mixing 1-based GFF3/GTF coordinates with 0-based BED coordinates.
- Broken parent-child relationships (missing
Parent/ID, inconsistent transcript IDs). - Invalid or inconsistent attributes that downstream parsers cannot interpret.
- Treating GFF3 and GTF attribute syntax as interchangeable.
- Incorrect CDS phase values causing translation/frame issues.
Common Uses
- Gene and transcript annotation
- Feature counting and RNA-seq quantification input
- Region extraction and annotation conversion (for example to BED)
Useful Annotation Tools
gffread
Validate, filter, and convert GFF/GTF annotations.
# basic validation-style parse and export
gffread annotations.gff3 -T -o annotations.gtf
# convert GTF to GFF3
gffread annotations.gtf -o annotations.gff3
gffcompare
Compare transcript annotations against a reference.
gffcompare -r reference.gtf -o compare_out query.gtf
gffutils
Build/query a feature database from GFF/GTF.
# create sqlite DB from GFF3
python -m gffutils.cli create annotations.gff3 --db annotations.db
gxf2bed
Convert GFF/GTF annotations to BED intervals.
gxf2bed annotations.gff3 > annotations.bed
gff3 gtf annotation genomics bioinformatics file formats
409 Words
2026-03-10 19:00 (Last updated: 2026-03-11 02:45)