GFF3 and GTF are tab-delimited annotation formats used to describe genomic features such as genes, transcripts, and exons.

TL;DR

  • Both formats use 1-based, closed genomic coordinates.
  • Core records are 9 columns: seqid, source, type, start, end, score, strand, phase, attributes.
  • GFF3 uses key=value attributes and explicit parent/child IDs (ID, Parent).
  • GTF commonly uses key "value"; attributes, especially gene_id and transcript_id.
  • Prefer GFF3 for hierarchical feature models and standards compliance; GTF is common in RNA-seq pipelines.

Structure

Each non-comment line has 9 tab-delimited fields:

  1. seqid (chromosome/contig)
  2. source
  3. feature type (for example gene, transcript, exon, CDS)
  4. start (1-based)
  5. end (1-based, inclusive)
  6. score (. if missing)
  7. strand (+, -, or .)
  8. phase (0, 1, 2 for CDS, else .)
  9. attributes

GFF3 example:

##gff-version 3
chr1	RefSeq	gene	11869	14409	.	+	.	ID=gene1;Name=DDX11L1
chr1	RefSeq	mRNA	11869	14409	.	+	.	ID=tx1;Parent=gene1
chr1	RefSeq	exon	11869	12227	.	+	.	ID=exon1;Parent=tx1

GTF example:

chr1	ENSEMBL	gene	11869	14409	.	+	.	gene_id "GENE1"; gene_name "DDX11L1";
chr1	ENSEMBL	transcript	11869	14409	.	+	.	gene_id "GENE1"; transcript_id "TX1";
chr1	ENSEMBL	exon	11869	12227	.	+	.	gene_id "GENE1"; transcript_id "TX1"; exon_number "1";

Coordinate model (important)

GFF3/GTF are 1-based and end-inclusive.

  • Interval chr1 11869 12227 has length 12227 - 11869 + 1 = 359 bp.
  • Converting to BED requires coordinate shift: BED_start = start - 1, BED_end = end.

Practical Conventions

  • Keep feature hierarchy consistent (gene -> transcript -> exon/CDS).
  • Use stable feature IDs (especially in GFF3 ID and Parent).
  • Keep chromosome naming consistent across FASTA/BAM/VCF/BED.
  • Use tabs only; spaces in attributes should remain inside quoted values.
  • Sort by chromosome and start when possible for reproducible processing.

Common Pitfalls

  • Mixing 1-based GFF3/GTF coordinates with 0-based BED coordinates.
  • Broken parent-child relationships (missing Parent/ID, inconsistent transcript IDs).
  • Invalid or inconsistent attributes that downstream parsers cannot interpret.
  • Treating GFF3 and GTF attribute syntax as interchangeable.
  • Incorrect CDS phase values causing translation/frame issues.

Common Uses

  • Gene and transcript annotation
  • Feature counting and RNA-seq quantification input
  • Region extraction and annotation conversion (for example to BED)

Useful Annotation Tools

gffread

Validate, filter, and convert GFF/GTF annotations.

# basic validation-style parse and export
gffread annotations.gff3 -T -o annotations.gtf

# convert GTF to GFF3
gffread annotations.gtf -o annotations.gff3

gffcompare

Compare transcript annotations against a reference.

gffcompare -r reference.gtf -o compare_out query.gtf

gffutils

Build/query a feature database from GFF/GTF.

# create sqlite DB from GFF3
python -m gffutils.cli create annotations.gff3 --db annotations.db

gxf2bed

Convert GFF/GTF annotations to BED intervals.

gxf2bed annotations.gff3 > annotations.bed