GFF3 and GTF File Formats

GFF3 and GTF are tab-delimited annotation formats used to describe genomic features such as genes, transcripts, and exons.

TL;DR

Both formats use 1-based, closed genomic coordinates.
Core records are 9 columns: seqid, source, type, start, end, score, strand, phase, attributes.
GFF3 uses key=value attributes and explicit parent/child IDs (ID, Parent).
GTF commonly uses key "value"; attributes, especially gene_id and transcript_id.
Prefer GFF3 for hierarchical feature models and standards compliance; GTF is common in RNA-seq pipelines.

Structure

Each non-comment line has 9 tab-delimited fields:

seqid (chromosome/contig)
source
feature type (for example gene, transcript, exon, CDS)
start (1-based)
end (1-based, inclusive)
score (. if missing)
strand (+, -, or .)
phase (0, 1, 2 for CDS, else .)
attributes

GFF3 example:

##gff-version 3
chr1	RefSeq	gene	11869	14409	.	+	.	ID=gene1;Name=DDX11L1
chr1	RefSeq	mRNA	11869	14409	.	+	.	ID=tx1;Parent=gene1
chr1	RefSeq	exon	11869	12227	.	+	.	ID=exon1;Parent=tx1

GTF example:

chr1	ENSEMBL	gene	11869	14409	.	+	.	gene_id "GENE1"; gene_name "DDX11L1";
chr1	ENSEMBL	transcript	11869	14409	.	+	.	gene_id "GENE1"; transcript_id "TX1";
chr1	ENSEMBL	exon	11869	12227	.	+	.	gene_id "GENE1"; transcript_id "TX1"; exon_number "1";

Coordinate model (important)

GFF3/GTF are 1-based and end-inclusive.

Interval chr1 11869 12227 has length 12227 - 11869 + 1 = 359 bp.
Converting to BED requires coordinate shift: BED_start = start - 1, BED_end = end.

Practical Conventions

Keep feature hierarchy consistent (gene -> transcript -> exon/CDS).
Use stable feature IDs (especially in GFF3 ID and Parent).
Keep chromosome naming consistent across FASTA/BAM/VCF/BED.
Use tabs only; spaces in attributes should remain inside quoted values.
Sort by chromosome and start when possible for reproducible processing.

Common Pitfalls

Mixing 1-based GFF3/GTF coordinates with 0-based BED coordinates.
Broken parent-child relationships (missing Parent/ID, inconsistent transcript IDs).
Invalid or inconsistent attributes that downstream parsers cannot interpret.
Treating GFF3 and GTF attribute syntax as interchangeable.
Incorrect CDS phase values causing translation/frame issues.

Common Uses

Gene and transcript annotation
Feature counting and RNA-seq quantification input
Region extraction and annotation conversion (for example to BED)

Useful Annotation Tools

gffread

Validate, filter, and convert GFF/GTF annotations.

# basic validation-style parse and export
gffread annotations.gff3 -T -o annotations.gtf

# convert GTF to GFF3
gffread annotations.gtf -o annotations.gff3

gffcompare

Compare transcript annotations against a reference.

gffcompare -r reference.gtf -o compare_out query.gtf

gffutils

Build/query a feature database from GFF/GTF.

# create sqlite DB from GFF3
python -m gffutils.cli create annotations.gff3 --db annotations.db

gxf2bed

Convert GFF/GTF annotations to BED intervals.

gxf2bed annotations.gff3 > annotations.bed