bwbioinfo · GitHub

Last updated:2026-03-13 12:43

2 minutes

FASTQ File Format

FASTQ stores sequencing reads and their quality scores in a single text format.

TL;DR

FASTQ uses 4 lines per read: header, sequence, separator, quality.
Sequence and quality strings must be the same length.
Most modern pipelines assume Sanger/Illumina 1.8+ encoding (Phred+33).
FASTQ is usually compressed as .fastq.gz.
fastqc, fastp, seqkit, and seqtk are common day-to-day tools.

Structure

Each read is represented by 4 lines:

@ header
sequence
+ separator
quality string

@read1
ACGTACGTACGT
+
IIIIIIIIIIII

Most FASTQ files contain millions of repeated 4-line records.

Practical Conventions

Header lines begin with @ and may include instrument/run metadata.
+ line may repeat the read ID or be just +.
Quality characters encode Phred scores (commonly Phred+33).
Files are often gzip-compressed (.fastq.gz) to reduce storage.

Common Pitfalls

Sequence and quality lengths not matching (invalid record).
Mixing quality encodings (older Phred+64 vs modern Phred+33).
Truncated files from interrupted transfers/downloads.
Paired-end files getting out of sync (R1 and R2 order mismatch).

Common Uses

Raw sequencing output
Read-level QC
Input for alignment and assembly pipelines

Useful FASTQ Tools

fastqc

Standard read-level quality control report.

fastqc sample_R1.fastq.gz sample_R2.fastq.gz

fastp

Fast all-in-one filtering/trimming with QC outputs.

fastp \
  -i sample_R1.fastq.gz -I sample_R2.fastq.gz \
  -o sample_R1.clean.fastq.gz -O sample_R2.clean.fastq.gz \
  -h fastp.html -j fastp.json

seqkit

Convenient FASTA/FASTQ stats and filtering.

# summary stats
seqkit stats sample.fastq.gz

# keep reads with minimum length 75
seqkit seq -m 75 sample.fastq.gz > sample.min75.fastq

seqtk

Lightweight toolkit for sampling and format conversion.

# subsample reads reproducibly
seqtk sample -s42 sample.fastq.gz 100000 > subset.fastq

Genomics File Formats

fastq bioinformatics file formats

249 Words

2026-03-05 19:00 (Last updated: 2026-03-13 12:43)

← FASTA File Format BED File Format →