POD5 is Oxford Nanopore’s container format for raw nanopore signal data and read-level run metadata.

TL;DR

  • POD5 stores raw electrical signal traces from ONT sequencing runs.
  • It replaces FAST5 in many modern ONT workflows with better performance and simpler access patterns.
  • A POD5 file typically includes read IDs, signal chunks, timing/scaling info, and run context metadata.
  • Basecalling tools (for example Dorado) use POD5 as direct input.
  • pod5 CLI tools are used to inspect, subset, and convert POD5 datasets.

Structure

POD5 is a binary container format (not line-based text like FASTQ/VCF/BED).

Conceptually, it stores:

  1. Read records (read_id and per-read metadata)
  2. Raw signal arrays (current levels across time)
  3. Calibration/scaling fields (for signal interpretation)
  4. Run/context metadata (flowcell, run identifiers, acquisition details)

Unlike FASTQ (basecalled sequence) or BAM (aligned reads), POD5 captures pre-basecalling raw signal data.

Practical Conventions

  • Keep POD5 files immutable once generated to preserve provenance.
  • Track software/basecaller version alongside POD5 datasets.
  • Organize files by run and sample metadata for downstream traceability.
  • Use checksums when moving POD5 across storage systems.
  • Convert/subset with official tooling rather than ad-hoc binary manipulation.

Common Pitfalls

  • Treating POD5 as if it were sequence-level output (it is signal-level data).
  • Losing run metadata linkage when splitting files without consistent naming.
  • Mixing POD5 batches from different chemistry/basecaller expectations without tracking metadata.
  • Underestimating storage and I/O requirements for raw signal datasets.
  • Attempting manual parsing without format-aware libraries/tools.

Common Uses

  • Input to ONT basecalling workflows
  • Modified-base and signal-level analyses
  • Archival of raw nanopore run data

Useful POD5 Tools

pod5 inspect

Inspect summary metadata for POD5 files.

pod5 inspect reads.pod5

pod5 view

View selected records/fields from POD5 datasets.

pod5 view reads.pod5 --ids read_ids.txt

pod5 subset

Create a smaller POD5 from selected reads.

pod5 subset reads.pod5 --ids read_ids.txt --output subset.pod5

dorado

Use POD5 directly as basecalling input.

dorado basecaller hac reads.pod5 > basecalls.bam