POD5 File Format
POD5 is Oxford Nanopore’s container format for raw nanopore signal data and read-level run metadata.
TL;DR
- POD5 stores raw electrical signal traces from ONT sequencing runs.
- It replaces FAST5 in many modern ONT workflows with better performance and simpler access patterns.
- A POD5 file typically includes read IDs, signal chunks, timing/scaling info, and run context metadata.
- Basecalling tools (for example Dorado) use POD5 as direct input.
pod5CLI tools are used to inspect, subset, and convert POD5 datasets.
Structure
POD5 is a binary container format (not line-based text like FASTQ/VCF/BED).
Conceptually, it stores:
- Read records (
read_idand per-read metadata) - Raw signal arrays (current levels across time)
- Calibration/scaling fields (for signal interpretation)
- Run/context metadata (flowcell, run identifiers, acquisition details)
Unlike FASTQ (basecalled sequence) or BAM (aligned reads), POD5 captures pre-basecalling raw signal data.
Practical Conventions
- Keep POD5 files immutable once generated to preserve provenance.
- Track software/basecaller version alongside POD5 datasets.
- Organize files by run and sample metadata for downstream traceability.
- Use checksums when moving POD5 across storage systems.
- Convert/subset with official tooling rather than ad-hoc binary manipulation.
Common Pitfalls
- Treating POD5 as if it were sequence-level output (it is signal-level data).
- Losing run metadata linkage when splitting files without consistent naming.
- Mixing POD5 batches from different chemistry/basecaller expectations without tracking metadata.
- Underestimating storage and I/O requirements for raw signal datasets.
- Attempting manual parsing without format-aware libraries/tools.
Common Uses
- Input to ONT basecalling workflows
- Modified-base and signal-level analyses
- Archival of raw nanopore run data
Useful POD5 Tools
pod5 inspect
Inspect summary metadata for POD5 files.
pod5 inspect reads.pod5
pod5 view
View selected records/fields from POD5 datasets.
pod5 view reads.pod5 --ids read_ids.txt
pod5 subset
Create a smaller POD5 from selected reads.
pod5 subset reads.pod5 --ids read_ids.txt --output subset.pod5
dorado
Use POD5 directly as basecalling input.
dorado basecaller hac reads.pod5 > basecalls.bam
pod5 nanopore ont signal bioinformatics file formats
298 Words
2026-03-10 19:00 (Last updated: 2026-03-11 02:45)