kraken2
Tags: metagenomics taxonomy classification contamination scrubbing k-mer lca sample-scope
Taxonomic classification and host filtering of sequence reads.
Uses Kraken2 to assign taxonomic labels to short DNA reads by examining exact k-mer matches against a large reference database. It uses the Lowest Common Ancestor (LCA) algorithm to provide high-precision classification, making it ideal for metagenomics or removing host contamination (scrubbing).
Uses explicit positional record fields for reads:
- Input: record(meta, r1, r2, se, lr) where each read slot is Path?
Requires a standard Kraken2 database (directory or tarball). Memory usage depends on database size (Standard ~50GB).
Inputs
record (
meta: Record,
r1: Path?,
r2: Path?,
se: Path?,
lr: Path?
)
| Field | Type | Description |
|---|---|---|
meta | Record | Groovy Record containing sample information |
r1 | Path? | Illumina R1 reads (paired-end) |
r2 | Path? | Illumina R2 reads (paired-end) |
se | Path? | Single-end Illumina reads |
lr | Path? | Long reads (ONT/PacBio) - not typically used by Kraken2 |
db: Path
| Name | Type | Description |
|---|---|---|
db | Path | Kraken2 database (Directory or compressed tarball) |
Outputs
record (
meta: Record,
special_meta: Record,
kraken2_report: Path,
scrub_report: Path?,
classified: Set<Path?>,
unclassified: Set<Path?>,
results: Set<Path>,
logs: Set<Path?>,
nf_logs: Set<Path>,
versions: Set<Path>
)
| Field | Type | Description |
|---|---|---|
meta | Record | Sample information record |
special_meta | Record | A simplified metadata record for internal use |
kraken2_report | Path | Standard Kraken2 report containing taxonomic abundance counts |
scrub_report | Path? | Summary report of reads removed during host scrubbing |
classified | Set<Path?> | Reads assigned to a taxon in the database (FASTQ) |
unclassified | Set<Path?> | Reads NOT assigned to any taxon (FASTQ) |
results | Set<Path> | All output files to be published |
logs | Set<Path?> | Optional program specific log files |
nf_logs | Set<Path> | Nextflow-specific log files (e.g. .command.{begin |
versions | Set<Path> | A YAML formatted file with program versions |
Parameters
Kraken2 Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
--kraken2_db | string | The a single tarball or path to a Kraken2 formatted database | |
--kraken2_confidence | number | 0.0 | Confidence score threshold between 0 and 1 |
--kraken2_use_mpa_style | boolean | false | Format report output like Kraken 1's kraken-mpa-report |
--kraken2_report_zero_counts | boolean | false | Report counts for ALL taxa, even if counts are zero |
Used By
Subworkflows
- kraken2 - Classify metagenomic reads using Kraken2.
Workflows
- kraken2 - Taxonomic classification of metagenomic sequence reads.
Citations
If you use this in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
Kraken2
Wood DE, Lu J, Langmead B Improved metagenomic analysis with Kraken 2. Genome Biology, 20(1), 257. (2019)
Source
Version
KRAKEN2:
- bactopia-teton: 1.1.3