teton
Tags: metagenomics taxonomy classification kraken bracken genome-size run-scope
Perform taxonomic classification and estimate bacterial genome sizes.
This subworkflow processes raw sequencing reads through a taxonomic classification pipeline using Kraken2 and Bracken to estimate bacterial genome sizes and separate bacterial from non-bacterial organisms. It first removes host reads using the scrubber subworkflow, then classifies reads, and finally creates sample sheets with genome size estimates for downstream Bactopia analysis.
Uses explicit positional record fields for reads:
- Input: record(meta, r1, r2, se, lr) where each read slot is Path?
Take
reads: Channel<Record>
| Field | Description |
|---|---|
meta | Groovy Record containing sample information |
r1 | Illumina R1 reads (paired-end) |
r2 | Illumina R2 reads (paired-end) |
se | Single-end Illumina reads |
lr | Long reads (ONT/PacBio) |
db: Path?
use_srascrubber: Boolean
nohuman_db: Path?
download_nohuman: Boolean
nohuman_save_as_tarball: Boolean
| Name | Type | Description |
|---|---|---|
db | Path? | Optional Kraken2 database path for taxonomic classification |
use_srascrubber | Boolean | Boolean flag to use SRA scrubber for host read removal |
Emit
Published
The sample_outputs and run_outputs emissions are aggregates of output files that will be published in the entry workflow.
sample_outputs
No sample-scope outputs.
run_outputs
No run-scope outputs.
Subworkflow Composition
This subworkflow calls the following subworkflows:
- scrubber - Remove contaminant sequences from metagenomic data.
- bracken - Estimate species abundance from metagenomic reads.
Module Composition
This subworkflow calls the following modules:
- bactopia_teton - Predict genome size and route samples based on taxonomic classification.
- csvtk_join - Join two CSV or TSV files based on common fields.
- csvtk_concat - Concatenate multiple CSV or TSV files into a single table.
Used By
This subworkflow is used by the following workflows:
- teton - Taxonomic classification and abundance profiling of metagenomic reads.
Citations
If you use this in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
Kraken2
Wood DE, Lu J, Langmead B Improved metagenomic analysis with Kraken 2. Genome Biology, 20(1), 257. (2019) -
Bracken
Lu J, Breitwieser FP, Thielen P, and Salzberg SL Bracken: estimating species abundance in metagenomics data. PeerJ Computer Science, 3, e104. (2017)