scrubber
Tags: metagenomics decontamination human-removal read-filtering sample-scope
Remove contaminant sequences from metagenomic data.
This subworkflow removes human and other contaminant sequences from metagenomic reads using either the SRA Human Scrubber or nohuman with the HPRC human database. It provides flexible contamination removal with detailed reporting and aggregates results across multiple samples.
Take
reads: Channel<Record>
| Field | Description |
|---|---|
meta | Groovy Record containing sample information |
r1 | Illumina R1 reads (paired-end) |
r2 | Illumina R2 reads (paired-end) |
se | Single-end Illumina reads |
lr | Long reads (ONT/PacBio) |
use_srascrubber: Boolean
nohuman_db: Path?
download_nohuman: Boolean
nohuman_save_as_tarball: Boolean
| Name | Type | Description |
|---|---|---|
use_srascrubber | Boolean | Boolean flag to choose between SRA Human Scrubber (true) or nohuman (false) for decontamination. |
nohuman_db | Path? | Path to nohuman database directory or tarball (used when use_srascrubber is false) |
download_nohuman | Boolean | Boolean flag to download the nohuman database instead of using the provided path |
nohuman_save_as_tarball | Boolean | Boolean flag to save downloaded nohuman database as tarball |
Emit
Published
The sample_outputs and run_outputs emissions are aggregates of output files that will be published in the entry workflow.
sample_outputs
| Output | Description |
|---|---|
special_meta | Simplified metadata record for downstream report joining |
r1 | Scrubbed paired-end forward reads |
r2 | Scrubbed paired-end reverse reads |
se | Scrubbed single-end reads |
lr | Scrubbed long reads |
scrub_report | Contamination removal statistics report |
run_outputs
| Output | Description |
|---|---|
csv | Aggregated contamination reports across all samples |
Downstream Inputs
The following emissions are meant to be used as inputs to downstream subworkflows.
scrubbed
| Output | Description |
|---|---|
r1 | Scrubbed paired-end forward reads |
r2 | Scrubbed paired-end reverse reads |
se | Scrubbed single-end reads |
lr | Scrubbed long reads |
scrubbed_extra
| Output | Description |
|---|---|
r1 | Scrubbed paired-end forward reads |
r2 | Scrubbed paired-end reverse reads |
se | Scrubbed single-end reads |
lr | Scrubbed long reads |
fna | Assembly file (passed through) |
special_tsv
| Output | Description |
|---|---|
special_meta | Simplified metadata record for downstream report joining |
scrub_report | Contamination removal statistics report |
Subworkflow Composition
This subworkflow calls the following subworkflows:
- srahumanscrubber - Remove human contamination from sequencing reads for SRA submission.
- nohuman - Remove human reads from sequencing data using nohuman.
Module Composition
This subworkflow calls the following modules:
- csvtk_concat - Concatenate multiple CSV or TSV files into a single table.
Used By
This subworkflow is used by the following workflows:
- cleanyerreads - Quality control and optional host read removal from raw sequencing reads.
- scrubber - Removal of human and contaminant sequences from metagenomic reads.
Citations
If you use this in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
Kraken2
Wood DE, Lu J, Langmead B Improved metagenomic analysis with Kraken 2. Genome Biology, 20(1), 257. (2019) -
SRA Human Scrubber
Katz KS, Shutov O, Lapoint R, Kimelman M, Brister JR, and O'Sullivan C STAT: a fast, scalable, MinHash-based k-mer tool to assess Sequence Read Archive next-generation sequence submissions. Genome Biology, 22(1), 270 (2021)