Skip to main content

scrubber

Tags: metagenomics decontamination human-removal read-filtering sample-scope

Remove contaminant sequences from metagenomic data.

This subworkflow removes human and other contaminant sequences from metagenomic reads using either the SRA Human Scrubber or nohuman with the HPRC human database. It provides flexible contamination removal with detailed reporting and aggregates results across multiple samples.

Take

reads: Channel<Record>
FieldDescription
metaGroovy Record containing sample information
r1Illumina R1 reads (paired-end)
r2Illumina R2 reads (paired-end)
seSingle-end Illumina reads
lrLong reads (ONT/PacBio)
use_srascrubber: Boolean
nohuman_db: Path?
download_nohuman: Boolean
nohuman_save_as_tarball: Boolean
NameTypeDescription
use_srascrubberBooleanBoolean flag to choose between SRA Human Scrubber (true) or nohuman (false) for decontamination.
nohuman_dbPath?Path to nohuman database directory or tarball (used when use_srascrubber is false)
download_nohumanBooleanBoolean flag to download the nohuman database instead of using the provided path
nohuman_save_as_tarballBooleanBoolean flag to save downloaded nohuman database as tarball

Emit

Published

The sample_outputs and run_outputs emissions are aggregates of output files that will be published in the entry workflow.

sample_outputs

OutputDescription
special_metaSimplified metadata record for downstream report joining
r1Scrubbed paired-end forward reads
r2Scrubbed paired-end reverse reads
seScrubbed single-end reads
lrScrubbed long reads
scrub_reportContamination removal statistics report

run_outputs

OutputDescription
csvAggregated contamination reports across all samples

Downstream Inputs

The following emissions are meant to be used as inputs to downstream subworkflows.

scrubbed

OutputDescription
r1Scrubbed paired-end forward reads
r2Scrubbed paired-end reverse reads
seScrubbed single-end reads
lrScrubbed long reads

scrubbed_extra

OutputDescription
r1Scrubbed paired-end forward reads
r2Scrubbed paired-end reverse reads
seScrubbed single-end reads
lrScrubbed long reads
fnaAssembly file (passed through)

special_tsv

OutputDescription
special_metaSimplified metadata record for downstream report joining
scrub_reportContamination removal statistics report

Subworkflow Composition

This subworkflow calls the following subworkflows:

  • srahumanscrubber - Remove human contamination from sequencing reads for SRA submission.
  • nohuman - Remove human reads from sequencing data using nohuman.

Module Composition

This subworkflow calls the following modules:

  • csvtk_concat - Concatenate multiple CSV or TSV files into a single table.

Used By

This subworkflow is used by the following workflows:

  • cleanyerreads - Quality control and optional host read removal from raw sequencing reads.
  • scrubber - Removal of human and contaminant sequences from metagenomic reads.

Citations

If you use this in your analysis, please cite the following.

Source

View source on GitHub