srahumanscrubber
Tags: contamination human scrub sra sequencing fastq sample-scope
Remove human contamination from sequencing reads for SRA submission.
This subworkflow uses the SRA Human Scrubber to identify and remove human reads from sequencing data. It first initializes a human reference database and then scrubs the input reads to ensure they meet SRA submission requirements.
Uses explicit positional record fields for reads:
- Input: record(meta, r1, r2, se, lr) where each read slot is Path?
Take
reads: Channel<Record>
| Field | Description |
|---|---|
meta | Groovy Record containing sample information |
r1 | Illumina R1 reads (paired-end) |
r2 | Illumina R2 reads (paired-end) |
se | Single-end Illumina reads |
lr | Long reads (ONT/PacBio) |
Emit
Published
The sample_outputs and run_outputs emissions are aggregates of output files that will be published in the entry workflow.
sample_outputs
| Output | Description |
|---|---|
special_meta | Groovy Record with name for downstream aggregation |
scrubbed | Scrubbed FASTQ files with human reads removed |
scrubbed_extra | Placeholder files for pipeline compatibility |
scrub_report | Report of scrubbing statistics |
run_outputs
No run-scope outputs.
Module Composition
This subworkflow calls the following modules:
- srahumanscrubber_initdb - Initialize human read removal database for SRA Human Scrubber.
- srahumanscrubber_scrub - Scrub human reads from FASTQ files.
Citations
If you use this in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
SRA Human Scrubber
Katz KS, Shutov O, Lapoint R, Kimelman M, Brister JR, and O'Sullivan C STAT: a fast, scalable, MinHash-based k-mer tool to assess Sequence Read Archive next-generation sequence submissions. Genome Biology, 22(1), 270 (2021)