bactopia_qc
Tags: quality-control adapters error-correction subsampling fastq illumina nanopore fastp bbduk nanoq sample-scope
Perform comprehensive quality control on sequencing reads.
This subworkflow processes raw sequencing reads through a comprehensive quality control pipeline. It adapts to different read types:
- Illumina: Adapter/PhiX removal (Fastp or BBDuk), Error Correction (Lighter), and Subsampling (Rasusa)
- Nanopore: Adapter removal (Porechop), Quality filtering (Nanoq), and Subsampling (Rasusa)
- Hybrid: Processes both short and long reads through their respective pipelines
- Assembly: Passes through simulated reads from assemblies
Generates quality metrics using fastq-scan and optional quality reports using FastQC (Illumina) and NanoPlot (ONT).
Take
samples: Channel<Record>
| Field | Description |
|---|---|
meta | Groovy Record containing sample information (must include runtype, genome_size, species) |
r1 | Illumina R1 reads (paired-end forward) |
r2 | Illumina R2 reads (paired-end reverse) |
se | Single-end Illumina reads |
lr | Long reads (ONT) |
assembly | Assembly file (FASTA) for assembly-based simulations |
adapters: Path?
phix: Path?
| Name | Type | Description |
|---|---|---|
adapters | Path? | Optional adapter sequences in FASTA format for removal from Illumina reads |
phix | Path? | Optional PhiX sequences in FASTA format for removal from Illumina reads |
Emit
Published
The sample_outputs and run_outputs emissions are aggregates of output files that will be published in the entry workflow.
sample_outputs
| Output | Description |
|---|---|
reads_grouped | All output FASTQs for publishing |
supplemental | QC reports (FastQC/NanoPlot), JSON metrics, and error FASTQs if QC failed |
error | Captured error messages if QC failed (e.g., reads empty after trimming) |
run_outputs
No run-scope outputs.
Downstream Inputs
The following emissions are meant to be used as inputs to downstream subworkflows.
reads
| Output | Description |
|---|---|
r1 | QC-filtered Illumina R1 reads |
r2 | QC-filtered Illumina R2 reads |
se | QC-filtered single-end reads |
lr | QC-filtered long reads |
fna | Assembly file (passed through for assembly-based samples) |
Module Composition
This subworkflow calls the following modules:
- bactopia_qc - Automated quality control, error correction, and read subsampling.
Used By
This subworkflow is used by the following workflows:
- bactopia - Comprehensive bacterial analysis pipeline for complete genomic characterization.
- cleanyerreads - Quality control and optional host read removal from raw sequencing reads.
- staphopia - Comprehensive analysis pipeline for Staphylococcus aureus isolates.
Citations
If you use this in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
BBTools
Bushnell B BBMap short read aligner, and other bioinformatic tools. (Link) -
fastp
Chen S, Zhou Y, Chen Y, and Gu J fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics, 34(17), i884-i890. (2018) -
FastQC
Andrews S FastQC: a quality control tool for high throughput sequence data. (WebLink) -
fastq-scan
Petit III RA fastq-scan: generate summary statistics of input FASTQ sequences. (GitHub) -
Lighter
Song L, Florea L, Langmead B Lighter: Fast and Memory-efficient Sequencing Error Correction without Counting. Genome Biol. 15(11):509 (2014) -
NanoPlot
De Coster W, D'Hert S, Schultz DT, Cruts M, Van Broeckhoven C NanoPack: visualizing and processing long-read sequencing data Bioinformatics Volume 34, Issue 15 (2018) -
Nanoq
Steinig E Nanoq: Minimal but speedy quality control for nanopore reads in Rust (GitHub) -
Porechop
Wick RR, Judd LM, Gorrie CL, Holt KE. Completing bacterial genome assemblies with multiplex MinION sequencing. Microb Genom. 3(10):e000132 (2017) -
Rasusa
Hall MB Rasusa: Randomly subsample sequencing reads to a specified coverage. (2019).