Skip to main content

bactopia_qc

Tags: quality-control adapters error-correction subsampling fastq illumina nanopore fastp bbduk nanoq sample-scope

Perform comprehensive quality control on sequencing reads.

This subworkflow processes raw sequencing reads through a comprehensive quality control pipeline. It adapts to different read types:

  • Illumina: Adapter/PhiX removal (Fastp or BBDuk), Error Correction (Lighter), and Subsampling (Rasusa)
  • Nanopore: Adapter removal (Porechop), Quality filtering (Nanoq), and Subsampling (Rasusa)
  • Hybrid: Processes both short and long reads through their respective pipelines
  • Assembly: Passes through simulated reads from assemblies

Generates quality metrics using fastq-scan and optional quality reports using FastQC (Illumina) and NanoPlot (ONT).

Take

samples: Channel<Record>
FieldDescription
metaGroovy Record containing sample information (must include runtype, genome_size, species)
r1Illumina R1 reads (paired-end forward)
r2Illumina R2 reads (paired-end reverse)
seSingle-end Illumina reads
lrLong reads (ONT)
assemblyAssembly file (FASTA) for assembly-based simulations
adapters: Path?
phix: Path?
NameTypeDescription
adaptersPath?Optional adapter sequences in FASTA format for removal from Illumina reads
phixPath?Optional PhiX sequences in FASTA format for removal from Illumina reads

Emit

Published

The sample_outputs and run_outputs emissions are aggregates of output files that will be published in the entry workflow.

sample_outputs

OutputDescription
reads_groupedAll output FASTQs for publishing
supplementalQC reports (FastQC/NanoPlot), JSON metrics, and error FASTQs if QC failed
errorCaptured error messages if QC failed (e.g., reads empty after trimming)

run_outputs

No run-scope outputs.

Downstream Inputs

The following emissions are meant to be used as inputs to downstream subworkflows.

reads

OutputDescription
r1QC-filtered Illumina R1 reads
r2QC-filtered Illumina R2 reads
seQC-filtered single-end reads
lrQC-filtered long reads
fnaAssembly file (passed through for assembly-based samples)

Module Composition

This subworkflow calls the following modules:

  • bactopia_qc - Automated quality control, error correction, and read subsampling.

Used By

This subworkflow is used by the following workflows:

  • bactopia - Comprehensive bacterial analysis pipeline for complete genomic characterization.
  • cleanyerreads - Quality control and optional host read removal from raw sequencing reads.
  • staphopia - Comprehensive analysis pipeline for Staphylococcus aureus isolates.

Citations

If you use this in your analysis, please cite the following.

Source

View source on GitHub