bactopia_assembler
Tags: bacteria assembly hybrid shovill dragonflye unicycler illumina nanopore sample-scope
Assemble bacterial genomes using short read, long read, or hybrid strategies.
Automatically selects the appropriate assembler based on input read types:
- Short Paired-End Reads: Uses Shovill (SKESA/SPAdes wrapper).
- Short Single-End Reads: Uses Shovill (SKESA/SPAdes wrapper).
- Long Reads: Uses Dragonflye (Flye/Miniasm wrapper).
- Hybrid: Uses Unicycler or Dragonflye (with polishing).
Summary statistics for each assembly are generated using assembly-scan.
Uses named record input with explicit read slots (r1, r2, se, lr, assembly) as Path?.
the original assembly is used without re-assembly.
Inputs
record (
meta: Record,
r1: Path?,
r2: Path?,
se: Path?,
lr: Path?,
fna: Path?
)
| Field | Type | Description |
|---|---|---|
meta | Record | Groovy Record containing sample information |
r1 | Path? | Illumina R1 reads (paired-end) |
r2 | Path? | Illumina R2 reads (paired-end) |
se | Path? | Single-end Illumina reads |
lr | Path? | Long reads (ONT/PacBio) for long-read or hybrid assembly |
fna | Path? | Assembly file (FASTA) for assembly-based runtypes |
Outputs
record (
meta: Record,
fna: Path?,
r1: Path?,
r2: Path?,
se: Path?,
lr: Path?,
tsv: Path?,
results: Set<Path>,
logs: Set<Path?>,
nf_logs: Set<Path>,
versions: Set<Path>
)
| Field | Type | Description |
|---|---|---|
meta | Record | Sample information record |
fna | Path? | Assembled contigs in FASTA format |
r1 | Path? | Passthrough Illumina R1 reads |
r2 | Path? | Passthrough Illumina R2 reads |
se | Path? | Passthrough single-end reads |
lr | Path? | Passthrough long reads |
tsv | Path? | Tab-delimited report of assembly statistics (N50, length, coverage) |
results | Set<Path> | All output files to be published |
logs | Set<Path?> | Optional program specific log files |
nf_logs | Set<Path> | Nextflow-specific log files (e.g. .command.{begin |
versions | Set<Path> | A YAML formatted file with program versions |
Parameters
Used By
Subworkflows
- bactopia_assembler - Assemble bacterial genomes using automated assembler selection.
Workflows
- bactopia - Comprehensive bacterial analysis pipeline for complete genomic characterization.
- staphopia - Comprehensive analysis pipeline for Staphylococcus aureus isolates.
Citations
If you use this in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
any2fasta
Seemann T any2fasta: Convert various sequence formats to FASTA (GitHub) -
assembly-scan
Petit III RA assembly-scan: generate basic stats for an assembly (GitHub) -
BWA
Li H Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv [q-bio.GN] (2013) -
Dragonflye
Petit III RA Dragonflye: Assemble bacterial isolate genomes from Nanopore reads. (GitHub) -
FLASH
Magoč T, Salzberg SL FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27.21 2957-2963 (2011) -
Flye
Kolmogorov M, Yuan J, Lin Y, Pevzner P Assembly of Long Error-Prone Reads Using Repeat Graphs Nature Biotechnology (2019) -
Medaka
ONT Research Medaka: Sequence correction provided by ONT Research (GitHub) -
MEGAHIT
Li D, Liu C-M, Luo R, Sadakane K, Lam T-W MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31.10 1674-1676 (2015) -
Miniasm
Li H Miniasm: Ultrafast de novo assembly for long noisy reads (GitHub) -
Minimap2
Li H Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094-3100 (2018) -
Nanoq
Steinig E Nanoq: Minimal but speedy quality control for nanopore reads in Rust (GitHub) -
Pigz
Adler M. pigz: A parallel implementation of gzip for modern multi-processor, multi-core machines. Jet Propulsion Laboratory (2015) -
Pilon
Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, Earl AM Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PloS one 9.11 e112963 (2014) -
Racon
Vaser R, Sović I, Nagarajan N, Šikić M Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res 27, 737-746 (2017) -
Rasusa
Hall MB Rasusa: Randomly subsample sequencing reads to a specified coverage. (2019). -
Raven
Vaser R, Šikić M Time- and memory-efficient genome assembly with Raven. Nat Comput Sci 1, 332-336 (2021) -
samclip
Seemann T Samclip: Filter SAM file for soft and hard clipped alignments (GitHub) -
Samtools
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079 (2009) -
Shovill
Seemann T Shovill: De novo assembly pipeline for Illumina paired reads (GitHub) -
Shovill-SE
Petit III RA Shovill-SE: A fork of Shovill that includes support for single end reads. (GitHub) -
SKESA
Souvorov A, Agarwala R, Lipman DJ SKESA: strategic k-mer extension for scrupulous assemblies. Genome Biology 19:153 (2018) -
SPAdes
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of computational biology 19.5 455-477 (2012) -
Unicycler
Wick RR, Judd LM, Gorrie CL, Holt KE Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput. Biol. 13, e1005595 (2017) -
Velvet
Zerbino DR, Birney E Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome research 18.5 821-829 (2008)
Source
Version
BACTOPIA_ASSEMBLER:
- bactopia-assembler: 1.0.5