bactopia_sketcher
Tags: taxonomy classification minhash sketch mash sourmash refseq gtdb sample-scope
Create genomic sketches and perform rapid taxonomic classification.
This subworkflow generates MinHash sketches from assembled genomes using Mash and Sourmash. The sketches are compared against reference databases to identify taxonomic classification and find closely related genomes. Mash queries against RefSeq while Sourmash uses the GTDB database for comprehensive taxonomic placement.
Take
assembly: Channel<Record>
| Field | Description |
|---|---|
meta | Groovy Record containing sample information |
assembly | Assembled contigs in FASTA format |
mash_db: Path
sourmash_db: Path
| Name | Type | Description |
|---|---|---|
mash_db | Path | Path to the Mash RefSeq database for taxonomic classification |
sourmash_db | Path | Path to the Sourmash GTDB LCA database for taxonomic classification |
Emit
Published
The sample_outputs and run_outputs emissions are aggregates of output files that will be published in the entry workflow.
sample_outputs
| Output | Description |
|---|---|
sig | Sourmash signature file |
msh | Mash sketch files for k=21 and k=31 |
mash | Mash Screen classification report against RefSeq |
sourmash | Sourmash LCA classification report against GTDB |
run_outputs
No run-scope outputs.
Module Composition
This subworkflow calls the following modules:
- bactopia_sketcher - Create genomic sketches and perform rapid taxonomic classification.
Used By
This subworkflow is used by the following workflows:
- bactopia - Comprehensive bacterial analysis pipeline for complete genomic characterization.
- staphopia - Comprehensive analysis pipeline for Staphylococcus aureus isolates.
Citations
If you use this in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
Mash
Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 17, 132 (2016) -
Sourmash
Brown CT, Irber L sourmash: a library for MinHash sketching of DNA. JOSS 1, 27 (2016)