bactopia_sketcher
Tags: bacteria taxonomy classification minhash sketch mash sourmash refseq gtdb sample-scope
Create genomic sketches and perform rapid taxonomic classification.
Uses Mash and Sourmash to create MinHash sketches of the input sequences. These sketches are then queried against pre-built databases (RefSeq and GTDB to identify the closest reference genomes.
Requires the pre-compiled RefSeq (Mash) and GTDB (Sourmash) databases, usually downloaded
by the datasets module.
Inputs
record (
meta: Record,
fna: Path
)
| Field | Type | Description |
|---|---|---|
meta | Record | Groovy Record containing sample information |
fna | Path | Assembled contigs in FASTA format |
mash_db: Path
sourmash_db: Path
| Name | Type | Description |
|---|---|---|
mash_db | Path | Path to the Mash RefSeq database |
sourmash_db | Path | Path to the Sourmash GTDB LCA database |
Outputs
record (
meta: Record,
sig: Path,
msh: Set<Path>,
mash: Path,
sourmash: Path,
results: Set<Path>,
logs: Set<Path?>,
nf_logs: Set<Path>,
versions: Set<Path>
)
| Field | Type | Description |
|---|---|---|
meta | Record | Sample information record |
sig | Path | The Sourmash signature file (*.sig) |
msh | Set<Path> | The Mash sketch files for k=21 and k=31 (*.msh) |
mash | Path | A classification report of Mash Screen results against RefSeq database |
sourmash | Path | A classification report from Sourmash LCA against GTDB database |
results | Set<Path> | All output files to be published |
logs | Set<Path?> | Optional program specific log files |
nf_logs | Set<Path> | Nextflow-specific log files (e.g. .command.{begin |
versions | Set<Path> | A YAML formatted file with program versions |
Parameters
Used By
Subworkflows
- bactopia_sketcher - Create genomic sketches and perform rapid taxonomic classification.
Workflows
- bactopia - Comprehensive bacterial analysis pipeline for complete genomic characterization.
- staphopia - Comprehensive analysis pipeline for Staphylococcus aureus isolates.
Citations
If you use this in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
Mash
Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 17, 132 (2016) -
Sourmash
Brown CT, Irber L sourmash: a library for MinHash sketching of DNA. JOSS 1, 27 (2016) -
Mash Screen
Ondov BD, Starrett GJ, Sappington A, Kostic A, Koren S, Buck CB, Phillippy AM Mash Screen: high-throughput sequence containment estimation for genome discovery Genome Biol 20, 232 (2019) -
NCBI RefSeq Database
O'Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B, Smith-White B, Ako-Adjei D, Astashyn A, Badretdin A, Bao Y, Blinkova O0, Brover V, Chetvernin V, Choi J, Cox E, Ermolaeva O, Farrell CM, Goldfarb T, Gupta T, Haft D, Hatcher E, Hlavina W, Joardar VS, Kodali VK, Li W, Maglott D, Masterson P, McGarvey KM, Murphy MR, O'Neill K, Pujar S, Rangwala SH, Rausch D, Riddick LD, Schoch C, Shkeda A, Storz SS, Sun H, Thibaud-Nissen F, Tolstoy I, Tully RE, Vatsan AR, Wallin C, Webb D, Wu W, Landrum MJ, Kimchi A, Tatusova T, DiCuccio M, Kitts P, Murphy TD, Pruitt KD Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733-45 (2016)
Source
Version
BACTOPIA_SKETCHER:
- bactopia-sketcher: 1.0.3