mashdist
Tags: mash distance minhash comparison reference sample-scope
Calculate Mash distances between sequences and a reference.
This subworkflow uses Mash to calculate MinHash-based distances between query sequences and a reference sequence. It creates Mash sketches of the input sequences and computes distance values, then aggregates all distance calculations into a single consolidated report.
Take
assembly: Channel<Record>
| Field | Description |
|---|---|
meta | Groovy Record containing sample information |
fna | Sequences in FASTA format to compare against reference |
reference: Path
| Name | Type | Description |
|---|---|---|
reference | Path | Reference sequence in FASTA format for distance calculations |
Emit
Published
The sample_outputs and run_outputs emissions are aggregates of output files that will be published in the entry workflow.
sample_outputs
| Output | Description |
|---|---|
dist | A tab-delimited summary of the Mash distances and p-values |
run_outputs
| Output | Description |
|---|---|
csv | Merged Mash distance results from all samples |
Module Composition
This subworkflow calls the following modules:
- csvtk_concat - Concatenate multiple CSV or TSV files into a single table.
- mash_dist - Calculate genomic distances using MinHash sketches.
Used By
This subworkflow is used by the following workflows:
- mashdist - Calculate Mash distances between sequences and reference genomes.
Citations
If you use this in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
Mash
Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 17, 132 (2016)