mash_dist
Tags: mash distance minhash ani comparison taxonomy sample-scope
Calculate genomic distances using MinHash sketches.
Uses Mash to compute the distance between query sequences and a reference database. It uses MinHash sketches to rapidly estimate the Jaccard index, providing a fast approximation of Average Nucleotide Identity (ANI).
Inputs
record (
meta: Record,
fna: Path
)
| Field | Type | Description |
|---|---|---|
meta | Record | Groovy Record containing sample information |
fna | Path | FASTA, FASTQ, or Mash sketch file to be queried |
reference: Path
| Name | Type | Description |
|---|---|---|
reference | Path | The reference file (FASTA, FASTQ, or Mash sketch) to compare against |
Outputs
record (
meta: Record,
dist: Path,
results: Set<Path>,
logs: Set<Path?>,
nf_logs: Set<Path>,
versions: Set<Path>
)
| Field | Type | Description |
|---|---|---|
meta | Record | Sample information record |
dist | Path | A tab-delimited summary of the Mash distances and p-values |
results | Set<Path> | All output files to be published |
logs | Set<Path?> | Optional program specific log files |
nf_logs | Set<Path> | Nextflow-specific log files (e.g. .command.{begin |
versions | Set<Path> | A YAML formatted file with program versions |
Parameters
mashdist Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
--mash_sketch | string | The reference sequence as a Mash Sketch (.msh file) | |
--full_merlin | boolean | false | Go full Merlin and run all species-specific tools, no matter the Mash distance |
Used By
Subworkflows
- mashdist - Calculate Mash distances between sequences and a reference.
Workflows
- mashdist - Calculate Mash distances between sequences and reference genomes.
Citations
If you use this in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
Mash
Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 17, 132 (2016)
Source
Version
MASH_DIST:
- mash: 2.3