Skip to main content

mash_dist

Tags: mash distance minhash ani comparison taxonomy sample-scope

Calculate genomic distances using MinHash sketches.

Uses Mash to compute the distance between query sequences and a reference database. It uses MinHash sketches to rapidly estimate the Jaccard index, providing a fast approximation of Average Nucleotide Identity (ANI).

Inputs

record (
meta: Record,
fna: Path
)
FieldTypeDescription
metaRecordGroovy Record containing sample information
fnaPathFASTA, FASTQ, or Mash sketch file to be queried
reference: Path
NameTypeDescription
referencePathThe reference file (FASTA, FASTQ, or Mash sketch) to compare against

Outputs

record (
meta: Record,
dist: Path,
results: Set<Path>,
logs: Set<Path?>,
nf_logs: Set<Path>,
versions: Set<Path>
)
FieldTypeDescription
metaRecordSample information record
distPathA tab-delimited summary of the Mash distances and p-values
resultsSet<Path>All output files to be published
logsSet<Path?>Optional program specific log files
nf_logsSet<Path>Nextflow-specific log files (e.g. .command.{begin
versionsSet<Path>A YAML formatted file with program versions

Parameters

mashdist Parameters

ParameterTypeDefaultDescription
--mash_sketchstringThe reference sequence as a Mash Sketch (.msh file)
--full_merlinbooleanfalseGo full Merlin and run all species-specific tools, no matter the Mash distance

Used By

Subworkflows

  • mashdist - Calculate Mash distances between sequences and a reference.

Workflows

  • mashdist - Calculate Mash distances between sequences and reference genomes.

Citations

If you use this in your analysis, please cite the following.

Source

View source on GitHub

Version

MASH_DIST:
- mash: 2.3