Skip to main content

mashdist

Tags: mash distance minhash comparison reference sample-scope

Calculate Mash distances between sequences and a reference.

This subworkflow uses Mash to calculate MinHash-based distances between query sequences and a reference sequence. It creates Mash sketches of the input sequences and computes distance values, then aggregates all distance calculations into a single consolidated report.

Take

assembly: Channel<Record>
FieldDescription
metaGroovy Record containing sample information
fnaSequences in FASTA format to compare against reference
reference: Path
NameTypeDescription
referencePathReference sequence in FASTA format for distance calculations

Emit

Published

The sample_outputs and run_outputs emissions are aggregates of output files that will be published in the entry workflow.

sample_outputs

OutputDescription
distA tab-delimited summary of the Mash distances and p-values

run_outputs

OutputDescription
csvMerged Mash distance results from all samples

Module Composition

This subworkflow calls the following modules:

  • csvtk_concat - Concatenate multiple CSV or TSV files into a single table.
  • mash_dist - Calculate genomic distances using MinHash sketches.

Used By

This subworkflow is used by the following workflows:

  • mashdist - Calculate Mash distances between sequences and reference genomes.

Citations

If you use this in your analysis, please cite the following.

Source

View source on GitHub