Skip to main content

merlin

Tags: species-specific automated mash minmer typing bactopia-tool

MinMER-assisted species-specific tool selection and execution.

This Bactopia Tool, Merlin, uses MinMER distances based on the RefSeq sketch to automatically run species-specific analysis tools. Merlin identifies the closest reference genomes and executes appropriate typing and analysis tools for each detected species.

Usage

Bactopia CLI:

bactopia --wf merlin \
--bactopia /path/to/your/bactopia/results

Nextflow:

nextflow run bactopia/bactopia/workflows/bactopia-tools/merlin/main.nf \
--bactopia /path/to/your/bactopia/results

Outputs

Expected Output Files

<BACTOPIA_DIR>
├── <SAMPLE_NAME>
│ └── tools
│ ├── clermontyping
│ │ ├── <SAMPLE_NAME>.tsv
│ │ ├── logs
│ │ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ │ └── versions.yml
│ │ └── supplemental
│ │ ├── <SAMPLE_NAME>.blast.xml
│ │ ├── <SAMPLE_NAME>.html
│ │ └── <SAMPLE_NAME>.mash.tsv
│ ├── ectyper
│ │ ├── <SAMPLE_NAME>.blast_alleles.txt
│ │ ├── <SAMPLE_NAME>.tsv
│ │ └── logs
│ │ ├── ectyper.log
│ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ └── versions.yml
│ ├── kleborate
│ │ ├── <SAMPLE_NAME>.tsv
│ │ └── logs
│ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ └── versions.yml
│ ├── merlindist
│ │ └── merlin-<TIMESTAMP>
│ │ ├── <SAMPLE_NAME>-dist.txt
│ │ └── logs
│ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ └── versions.yml
│ ├── shigapass
│ │ ├── <SAMPLE_NAME>.tsv
│ │ ├── logs
│ │ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ │ └── versions.yml
│ │ └── supplemental
│ │ └── ShigaPass_summary.csv
│ ├── shigatyper
│ │ ├── <SAMPLE_NAME>-hits.tsv
│ │ ├── <SAMPLE_NAME>.tsv
│ │ └── logs
│ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ └── versions.yml
│ ├── shigeifinder
│ │ ├── <SAMPLE_NAME>.tsv
│ │ └── logs
│ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ └── versions.yml
│ └── stecfinder
│ ├── <SAMPLE_NAME>.tsv
│ └── logs
│ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ └── versions.yml
├── <SAMPLE_NAME>SE
│ └── tools
│ ├── clermontyping
│ │ ├── <SAMPLE_NAME>SE.tsv
│ │ ├── logs
│ │ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ │ └── versions.yml
│ │ └── supplemental
│ │ ├── <SAMPLE_NAME>SE.blast.xml
│ │ ├── <SAMPLE_NAME>SE.html
│ │ └── <SAMPLE_NAME>SE.mash.tsv
│ ├── ectyper
│ │ ├── <SAMPLE_NAME>SE.blast_alleles.txt
│ │ ├── <SAMPLE_NAME>SE.tsv
│ │ └── logs
│ │ ├── ectyper.log
│ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ └── versions.yml
│ ├── kleborate
│ │ ├── <SAMPLE_NAME>SE.tsv
│ │ └── logs
│ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ └── versions.yml
│ ├── merlindist
│ │ └── merlin-<TIMESTAMP>
│ │ ├── <SAMPLE_NAME>SE-dist.txt
│ │ └── logs
│ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ └── versions.yml
│ ├── shigapass
│ │ ├── <SAMPLE_NAME>SE.tsv
│ │ ├── logs
│ │ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ │ └── versions.yml
│ │ └── supplemental
│ │ └── ShigaPass_summary.csv
│ ├── shigatyper
│ │ ├── <SAMPLE_NAME>SE-hits.tsv
│ │ ├── <SAMPLE_NAME>SE.tsv
│ │ └── logs
│ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ └── versions.yml
│ ├── shigeifinder
│ │ ├── <SAMPLE_NAME>SE.tsv
│ │ └── logs
│ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ └── versions.yml
│ └── stecfinder
│ ├── <SAMPLE_NAME>SE.tsv
│ └── logs
│ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ └── versions.yml
├── SRR13039589
│ └── tools
│ ├── clermontyping
│ │ ├── SRR13039589.tsv
│ │ ├── logs
│ │ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ │ └── versions.yml
│ │ └── supplemental
│ │ ├── SRR13039589.blast.xml
│ │ ├── SRR13039589.html
│ │ └── SRR13039589.mash.tsv
│ ├── ectyper
│ │ ├── SRR13039589.blast_alleles.txt
│ │ ├── SRR13039589.tsv
│ │ └── logs
│ │ ├── ectyper.log
│ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ └── versions.yml
│ ├── kleborate
│ │ ├── SRR13039589.tsv
│ │ └── logs
│ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ └── versions.yml
│ ├── merlindist
│ │ └── merlin-<TIMESTAMP>
│ │ ├── SRR13039589-dist.txt
│ │ └── logs
│ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ └── versions.yml
│ ├── shigapass
│ │ ├── SRR13039589.tsv
│ │ ├── logs
│ │ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ │ └── versions.yml
│ │ └── supplemental
│ │ └── ShigaPass_summary.csv
│ ├── shigatyper
│ │ ├── SRR13039589-hits.tsv
│ │ ├── SRR13039589.tsv
│ │ └── logs
│ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ └── versions.yml
│ ├── shigeifinder
│ │ ├── SRR13039589.tsv
│ │ └── logs
│ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ └── versions.yml
│ └── stecfinder
│ ├── SRR13039589.tsv
│ └── logs
│ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ └── versions.yml
└── bactopia-runs
└── merlin-<TIMESTAMP>
├── merged-results
│ ├── clermontyping.tsv
│ ├── ectyper.tsv
│ ├── kleborate.tsv
│ ├── logs
│ │ ├── clermontyping-concat
│ │ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ │ └── versions.yml
│ │ ├── ectyper-concat
│ │ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ │ └── versions.yml
│ │ ├── kleborate-concat
│ │ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ │ └── versions.yml
│ │ ├── shigapass-concat
│ │ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ │ └── versions.yml
│ │ ├── shigatyper-concat
│ │ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ │ └── versions.yml
│ │ ├── shigeifinder-concat
│ │ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ │ └── versions.yml
│ │ └── stecfinder-concat
│ │ ├── nf.command.{begin,err,log,out,run,sh,trace}
│ │ └── versions.yml
│ ├── shigapass.tsv
│ ├── shigatyper.tsv
│ ├── shigeifinder.tsv
│ └── stecfinder.tsv
└── nf-reports
├── merlin-dag.dot
├── merlin-report.html
└── merlin-timeline.html

Species-Specific Analysis

note

Tools executed depend on detected species

FileDescription
Analysisresults from all executed species-specific tools

Merged Results

FileDescription
merlin.tsvMerged summary of all species-specific analyses

Audit Trail

Below are files that can assist you in understanding which parameters and program versions were used.

Logs

Each process that is executed will have a folder named logs. In this folder are helpful files for you to review if the need ever arises.

ExtensionDescription
.beginAn empty file used to designate the process started
.errContains STDERR outputs from the process
.logContains both STDERR and STDOUT outputs from the process
.outContains STDOUT outputs from the process
.runThe script Nextflow uses to stage/unstage files and queue processes based on given profile
.shThe script executed by bash for the process
.traceThe Nextflow trace report for the process
versions.ymlA YAML formatted file with program versions

Nextflow Reports

These Nextflow reports provide great a great summary of your run. These can be used to optimize resource usage and estimate expected costs if using cloud platforms.

FilenameDescription
merlin-dag.dotThe Nextflow DAG visualization
merlin-report.htmlThe Nextflow Execution Report
merlin-timeline.htmlThe Nextflow Timeline Report
merlin-trace.txtThe Nextflow Trace report

Parameters

Required Parameters

Define where the pipeline should find input data and save output data.

ParameterTypeDefaultDescription
--bactopiastringThe path to bactopia results to use as inputs

mashdist Parameters

ParameterTypeDefaultDescription
--mash_sketchstringThe reference sequence as a Mash Sketch (.msh file)
--full_merlinbooleanfalseGo full Merlin and run all species-specific tools, no matter the Mash distance

ClermonTyping Parameters

ParameterTypeDefaultDescription
--clermontyping_thresholdinteger0Do not use contigs under this size

ECTyper Parameters

ParameterTypeDefaultDescription
--ectyper_opidinteger90Percent identity required for an O antigen allele match
--ectyper_opcovinteger90Minimum percent coverage required for an O antigen allele match
--ectyper_hpidinteger95Percent identity required for an H antigen allele match
--ectyper_hpcovinteger50Minimum percent coverage required for an H antigen allele match

emmtyper Parameters

ParameterTypeDefaultDescription
--emmtyper_wfstringblastWorkflow for emmtyper to use. (choices: blast, pcr)
--emmtyper_blastdbstringPath to custom EMM BLAST DB.
--emmtyper_cluster_distanceinteger500Distance between cluster of matches to consider as different clusters
--emmtyper_percidinteger95Minimal percent identity of sequence

hicap Parameters

ParameterTypeDefaultDescription
--hicap_gene_coveragenumber0.8Minimum percentage coverage to consider a single gene complete
--hicap_gene_identitynumber0.7Minimum percentage identity to consider a single gene complete
--hicap_broken_gene_lengthinteger60Minimum length to consider a broken gene
--hicap_broken_gene_identitynumber0.8Minimum percentage identity to consider a broken gene

Mykrobe Parameters

ParameterTypeDefaultDescription
--mykrobe_speciesstringSpecies panel to use (choices: sonnei, staph, tb, typhi)
--mykrobe_optsstringExtra Mykrobe options in quotes

GenoTyphi Parameters

ParameterTypeDefaultDescription
--genotyphi_mykrobe_optsstringExtra Mykrobe options in quotes

Kleborate Parameters

ParameterTypeDefaultDescription
--kleborate_presetstringkpscPreset module to use for Kleborate (choices: kpsc, kosc, escherichia)
--kleborate_optsstringExtra options in quotes for Kleborate

legsta Parameters

ParameterTypeDefaultDescription
--legsta_noheaderbooleanfalseDon't print header row

LisSero Parameters

ParameterTypeDefaultDescription
--lissero_min_idnumber95.0Minimum percent identity to accept a match
--lissero_min_covnumber95.0Minimum coverage of the gene to accept a match

ngmaster Parameters

ParameterTypeDefaultDescription
--ngmaster_csvbooleanfalseoutput comma-separated format (CSV) rather than tab-separated

pasty Parameters

ParameterTypeDefaultDescription
--pasty_min_pidentinteger95Minimum percent identity to count a hit
--pasty_min_coverageinteger95Minimum percent coverage to count a hit

pbptyper Parameters

ParameterTypeDefaultDescription
--pbptyper_min_pidentinteger95Minimum percent identity to count a hit
--pbptyper_min_coverageinteger95Minimum percent coverage to count a hit

SeqSero2 Parameters

ParameterTypeDefaultDescription
--seqsero2_run_modestringkWorkflow to run. 'a' allele mode, or 'k' k-mer mode (choices: a, k)
--seqsero2_input_typestringassemblyInput format to analyze. 'assembly' or 'fastq' (choices: assembly, fastq)

SeroBA Parameters

ParameterTypeDefaultDescription
--seroba_coverageinteger20Threshold for k-mer coverage of the reference sequence

SISTR Parameters

ParameterTypeDefaultDescription
--sistr_full_cgmlstbooleanfalseUse the full set of cgMLST alleles which can include highly similar alleles

AgrVATE Parameters

ParameterTypeDefaultDescription
--agrvate_typing_onlybooleanfalseagr typing only. Skips agr operon extraction and frameshift detection

spaTyper Parameters

ParameterTypeDefaultDescription
--spatyper_do_enrichbooleanfalseDo PCR product enrichment

sccmec Parameters

ParameterTypeDefaultDescription
--sccmec_min_targets_pidentinteger90Minimum percent identity to count a target hit
--sccmec_min_targets_coverageinteger80Minimum percent coverage to count a target hit
--sccmec_min_regions_pidentinteger85Minimum percent identity to count a region hit
--sccmec_min_regions_coverageinteger93Minimum percent coverage to count a region hit

STECFinder Parameters

ParameterTypeDefaultDescription
--stecfinder_use_readsbooleanfalsePaired-end Illumina reads will be used instead of assemblies
--stecfinder_hitsbooleanfalseShow detailed gene search results
--stecfinder_cutoffnumber10.0Minimum read coverage for gene to be called
--stecfinder_lengthnumber50.0Percentage of gene length needed for positive call

TB-Profiler Profile Parameters

ParameterTypeDefaultDescription
--tbprofiler_call_whole_genomebooleanfalseCall whole genome
--tbprofiler_mapperstringbwaMapping tool to use. If you are using nanopore data it will default to minimap2 (choices: bwa, minimap2, bowtie2, bwa-mem2)
--tbprofiler_callerstringfreebayesVariant calling tool to use (choices: bcftools, gatk, freebayes)
--tbprofiler_optsstringExtra options in quotes for TBProfiler

TB-Profiler Collate Parameters

ParameterTypeDefaultDescription
--tbprofiler_itolbooleanfalseGenerate itol config files
--tbprofiler_fullbooleanfalseOutput mutations in main result file
--tbprofiler_all_variantsbooleanfalseOutput all variants in variant matrix
--tbprofiler_mark_missingbooleanfalseAn asterisk will be used to mark predictions which are affected by missing data at a drug resistance position
Filtering Parameters

Use these parameters to specify which samples to include or exclude.

ParameterTypeDefaultDescription
--includestringA text file containing sample names (one per line) to include from the analysis
--excludestringA text file containing sample names (one per line) to exclude from the analysis
Optional Parameters

These optional parameters can be useful in certain settings.

ParameterTypeDefaultDescription
--outdirstringbactopiaBase directory to write results to
Nextflow Profile Parameters

Parameters to fine-tune your Nextflow setup.

ParameterTypeDefaultDescription
--datasets_cachestring<HOME>/.bactopia/datasetsDirectory where downloaded datasets should be stored.
Helpful Parameters

Uncommonly used parameters that might be useful.

ParameterTypeDefaultDescription
--wfstringbactopiaSpecify which workflow or Bactopia Tool to execute
--list_wfsbooleanList the available workflows and Bactopia Tools to use with '--wf'
--help_allbooleanAn alias for --help --show_hidden_params
--versionbooleanDisplay version text.

Composition

This workflow uses the following subworkflows:

  • bactopia_datasets - Download and provide pre-compiled datasets required by Bactopia.
  • merlin - MinER assisted species-specific bactopia tool seLectIoN.

Citations

If you use this in your analysis, please cite the following.

Source

View source on GitHub