Skip to main content

checkm_lineagewf

Tags: quality-control completeness contamination marker-genes lineage bacteria archaea sample-scope

Assess genome quality using lineage-specific marker sets.

Uses CheckM to estimate the completeness and contamination of genome assemblies. It places the genome into a reference tree to select an appropriate set of single-copy marker genes, then calculates quality metrics based on the recovery of these markers.

Database Required

Requires the CheckM reference database (~275GB uncompressed) to be configured via the CHECKM_DATA_PATH environment variable or pre-installed in the container.

Inputs

record (
meta: Record,
fna: Path
)
FieldTypeDescription
metaRecordGroovy Record containing sample information
fnaPathAssembled contigs in FASTA format

Outputs

record (
meta: Record,
tsv: Path,
results: Set<Path>,
logs: Set<Path?>,
nf_logs: Set<Path>,
versions: Set<Path>
)
FieldTypeDescription
metaRecordSample information record
tsvPathTab-delimited genome quality report with completeness and contamination estimates
resultsSet<Path>All output files to be published
logsSet<Path?>Optional program specific log files
nf_logsSet<Path>Nextflow-specific log files (e.g. .command.{begin
versionsSet<Path>A YAML formatted file with program versions

Parameters

CheckM Parameters

ParameterTypeDefaultDescription
--checkm_uniqueinteger10Minimum number of unique phylogenetic markers required to use lineage-specific marker set.
--checkm_multiinteger10Maximum number of multi-copy phylogenetic markers before defaulting to domain-level marker set.
--checkm_aai_strainnumber0.9AAI threshold used to identify strain heterogeneity
--checkm_lengthnumber0.7Percent overlap between target and query

Used By

Subworkflows

  • checkm - Assess metagenome bin completeness using CheckM.

Workflows

  • checkm - Assessment of microbial genome assembly quality.

Citations

If you use this in your analysis, please cite the following.

Source

View source on GitHub

Version

CHECKM_LINEAGEWF:
- checkm-genome: 1.2.5