checkm_lineagewf
Tags: quality-control completeness contamination marker-genes lineage bacteria archaea sample-scope
Assess genome quality using lineage-specific marker sets.
Uses CheckM to estimate the completeness and contamination of genome assemblies. It places the genome into a reference tree to select an appropriate set of single-copy marker genes, then calculates quality metrics based on the recovery of these markers.
Requires the CheckM reference database (~275GB uncompressed) to be configured via the
CHECKM_DATA_PATH environment variable or pre-installed in the container.
Inputs
record (
meta: Record,
fna: Path
)
| Field | Type | Description |
|---|---|---|
meta | Record | Groovy Record containing sample information |
fna | Path | Assembled contigs in FASTA format |
Outputs
record (
meta: Record,
tsv: Path,
results: Set<Path>,
logs: Set<Path?>,
nf_logs: Set<Path>,
versions: Set<Path>
)
| Field | Type | Description |
|---|---|---|
meta | Record | Sample information record |
tsv | Path | Tab-delimited genome quality report with completeness and contamination estimates |
results | Set<Path> | All output files to be published |
logs | Set<Path?> | Optional program specific log files |
nf_logs | Set<Path> | Nextflow-specific log files (e.g. .command.{begin |
versions | Set<Path> | A YAML formatted file with program versions |
Parameters
CheckM Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
--checkm_unique | integer | 10 | Minimum number of unique phylogenetic markers required to use lineage-specific marker set. |
--checkm_multi | integer | 10 | Maximum number of multi-copy phylogenetic markers before defaulting to domain-level marker set. |
--checkm_aai_strain | number | 0.9 | AAI threshold used to identify strain heterogeneity |
--checkm_length | number | 0.7 | Percent overlap between target and query |
Used By
Subworkflows
- checkm - Assess metagenome bin completeness using CheckM.
Workflows
- checkm - Assessment of microbial genome assembly quality.
Citations
If you use this in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
CheckM
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25, 1043-1055 (2015) -
pplacer
Matsen FA, Kodner RB, Armbrust EV pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11, 538 (2010)
Source
Version
CHECKM_LINEAGEWF:
- checkm-genome: 1.2.5