checkm2_predict
Tags: quality-control completeness contamination machine-learning bacteria archaea sample-scope
Assess genome quality using machine learning.
Uses CheckM2 to predict the completeness and contamination of genome assemblies. Unlike the original CheckM, it uses a gradient boost machine learning model to predict quality without relying on lineage-specific marker sets, making it more accurate for novel or reduced genomes.
Requires the CheckM2 database (Diamond database file) to be available.
Inputs
record (
meta: Record,
fna: Path
)
| Field | Type | Description |
|---|---|---|
meta | Record | Groovy Record containing sample information |
fna | Path | Assembled contigs in FASTA format |
db: Path
| Name | Type | Description |
|---|---|---|
db | Path | The CheckM2 database file (*.dmnd) |
Outputs
record (
meta: Record,
tsv: Path,
results: Set<Path>,
logs: Set<Path?>,
nf_logs: Set<Path>,
versions: Set<Path>
)
| Field | Type | Description |
|---|---|---|
meta | Record | Sample information record |
tsv | Path | Tab-delimited report of quality metrics (Completeness, Contamination) |
results | Set<Path> | All output files to be published |
logs | Set<Path?> | Optional program specific log files |
nf_logs | Set<Path> | Nextflow-specific log files (e.g. .command.{begin |
versions | Set<Path> | A YAML formatted file with program versions |
Parameters
CheckM2 Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
--checkm2_lowmem | boolean | Low memory mode. Reduces DIAMOND blocksize to significantly reduce RAM usage at the expense of longer runtime | |
--checkm2_general | boolean | Force the use of the general quality prediction model (gradient boost) | |
--checkm2_specific | boolean | Force the use of the specific quality prediction model (neural network) | |
--checkm2_allmodels | boolean | Output quality prediction for both models for each genome. | |
--checkm2_genes | boolean | Treat input files as protein files. [Default: False] | |
--checkm2_opts | string | Additional options to pass to CheckM2 |
Used By
Subworkflows
- checkm2 - Assess metagenome bin completeness using CheckM2.
Workflows
- checkm2 - Machine learning-based assessment of microbial genome assembly quality.
Citations
If you use this in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
CheckM2
Chklovksi A Rapid assessment of genome bin quality using machine learning (GitHub)
Source
Version
CHECKM2_PREDICT:
- checkm2: 1.1.0