checkm2
Tags: metagenome bin completeness contamination mag quality machine-learning sample-scope
Assess metagenome bin completeness using CheckM2.
This subworkflow evaluates the quality and completeness of metagenome-assembled genomes (MAGs) using CheckM2. It provides an improved assessment using machine learning models trained on high-quality reference genomes, offering more accurate completeness and contamination estimates. The workflow can either download the required database or use a user-provided database path.
Take
assembly: Channel<Record>
| Field | Description |
|---|---|
meta | Groovy Record containing sample information |
assembly | Metagenome-assembled genome bins to evaluate. Each record contains metadata |
database: Path
download_checkm2: Boolean
| Name | Type | Description |
|---|---|---|
database | Path | Path to CheckM2 database directory. If download_checkm2 is true, this can be a placeholder as the database will be downloaded automatically. |
download_checkm2 | Boolean | Boolean flag to automatically download the CheckM2 database if not available. When true, downloads the required reference database before prediction. |
Emit
Published
The sample_outputs and run_outputs emissions are aggregates of output files that will be published in the entry workflow.
sample_outputs
| Output | Description |
|---|---|
tsv | A tab-delimited report of quality metrics (Completeness, Contamination) |
supplemental | Directory containing intermediate protein files and Diamond alignments |
run_outputs
| Output | Description |
|---|---|
csv | Aggregated results in CSV format |
Module Composition
This subworkflow calls the following modules:
- csvtk_concat - Concatenate multiple CSV or TSV files into a single table.
- checkm2_predict - Assess genome quality using machine learning.
- checkm2_download - Download the pre-trained CheckM2 database.
Used By
This subworkflow is used by the following workflows:
- checkm2 - Machine learning-based assessment of microbial genome assembly quality.
Citations
If you use this in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
CheckM2
Chklovksi A Rapid assessment of genome bin quality using machine learning (GitHub)