gtdb
Tags: taxonomy classification gtdb phylogeny marker-genes sample-scope
Taxonomic classification with the Genome Taxonomy Database.
This subworkflow assigns objective taxonomic classifications to bacterial and archaeal genomes using GTDB-Tk, which is based on the Genome Taxonomy Database (GTDB). The workflow can optionally download the GTDB database and supports both unpacked and tarball database formats. It provides taxonomic placement and phylogenetic marker gene identification.
Take
assembly: Channel<Record>
| Field | Description |
|---|---|
meta | Groovy Record containing sample information |
assembly | Assembly files in FASTA format for taxonomic classification |
database: Path
download_gtdb: Boolean
save_as_tarball: Boolean
| Name | Type | Description |
|---|---|---|
database | Path | Path to GTDB reference database, or path to download to if download_gtdb is true |
download_gtdb | Boolean | Boolean flag to trigger GTDB database download if not provided |
save_as_tarball | Boolean | Boolean flag to use tarball format database when downloading |
Emit
Published
The sample_outputs and run_outputs emissions are aggregates of output files that will be published in the entry workflow.
sample_outputs
| Output | Description |
|---|---|
bac_tsv | The bacterial classification summary file containing the taxonomic assignment |
ar_tsv | The archaeal classification summary file containing the taxonomic assignment |
supplemental | Directory containing the reference tree, alignments, and detailed logs |
run_outputs
| Output | Description |
|---|---|
csv | Aggregated results in CSV format |
Module Composition
This subworkflow calls the following modules:
- csvtk_concat - Concatenate multiple CSV or TSV files into a single table.
- gtdbtk_classifywf - Taxonomic classification of bacterial and archaeal genomes using GTDB-Tk.
- gtdbtk_download - Download and configure the GTDB-Tk reference database.
Used By
This subworkflow is used by the following workflows:
- gtdb - Identify marker genes and assign taxonomic classifications using GTDB.
Citations
If you use this in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
GTDB-Tk
Chaumeil PA, Mussig AJ, Hugenholtz P, Parks DH GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics (2019)