gtdbtk_classifywf
Tags: taxonomy classification phylogeny gtdb bacteria archaea marker-genes sample-scope
Taxonomic classification of bacterial and archaeal genomes using GTDB-Tk.
Uses GTDB-Tk to assign objective taxonomic classifications to genome assemblies based on the Genome Taxonomy Database. It identifies marker genes, aligns them, and places the genome into the reference tree to determine taxonomy.
Requires the massive GTDB-Tk database (~60GB+) to be available.
Inputs
record (
meta: Record,
fna: Path
)
| Field | Type | Description |
|---|---|---|
meta | Record | Groovy Record containing sample information |
fna | Path | Assembled contigs in FASTA format |
db: Path
| Name | Type | Description |
|---|---|---|
db | Path | Path (or Set of paths) to the GTDB-Tk reference database |
Outputs
record (
meta: Record,
bac_tsv: Path?,
ar_tsv: Path?,
results: Set<Path>,
logs: Set<Path?>,
nf_logs: Set<Path>,
versions: Set<Path>
)
| Field | Type | Description |
|---|---|---|
meta | Record | Sample information record |
bac_tsv | Path? | The bacterial classification summary file containing the taxonomic assignment |
ar_tsv | Path? | The archaeal classification summary file containing the taxonomic assignment |
results | Set<Path> | All output files to be published |
logs | Set<Path?> | Optional program specific log files |
nf_logs | Set<Path> | Nextflow-specific log files (e.g. .command.{begin |
versions | Set<Path> | A YAML formatted file with program versions |
Parameters
GTDB Classify Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
--gtdb_min_af | number | 0.65 | Minimum alignment fraction to consider closest genome |
--gtdb_min_perc_aa | integer | 10 | Filter genomes with an insufficient percentage of AA in the MSA |
--force_gtdb | boolean | false | Continue processing if an error occurs on a single genome |
Used By
Subworkflows
- gtdb - Taxonomic classification with the Genome Taxonomy Database.
Workflows
- gtdb - Identify marker genes and assign taxonomic classifications using GTDB.
Citations
If you use this in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
GTDB-Tk
Chaumeil PA, Mussig AJ, Hugenholtz P, Parks DH GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics (2019) -
pplacer
Matsen FA, Kodner RB, Armbrust EV pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11, 538 (2010)
Source
Version
GTDBTK_CLASSIFYWF:
- gtdbtk: 2.7.1