Skip to main content

bakta

Tags: bacteria annotation genome functional-annotation taxonomy sample-scope

Rapid bacterial genome annotation.

This subworkflow uses Bakta to provide rapid, comprehensive annotation of bacterial genomes. It can download and prepare the Bakta database on-demand or use a pre-existing database. The workflow processes each sample individually, producing multiple output formats including GFF3, GenBank, protein sequences, nucleotide sequences, and a BLAST database.

Take

assembly: Channel<Record>
FieldDescription
metaGroovy Record containing sample information
assemblyAssembled contigs in FASTA format
database: Path?
download_bakta: Boolean
save_as_tarball: Boolean
proteins: Path?
prodigal_tf: Path?
replicons: Path?
NameTypeDescription
databasePath?Optional pre-existing Bakta database path
download_baktaBooleanBoolean flag to trigger automatic database download
save_as_tarballBooleanBoolean flag to save downloaded database as tarball
proteinsPath?Optional trusted protein sequences for homology search
prodigal_tfPath?Optional Prodigal training file for improved gene prediction
repliconsPath?Optional replicon sequences for plasmid identification

Emit

Published

The sample_outputs and run_outputs emissions are aggregates of output files that will be published in the entry workflow.

sample_outputs

OutputDescription
emblAnnotations and sequences in EMBL format
faaCDS/sORF amino acid sequences as FASTA
ffnFeature nucleotide sequences as FASTA
fnaReplicon/contig DNA sequences as FASTA
gbffAnnotations and sequences in GenBank format
gffAnnotations and sequences in GFF3 format
hypotheticals_tsvFurther information on hypothetical protein CDS as tab-separated values
hypotheticals_faaHypothetical protein CDS amino acid sequences as FASTA
tsvAnnotations as simple human readable tab-separated values
txtBroad summary of Bakta annotations
blastdbA compressed tar.gz archive of BLAST+ databases of the contigs, genes, and proteins

run_outputs

No run-scope outputs.

Downstream Inputs

The following emissions are meant to be used as inputs to downstream subworkflows.

annotations

OutputDescription
fnaAnnotated nucleotide sequences in FASTA format
faaProtein sequences in FASTA format
gffAnnotations in GFF3 format

Module Composition

This subworkflow calls the following modules:

  • bakta_download - Download the Bakta annotation database.
  • bakta_run - Rapid and standardized annotation of bacterial genomes and plasmids.

Used By

This subworkflow is used by the following workflows:

  • bactopia - Comprehensive bacterial analysis pipeline for complete genomic characterization.
  • bakta - Rapid annotation of bacterial genomes and plasmids.
  • staphopia - Comprehensive analysis pipeline for Staphylococcus aureus isolates.

Citations

If you use this in your analysis, please cite the following.

Source

View source on GitHub