Skip to main content

bakta_run

Tags: bacteria annotation genome assembly prodigal compliant genbank ena sample-scope

Rapid and standardized annotation of bacterial genomes and plasmids.

Uses Bakta to annotate genomes via alignment-free sequence identification. It detects CDS, sORFs, tRNAs, tmRNAs, rRNAs, ncRNAs, and CRISPR arrays, assigning functions from a comprehensive database.

Database Required

Requires a Bakta database (directory or tarball) to be available.

Inputs

record (
meta: Record,
fna: Path
)
FieldTypeDescription
metaRecordGroovy Record containing sample information
fnaPathAssembled contigs in FASTA format
db: Path
proteins: Path?
prodigal_tf: Path?
replicons: Path?
NameTypeDescription
dbPathPath to the Bakta database (Directory or compressed tarball)
proteinsPath?FASTA file of trusted proteins to use for first-pass annotation
prodigal_tfPath?Prodigal training file for CDS prediction
repliconsPath?Table (TSV/CSV) of replicon information for origin detection

Outputs

record (
meta: Record,
blastdb: Path,
faa: Path,
ffn: Path,
fna: Path,
gbff: Path,
gff: Path,
hypotheticals_tsv: Path,
hypotheticals_faa: Path,
inference_tsv: Path,
json: Path,
png: Path,
svg: Path,
tsv: Path,
txt: Path,
results: Set<Path>,
logs: Set<Path?>,
nf_logs: Set<Path>,
versions: Set<Path>
)
FieldTypeDescription
metaRecordSample information record
blastdbPathA compressed tar.gz archive of BLAST+ databases of the contigs, genes, and proteins
faaPathCDS/sORF amino acid sequences as FASTA
ffnPathFeature nucleotide sequences as FASTA
fnaPathReplicon/contig DNA sequences as FASTA
gbffPathAnnotations and sequences in GenBank format
gffPathAnnotations and sequences in GFF3 format
hypotheticals_tsvPathFurther information on hypothetical protein CDS as tab-separated values
hypotheticals_faaPathHypothetical protein CDS amino acid sequences as FASTA
inference_tsvPathDetailed annotation evidence and database hit information
jsonPathMachine-readable annotations and metadata in JSON format
pngPathCircular genome plot as PNG image
svgPathCircular genome plot as SVG image
tsvPathAnnotations as simple human readable tab-separated values
txtPathBroad summary of Bakta annotations
resultsSet<Path>All output files to be published
logsSet<Path?>Optional program specific log files
nf_logsSet<Path>Nextflow-specific log files (e.g. .command.{begin
versionsSet<Path>A YAML formatted file with program versions

Parameters

Bakta Parameters

ParameterTypeDefaultDescription
--bakta_proteinsstringFASTA file of trusted proteins to first annotate from
--bakta_prodigal_tfstringTraining file to use for Prodigal
--bakta_repliconsstringReplicon information table (tsv/csv)

Used By

Subworkflows

  • bakta - Rapid bacterial genome annotation.

Workflows

  • bactopia - Comprehensive bacterial analysis pipeline for complete genomic characterization.
  • bakta - Rapid annotation of bacterial genomes and plasmids.
  • staphopia - Comprehensive analysis pipeline for Staphylococcus aureus isolates.

Citations

If you use this in your analysis, please cite the following.

Source

View source on GitHub

Version

BAKTA_RUN:
- bakta: 1.12.0