bakta_run
Tags: bacteria annotation genome assembly prodigal compliant genbank ena sample-scope
Rapid and standardized annotation of bacterial genomes and plasmids.
Uses Bakta to annotate genomes via alignment-free sequence identification. It detects CDS, sORFs, tRNAs, tmRNAs, rRNAs, ncRNAs, and CRISPR arrays, assigning functions from a comprehensive database.
Requires a Bakta database (directory or tarball) to be available.
Inputs
record (
meta: Record,
fna: Path
)
| Field | Type | Description |
|---|---|---|
meta | Record | Groovy Record containing sample information |
fna | Path | Assembled contigs in FASTA format |
db: Path
proteins: Path?
prodigal_tf: Path?
replicons: Path?
| Name | Type | Description |
|---|---|---|
db | Path | Path to the Bakta database (Directory or compressed tarball) |
proteins | Path? | FASTA file of trusted proteins to use for first-pass annotation |
prodigal_tf | Path? | Prodigal training file for CDS prediction |
replicons | Path? | Table (TSV/CSV) of replicon information for origin detection |
Outputs
record (
meta: Record,
blastdb: Path,
faa: Path,
ffn: Path,
fna: Path,
gbff: Path,
gff: Path,
hypotheticals_tsv: Path,
hypotheticals_faa: Path,
inference_tsv: Path,
json: Path,
png: Path,
svg: Path,
tsv: Path,
txt: Path,
results: Set<Path>,
logs: Set<Path?>,
nf_logs: Set<Path>,
versions: Set<Path>
)
| Field | Type | Description |
|---|---|---|
meta | Record | Sample information record |
blastdb | Path | A compressed tar.gz archive of BLAST+ databases of the contigs, genes, and proteins |
faa | Path | CDS/sORF amino acid sequences as FASTA |
ffn | Path | Feature nucleotide sequences as FASTA |
fna | Path | Replicon/contig DNA sequences as FASTA |
gbff | Path | Annotations and sequences in GenBank format |
gff | Path | Annotations and sequences in GFF3 format |
hypotheticals_tsv | Path | Further information on hypothetical protein CDS as tab-separated values |
hypotheticals_faa | Path | Hypothetical protein CDS amino acid sequences as FASTA |
inference_tsv | Path | Detailed annotation evidence and database hit information |
json | Path | Machine-readable annotations and metadata in JSON format |
png | Path | Circular genome plot as PNG image |
svg | Path | Circular genome plot as SVG image |
tsv | Path | Annotations as simple human readable tab-separated values |
txt | Path | Broad summary of Bakta annotations |
results | Set<Path> | All output files to be published |
logs | Set<Path?> | Optional program specific log files |
nf_logs | Set<Path> | Nextflow-specific log files (e.g. .command.{begin |
versions | Set<Path> | A YAML formatted file with program versions |
Parameters
Bakta Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
--bakta_proteins | string | FASTA file of trusted proteins to first annotate from | |
--bakta_prodigal_tf | string | Training file to use for Prodigal | |
--bakta_replicons | string | Replicon information table (tsv/csv) |
Used By
Subworkflows
- bakta - Rapid bacterial genome annotation.
Workflows
- bactopia - Comprehensive bacterial analysis pipeline for complete genomic characterization.
- bakta - Rapid annotation of bacterial genomes and plasmids.
- staphopia - Comprehensive analysis pipeline for Staphylococcus aureus isolates.
Citations
If you use this in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
Bakta
Schwengers O, Jelonek L, Dieckmann MA, Beyvers S, Blom J, Goesmann A Bakta - rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microbial Genomics 7(11) (2021) -
Aragorn
Laslett D, Canback B ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res. 32(1):11-6 (2004) -
DIAMOND
Buchfink B, Xie C, Huson DH Fast and sensitive protein alignment using DIAMOND. Nat. Methods. 12, 59-60 (2015) -
HMMER
Eddy SR Accelerated Profile HMM Searches. PLoS Comput. Biol. 7, e1002195 (2011) -
Infernal
Nawrocki EP, Eddy SR Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29(22), 2933-2935 (2013) -
Prodigal
Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11.1 119 (2010)
Source
Version
BAKTA_RUN:
- bakta: 1.12.0