Skip to main content

prokka

Tags: bacteria annotation genome prokaryote functional-annotation genes sample-scope

Annotate bacterial genomes with functional information.

This subworkflow annotates bacterial assemblies using Prokka. It rapidly calls genes, translates them, and searches them against multiple protein databases to produce comprehensive annotation in various standard formats. Optional protein sequences and Prodigal training files can be provided to improve annotation accuracy.

Take

assembly: Channel<Record>
FieldDescription
metaGroovy Record containing sample information
assemblyBacterial assembly files in FASTA format to be annotated
proteins: Path?
prodigal_tf: Path?
NameTypeDescription
proteinsPath?Optional protein sequences for homology search to improve annotation accuracy
prodigal_tfPath?Optional Prodigal training file for improved gene prediction accuracy

Emit

Published

The sample_outputs and run_outputs emissions are aggregates of output files that will be published in the entry workflow.

sample_outputs

OutputDescription
gffAnnotation in GFF3 format, containing both sequences and annotations
gbffAnnotation in GenBank format, containing both sequences and annotations
fnaNucleotide FASTA file of the input contig sequences
faaProtein FASTA file of the translated CDS sequences
ffnNucleotide FASTA file of all prediction transcripts (CDS, rRNA, tRNA, tmRNA, misc_RNA)
sqnAn ASN1 format "Sequin" file for submission to GenBank
fsaNucleotide FASTA file of the input contig sequences, used by tbl2asn
tblFeature Table file for NCBI submission
txtSummary statistics relating to the annotated features found
tsvTab-separated file of all features
blastdbA compressed tar.gz archive of BLAST+ databases

run_outputs

No run-scope outputs.

Downstream Inputs

The following emissions are meant to be used as inputs to downstream subworkflows.

annotations

OutputDescription
fnaAnnotated nucleotide sequences in FASTA format
faaProtein sequences in FASTA format
gffAnnotations in GFF3 format

gffs

OutputDescription
gffGFF3 annotation file for pangenome analysis

Module Composition

This subworkflow calls the following modules:

  • prokka - Annotate prokaryotic genomes.

Used By

This subworkflow is used by the following workflows:

  • bactopia - Comprehensive bacterial analysis pipeline for complete genomic characterization.
  • pangenome - Pangenome analysis with optional core-genome phylogeny.
  • prokka - Rapid whole genome annotation of bacterial, archaeal, and viral genomes.
  • staphopia - Comprehensive analysis pipeline for Staphylococcus aureus isolates.

Citations

If you use this in your analysis, please cite the following.

Source

View source on GitHub