Skip to main content

prokka

Tags: prokka annotation prokaryotic bacteria genbank gff sample-scope

Annotate prokaryotic genomes.

Uses Prokka to rapidly annotate bacterial, archaeal, and viral genomes, producing standards-compliant output files including GFF3, GenBank, and Sequin.

Inputs

record (
meta: Record,
fna: Path
)
FieldTypeDescription
metaRecordGroovy Record containing sample information
fnaPathAssembled contigs in FASTA format
proteins: Path?
prodigal_tf: Path?
NameTypeDescription
proteinsPath?FASTA file of trusted proteins to first annotate from
prodigal_tfPath?Training file to use for gene prediction

Outputs

record (
meta: Record,
gff: Path,
gbff: Path,
fna: Path,
faa: Path,
ffn: Path,
sqn: Path,
fsa: Path,
tbl: Path,
txt: Path,
tsv: Path,
blastdb: Path,
results: Set<Path>,
logs: Set<Path?>,
nf_logs: Set<Path>,
versions: Set<Path>
)
FieldTypeDescription
metaRecordSample information record
gffPathAnnotation in GFF3 format, containing both sequences and annotations
gbffPathAnnotation in GenBank format, containing both sequences and annotations
fnaPathNucleotide FASTA file of the input contig sequences
faaPathProtein FASTA file of the translated CDS sequences
ffnPathNucleotide FASTA file of all prediction transcripts (CDS, rRNA, tRNA, tmRNA, misc_RNA)
sqnPathAn ASN1 format "Sequin" file for submission to GenBank
fsaPathNucleotide FASTA file of the input contig sequences, used by tbl2asn
tblPathFeature Table file for NCBI submission
txtPathSummary statistics relating to the annotated features found
tsvPathTab-separated file of all features (locus_tag, ftype, len_bp, gene, EC_number, COG, product)
blastdbPathA compressed tar.gz archive of BLAST+ databases of the contigs, genes, and proteins
resultsSet<Path>All output files to be published
logsSet<Path?>Optional program specific log files
nf_logsSet<Path>Nextflow-specific log files (e.g. .command.{begin
versionsSet<Path>A YAML formatted file with program versions

Parameters

Prokka Parameters

ParameterTypeDefaultDescription
--prokka_proteinsstring${projectDir}/data/proteins.faaFASTA file of trusted proteins to first annotate from
--prokka_prodigal_tfstringTraining file to use for Prodigal
--prokka_coverageinteger80Minimum coverage on query protein

Used By

Subworkflows

  • prokka - Annotate bacterial genomes with functional information.

Workflows

  • bactopia - Comprehensive bacterial analysis pipeline for complete genomic characterization.
  • pangenome - Pangenome analysis with optional core-genome phylogeny.
  • prokka - Rapid whole genome annotation of bacterial, archaeal, and viral genomes.
  • staphopia - Comprehensive analysis pipeline for Staphylococcus aureus isolates.

Citations

If you use this in your analysis, please cite the following.

Source

View source on GitHub

Version

PROKKA:
- prokka: 1.15.6