prokka
Tags: prokka annotation prokaryotic bacteria genbank gff sample-scope
Annotate prokaryotic genomes.
Uses Prokka to rapidly annotate bacterial, archaeal, and viral genomes, producing standards-compliant output files including GFF3, GenBank, and Sequin.
Inputs
record (
meta: Record,
fna: Path
)
| Field | Type | Description |
|---|---|---|
meta | Record | Groovy Record containing sample information |
fna | Path | Assembled contigs in FASTA format |
proteins: Path?
prodigal_tf: Path?
| Name | Type | Description |
|---|---|---|
proteins | Path? | FASTA file of trusted proteins to first annotate from |
prodigal_tf | Path? | Training file to use for gene prediction |
Outputs
record (
meta: Record,
gff: Path,
gbff: Path,
fna: Path,
faa: Path,
ffn: Path,
sqn: Path,
fsa: Path,
tbl: Path,
txt: Path,
tsv: Path,
blastdb: Path,
results: Set<Path>,
logs: Set<Path?>,
nf_logs: Set<Path>,
versions: Set<Path>
)
| Field | Type | Description |
|---|---|---|
meta | Record | Sample information record |
gff | Path | Annotation in GFF3 format, containing both sequences and annotations |
gbff | Path | Annotation in GenBank format, containing both sequences and annotations |
fna | Path | Nucleotide FASTA file of the input contig sequences |
faa | Path | Protein FASTA file of the translated CDS sequences |
ffn | Path | Nucleotide FASTA file of all prediction transcripts (CDS, rRNA, tRNA, tmRNA, misc_RNA) |
sqn | Path | An ASN1 format "Sequin" file for submission to GenBank |
fsa | Path | Nucleotide FASTA file of the input contig sequences, used by tbl2asn |
tbl | Path | Feature Table file for NCBI submission |
txt | Path | Summary statistics relating to the annotated features found |
tsv | Path | Tab-separated file of all features (locus_tag, ftype, len_bp, gene, EC_number, COG, product) |
blastdb | Path | A compressed tar.gz archive of BLAST+ databases of the contigs, genes, and proteins |
results | Set<Path> | All output files to be published |
logs | Set<Path?> | Optional program specific log files |
nf_logs | Set<Path> | Nextflow-specific log files (e.g. .command.{begin |
versions | Set<Path> | A YAML formatted file with program versions |
Parameters
Prokka Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
--prokka_proteins | string | ${projectDir}/data/proteins.faa | FASTA file of trusted proteins to first annotate from |
--prokka_prodigal_tf | string | Training file to use for Prodigal | |
--prokka_coverage | integer | 80 | Minimum coverage on query protein |
Used By
Subworkflows
- prokka - Annotate bacterial genomes with functional information.
Workflows
- bactopia - Comprehensive bacterial analysis pipeline for complete genomic characterization.
- pangenome - Pangenome analysis with optional core-genome phylogeny.
- prokka - Rapid whole genome annotation of bacterial, archaeal, and viral genomes.
- staphopia - Comprehensive analysis pipeline for Staphylococcus aureus isolates.
Citations
If you use this in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
Prokka
Seemann T Prokka: rapid prokaryotic genome annotation Bioinformatics 30, 2068-2069 (2014) -
Aragorn
Laslett D, Canback B ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res. 32(1):11-6 (2004) -
Barrnap
Seemann T Barrnap: Bacterial ribosomal RNA predictor (GitHub) -
CD-HIT
Li W, Godzik A Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658-1659 (2006) -
HMMER
Eddy SR Accelerated Profile HMM Searches. PLoS Comput. Biol. 7, e1002195 (2011) -
Infernal
Nawrocki EP, Eddy SR Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29(22), 2933-2935 (2013) -
MinCED
Skennerton C MinCED: Mining CRISPRs in Environmental Datasets (GitHub) -
nhmmer
Wheeler TJ, Eddy SR nhmmer: DNA homology search with profile HMMs. Bioinformatics 29, 2487-2489 (2013) -
Prodigal
Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11.1 119 (2010) -
RNAmmer
Lagesen K, Hallin P, Rødland EA, Stærfeldt H-H, Rognes T, Ussery DW RNAmmer: consistent annotation of rRNA genes in genomic sequences. Nucleic Acids Res 35.9: 3100-3108 (2007) -
SignalP
Petersen TN, Brunak S, von Heijne G, Nielsen H SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature methods 8.10: 785 (2011)
Source
Version
PROKKA:
- prokka: 1.15.6