pangenome
Tags: alignment core-genome pan-genome phylogeny comparative-genomics run-scope
Perform pangenome analysis with optional core-genome phylogeny.
This subworkflow creates a pangenome from GFF3 annotation files using one of three tools: Panaroo (default), PIRATE, or Roary. It generates core-genome alignments and gene presence/absence matrices, followed by SNP distance calculations using snp-dists. The workflow conditionally executes the selected pangenome tool based on Boolean parameters.
Take
gff: Channel<Record>
| Field | Description |
|---|---|
meta | Groovy Record containing sample information |
gff | Set of GFF3 annotation files from assembled genomes |
use_pirate: Boolean
use_roary: Boolean
| Name | Type | Description |
|---|---|---|
use_pirate | Boolean | Boolean flag to use PIRATE for pangenome analysis |
use_roary | Boolean | Boolean flag to use Roary for pangenome analysis |
Emit
Published
The sample_outputs and run_outputs emissions are aggregates of output files that will be published in the entry workflow.
sample_outputs
No sample-scope outputs.
run_outputs
| Output | Description |
|---|---|
aln | Core-genome alignment in FASTA format |
csv | Gene presence/absence matrix |
supplemental | Intermediate files and detailed outputs |
tsv | Pairwise SNP distance matrix from core-genome alignment |
Downstream Inputs
The following emissions are meant to be used as inputs to downstream subworkflows.
alignment
| Output | Description |
|---|---|
aln | Core-genome alignment for downstream analysis (e.g., recombination detection) |
phylogeny_input
| Output | Description |
|---|---|
aln | Core-genome alignment with iqtree-ready meta for phylogeny construction |
csv
| Output | Description |
|---|---|
csv | Gene presence/absence matrix for downstream analysis (e.g., pan-GWAS) |
Subworkflow Composition
This subworkflow calls the following subworkflows:
- pirate - Build a pangenome from GFF3 annotations using PIRATE.
- roary - Build a pangenome from GFF3 annotations using Roary.
- panaroo - Build a pangenome from GFF3 annotations using Panaroo.
- snpdists - Calculate pairwise SNP distances from sequence alignments.
Used By
This subworkflow is used by the following workflows:
- pangenome - Pangenome analysis with optional core-genome phylogeny.
Citations
If you use this in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
PIRATE
Bayliss SC, Thorpe HA, Coyle NM, Sheppard SK, Feil EJ PIRATE: A fast and scalable pangenomics toolbox for clustering diverged orthologues in bacteria. Gigascience 8 (2019) -
Panaroo
Tonkin-Hill G, MacAlasdair N, Ruis C, Weimann A, Horesh G, Lees JA, Gladstone RA, Lo S, Beaudoin C, Floto RA, Frost SDW, Corander J, Bentley SD, Parkhill J Producing polished prokaryotic pangenomes with the Panaroo pipeline. Genome Biology 21(1), 180. (2020) -
Roary
Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S, Holden MTG, Fookes M, Falush D, Keane JA, Parkhill J Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31, 3691-3693 (2015) -
snp-dists
Seemann T snp-dists - Pairwise SNP distance matrix from a FASTA sequence alignment. (GitHub)