ncbigenomedownload
Tags: download ncbi refseq genome assembly database sample-scope
Download bacterial genomes from NCBI's RefSeq database.
This subworkflow downloads complete and draft bacterial genomes using the ncbi-genome-download tool. It fetches genome assemblies in various formats including GenBank, GFF, and FASTA files along with associated annotation files and statistics.
Take
accessions: Path?
| Name | Type | Description |
|---|---|---|
accessions | Path? | A file containing NCBI accession numbers, one per line. If empty, will download all genomes matching the specified criteria. |
Emit
Published
The sample_outputs and run_outputs emissions are aggregates of output files that will be published in the entry workflow.
sample_outputs
| Output | Description |
|---|---|
gbff | GenBank format genome sequences |
fna | Genomic nucleotide sequences in FASTA format |
gff | Genome annotations in GFF3 format |
faa | Protein sequences in FASTA format |
gpff | Protein sequences in GenPept format |
wgs_gbk | WGS master records in GenBank format |
cds | CDS nucleotide sequences in FASTA format |
rna | RNA product sequences in FASTA format |
rna_fna | RNA feature nucleotide sequences in FASTA format |
features | Feature table with locations and attributes |
rm | RepeatMasker output (optional) |
report | Assembly report with unit and sequence relationships |
stats | Assembly statistics |
accessions | Generated accession list files |
run_outputs
No run-scope outputs.
Downstream Inputs
The following emissions are meant to be used as inputs to downstream subworkflows.
bactopia_tools
Downloaded files formatted for Bactopia Tools workflows
assemblies
| Output | Description |
|---|---|
fna | Individual downloaded assembly in FASTA format |
reference
First downloaded assembly file for use as a reference genome
Module Composition
This subworkflow calls the following modules:
- ncbigenomedownload - Download assemblies and annotation files from NCBI's Assembly database.
Used By
This subworkflow is used by the following workflows:
- fastani - Fast alignment-free computation of whole-genome Average Nucleotide Identity.
- mashtree - Rapid phylogenetic tree construction using Mash distances.
- pangenome - Pangenome analysis with optional core-genome phylogeny.
- snippy - Rapid haplotype variant calling and core genome alignment.
Citations
If you use this in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
ncbi-genome-download
Blin K ncbi-genome-download: Scripts to download genomes from the NCBI FTP servers (GitHub)