Skip to main content

ncbigenomedownload

Tags: download ncbi refseq genome assembly database sample-scope

Download bacterial genomes from NCBI's RefSeq database.

This subworkflow downloads complete and draft bacterial genomes using the ncbi-genome-download tool. It fetches genome assemblies in various formats including GenBank, GFF, and FASTA files along with associated annotation files and statistics.

Take

accessions: Path?
NameTypeDescription
accessionsPath?A file containing NCBI accession numbers, one per line. If empty, will download all genomes matching the specified criteria.

Emit

Published

The sample_outputs and run_outputs emissions are aggregates of output files that will be published in the entry workflow.

sample_outputs

OutputDescription
gbffGenBank format genome sequences
fnaGenomic nucleotide sequences in FASTA format
gffGenome annotations in GFF3 format
faaProtein sequences in FASTA format
gpffProtein sequences in GenPept format
wgs_gbkWGS master records in GenBank format
cdsCDS nucleotide sequences in FASTA format
rnaRNA product sequences in FASTA format
rna_fnaRNA feature nucleotide sequences in FASTA format
featuresFeature table with locations and attributes
rmRepeatMasker output (optional)
reportAssembly report with unit and sequence relationships
statsAssembly statistics
accessionsGenerated accession list files

run_outputs

No run-scope outputs.

Downstream Inputs

The following emissions are meant to be used as inputs to downstream subworkflows.

bactopia_tools

Downloaded files formatted for Bactopia Tools workflows

assemblies

OutputDescription
fnaIndividual downloaded assembly in FASTA format

reference

First downloaded assembly file for use as a reference genome

Module Composition

This subworkflow calls the following modules:

  • ncbigenomedownload - Download assemblies and annotation files from NCBI's Assembly database.

Used By

This subworkflow is used by the following workflows:

  • fastani - Fast alignment-free computation of whole-genome Average Nucleotide Identity.
  • mashtree - Rapid phylogenetic tree construction using Mash distances.
  • pangenome - Pangenome analysis with optional core-genome phylogeny.
  • snippy - Rapid haplotype variant calling and core genome alignment.

Citations

If you use this in your analysis, please cite the following.

Source

View source on GitHub