Skip to main content

ncbigenomedownload

Tags: ncbi download genome assembly fasta genbank utility run-scope

Download assemblies and annotation files from NCBI's Assembly database.

Uses ncbi-genome-download to efficiently fetch one or more complete genome assemblies and their associated annotation and report files from the NCBI FTP site based on accession numbers, species name, or assembly ID.

Inputs

accessions: Path?
NameTypeDescription
accessionsPath?A path to a text file containing a list of NCBI Assembly accession numbers (one per line)

Outputs

record (
meta: Record,
gbff: Set<Path?>,
fna: Set<Path?>,
rm: Set<Path?>,
features: Set<Path?>,
gff: Set<Path?>,
faa: Set<Path?>,
gpff: Set<Path?>,
wgs_gbk: Set<Path?>,
cds: Set<Path?>,
rna: Set<Path?>,
rna_fna: Set<Path?>,
report: Set<Path?>,
stats: Set<Path?>,
accessions: Set<Path?>,
results: Set<Path>,
logs: Set<Path?>,
nf_logs: Set<Path>,
versions: Set<Path>
)
FieldTypeDescription
metaRecordSample information record
gbffSet<Path?>GenBank format of the genomic sequence(s) (*_genomic.gbff.gz)
fnaSet<Path?>FASTA format of the genomic nucleotide sequence(s) (*_genomic.fna.gz)
rmSet<Path?>RepeatMasker output for eukaryotes
featuresSet<Path?>Tab-delimited text file reporting locations and attributes for a subset of features
gffSet<Path?>Annotation of the genomic sequence(s) in GFF3 format (*_genomic.gff.gz)
faaSet<Path?>FASTA format of the accessioned protein products (*_protein.faa.gz)
gpffSet<Path?>GenPept format of the accessioned protein products
wgs_gbkSet<Path?>GenBank flat file format of the WGS master
cdsSet<Path?>FASTA format of the nucleotide sequences corresponding to all CDS features
rnaSet<Path?>FASTA format of accessioned RNA products
rna_fnaSet<Path?>FASTA format of the nucleotide sequences corresponding to all RNA features
reportSet<Path?>Tab-delimited text file reporting assembly unit names, roles, and relationships
statsSet<Path?>Tab-delimited text file reporting assembly statistics
accessionsSet<Path?>The generated accession list files
resultsSet<Path>All output files to be published
logsSet<Path?>Optional program specific log files
nf_logsSet<Path>Nextflow-specific log files (e.g. .command.{begin
versionsSet<Path>A YAML formatted file with program versions

Parameters

NCBI Genome Download Parameters

ParameterTypeDefaultDescription
--speciesstringName of the species to download assemblies
--accessionstringAn NCBI Assembly accession to be downloaded
--accessionsstringAn file of NCBI Assembly accessions (one per line) to be downloaded
--formatstringfastaComma separated list of formats to download
--limitstringLimit the number of assemblies to download
--keep_downloadsbooleanfalseSave downloaded files into the bactopia-runs folder

Used By

Subworkflows

Workflows

  • fastani - Fast alignment-free computation of whole-genome Average Nucleotide Identity.
  • mashtree - Rapid phylogenetic tree construction using Mash distances.
  • pangenome - Pangenome analysis with optional core-genome phylogeny.
  • snippy - Rapid haplotype variant calling and core genome alignment.

Citations

If you use this in your analysis, please cite the following.

Source

View source on GitHub

Version

NCBIGENOMEDOWNLOAD:
- ncbi-genome-download: 0.3.3