ncbigenomedownload
Tags: ncbi download genome assembly fasta genbank utility run-scope
Download assemblies and annotation files from NCBI's Assembly database.
Uses ncbi-genome-download to efficiently fetch one or more complete genome assemblies and their associated annotation and report files from the NCBI FTP site based on accession numbers, species name, or assembly ID.
Inputs
accessions: Path?
| Name | Type | Description |
|---|---|---|
accessions | Path? | A path to a text file containing a list of NCBI Assembly accession numbers (one per line) |
Outputs
record (
meta: Record,
gbff: Set<Path?>,
fna: Set<Path?>,
rm: Set<Path?>,
features: Set<Path?>,
gff: Set<Path?>,
faa: Set<Path?>,
gpff: Set<Path?>,
wgs_gbk: Set<Path?>,
cds: Set<Path?>,
rna: Set<Path?>,
rna_fna: Set<Path?>,
report: Set<Path?>,
stats: Set<Path?>,
accessions: Set<Path?>,
results: Set<Path>,
logs: Set<Path?>,
nf_logs: Set<Path>,
versions: Set<Path>
)
| Field | Type | Description |
|---|---|---|
meta | Record | Sample information record |
gbff | Set<Path?> | GenBank format of the genomic sequence(s) (*_genomic.gbff.gz) |
fna | Set<Path?> | FASTA format of the genomic nucleotide sequence(s) (*_genomic.fna.gz) |
rm | Set<Path?> | RepeatMasker output for eukaryotes |
features | Set<Path?> | Tab-delimited text file reporting locations and attributes for a subset of features |
gff | Set<Path?> | Annotation of the genomic sequence(s) in GFF3 format (*_genomic.gff.gz) |
faa | Set<Path?> | FASTA format of the accessioned protein products (*_protein.faa.gz) |
gpff | Set<Path?> | GenPept format of the accessioned protein products |
wgs_gbk | Set<Path?> | GenBank flat file format of the WGS master |
cds | Set<Path?> | FASTA format of the nucleotide sequences corresponding to all CDS features |
rna | Set<Path?> | FASTA format of accessioned RNA products |
rna_fna | Set<Path?> | FASTA format of the nucleotide sequences corresponding to all RNA features |
report | Set<Path?> | Tab-delimited text file reporting assembly unit names, roles, and relationships |
stats | Set<Path?> | Tab-delimited text file reporting assembly statistics |
accessions | Set<Path?> | The generated accession list files |
results | Set<Path> | All output files to be published |
logs | Set<Path?> | Optional program specific log files |
nf_logs | Set<Path> | Nextflow-specific log files (e.g. .command.{begin |
versions | Set<Path> | A YAML formatted file with program versions |
Parameters
NCBI Genome Download Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
--species | string | Name of the species to download assemblies | |
--accession | string | An NCBI Assembly accession to be downloaded | |
--accessions | string | An file of NCBI Assembly accessions (one per line) to be downloaded | |
--format | string | fasta | Comma separated list of formats to download |
--limit | string | Limit the number of assemblies to download | |
--keep_downloads | boolean | false | Save downloaded files into the bactopia-runs folder |
Used By
Subworkflows
- ncbigenomedownload - Download bacterial genomes from NCBI's RefSeq database.
Workflows
- fastani - Fast alignment-free computation of whole-genome Average Nucleotide Identity.
- mashtree - Rapid phylogenetic tree construction using Mash distances.
- pangenome - Pangenome analysis with optional core-genome phylogeny.
- snippy - Rapid haplotype variant calling and core genome alignment.
Citations
If you use this in your analysis, please cite the following.
-
Bactopia
Petit III RA, Read TD Bactopia - a flexible pipeline for complete analysis of bacterial genomes. mSystems 5 (2020) -
ncbi-genome-download
Blin K ncbi-genome-download: Scripts to download genomes from the NCBI FTP servers (GitHub)
Source
Version
NCBIGENOMEDOWNLOAD:
- ncbi-genome-download: 0.3.3