Skip to main content

bactopia_gather

Tags: fastq validation sra ena download merging simulation art ncbi sample-scope

Search, validate, gather, or simulate input samples.

This process is the entry point for data ingestion. It handles:

  • Validation: Verifies FASTQ formatting and gzip integrity.
  • Merging: Combines multiple runs (lanes) into a single sample.
  • Downloading: Fetches reads (SRA/ENA) or assemblies (NCBI) from accessions.
  • Simulation: Generates synthetic reads from assemblies using ART to enable read-based analysis.

Uses explicit named slots for input and output reads:

  • Input accepts Set<Path> for each slot (pre-merge, supports multiple files)
  • Output emits Path? for each slot (post-merge, single consolidated file or null)

Inputs

record (
meta: Record,
r1_files: Set<Path?>,
r2_files: Set<Path?>,
se_files: Set<Path?>,
lr_files: Set<Path?>,
fna_files: Set<Path?>
)
FieldTypeDescription
metaRecordGroovy Record containing sample information
r1_filesSet<Path?>Illumina R1 read files (Set, elements may be null)
r2_filesSet<Path?>Illumina R2 read files (Set, elements may be null)
se_filesSet<Path?>Single-end read files (Set, elements may be null)
lr_filesSet<Path?>Long read files (ONT) or assembly for simulation (Set, elements may be null)
fna_filesSet<Path?>Input or downloaded assembly file (Set, elements may be null)

Outputs

record (
meta: Record,
r1: Path?,
r2: Path?,
se: Path?,
lr: Path?,
fna: Path?,
tsv: Path,
results: Set<Path>,
logs: Set<Path?>,
nf_logs: Set<Path>,
versions: Set<Path?>
)
FieldTypeDescription
metaRecordSample information record
r1Path?Merged Illumina R1 read file
r2Path?Merged Illumina R2 read file
sePath?Merged single-end read file
lrPath?Merged long read file (ONT)
fnaPath?Assembly file
tsvPathA tab-delimited metadata file describing the valid samples
resultsSet<Path>All output files to be published
logsSet<Path?>Optional program specific log files
nf_logsSet<Path>Nextflow-specific log files (e.g. .command.{begin
versionsSet<Path?>A YAML formatted file with program versions

Parameters

Used By

Subworkflows

  • bactopia_gather - Search, validate, gather, and standardize input samples.

Workflows

  • bactopia - Comprehensive bacterial analysis pipeline for complete genomic characterization.
  • cleanyerreads - Quality control and optional host read removal from raw sequencing reads.
  • staphopia - Comprehensive analysis pipeline for Staphylococcus aureus isolates.
  • teton - Taxonomic classification and abundance profiling of metagenomic reads.

Citations

If you use this in your analysis, please cite the following.

Source

View source on GitHub

Version

BACTOPIA_GATHER:
- bactopia-gather: 1.0.5