Enhancements to Open-Source Software
Sustaining open source software is a difficult challenge that often demands substantial time and effort, usually without the benefit of recognition or support. The field of bioinformatics is no exception, as it heavily depends on tools maintained by individuals with little to no support. Bactopia is no different.
Recognizing these challenges, I designed Bactopia with an explicit goal of giving back to the community. To fulfill this aim, I incorporated several key design requirements:
- Tools must open source and free to use.
- Tools must be available from conda
- Bactopia Tools must be available on nf-core/modules
- 11 stand-alone tools, each available from Bioconda
- 30 new Conda recipes, 46 updated recipes, and 2,000+ Bioconda pull requests reviewed.
- 68 contributions to nf-core/modules
- 26 contributions to other tools
These contributions are to the wider community, and do not require you to use Bactopia to take advantage of them.
Stand-Alone Tools
Sometimes, tools are developed to enhance Bactopia capabilities, such as Dragonflye, which was developed to add Nanopore support. These tools are designed to function as stand-alone tools. Below are 11 such tools, originally built for Bactopia, that you can also use independent of Bactopia.
| Tool | Description |
|---|---|
| assembly-scan | Generate basic stats for an assembly |
| dragonflye | Assemble bacterial isolate genomes from Nanopore reads |
| fastq-dl | Download FASTQ files from SRA or ENA repositories. |
| fastq-scan | Output FASTQ summary statistics in JSON format |
| GOBLIN | Generate trusted prOteins to supplement BacteriaL annotatIoN |
| pasty | A tool for in silico serogrouping of Pseudomonas aeruginosa isolates |
| pbptyper | In silico Penicillin Binding Protein typer for Streptococcus pneumoniae |
| pmga | A fork of PMGA for all Neisseria species and Haemophilus influenzae |
| shovill-se | A fork of Shovill that includes support for single end reads |
| staphopia-sccmec | A standalone version of Staphopia’s SCCmec typing method |
| vcf-annotator | Add biological annotations to variants in a given VCF file |
Bioconda Contributions
Bactopia requires tools be installable with Conda to simplify the installation process for
users. This requirement led to an unintended, but welcomed, deeper involvement with
the Bioconda community. Bioconda is more than conda install, it is a valuable resource
that makes bioinformatics tools more accessible to the community. Every time a tool is added
to Bioconda, a Docker container is created by Biocontainers,
as well as a Singularity image is created by the Galaxy Project.
In essence, a single recipe contributes significantly to the broader community.
Bactopia has led to 30 new recipes, 46 updated recipes, and more than 2,000 pull requests have been reviewed.
New Recipes
Bactopia has led to the addition of 30 new recipes to Bioconda and conda-forge. These new recipes allow users to rapidly begin using these tools for their own analyses, and include:
| Tool | Description | Pull Request |
|---|---|---|
| Aspera Connect | high-performance transfer client | anaconda/rpetit3 |
| assembly-scan | Generate basic stats for an assembly | bioconda/bioconda-recipes#11425 |
| bactopia | A flexible pipeline for complete analysis of bacterial genomes | bioconda/bioconda-recipes#17434 |
| Dragonflye | Assemble bacterial isolate genomes from Nanopore reads | bioconda/bioconda-recipes#29696 |
| ena-dl | Download FASTQ files from ENA | bioconda/bioconda-recipes#17354 |
| EToKi | all methods related to Enterobase | bioconda/bioconda-recipes#37069 |
| executor | programmer friendly Python subprocess wrapper | conda-forge/staged-recipes#9457 |
| fastq-dl | Download FASTQ files from SRA or ENA repositories. | bioconda/bioconda-recipes#18252 |
| fastq-scan | Output FASTQ summary statistics in JSON format | bioconda/bioconda-recipes#11415 |
| GenoTyphi | assign genotypes to Salmonella Typhi genomes | bioconda/bioconda-recipes#25674 |
| GOBLIN | Generate trusted prOteins to supplement BacteriaL annotatIoN | bioconda/bioconda-recipes#38922 |
| illumina-cleanup | A simple pipeline for pre-processing Illumina FASTQ files | bioconda/bioconda-recipes#11481 |
| ISMapper | insertion sequence mapping software | bioconda/bioconda-recipes#14180 |
| mashpit | Sketch-based surveillance platform | bioconda/bioconda-recipes#35199 |
| NextPolish | Fast and accurately polish the genome generated by long reads | bioconda/bioconda-recipes#36582 |
| ParallelTask | A simple and lightweight parallel task engine | conda-forge/staged-recipes#19616 |
| ParallelTask | A simple and lightweight parallel task engine | conda-forge/staged-recipes#19616 |
| pasty | A tool for in silico serogrouping of Pseudomonas aeruginosa isolates | bioconda/bioconda-recipes#35930 |
| pbptyper | In silico Penicillin Binding Protein typer for Streptococcus pneumoniae | bioconda/bioconda-recipes#36222 |
| pHierCC | Hierarchical clustering of cgMLST | bioconda/bioconda-recipes#37070 |
| pmga | Command-line version of PMGA (PubMLST Genome Annotator) | bioconda/bioconda-recipes/#32801 |
| property-manager | useful property variants for Python programming | conda-forge/staged-recipes#9442 |
| RFPlasmid | predicting plasmid contigs from assemblies | bioconda/bioconda-recipes#25849 |
| SerotypeFinder | Identifies the serotype in total or partial sequenced isolates of E. coli | bioconda/bioconda-recipes#29718 |
| shovill-se | A fork of Shovill that includes support for single end reads | bioconda/bioconda-recipes#26040 |
| spaTyper | computational method for finding spa types | bioconda/bioconda-recipes#26044 |
| sra-human-scrubber | Identify and remove human reads from FASTQ files | bioconda/bioconda-recipes#29926 |
| staphopia-sccmec | A standalone version of Staphopia's SCCmec typing method | bioconda/bioconda-recipes#28214 |
| tbl2asn-forever | use tbl2asn forever by pretending that it's still 2019 | bioconda/bioconda-recipes#20073 |
| vcf-annotator | Add biological annotations to variants in a given VCF file | bioconda/bioconda-recipes#13417 |
Sometimes overlooked, its important to reiterate, every recipe added to Bioconda has a Docker container created by Biocontainers, and a Singularity container created by the Galaxy Project. These containers allow for version controlled reproducible analyses.
Enhancements and Fixes
A common issue with Bioconda recipes, is the tool works great in a Conda environment when containerized it fails for various reasons. When these issues occur with a tool used by Bactopia an effort is made to improve or fix the Bioconda recipe. Below is a list fixes and improvements to some Bioconda recipes:
nf-core/modules Contributions
When Bactopia transitioned to Nextflow DSL2, it opened the door to adopting modules from nf-core/modules. These modules enable users to seamlessly integrate them in their own Nextflow DSL2 pipelines. To support this integration, I decided to require each Bactopia Tool must have a corresponding module be available from nf-core/modules. If such a module is not already available, it will be added.
By adopting this practice, there have been 68 contributions to nf-core/modules in the form of new modules, module updates, and testing adjustments.
Other Contributions
In addition to Bioconda and nf-core/modules, Bactopia has made 26 contributions to other tools including:
| Tool | Description | Pull Request |
|---|---|---|
| MOB-suite | fix hostrange() missing 1 required positional argument: 'database_directory' | phac-nml/mob-suite#149 |
| bioconda-utils | chore: update change visibility action | bioconda/bioconda-utils#873 |
| Prokka | Convert Travis CI to Github Actions | tseemann/prokka#662 |
| bioconda-utils | chore: add CI to changevisibility of private containers | bioconda/bioconda-utils#835 |
| bioconda-containers | Patch - small fix on merge command and quay toggle visibility | bioconda/bioconda-containers#54 |
| Shigatyper | Incorporate patches from Bioconda | CFSAN-Biostatistics/shigatyper#14 |
| EToKi | let tempfile determine where to put temp files | lskatz/EToKi#2 |
| EToKi | Allow multiple path parameters on the configure step | lskatz/EToKi#1 |
| Seroba | let tempfile determine temp dir location | sanger-pathogens/seroba#68 |
| pymummer | allow the user to specify temp dir or use the system default | sanger-pathogens/pymummer#36 |
| ShigaTyper | Fix install process | CFSAN-Biostatistics/shigatyper#10 |
| legsta | use grep -q to play nice with bioconda docker build | tseemann/legsta#17 |
| ShigaTyper | Add single-end and ONT support, add GitHub Actions, update readme | CFSAN-Biostatistics/shigatyper#9 |
| Ariba | Ignore comments column and drop Bio.Alphabet | sanger-pathogens/ariba#319 |
| BioContainers | Add ClonalFrameML and maskrc-svg multipackage | BioContainers/multi-package-containers#1923" |
| Kleborate | Add --kaptive_path to specify path to kaptive data | katholt/Kleborate#59 |
| Ariba | fix SPAdes version capture | sanger-pathogens/ariba#315 |
| AgrVATE | Fix for dots in sample names | VishnuRaghuram94/AgrVATE#9 |
| PIRATE | Add minimum feature length option | SionBayliss/PIRATE#53 |
| Ariba | Fix for changes in PubMLST url | sanger-pathogens/ariba#305 |
| Ariba | Solution 1: for fixing CARD download | sanger-pathogens/ariba#302 |
| bowtie2 | Rename VERSION to BOWTIE2_VERSION | BenLangmead/bowtie2#302 |
| phyloFlash | Improved single end support | HRGV/phyloFlash#102 |
| ISMapper | set min_range and max_range args to be a float | jhawkey/IS_mapper#38 |
| maskrc-svg | Add requirements.txt for python modules | kwongj/maskrc-svg#2 |
| Shovill | Added shovill-se for processing single-end reads | tseemann/shovill#105 |