View on GitHub

IVA

Iterative Virus Assembler

Download this project as a .zip file Download this project as a tar.gz file

Overview

IVA is a de novo assembler designed to assemble virus genomes that have no repeat sequences, using Illumina read pairs sequenced from mixed populations at extremely high and variable depth.

For more information, please read the IVA publication (open access).

Installation instructions are below. For usage help and examples, see the IVA wiki page.


Installation

IVA was developed for and intended to be run on Linux. It has also been run successfully on OS X. If you have a Windows machine (or even Linux or a Mac) then you can run IVA using a virtual machine with VirtualBox. IVA is installed on the Sanger pathogens virtual machine.

General installation instructions for Linux/OS X are below. You may also find it useful to read detailed instructions for Ubuntu.

Required dependencies

The following are required to install IVA and use it to run an assembly.

The recommended versions are: kmc version 2.1.1, MUMmer version 3.23, samtools version 0.1.19 or greater, and SMALT version 0.7.6.

Install IVA

Once you have installed the dependencies, install IVA with

pip3 install iva

The installation can be tested by running an assembly using test data included with IVA.

Optional dependencies

The following dependencies only apply to the QC scripts of IVA and are not needed to run an assembly. Only install them if you want to run the scripts iva_qc or iva_qc_make_db.

The QC code is also bundled with the following (they do not need to be installed).


References

Adapter sequences: Quail, M. a et al. Optimal enzymes for amplifying sequencing libraries. Nat. Methods 9, 10-1 (2012).

GAGE: Salzberg, S. L. et al. GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res. 22, 557-67 (2012).

KMC: Deorowicz, S., Debudaj-Grabysz, A. & Grabowski, S. Disk-based k-mer counting on a PC. BMC Bioinformatics 14, 160 (2013).

Kraken: Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).

MUMmer: Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).

R: R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

RATT: Otto, T. D., Dillon, G. P., Degrave, W. S. & Berriman, M. RATT: Rapid Annotation Transfer Tool. Nucleic Acids Res. 39, e57 (2011).

SAMtools: Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-9 (2009).

Trimmomatic: Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bioinformatics 1-7 (2014).