Overview
IVA is a de novo assembler designed to assemble virus genomes that have no repeat sequences, using Illumina read pairs sequenced from mixed populations at extremely high and variable depth.
For more information, please read the IVA publication (open access).
Installation instructions are below. For usage help and examples, see the IVA wiki page.
Installation
IVA was developed for and intended to be run on Linux. It has also been run successfully on OS X. If you have a Windows machine (or even Linux or a Mac) then you can run IVA using a virtual machine with VirtualBox.
General installation instructions for Linux/OS X are below. You may also find it useful to read detailed instructions for Ubuntu.
Required dependencies
The following are required to install IVA and use it to run an assembly.
- Python 3 version 3.3 or higher (IVA is written in Python 3)
-
KMC installed, so that
kmc
andkmc_dump
are in your path. -
MUMmer installed with its executables (ie
nucmer
etc) in your path. -
Samtools installed, so that
samtools
is in your path. -
SMALT installed, so that
smalt
is in your path. - Optional: Trimmomatic - although this is optional, it is highly recommended. It is used to trim adapter sequences from reads before assembling and significantly improves the results. You don't need to add anything to your path, but will need to tell IVA where the Java jar file is to use Trimmomatic (see examples).
The recommended versions are: kmc version 2.1.1, MUMmer version 3.23, samtools version 0.1.19 or greater, and SMALT version 0.7.6.
Install IVA
Once you have installed the dependencies, install IVA with
pip3 install iva
The installation can be tested by running an assembly using test data included with IVA.
Docker
IVA can be run in a Docker container. First install Docker, then install IVA:
docker pull sangerpathogens/iva
To use it you would use a command such as this (substituting in your directories), where your files are assumed to be stored in /home/ubuntu/data:
docker run --rm -it -v /home/ubuntu/data:/data sangerpathogens/iva iva -f /data/reads_fwd.fastq -r /data/reads_rev.fastq /data/Output_directory
The image includes all required and optional dependencies for IVA. To run IVA with trimmomatic, you also need to specify the path to the trimmomatic jar file, which is available at /Trimmomatic-0.38/trimmomatic-0.38.jar:
docker run --rm -it -v /home/ubuntu/data:/data sangerpathogens/iva iva --trimmomatic /Trimmomatic-0.38/trimmomatic-0.38.jar -f /data/reads_fwd.fastq -r /data/reads_rev.fastq /data/Output_directory
Optional dependencies
The following dependencies only apply to the QC scripts of IVA and are not needed to run an assembly. Only install them if you want to run the scripts iva_qc or iva_qc_make_db.
- R installed and in your path.
- BioPerl
- Optional: kraken installed, so that
kraken
andkraken-build
are in your path. These are needed if you want to make your own reference database, or if you use a database to automatically choose the reference genome. - Optional:NCBI-blast+. If this is installed then the QC script will make input files for ACT, to compare the assembly against a reference.
The QC code is also bundled with the following (they do not need to be installed).
- Analysis code from the GAGE assembly evaluation project. We are grateful to the GAGE authors for permission to modify and redistribute this code.
- RATT is used to transfer annotation from a reference onto the assembly.
References
Adapter sequences: Quail, M. a et al. Optimal enzymes for amplifying sequencing libraries. Nat. Methods 9, 10-1 (2012).
GAGE: Salzberg, S. L. et al. GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res. 22, 557-67 (2012).
KMC: Deorowicz, S., Debudaj-Grabysz, A. & Grabowski, S. Disk-based k-mer counting on a PC. BMC Bioinformatics 14, 160 (2013).
Kraken: Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).
MUMmer: Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
R: R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
RATT: Otto, T. D., Dillon, G. P., Degrave, W. S. & Berriman, M. RATT: Rapid Annotation Transfer Tool. Nucleic Acids Res. 39, e57 (2011).
SAMtools: Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-9 (2009).
Trimmomatic: Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bioinformatics 1-7 (2014).