This page describes some of the development background and admin of using and setting up an Artemis and ACT connection with a Chado database.

Opening the Database Manager

The Artemis Database Manager is cached between sessions in the directory '.artemis/cache' in the users home directory. There is an option under the File menu to clear this cache.

To open the Artemis Database Manager panel (from which the browser is launched), Artemis looks initially for the existence of the cvterm.name = 'top_level_seq' which belongs to cv.name = 'genedb_misc'. If these exist it follows method A:

  1. -call 'getTopLevelOrganisms' (in Organism.xml mapping). This relies on the the source features (e.g. chromosome) having a featureprop with a type_id corresponding to 'top_level_seq'.

    If the 'top_level_seq' is not implemented in the database it then follows method B:

  2. -call 'getOrganismsContainingSrcFeatures' (in Organism.xml mapping). This searches for those organisms that contain sequences with residues and have a type_id that corresponds to a cvterm name that matches:

    *chromosome*, *sequence*, supercontig, ultra_scaffold, golden_path_region, or contig

When the organisms with the source feature have been identified these are displayed. When a user clicks on an organism it opens that node and finds the types (e.g. chromosome, contig) of source features and the underlying features that have residues (getResidueFeatures in Feature.xml).

Opening the main Artemis/ACT window

The organismprop's are loaded lazily when a sequence is opened. If an organismprop is of type 'translationTable' the value of the organismprop is then used as the translation table when Artemis opens a sequence from that organism.

When a sequence is double clicked to open it in Artemis, most things for that sequence are read from the database. The iBatis statement calls made when reading an entry are summarised below.

Statement IDSQL Mapping FileDescription
getFeature (Feature.xml) Retrieves all the features and their featureloc's, featureprop's, feature_relationship's and primary dbxref
getFeatureDbXRefsBySrcFeature (FeatureDbXRef.xml) Retrieves all secondary dbxref's
getFeatureSynonymsBySrcFeature (FeatureSynonym.xml) Retrieves feature synonyms
getFeatureCvTermsBySrcFeature (FeatureCvTerm.xml) Retrieves feature_cvterm's, feature_cvtermprop (evidence code, extra qualifiers, date).
getFeatureCvTermDbXRefBySrcFeature (FeatureCvTermDbXRef.xml) Retrieves feature_cvterm_dbxref (WITH/FROM column).
getFeatureCvTermPubBySrcFeature (FeatureCvTermPub.xml) Retrieves feature_cvterm_pub's.

Artemis constructs an internal GFF3 stream from these calls for the selected sequence. This is then read in the same way as a GFF3 file as an Artemis DatabaseDocumentEntry (which extends GFFDocumentEntry) and creating GFFStreamFeatures.

If the lazy load option is selected from the Database Manager's File menu, then only getFeature is called. The resulting GFFStreamFeature object is marked as lazy loading and FeatureDbXRefs, FeatureSynonyms, FeatureCvTerms, FeatureCvTermDbXRefs and FeatureCvTermPubs are read from the database for a feature when the Gene Builder is opened.

The feature_relationship (from getFeature) is used to create the gene hierarchy; 'part_of'a and 'derives_from' relationships become Parent and Derives_from in GFF3 terms. If the feature_relationship type_id does not correspond to one of these terms (derives_from, part_of, proper_part_of, partof, producedby) then the object_id is recorded as a qualifer value. This is used to read orthologous_to and paralogous_to relations. The qualifier values for these are lazily stored (as ClusterLazyQualifierValue.java). When Artemis displays these qualifiers in the Gene Builder it then queries the database further to list the related genes.

Other properties that have a featureloc association with a feature are found by calling getLazySimilarityMatches (Feature.xml). Artemis then constructs lazy loading qualifiers (QualifierLazyLoading.java) from this that query the database further only when that qualifier is needed. This is used for blast/fasta similarity and polypeptide_domains.

The gene hierarchy is stored internally by the ChadoCanonicalGene.java object and is based on the Parent/Derives_from relationships. It stores the related children of the gene. The spliced features (exon, pseudogenic_exon) are combined into a single Artemis Feature. The joined exons become an Artemis CDS feature (GFFStreamFeature), which stores the uniquenames of the original exons in the database.

Artemis Chado Configuration

This is an example extract from the Artemis options file for the chado related options:

#
# CHADO DATABASE OPTIONS
#
# chado gene model features default types
chado_exon_model=CDS
#chado_transcript=transcript

# infer CDS and UTR features from gene model
chado_infer_CDS_UTR=no

# provide a list of available servers
chado_servers = \
  workshop localhost:10101/workshop?user \
  GeneDB db.genedb.org:5432/snapshot?genedb_ro

# define how product qualifiers are stored (as a cv or as a featureprop)
product_cv=yes
product_cvname = genedb_products
# cv containing synonym names
synonym_cvname = genedb_synonym_type

# set default delete behaviour to make things obsolete, if
# this is not provided the default is to permanently delete
set_obsolete_on_delete=yes

# list of features to record residues for in the database
# - these are included when inserting or updating their featurelocs
sequence_update_features = polypeptide mRNA rRNA tRNA snRNA snoRNA

Artemis combines the exons stored in chado and describes it as a 'CDS' feature by default. The chado_exon_model flag in the options file allows this to be changed.

When a gene model is created in Artemis it creates the transcript as a 'mRNA' feature by default. The chado_transcript flag in the options file allows this to be changed.

For Artemis the default gene model representation is described in the overview. In this representation the UTRs are explicitly created in the database. However the gmod loader (gmod_bulk_load_gff3.pl) does not create the UTRs and they can be inferred from the exon and protein features. If the gmod loader is used then Artemis can infer the CDS and UTR features by setting chado_infer_CDS_UTR=yes in the options file. Adjusting the polypeptide boundaries in the Gene Builder will result in the generation or deletion of UTRs.

A list of available databases can be configured in the options file with the chado_servers flag. For each database an alias is given followed by its location (host:port/database?user), each alias is displayed in a drop down menu in the login box.

If product qualifiers are stored as an ontology (in cvterm) then set product_cv=yes and set product_cvname is set to the name of the controlled vocabulary (cv) used in chado.

When features are deleted in Artemis the default behaviour can be set to make these features obsolete rather than permanently delete them from the database.