InterMine Object/Relational Mapping Tool

The InterMine Object/Relational Mapping Tool is contained in the Java package org.intermine.objectstore. The main interface it provides is the ObjectStore which represents a connection to an object-oriented database. It provides methods to access the data in the database, using a powerful query system. Write access is provided in a separate interface - the ObjectStoreWriter.

The ObjectStore class is a common interface to many different implementations of the database system, sharing a consistent behaviour and allowing implementations to be layered in a scalable architecture.

How to obtain an ObjectStore or ObjectStoreWriter

The way to obtain an ObjectStore is to use the ObjectStoreFactory. This class will return an ObjectStore named by an alias, which has to be configured using properties. At least one property is required, which specifies the class name of the implementation of ObjectStore required. This will usually be org.intermine.objectstore.intermine.ObjectStoreInterMineImpl. For example, if the alias of the ObjectStore is "os.main":

os.main.class=org.intermine.objectstore.intermine.ObjectStoreInterMineImpl

Similarly, an ObjectStoreWriter can be obtained from the ObjectStoreWriterFactory. Two properties are required - class (as for ObjectStoreFactory), and os which specifies an alias of an objectstore to attach to. For example:

osw.main.class=org.intermine.objectstore.intermine.ObjectStoreWriterInterMineImpl
osw.main.os=os.main

The individual implementations will usually require extra properties to configure their behaviour.

Using an ObjectStore

The ObjectStore interface provides several different methods to access data, which can be categorised into two main groups.

Query methods

The InterMine query system is a powerful SQL-like object-oriented query, which can be accessed in two ways. A programmatic interface is available in the org.intermine.objectstore.query package, which can be translated to and from a text-based query language using the org.intermine.objectstore.query.iql package. The text-based language is IQL, and is very similar to SQL in syntax. Use of the programmatic interface is described in QueryPackage.

  • ObjectStore.execute(Query q) - returns a List containing the results of running the query. The List is a special InterMine implementation (Results) that loads the results lazily from the database as they are accessed.
  • ObjectStore.estimate(Query q) - returns an object describing how long the query might take to run, along with a guess at how many rows will be returned.
  • ObjectStore.count(Query q) - returns an exact count of the number of rows in the results of a query. This will run faster than counting the number of entries in the List returned by ObjectStore.execute() manually. However, calling size() on that List uses this method.

Object ID methods

Every object in the database has an Integer ID, which is used internally in many ways. It provides a method of fetching a particular object without running an actual query. Because of the simplicity of the key used to fetch the objects, a cache can be used to speed up operations.

  • ObjectStore.getObjectById(Integer id) - returns an object from the database with that id.
  • ObjectStore.getObjectById(Integer id, Class clazz) - returns an object from the database with that id, looking in the table associated with the given Class. The previous method looks in the InterMineObject table (the superclass of all objects in the database), so this method is not really necessary unless there are unusual circumstances.
  • ObjectStore.getObjectsByIds(Collection ids) - returns a List of objects from the database being those objects that have the given IDs.

Using an ObjectStoreWriter

The ObjectStoreWriter interface is a sub-interface of ObjectStore, adding write access methods. An instance of an ObjectStoreWriter will be associated with a certain ObjectStore. The methods of the ObjectStoreWriter access the same database as the ObjectStore, although the writer may be in a different transaction context and therefore see different data. If the ObjectStore permits multiple parallel accesses, the associated ObjectStoreWriter probably will not, in order to maintain transaction integrity with the underlying database.

  • ObjectStoreWriter.beginTransaction() - begins a transaction on the database. All data being written will be invisible to the ObjectStore and other associated ObjectStoreWriters until the transaction is committed.
  • ObjectStoreWriter.commitTransaction() - commits the data in the transaction so that it is visible to the ObjectStore. The transaction is over.
  • ObjectStoreWriter.abortTransaction() - throws away the transaction and all data written in it.
  • ObjectStoreWriter.isInTransaction() - returns true if the ObjectStoreWriter is currently in a transaction.
  • ObjectStoreWriter.store(InterMineObject o) - stores an object in the database. If the object does not have an ID, then one will be auto-generated for it. If an object with that ID already exists in the database then it will be overwritten, but the new object MUST be an instance of every class that the old object is an instance of, or the behaviour will be undefined.
  • ObjectStoreWriter.delete(InterMineObject o) - deletes the object from the database. A similar restriction to that in the store method exists.
  • ObjectStoreWriter.addToCollection(Integer hasId, Class clazz, String fieldName, Integer hadId) - if clazz.fieldName describes a many-to-many collection, then this method will place the object with an ID of hadId into the collection of the object with ID hasId. This method does not require either of those two objects to be loaded into memory. This method exists to provide a performance boost with some operations.

What data can be stored in the database?

Generally, the only requirement for an object to be stored in the database is that it must implement the InterMineObject interface (plus instances of defined Simple Objects), and it must be a pure bean. All the bean accessible data in the object will be stored in the database, and the object will be recreated with that data when it is retrieved from the database. However, the database can be put into different modes which reduce this freedom, generally in order to increase performance.

Having said this, it is usually only useful to store data that matches the registered model for the database, which defines a class hierarchy with fields that can be queried. The model is defined in a Model Description file. If you store data with fields not in the database model, then you cannot retrieve it by running a query on those fields - you can only retrieve the data by searching on a field in the model, which includes fetching the object by ID (as ID is a field in InterMineObject).

ObjectStore implementations

At the last count, there are eight or so different implementations of the ObjectStore interface, which serve different purposes. Only one is an actual concrete database implementation - the rest are implementations that require the use of another ObjectStore to provide the database functionality, but add extra features, or performance characteristics.

ObjectStoreInterMineImpl

API Documentation

This is our main implementation of the ObjectStore interface. It is performance-optimised, and uses batching, prefetching, query optimisation?, logging, and allows multi-threaded parallel access and operation cancellation. It also allows a query to be sped up? by allocating extra resources to it. It stores data in a PostgreSQL database using a schema derived from the model class heirarchy. For each class in the model, there will be a table in the SQL database, with a column for each of the fields in that class and all the superclasses. This allows the data to be searched by class and any field in the class. In addition, an extra column in the table contains a serialised copy of the entire object being stored. Each object in the database is stored as a single row in every table corresponding to a class that the object is an instance of, which makes writing to the database sometimes slow, but greatly simplifies querying. There are a few settings that alter this overall layout, which can be set in the parameters:

  • Truncated tables - If a table is truncated, then all of the subclass tables are merged into this one table. The table will contain all the fields of all the subclasses that are in the model (but not fields of subclasses which are not mentioned in the model). If an object is written to the database which would be written to more than one of the original tables, then that appears in this case as more than one row in the truncated table. An additional column in the truncated table indicates which original table each row would have appeared in. This setting has the effect of reducing the number of tables in the SQL database, but it does not reduce the capabilities of the database or the quantity of data stored in the SQL database.
  • Missing tables - A table can be declared to be missing. If that is the case, then the table will not exist in the SQL database. No data will be written to it, and queries cannot be run that mention the class that the table represents. The ObjectStore will complain if it is asked to write data to the database that will not be written to any tables. This is useful to reduce the size of the SQL database, and speed up writes if you know that you will never query for objects of a particular class. As such, the most common table to be missing is the InterMineObject table.
  • Missing NotXML - The extra column in each table that contains a serialised version of the object is referred to as the NotXML column. This setting alters the schema so that the extra column is only added to the InterMineObject table. This means that queries have to be run in a different manner. When running a query that returns objects, normally this runs an SQL query that returns the NotXML column, which is then deserialised to produce the objects. When this setting is switched on, the query must instead return the object IDs, then a second query is run against the InterMineObject table to convert the IDs into NotXML to deserialise. Therefore running most queries may be slightly slower, however the size of the database will be smaller and writes will be faster. This setting has become the default in many of our own databases, because it more gracefully handles the case where many copies of a very large object are returned by a query.
  • Flat mode - This is a combination of the InterMineObject table being declared missing and the "missing NotXML" setting being enabled. The database does not store a serialised version of the objects at all - the objects are reconstructed from the values of the fields instead. This reduces the flexibility of the database significantly, but it is much faster, and is suitable for many object models. We use this mode for our Items databases.

The properties that configure this ObjectStore are:

  • class=org.intermine.objectstore.intermineObjectStoreInterMineImpl - tells the ObjectStoreFactory to select this implementation.
  • db=database alias - points to an SQL database?, which needs to be configured separately.
  • model=model name - the name of a model, available as an xml description file in the classpath.
  • truncatedClasses=class name list - a comma-separated list (with no spaces) of fully-qualified class names which should be truncated.
  • missingTables=table name list - a comma-separated list (with no spaces) of table names which should be missing. These are usually the class name without the preceding package name.
  • noNotXml=true or false (default true) - whether NotXML should be missing.
  • minBagTableSize=integer - bags larger than this size will be written to a temporary table in the SQL database to speed up queries.
  • logfile=file name - the name of a file to write query logs to.
  • logTable=table name - the name of a table in the SQL database to write query logs to.
  • logEverything=true or false (default false) - whether to log every query, or just the ones which are explained.
  • verboseQueryLog=true or false (default false) - switches on an extremely verbose log.

ObjectStoreClient

API Documentation

This is a webservice client that allows a Java program to access a remote ObjectStore through an ObjectStoreServer as if it was a local database. It uses SOAP to communicate with the server.

ObjectStorePassThruImpl

API Documentation

This implementation of the ObjectStore interface simply passes through every method call to an underlying ObjectStore. The purpose of this implementation is as a superclass to more useful ObjectStore implementations, which actually provide new features. These are:

  • ObjectStoreSafeImpl - This ObjectStore clones the query every time it passes a query through to the underlying ObjectStore. This makes it safe to execute a query, and then alter the query in memory.
  • ObjectStoreFastCollectionsImpl - This ObjectStore explicitly fetches the contents of the collections of all the objects returned by a query, in order to avoid the "n + 1 reads problem". This gives much better performance when iterating through the objects and reading their collections. See also FetchingRelatedObjects.
  • ObjectStoreFastCollectionsForTranslatorImpl - A very specific implementation designed to improve performance for our data loading process.
  • ObjectStoreItemPathFollowingImpl - A very specific implementation designed to improve performance for our data translation process.

ObjectStoreTranslatingImpl

API Documentation

This implementation provides an ObjectStore containing data in one model by translating the data from that in another ObjectStore in a different model. It is used in our data loading process.