Simple Objects in the InterMine System

The InterMine system contains a very complex feature set, which allows great flexibility for data loading, where data can originate from multiple data sources. However, this complexity comes at a cost to performance. Simple objects are an extension of the InterMine system that allows simpler data to be loaded without the performance bottlenecks. There are two types of simplified data discussed in this wiki page, with different levels of performance gain and functionality loss.

Switching off the DataTracker

In order to allow data conflicts to be managed, the system needs to keep track of where each piece of data came from. This is because conflicting values will be resolved by a priority system where one data source is regarded as more reliable than another for a particular field value. However, storing this data takes significant time while running the DataLoader, and can now be switched off on a per-class basis for the whole DataLoading run. This is useful if you know that there will never be any data conflicts for a particular class. The configuration is found in the properties file for the project, and a configuration line for "datatrackerMissingClasses" is added to the IntegrationWriter entry, like this:

integration.production.class=org.intermine.dataloader.IntegrationWriterDataTrackingImpl
integration.production.osw=osw.production
integration.production.datatrackerMaxSize=100000
integration.production.datatrackerCommitSize=10000
integration.production.datatrackerMissingClasses=OneAttribute

The parameter is a comma-separated list of class names for which no tracking data should be stored. All objects which are instances of these classes will be omitted, including subclasses.

Non-InterMineObjects

For the ultimate in performance gain, objects can be stored in the database which are not instances of InterMineObject. Such objects are stored in "flat mode" in an SQL table. Because they do not have an ID, they cannot be referenced by other objects, fetched by ID, or deleted by ID, and they cannot have a collection, or be in a many-to-many collection. They are not stored in the main InterMineObject table, and are not stored in the DataTracker, and are never merged with other objects by the DataLoader. No class hierarchy may exist in these classes, and no dynamic objects may make use of these classes. The objects take much less space in the database than instances of InterMineObject. The objects can however contain attributes and references to other objects, and can be in one-to-many collections of other objects. The full Query interface will work correctly with these simple objects. Simple objects are configured in the Model by declaring the superclass of a class to be "java.lang.Object" in the model description, like this:

    <class name="org.intermine.model.testmodel.SimpleObject" is-interface="false" extends="java.lang.Object">
        <attribute name="name" type="java.lang.String"/>
        <reference name="employee" referenced-type="org.intermine.model.testmodel.Employee" reverse-reference="simpleObjects"/>
    </class>

We recommend you set "is-interface" to "false" for these objects. There is no need to specify these classes in the "dataTrackerMissingClasses" property as above, because these classes are never tracked.