Hints for improving data loading performance

Java options

Loading data can be memory intensive so there are some Java options that should be tuned to improve performance. See a note about setting ANT_OPTS.

Storing Items in order

When loading objects into the production ObjectStore the order of loading can have a big impact on performance. It is important to store objects before any other objects that reference them. For example, if we have a Gene with a Publication in its evidence collection and a Synonym referencing the Gene, the objects should be stored in the order: Publication, Gene, Synonym. (If e.g. the Gene is stored after the Synonym a placeholder object is stored in the Gene's place which is later replaced by the real Gene. This takes time).

Objects are loaded in the order that Items are stored by converter code or the order they appear in an Items XML file. When Items are stored into the common-tgt-items database (during the build or using ant -Dsource=sourcename -Daction=retrieve) you can check if there are improvements possible with this SQL query:

   select classnamea, name, classnameb, count(*)
   from (select distinct itema.classname AS classnamea, name,
       itemb.classname AS classnameb, itemb.identifier
   FROM item AS itemA, reference, item AS itemB
   where itema.id = itemid and refid = itemb.identifier
        and itema.id < itemb.id) as a
   group by classnamea, name, classnameb;

If there are no results then no improvement can be made. The example below shows that there were 27836 Gene Items stored before the Synonyms that reference them. subject is the name of the reference in Synonym.

                  classnamea                  |  name   |                classnameb                 | count 
----------------------------------------------+---------+-------------------------------------------+-------
 http://www.flymine.org/model/genomic#Synonym | subject | http://www.flymine.org/model/genomic#Gene | 27836


Still to add: recommended hardware, postgres config

See Also: RunningABuild