Fetching related objects in the Results object - solving the N + 1 reads problem
Introduction
This document describes a particular performance problem inherent in object-oriented database systems, and how it is tackled in the InterMine system
The problem is that following references from objects will result in a request to the database to fetch the data. Each of these database requests is small, and returns a tiny amount of data for a large amount of effort spent. If many of these operations are performed, the efficiency of the database system is greatly reduced.
Why is this needed?
Consider the situation where a database has two object classes - A and B. Objects of type A each have a reference to an object of type B. The user writes a program that generates a query for all objects of type A, and iterates through the entire Results object, calling toString on each object. Unfortunately, the toString method of class A uses the reference to the object of type B. Therefore, the program performs lots of single object fetches of objects of type B. In fact, it performs N + 1 queries in total, where N is the number of objects of type A - hence it is called the "N + 1 reads problem".
First coping strategy - caches
The first coping strategy that the InterMine system uses is a cache in the ObjectStore that indexes by the ID of the object. When a proxy reference is accessed, the ObjectStore first looks in the cache. This way, if the user can arrange for the B objects to be in the cache already, the problem no longer occurs. The user could arrange this by including the B objects in the query, but they should be careful that this does not alter the results - InterMine does not have a true LEFT OUTER JOIN yet, which would be required for this to be possible for every query.
ObjectStoreFastCollectionsImpl
The second mechanism that the InterMine system provides is a specific ObjectStore that behaves differently to the main ObjectStore. This is a subclass of ObjectStorePassThruImpl, and it explicitly materialises every reference and collection present in the objects returned by the underlying ObjectStore. This is a fairly crude method, but it works well for the InterMine data translation pipeline. It could be retrofitted with a mechanism for specifying which collections and references are to be materialised.
A similar function is provided by the ObjectStoreFastCollectionsForTranslatorImpl and the ObjectStoreItemPathFollowingImpl, although they have more knowledge of the actual data requirements of the user than ObjectStoreFastCollectionsImpl.
Automatic N + 1 reads problem correction
There are currently no plans to implement an automatic system to detect and solve generic N + 1 reads problems, although it would not be too difficult.
