InterMine Query Language

This document is a specification for the text-based version of the InterMine Query, named IQL (InterMine Query Language). The language has similar syntax to SQL and OQL, with a few subtle differences. The language maps directly onto the InterMine Java Query structure. It may be useful to refer to some IQLExamples.

General syntax

An IQL statement follows the following form:

 SELECT <''list of QuerySelectable objects''>
    FROM <''list of FromElements''>
    [ WHERE <''a constraint''> ]
    [ GROUP BY <''list of QueryNode objects''> ]
    [ ORDER BY <''list of QueryNode objects''> ]

An IQL statement is very similar to an SQL statement, but simpler. There are no UNION, LIMIT, or OFFSET clauses. The HAVING clause is also absent, as its members are merged into the WHERE clause.

The FROM clause

The FROM clause contains a list of FromElement objects. Each element of this list must be associated with an alias, by appending " AS <alias>" to the element, with one exception. There must be no more than one instance of each alias in the FROM clause. Each element in the FROM clause provides a set of data in the form of a table with some number of rows and some number of columns. The different types of FromElement are:

  • QueryClass objects. The textual representation of a QueryClass is merely the name of the Java class, without the package name, which should be supplied along with the query. For instance, if one wishes to create a QueryClass for a Class of org.intermine.model.testmodel.Manager, and one had specified a package name of "org.intermine.model.testmodel", one would write "Manager" as the text for the required QueryClass. In addition, one can specify a dynamic class by putting a comma-separated list of class names inside parentheses - for example "(Manager, Broke)". A QueryClass need not have an alias if it is not representing a dynamic class - the name of the class stands in for the alias. A QueryClass provides one row for each object of that Class in the database, and one column per field in the class plus one column containing the objects themselves.
  • Subqueries. The textual representation of a subquery is (like SQL) the entire text of the subquery as an IQL query, enclosed inside parentheses. The subquery provides the number of rows and columns that the query's results contain.
  • QueryClassBag objects. This allows a collection to stand in for a QueryClass, which can improve performance when executing some queries, as the class' table does not appear in the SQL. Unlike the QueryClass, this FromElement does not provide a column for each field in the class. Instead, only the "id" attribute and all the collections are provided. The collection can take two forms:
    • A Java Collection - The Collection is not included in the IQL text - rather, it is passed in as a query parameter when converting the IQL into the Java Query. Parameters are represented in IQL as question marks, like in a prepared statement in SQL. The syntax is "?::<class name>". Like QueryClass, a dynamic class is specified by putting a comma separated list of class names in parentheses.
    • An ObjectStoreBag - The syntax is "BAG(<bag ID>)::<class name>". Again, the class name can be replaced with a list of class names separated by commas in parentheses.

QueryNode objects

A QueryNode is a Java interface for all objects that may be present in the SELECT, ORDER BY, or GROUP BY clauses. The different types of QueryNode are:

  • QueryClass - This represents actual objects from the database. The textual representation is the alias that was specified in the FROM clause.
  • QueryEvaluable - This is an sub-interface of QueryNode for elements that represent primitive values (String, Number, Boolean, or Date).
    • QueryField - This is a QueryEvaluable that represents a primitive field in an object or subquery. There are several textual representations, depending on where the field is from:
      • Field in a QueryClass - The textual representation is the alias of the QueryClass (in the FROM clause), followed by a dot, followed by the name of the field - for example "SELECT c.name as field FROM Company AS c".
      • Field from inside a subquery - The textual representation is the alias of the subquery, followed by a dot, followed by the name of the field, as aliased in the subquery - for example "SELECT subquery.thing as field FROM (SELECT Company.name as thing FROM Company) as subquery".
      • Field from a QueryClass, in a subquery - The textual representation is the alias of the subquery, followed by a dot, followed by the name of the QueryClass, as aliased in the subquery, followed by another dot, followed by the name of the field - for example "SELECT subquery.Company.name as field FROM (SELECT Company from Company) as subquery".
    • QueryValue - This represents a constant value. The textual representation depends on the type of the constant to be represented:
      • Number - The textual representation is as a normal number with no embellishment. The exact type of the number is deduced from how it is used in the Query. If you really need to specify the exact type of number, then make use of the QueryCast as described below. Valid number types are Short, Integer, Long, Float, Double, and BigDecimal.
      • String - The textual representation is with single quotes enclosing the required String.
      • Boolean - The textual representation is "true" or "false", without the double quotes.
      • Date - The textual representation is "'yyyy-MM-dd HH:mm:ss.SSS'", including the single quotes enclosing it. Note: This definition overlaps with that of a String. A Date is distinguished from a String by how it is used in the Query.

If, while converting an IQL query to a Java Query, the type of a QueryValue cannot be distinguished, then an UnknownTypeValue is used, which merely holds the textual representation of the value until the exact type can be determined.

  • QueryFunction - This represents an aggregate function. Most of these functions require a single argument, which should be a QueryField or a QueryExpression. The presence of an aggregate function in a query alters the number of rows in the results. If there is no GROUP BY clause, then the entire set of rows of the query results are rolled together into one row, otherwise the result rows of the query are rolled into multiple groups of rows, dictated by the contents of the GROUP BY clause. The textual representation of these functions is:
    • COUNT(*) - this evaluates to the number of rows in the group it represents, and takes no argument.
    • MIN(<QueryEvaluable>) - this evaluates to the minimum value of the QueryEvaluable mentioned in the group of rows it represents.
    • MAX(<QueryEvaluable>) - this evaluates to the maximum value of the QueryEvaluable mentioned in the group of rows it represents.
    • SUM(<QueryEvaluable>) - this evaluates to the sum of all the values of the QueryEvaluable mentioned in the group of rows it represents.
    • AVG(<QueryEvaluable>) - this evaluates to the average of all the values of the QueryEvaluable mentioned in the group of rows it represents.
    • STDDEV(<QueryEvaluable>) - This evaluates to the standard deviation of all the values of the QueryEvaluable mentioned in the group of rows it represents.
  • QueryExpression - This represents an arithmetic expression, comprising other QueryEvaluables and an operation. The various types are:
    • <QueryEvaluable> {+, -, *, /} <QueryEvaluable> - Normal arithmetic expressions.
    • INDEXOF(<QueryEvaluable>, <QueryEvaluable>) - the result of the expression is the index of the second argument (a String) in the first argument (also a String), where the character positions are numbered from one, or a result of zero if the second argument does not occur in the first argument.
    • SUBSTR(<QueryEvaluable>, <QueryEvaluable>, <QueryEvaluable>) - The result of the expression is the substring of the first argument, from the character numbered from 1 from the left determined by the second argument, and as many characters to the right as determined by the third argument or the end of the first argument being reached.
    • LOWER(<QueryEvaluable>) - Converts the argument text to lowercase.
    • UPPER(<QueryEvaluable>) - Converts the argument text to uppercase.
  • QueryCast - This represents a typecast from one type of value to another. One cannot cast to a primitive type (int, long, etc.), but to Java Classes (Integer, Long, etc). Some typecasts may not make sense, so an error may occur when the query is run. Due to the storage format of Date objects, typecasts involving them may not act as expected. The textual representation is <QueryEvaluable>::<class name (Short, Integer, Long, Float, Double, BigDecimal, Boolean, Date, String)>
  • QueryForeignKey - This represents the ID of an object in a reference from a QueryClass, without requiring the QueryClass for that object to be put on the FROM list too. The syntax is <QueryClass>.<reference name>.id

These QueryNodes can be used in the SELECT, GROUP BY, and ORDER BY clauses, as well as inside different constraints.

The SELECT clause

The results of an IQL query are provided in the form of a table, like in SQL. The SELECT clause lists the columns in that table. All elements in the SELECT clause must have an alias, which is specified by putting " AS <alias>" after each element. The exception is QueryClasses, which already have aliases. The elements are separated by commas. Elements that can appear on the SELECT clause (QuerySelectables) are:

  • All QueryNode objects.
  • ObjectStoreBag - This represents a collection of integers (which can represent objects, by ID), stored in the database. Putting an ObjectStoreBag? on the SELECT list means that you want to fetch the contents of the bag, and nothing else should be added to the SELECT list or the FROM list. The syntax is SELECT BAG(<bag ID>);
  • ObjectStoreBagCombination - This represents a combination of several ObjectStoreBag? objects. Putting this on the SELECT list means that you want to fetch the contents of that combination, and nothing else should be added to the SELECT list or the FROM list. The syntax is SELECT BAG(<bag ID>) {UNION, INTERSECT, EXCEPT} BAG(<bag ID>) ...
  • ObjectStoreBagsForObject - This represents a search for bags that contain a certain integer (possibly representing an object). The set of bags that will be returned can be constrained to be within a certain set of bags which must be provided. The syntax in IQL is SELECT BAG FOR <integer> [ IN BAGS ? ]. The question mark refers to a Collection holding ObjectStoreBag? objects, which is placed in the parameter list of the query.

The ORDER BY clause

The results of the query are sorted by the columns listed in the ORDER BY clause. The elements are separated by commas, and may be any QueryOrderable, which are:

  • All QueryNode objects.
  • QueryObjectReference objects. These represent object references, and are specified in exactly the same way as a QueryField. The rows will be ordered by the IDs of the referenced objects.

The GROUP BY clause

The presence of a GROUP BY clause in the query changes the entire query. All the rows are sorted by the elements in the GROUP BY clause, then grouped into sets of rows that have the equivalent values for those elements. Aggregate functions are then run on each group of rows, resulting in one row per group being placed into the final results. If aggregate functions are used without a GROUP BY clause, then the entire set of results is viewed as one group. The SELECT and ORDER BY clauses become constricted - they can only be constructed out of values that refer to an entire group of rows, which includes all QueryValues and QueryFunctions, plus anything present in the GROUP BY clause. QueryCasts and QueryExpressions can also be used, but only if constructed out of those elements. QueryFunctions may not be constructed from anything that can be placed in the SELECT or ORDER BY clauses.

Constraints and the WHERE clause

The WHERE clause contains a single constraint of some kind, where that constraint may contain other constraints. These are the different kinds of constraint:

  • SimpleConstraint - This type of constraint performs the comparison between different QueryEvaluables. It has two forms - a comparison between a QueryEvaluable and NULL, and a comparison between two QueryEvaluables.
    • <QueryEvaluable> {=, !=, <, <=, >, >=, LIKE, NOT LIKE} <QueryEvaluable> - Returns true if the condition is satisfied. Note that "=" and "!=" are available to all QueryEvaluable types, but "<", "<=", ">", ">=" are only available to numeric QueryEvaluable types, and "LIKE" and "NOT LIKE" are only available to String QueryEvaluable types. The two QueryEvaluables must have matching types.
    • <QueryEvaluable> IS [NOT] NULL - Returns true if the condition is satisfied.
  • ClassConstraint - This is a comparison involving a QueryClass, and there are two forms depending on what it is being compared to:
    • <QueryClass> {=, !=} <QueryClass> - Compare to another QueryClass.
    • <QueryClass> {=, !=} ? - Compare to a database object passed in as a parameter.
  • ContainsConstraint - This is a comparison involving a QueryReference, which can be constructed several ways:
    • <QueryClass>.<reference name> - produces a QueryObjectReference.
    • <QueryClass>.<collection name> - produces a QueryCollectionReference.
    • <QueryClassBag>.<collection name> - produces a QueryCollectionReference referring to the collections in the objects in a QueryClassBag.
    • ?.<collection name> - produces a QueryCollectionReference referring to the collection in the object passed in as a parameter, like having a QueryClassBag with only one object.
    There are three forms, for comparing the reference with different things. There are two types of reference which can be present in a Class - an object reference (which references a single object), and a collection reference (which is represented in the object as a Collection containing multiple objects). Nevertheless, both types of reference are treated exactly the same in IQL - the only noticeable difference between them being that collections are often named with a plural-like name (for instance "departments"). The three forms are:
    • <QueryReference> {CONTAINS, DOES NOT CONTAIN} <QueryClass> - This compares the reference to a QueryClass.
    • <QueryReference> {CONTAINS, DOES NOT CONTAIN} ? - This compares the reference to a database object passed in as a parameter.
    • <QueryObjectReference> IS [NOT] NULL - This asserts that the reference is or is not null, and cannot be applied to a collection.
    So, even though you may wish to write "Department.company = Company", the correct syntax is "Department.company CONTAINS Company".
  • SubqueryConstraint - This is a comparison between the results of a subquery, and a QueryClass or a QueryEvaluable. The constraint will return true if the value of the QueryClass or QueryEvaluable is present (or not present) in one of the rows of the results of the subquery. The textual representation is the QueryClass or QueryEvaluable, followed by "IN" or "NOT IN", followed by an IQL query statement with only one element in the SELECT list, enclosed in parentheses. For example "Company.name IN (SELECT Company.name AS name FROM Company)". Fields in the scope of the main query are currently not available in the scope of the subquery in IQL, although this is not the case in the Java query. This may change in the near future.
  • NotConstraint - This isn't actually a InterMine Query Java object - but it is a textual representation of negating the result of any Constraint. The textual representation is "NOT", followed by the Constraint to reverse. If In Doubt, Use Brackets. For example "NOT Company.name = Department.name" (which is equivalent to "Company.name != Department.name").
  • ConstraintSet - This represents a set of multiple constraints that have been combined with the AND or OR operators. The textual representation is a string of any number of Constraints, separated by "AND" or "OR" - again If In Doubt, Use Brackets. For example "Company.name = Department.name AND (Company.vatNumber = 3 OR Company.name = 'flobble') AND Department.name != 'wotsit'".
  • SubqueryExistsConstraint - This constraint is currently not very useful, as fields in the scope of the main query are not available in the scope of the subquery.
  • BagConstraint - This is a comparison between a QueryNode and a collection of objects. Only objects from the collection that are of a compatible type to the type of the QueryNode are used. Two types of collection are possible:
    • A Java Collection - The textual representation is "<QueryNode> [NOT] IN ?" and the Collection is passed in separately as a parameter.
    • An ObjectStoreBag? - The textual representation is "<QueryNode> [NOT] IN BAG(<bag id>)".