Last modified 16 months ago Last modified on 14/10/10 10:08:05

Coming in InterMine 0.94 - new search

One of the new features in the upcoming InterMine 0.94 release is a new keyword search. This provides a really fast search across all text fields in the database. Results are faceted just like an Amazon product search - currently by type (the class of object) and organism. Clicking on a facet restricts results to just that category. Boolean operations, wildcards and phrases are all supported.

An example search in FlyMine

The search is based on  Lucene and uses  Bobo for the faceting. This is the same technology used by  LinkedIn to power their profile search.

Indexing

Indexing the database runs as a post-process step which creates the index in a directory. The index is then zipped and stored in the database, when you deploy a webapp pointing at the database it will extract the index again. For FlyMine indexing takes less than an hour, including a large proportion of the database.

By default the index will include the text fields of all objects in the database. Each object in the database becomes a document in the index with text attributes attached. You can configure classes to ignore, such as locations and scores that don't provide text information. You can also add related information to an object, for example you can configure that the synonyms, pathways and GO terms should be included in the Gene's entry.

More Details

The faceted search system was implemented by Nils Kölling, a summer intern with InterMine. See the  talk he gave for more technical details.

The new search system is one new feature in the upcoming InterMine 0.94 release

Attachments