Strong relevancy is critical to a well-adopted search solution. Mindbreeze provides several ways to fine tune relevancy which they refer to as boosting. This post will explore how boosting works and four ways to apply boosting rules within Mindbreeze.
About Boosting & Rank Scores
Before we start altering relevancy, it’s important to examine how boosting works within Mindbreeze. Mindbreeze provides a baseline algorithm with factors such as term frequency, term proximity, freshness, etc. (while there are ways to alter these core signals, we’ll save that topic for another time). Boostings address many common relevancy-adjustment use cases and are the easiest way to alter rankings. Boosting are applied to the baseline rankings by configured amount. Although the term “boost” generally implies an increase, boosting can used to increase or decrease rank scores relative to the baseline.
Mindbreeze boostings are factor-based (i.e. multiplicative). For example, a boost factor of 2.0 would make something twice as relevant as the baseline, while a boost factor of 0.5 would make it half as relevant. For this reason, it’s helpful to monitor rankings (called rank scores) before and after boosting, in order to determine an appropriate boosting factor. Mindbreeze provides two options for viewing ranks scores as described below.
Viewing Rank using the Export
The easiest way to see the rank of multiple search results is to use the export feature. From the default Mindbreeze search client, perform your search and select Export. In the Export results window, select the plus sign (+) and add “mes:hitinfo:rank” to the visible columns. You can simply start typing “rank” in the input box and this column name will appear.
Viewing Rank within JSON Results
If using the Export feature is not available, or you need to test rankings that are specific to a custom search application, you can also view the rank within the Mindbreeze JSON response for a search request. Follow these instructions to do so:
- Open the developer-tools dock in your browser (F12).
- Navigate to the Network tab.
- Perform your search.
- Expand the search request.
- View the response body and drill down into the search request response data to find the desired rank. For example, to see the rank of the first result, you would select result set > results > 0 > rank_score.
For more information on the data contained in the search response, see the Mindbreeze documentation on api.v2.search.
Term2document Boosting
Term2DocumentBoost is a Mindbreeze query transformation plugin which allows you to apply relevance tuning to specific search queries (or all queries) based on defined rules. It is the primary way to adjust relevancy within Mindbreeze. The plugin gets configured either for a specific index or globally. If you configure an index-specific boosting file, those rules will be applied instead of (not in addition to) the global rules. If you’d like to apply both set of the rules, the global rules should be copied into the index-specific boosting file as each index can only reference one file at a time.
Term2Document boosting rules will always be applied to searches against the indices for which they are configured, so their best used for static boosting as opposed to the more dynamic boosting options described in later sections. Term2Document boosting can be used to increase the relevance of certain documents for specific search queries. For example, a search for “news” can be tuned so that documents with the “post-type” metadata value “newsarticle” will have higher relevance. Term2Document boosting can also be used to generally increase the relevance of certain documents based on any combination of metadata-value pairs. For example, all documents from a “Featured Results” data source, or all documents from the “products” section of a website. Rules can use regular expressions to accommodate more complex patterns. In the second to last example below, we show how pages one-level off the root of a website can be boosted using a regular expression.
The Term2Document Boost file uses five columns to apply boosting rules.
- Term: This is the search term you want to trigger the rule. Leave this blank to trigger the rule for all searches.
- Key: The name of the metadata field on which content to be boosted will be identified. If you want to boost documents matching a pattern in the full text of the document contents, the “Key” column should contain the word “content”.
- Pattern: A term or pattern that determines the metadata value on which content to be boosted will be identified. This column supports regular expressions. Please note, any fields you wish to boost in this way should be added to the Aggregated Metadata Keys configuration for the respective indices in order to enable regex matching.
- Boost: The boost factor. Values less than one (e.g. 0.5) should be preceded by a zero (i.e. 0.5 not .5).
- Query: Optional column for advanced configuration. Instead of specifying a Term, Key, and Pattern, you can use this column to create more flexible boosting rules via the Mindbreeze query language. This is helpful when you want to change the boosting rule for each user’s query. For example, if someone searches for a person (e.g. “John Doe”), documents with this person as the Author (i.e. stored in the Author metadata) can be boosted. This is shown in the last example below.
Term2document Boosting Examples
Term |
Key |
Pattern |
Boost |
Query |
news |
post-type |
newsarticle |
5 |
|
|
datasource/fqcategory |
DataIntegration:FeaturedResults |
100 |
|
|
key |
.*\/products.* |
1.5 |
|
|
key |
^http:\/\/[^\/]*\/[^\/]*$ |
2.5 |
|
|
|
|
2.0 |
Author:{{query}} |
Additional information can be found in the Mindbreeze documentation on the Term2DocumentBoost Transformer Plugin.
Category Descriptor Boosting
Mindbreeze uses an XML file called the CategoryDescriptor to control various aspects of the search experience for each data source category (e.g. Web, DataIntegration, Microsoft File, etc.). Each category plugin includes a default CategoryDescriptor which can be extended or modified to meet your needs.
You may modify the CategoryDescriptor if you wish to add localized display labels for metadata field names or alter the default metadata visible from the search results page. In this case, we’re focused on how you can use it to boost the overall impact of a metadata field on relevancy. This is common if you wish to change the impact of a term’s presence in certain field over others. Common candidates for up-boosting include title, keywords, or summary. Candidates for down-boosting may include ID numbers, GUIDs, or other values which could yield confusing or irrelevant results in certain cases.
The default CategoryDescriptor is located in the respective plugin’s zip file. You can extract this file and modify it as needed.
Category Descriptor Example
The example below shows the modification of two metadatum entries. The first, for Keywords, boosts the importance of this field by a factor of 5.0. The second, for Topic, adds a boost factor of 2.0 and localized display labels for four languages.
<?xml version=”1.0″ encoding=”UTF-8″?>
<category id=”MyExample” supportsPublic=”true”>
<name>My Example Category</name>
<metadata>
<metadatum aggregatable=”true” boost=”5.0″ id=”Keywords” selectable=”true” visible=”true”>
</metadatum>
<metadatum aggregatable=”true” boost=”2.0″ id=”Topic” selectable=”true” visible=”true”>
<name xml:lang=”de”>Thema</name>
<name xml:lang=”en”>Topic</name>
<name xml:lang=”fr”>Sujet</name>
<name xml:lang=”es”>Tema</name>
</metadatum>
… additional metadatum omitted for brevity …
</metadata>
</category>
Applying a Custom Category Descriptor
The easiest way to apply a custom category descriptor is to download a copy of the respective plugin from the Mindbreeze updates page. For example, if you wanted to change relevancy for crawled content, you would download Mindbreeze Web Connector.zip. Unzip the file and look for the categoryDescriptor.xml file which is located in Mindbreeze Web Connector\Web Connector\Plugin\WebConnector-18.1.4.203.zip (version number may vary).
Please note, if you update a plugin on the Mindbreeze appliance, your custom CategoryDescriptor will be overwritten. Keep a copy saved in case you need to reapply it after updating. Additional information can be found in the Mindbreeze documentation on Customizing the Category Descriptor.
Inline Boosting
Boosting can be set at query time using the Mindbreeze query language. This can be either done directly in the query box or as part of the search application’s design by applying it to the constraint property. This functionality can be leveraged to create contextual search by dynamically boosting results based on any number of factors and custom business logic.
For example:
- ALL (company:fishbowl)^3.5 OR NOT (company:fishbowl)
Returns all results and ranks items with fishbowl in the company metadata field 3.5 times higher than other results.
- (InSpire^2.0 OR “InSite“) AND “efficient“
Results must contain InSpire or InSite and efficient. Occurrences of InSpire are twice as relevant as other terms.
- holiday AND ((DocType:Policy)^5 OR (DocType:Memo))
Returns results which contain holiday and a DocType value of Policy or Memo. Ranks items with Policy as their DocType 5 times higher than those with the DocType of Memo.
JavaScript Boosting
JavaScript boosting is similar to inline boosting in that it is also set at query time. It serves many of the same use cases as inline boosting, but can provide a cleaner implementation for clients already working within the Mindbreeze client.js framework. The examples below show how to apply boosting to three different scenarios. Please note, any fields you wish to boost in this way should be added to the Aggregated Metadata Keys configuration for the respective indices in order to enable regex matching.
Examples
This example ranks items with fishbowl in the company metadata field 3.5 times higher than other results. For comparison, this rule would have the same effect on rankings as the inline boosting shown in the first example within the previous section.
var application = new Application({});
application.execute(function (application) {
application.models.search.set(“boostings.fishbowl”, {
factor: 3.5,
query_expr: {
label: “company”, regex: “^fishbowl$”
}
}, { silent: true });
});
This example ranks results with a department value of accounting 1.5 times higher than other results. This can be modified to dynamically set the department to a user’s given department. For example, accounting may wish to see accounting documents boosted whereas the engineering team would want engineering documents boosted. Please note, setting this dynamically requires access the user’s department data which is outside the scope of the Mindbreeze search API but is often accessible within the context of an intranet or other business application.
var application = new Application({});
application.execute(function (application) {
application.models.search.set(“boostings.accounting”, {
factor: 1.5,
query_expr: {
label: “department”, regex: “^accounting$”
}
}, { silent: true });
});
This example shows how dynamic, non-indexed information (in this case, the current time) can be used to alter relevancy. Results with a featuredmeals value of dinner are boosted by a factor of 3 when the local time is between 5 PM and 10 PM. This could be extended to boost breakfast, brunch, and lunch, for their respective date-time windows which would be helpful if users were searching for restaurants or other meal-time points of interest.
var application = new Application({});
application.execute(function (application) {
var now = new Date().getHours();
if (now >= 17 && now <= 22) {
application.models.search.set(“boostings.dinner”, {
factor: 3.0,
query_expr: {
label: “featuredmeals”, regex: “.*dinner.*”
}
}, { silent: true });
}
});
As you can see, Mindbreeze offers a variety of options for relevancy adjustments. If you have any questions about our experience working with Mindbreeze or would like to know more, please contact us.