All of these URLs are referring to the same homepage of your website but are slightly different. This is problematic for most enterprise search engines, and specifically Mindbreeze, because the URL is used as the unique key to identify a piece of content. This results in the same page being duplicated in the search results.
To combat this problem, Mindbreeze allows you to overwrite what is used as the unique key for a specific index. The best solution is to use canonical URLs within your website, and to tell Mindbreeze to use these canonical URLs as the unique key.
To do this, Mindbreeze uses what they call Extract Metadata. Using XPath expressions, you can tell the Mindbreeze crawler where to find structured information within your pages.
Mindbreeze uses “header/mes:key” as the unique identifier for an item within the Mindbreeze index, so we simply need to overwrite that metadata field with the canonical URL from our page.
The following configuration needs to be added to the crawler configuration in which canonical URLs should be used: