In Mindbreeze, the Semantic Pipeline refers to the process after a document is picked up by a crawler and then ultimately indexed. During this process, content and metadata can be extracted and manipulated to drive more intelligence in the Enterprise Search Solution. The following are the steps that occur in the Semantic Pipeline.

Filter Service

The Filter Service is used to extract all the content from the document to define what should be indexed, as well as what content should be used for the content preview. Different filters can be selected based on content type. A couple examples of this would be creating PDF previews for Microsoft Office Documents or extracting and indexing content inside zip files.

Post-Filter Transformation

After the content has been extracted by the Filter Service, the Post-Filter Transformation step can be used to manipulate the content with custom code. Mindbreeze offers a Java SDK to build these plugins.

Precomputed Synthesized Metadata

Precomputed Synthesized Metadata allows us to create or edit existing fields with functions known as Property Expression Language. The time at which this metadata is created or edited can be configured to run at different points using the “Transformation Pipeline Slot”.
To see an example of Precomputed Synthesized Metadata, please see our previous blog post regarding the topic.
The Mindbreeze property expression language is helpful when defining synthesized metadata. Please see the Mindbreeze Property Expression Documentation for the full details.

Entity Recognition

Entity recognition is used to derive metadata from content or other metadata based on rules and patterns. It is applied at the index level.

CSV Transformation

CSV Transformation is another way to manipulate metadata, but instead by mapping values based on a CSV file.

Item Transformation

The final way to create metadata is with Item Transformation Plugins. Item Transformations are similar to Post-Filter Transformations, but can be applied per index and after all other manipulation has been performed. Mindbreeze offers a Java SDK to build these plugins.

0 Comments

Subscribe to our Blog

Fishbowl Solutions

Since 1999, our expert team has been recognized by our customers as world-class partners who solve costly and frustrating knowledge sharing problems with Oracle, PTC, Mindbreeze, and Google technologies.

Headquarters
4500 Park Glen Rd STE 200
Minneapolis, MN 55416

Satellite Office
Cambridge, UK

+1 952.465.3400
info@fishbowlsolutions.com

How Mindbreeze Ingests Content with the Semantic Pipeline