January 3, 2018

Hijacking Episerver Find's Unified Search Text

I'm going to sort of build onto @kennygutierrez's recent quick-tip post regarding limiting indexing with your developer Find index with a Find tip of my own.

In the past few weeks, I've had a need to inject my own logic into how Find aggregates text for its UnifiedSearch. I won't get into the specifics of what I was doing (it would raise some questions as well as some eyebrows, to be honest), but I'd like to share the solution I found in the Episerver forums (sorry that I didn't save which post and therefore can't credit the poster, though I certainly owe them a drink).

There will only be a couple of short code snippets with this as it's really an elegantly simple, yet quite powerful solution.

Using this approach, you will effectively replace the code Episerver includes to aggregate all string or other indexable fields on a content object with your own logic. This means that you can set up your own rules for when to / when not to index content - on the fly, as it's being indexed.

I'm not talking about including or excluding specific properties from the index at a global level - there are some fine attributes and other solutions for doing that. Rather, this can be used to make those include/exclude decisions dynamically on a case-by-case basis rather than always include or always exclude.

In addition, you could use these snippets to modify the search text before it's put into the index - adding data, omitting or replacing text, or whatever you want. If you can code it into a string, you can apply it here.

For what I was doing, I needed custom logic around indexing the items within a ContentArea. They can be indexed by default, at a controllable recursion depth, but I needed more than the standard functionality.

You also can apply your logic for only specific content types, though in my case it was against all PageData and inheriting types.

SearchClient.Instance.Conventions.ForInstancesOf()
    .ExcludeField(x => x.SearchText())
    .IncludeField(x => x.SearchText(true))
    ;

The trick, seen above, is to first exclude the default behavior. SearchText() is an IContentData extension method in the EPiServer.Find.Cms namespace and is included as a "Field" in the index. It's this field that the UnifiedSearch queries against.

Once excluded, the next step is to include your own extension method of the same name - but with a different signature so it can be differentiated from the default method. Remember, it's an extension method against IContentData, so your first parameter should be this IContentData contentData (or an inheriting type - mine is PageData because that's the type for which I'm overriding). You don't have to use contentData as the first parameter name, but you get the idea.

Then you implement your extension. Here's (sort of) what mine looks like:


public static class MyIndexingCode 
{
    public static string SearchText(this PageData page, bool extended) 
    // I don't use extended for anything except to make sure the signature is unique from Find's extension
    {
        if(/*[completely replace]*/)
        {
            return CompileMyOwnSearchText(page);
        }

        if(/*[add custom data]*/)
        {
            return page.SearchText() + CompileAdditionalData(); 
            // where SearchText is the Find method
        }
        
        return page.SearchText(); 
        // fallback to default behavior
    }
}

Have another or better solution? Spot something I did wrong? Feel free to post in the comments!

Happy coding

2 comments:

Luc Gosso said...

Hi James,

Just put your extension with same name (SearchText) in same NameSpace as Initialization and it will use that one. Then no need for any exclude nor new override SearchText(bool)

Exemple:

public static string SearchText(this IContentData contentData)
{
return "hacked: " + EPiServer.Find.Cms.ContentExtensions.SearchText(contentData)
}

Works in Sweden

eGandalf said...

Thanks Luc. I wonder how that would impact some of the rest of my code, though. I'm doing checks inside my own extension and, if they fail, falling back to the default. If I don't differentiate my own, I suspect that would throw it into an infinite loop.