I'm willing to bet that most organizations are still a bit behind on the content strategy curve at this point and aren't adequately tagging their content and images. So I decided to build out a demo-ware / proof of concept for auto-tagging images, which will soon grow into processing content text, possibly as an add-on for the community. We'll see.
It's worth noting that the results that come back are suspect at best. Machine-based tagging should be looked at much like we look at machine-translation. It'll get close, but if you use it then there should be some human moderation put into play. Either pre-publish or post-publish doesn't matter, just know that you're not going to get 100% accuracy. You should expect some tags you'll want to remove and that you'll want to step in and add your own as well.
Still, I find that the tagger inserts values that are likely helpful as well as likely un-thought-of by some authors.
All of the code below goes into an Episerver IInitializableModule with the following includes (because I dislike when people leave these out of their instructions/samples...):
using System; using System.Collections.Generic; using System.Linq; using System.Threading; using System.Threading.Tasks; using BrightfindAlloyDemo.Models.Media; using EPiServer; using EPiServer.Core; using EPiServer.Framework; using EPiServer.Framework.Initialization; using EPiServer.ServiceLocation; using Microsoft.ProjectOxford.Vision; using Microsoft.ProjectOxford.Vision.Contract;
Microsoft's package for working with these services is pretty good. Very easy to implement to make quick calls out. Setting up a client is a one-line task, which I put inside a Lazy initializer in order to prevent it being initialized until needed. Probably not necessary, but I've gotten into that habit...
private static Lazy_visionService => new Lazy (() => new VisionServiceClient(Global.MicrosoftDemoValues.CognitiveServicesApiKey));
The method I wanted to use to auto-tag content, however, only existed in an Async form. I quickly found out that this didn't work well when used somewhat directly with Episerver's synchronous content event handlers.
Effectively, you can't add an async / await method as part of the handler. I started getting errors in the event log when I tried this. However, if I took those out, then the method would close out before the asynchronous request to Microsoft's services would return any data, which meant it did nothing but waste cycles.
Thanks to a little guidance from a fellow developer and a bit of Googling, I came upon something that works - using Tasks to run an Async method within one that isn't.
Without further ado, here's the code with comments to explain things.
public void Initialize(InitializationEngine context)
{
var contentEvents = ServiceLocator.Current.GetInstance();
contentEvents.SavingContent += AddImageAnalysis;
}
public void Uninitialize(InitializationEngine context)
{
var contentEvents = ServiceLocator.Current.GetInstance();
contentEvents.SavingContent -= AddImageAnalysis;
}
// This is the non-async method I want to add to the event handler so it doesn't go fubar on me.
private static void AddImageAnalysis(object content, ContentEventArgs e)
{
// Exit if the content is not an image or if the image content already has data in the fields (only want to add if empty, don't want to override user-entered data).
// if either is empty, we want to call the MS service, only exit if both have a value
var imageContent = e.Content as ImageFile;
if (!string.IsNullOrEmpty(imageContent?.Tags) && !string.IsNullOrEmpty(imageContent?.Description)) return;
// Task code to run an async method. Provides a token so we could cancel the task if we wanted to. You also could establish a hard timeout in ms in this area.
var tokenSource = new CancellationTokenSource();
var token = tokenSource.Token;
// sets up the task to run via delegate
var t = Task.Run(async () =>
{
await DoAnalysisAsync(imageContent);
}, token);
try
{
// wait for the task to compelte
t.Wait(token);
} catch(AggregateException ex)
{
// TODO Log exceptions
} finally
{
// probably could use a "using" statement instead of try/catch/finally, but you'd still want the try/catch portion anyway... so little difference.
tokenSource.Dispose();
}
}
// The async method being run by the task above. This is where I call the MS service.
private static async Task DoAnalysisAsync(ImageFile image)
{
// MS return type
AnalysisResult analysis = null;
// passing the image as binary data - alternatively could pass a publicly accessible URL, but this is pre-publish so that won't work.
using (var stream = image.BinaryData.OpenRead())
{
// calling MS services, second argument specifies the results in which I'm interested - not doing anything with categories yet
analysis = await _visionService.Value.AnalyzeImageAsync(stream, new List() { VisualFeature.Description, VisualFeature.Categories, VisualFeature.Tags });
}
// return if we got nothing back
if (analysis == null) return;
// only add/update if the field is empty
if (string.IsNullOrEmpty(image.Tags))
{
// only get the tags that are above a confidence threshold
var confidentTags = analysis.Tags?.Where(t => t.Confidence >= 0.25).Select(t => t.Name) ?? new string[] { };
// of note - this format is used by Geta.Tags add-on, which will push these into the tags data store on publish :)
image.Tags = string.Join(",", confidentTags);
}
if (string.IsNullOrEmpty(image.Description))
{
// only get descriptions/captions above a confidence threshold
var confidentDescriptions = analysis.Description.Captions.Where(c => c.Confidence > 0.25).Select(c => c.Text);
// usually just one, but in case there are multiple, set them up in sentence format for screen-readers to add appropriate pauses
image.Description = string.Join(". ", confidentDescriptions);
}
}
Sample image:
Sample output (using Geta Tags):


2 comments:
I see you had the same id as me :)
The problem I ran into with awaiting the execution is that Episerver gives an error message that the upload failed, although it didn't. So I chose to not await the analysis.
I was getting the same error for all prior attempts that were even close to successful. In this case, though, it takes a couple of seconds longer for the upload to complete, but it doesn't show a failure. It does the tagging at the time of the upload, too, giving authors the option to change the values after uploading.
Post a Comment