I'm willing to bet that most organizations are still a bit behind on the content strategy curve at this point and aren't adequately tagging their content and images. So I decided to build out a demo-ware / proof of concept for auto-tagging images, which will soon grow into processing content text, possibly as an add-on for the community. We'll see.
It's worth noting that the results that come back are suspect at best. Machine-based tagging should be looked at much like we look at machine-translation. It'll get close, but if you use it then there should be some human moderation put into play. Either pre-publish or post-publish doesn't matter, just know that you're not going to get 100% accuracy. You should expect some tags you'll want to remove and that you'll want to step in and add your own as well.
Still, I find that the tagger inserts values that are likely helpful as well as likely un-thought-of by some authors.
All of the code below goes into an Episerver IInitializableModule with the following includes (because I dislike when people leave these out of their instructions/samples...):
using System; using System.Collections.Generic; using System.Linq; using System.Threading; using System.Threading.Tasks; using BrightfindAlloyDemo.Models.Media; using EPiServer; using EPiServer.Core; using EPiServer.Framework; using EPiServer.Framework.Initialization; using EPiServer.ServiceLocation; using Microsoft.ProjectOxford.Vision; using Microsoft.ProjectOxford.Vision.Contract;
Microsoft's package for working with these services is pretty good. Very easy to implement to make quick calls out. Setting up a client is a one-line task, which I put inside a Lazy initializer in order to prevent it being initialized until needed. Probably not necessary, but I've gotten into that habit...
private static Lazy_visionService => new Lazy (() => new VisionServiceClient(Global.MicrosoftDemoValues.CognitiveServicesApiKey));
The method I wanted to use to auto-tag content, however, only existed in an Async form. I quickly found out that this didn't work well when used somewhat directly with Episerver's synchronous content event handlers.
Effectively, you can't add an async / await method as part of the handler. I started getting errors in the event log when I tried this. However, if I took those out, then the method would close out before the asynchronous request to Microsoft's services would return any data, which meant it did nothing but waste cycles.
Thanks to a little guidance from a fellow developer and a bit of Googling, I came upon something that works - using Tasks to run an Async method within one that isn't.
Without further ado, here's the code with comments to explain things.
public void Initialize(InitializationEngine context) { var contentEvents = ServiceLocator.Current.GetInstance(); contentEvents.SavingContent += AddImageAnalysis; } public void Uninitialize(InitializationEngine context) { var contentEvents = ServiceLocator.Current.GetInstance (); contentEvents.SavingContent -= AddImageAnalysis; } // This is the non-async method I want to add to the event handler so it doesn't go fubar on me. private static void AddImageAnalysis(object content, ContentEventArgs e) { // Exit if the content is not an image or if the image content already has data in the fields (only want to add if empty, don't want to override user-entered data). // if either is empty, we want to call the MS service, only exit if both have a value var imageContent = e.Content as ImageFile; if (!string.IsNullOrEmpty(imageContent?.Tags) && !string.IsNullOrEmpty(imageContent?.Description)) return; // Task code to run an async method. Provides a token so we could cancel the task if we wanted to. You also could establish a hard timeout in ms in this area. var tokenSource = new CancellationTokenSource(); var token = tokenSource.Token; // sets up the task to run via delegate var t = Task.Run(async () => { await DoAnalysisAsync(imageContent); }, token); try { // wait for the task to compelte t.Wait(token); } catch(AggregateException ex) { // TODO Log exceptions } finally { // probably could use a "using" statement instead of try/catch/finally, but you'd still want the try/catch portion anyway... so little difference. tokenSource.Dispose(); } } // The async method being run by the task above. This is where I call the MS service. private static async Task DoAnalysisAsync(ImageFile image) { // MS return type AnalysisResult analysis = null; // passing the image as binary data - alternatively could pass a publicly accessible URL, but this is pre-publish so that won't work. using (var stream = image.BinaryData.OpenRead()) { // calling MS services, second argument specifies the results in which I'm interested - not doing anything with categories yet analysis = await _visionService.Value.AnalyzeImageAsync(stream, new List () { VisualFeature.Description, VisualFeature.Categories, VisualFeature.Tags }); } // return if we got nothing back if (analysis == null) return; // only add/update if the field is empty if (string.IsNullOrEmpty(image.Tags)) { // only get the tags that are above a confidence threshold var confidentTags = analysis.Tags?.Where(t => t.Confidence >= 0.25).Select(t => t.Name) ?? new string[] { }; // of note - this format is used by Geta.Tags add-on, which will push these into the tags data store on publish :) image.Tags = string.Join(",", confidentTags); } if (string.IsNullOrEmpty(image.Description)) { // only get descriptions/captions above a confidence threshold var confidentDescriptions = analysis.Description.Captions.Where(c => c.Confidence > 0.25).Select(c => c.Text); // usually just one, but in case there are multiple, set them up in sentence format for screen-readers to add appropriate pauses image.Description = string.Join(". ", confidentDescriptions); } }
Sample image:
Sample output (using Geta Tags):
2 comments:
I see you had the same id as me :)
The problem I ran into with awaiting the execution is that Episerver gives an error message that the upload failed, although it didn't. So I chose to not await the analysis.
I was getting the same error for all prior attempts that were even close to successful. In this case, though, it takes a couple of seconds longer for the upload to complete, but it doesn't show a failure. It does the tagging at the time of the upload, too, giving authors the option to change the values after uploading.
Post a Comment