April 13, 2017

200 Lines or Less: Combining Twilio, AWS Lambda, and MS Cognitive Services into an SMS Image Analyzer

(Cool story. Show me the code!)

I've been doing a bit in the way of personal projects and a major source of inspiration for them is my Dad, who has been slowly but surely losing his eyesight for the past decade or so. I really want to help him navigate an intensely, naturally visual world as his affliction progresses. Through hours... and hours... of searching Google, I've found very few resources that are practical enough for every-day use.

That's not to say that there's nothing out there. Apple has been doing a fairly great job making iOS accessible for him - he zooms and has text read to him all the time on his phone. He uses his phone camera to snap pictures of items he wants to zoom in on to read. Not to mention Siri and the help she/it has provided so he can stay in touch with family and friends. Siri also helps him find how to get home, which is pretty critical to a man who walks nearly everywhere and can't read road signs. If he takes a wrong turn, it's easy for him to ask "Where am I?" to regain his bearings.

I've introduced my Dad, an avid reader all his life, to the wonder of audiobooks. He now has multiple Alexa-enabled devices around his condo that he uses to control lights, set timers, manage lists, and get tide schedules (he lives by the beach).

So there's not nothing for him to use, but there are a lot of things that are not as good as they could be for him. And some of what I've found online, such as desktop magnifiers, that are simply absurdly priced as medical equipment rather than the convenient household electronics they could or should be. The cheap ones I've seen hover around the $1,500 mark. For a camera and a screen. I'm considering building one out of an old monitor and a Raspberry Pi with a camera. Maybe even a Pi Zero. I figure it'll set me back about $200.

The point is that the resources are few, expensive, or only moderately effective.

What's a developer to do? Why, build something of course!




So I spent about 3 days of my time researching and building out a service that will take the "take a picture and zoom in" process a step further. At least that's my hope and that's what he's testing out right now.


It's a SMS/MMS-based app or service. You can send it a message with an image, it will analyze the image, perform a bit of OCR, and send back a text or two containing, respectively, a brief description of the image and any text found in the photograph.

To build this out, I used Twilio to handle SMS/MMS and pass the received message to a webhook. The webhook is programmed in Node.js and hosted in an AWS Lambda Function with an AWS API Gateway endpoint sitting in front providing pass-through for the function. All image processing is handled by Microsoft Cognitive Services.

As near as I can tell, the only charges I'll incur are for the SMS/MMS, which range from 1 cent for an MMS message to some fraction of that for the SMS replies.

AWS Lambda Functions are in a free tier that don't fall into the 12-month restriction. In order to max out that tier, I'd have to be processing some 400,000 requests per month - my Twilio costs would skyrocket before I ever came close to that.

Microsoft Cognitive Services - specifically the Computer Vision API - allow for 5,000 requests per month. I'm making two requests per image, so that becomes 2,500 texts I can handle per month for free using their API. If my dad is sending that many... he's got some sort of addiction.

Really, it's amazing what these companies give away for free / cheap. What they're doing is fueling innovation and exploration of technology by smaller companies and solo developers, and that's just plain awesome of them.

In the interest of brevity, I opted to post all of the code, hopefully well-commented, to GitHub.

No comments: