Keeping up with the latest skills: AWS IoT, Polly, and Rekognition

Recently, I secured a number of AWS IoT Buttons for our office to play with and wanted to try to see how easy they would be to set-up and use in various mock-up applications. In the spirit of playing around with the buttons and keeping up my technical skills related to the AWS platform, I decided to make a small proof-of-concept project around them by collecting some old Android devices I had lying around, and various bits and pieces of AWS services such as Image recognition. The concept I finally settled with is a remote surveillance camera solution which can be triggered remotely with the AWS IoT Button, and which performs simple image recognition labelling the image content in the form of gender, roughage, mood, and other parameters. The solution will update a “monitoring” website where the latest surveillance image will be shown and the recognised characteristics spoke out for the viewer, removing the need to read the monitor in detail. For building the actual solution I selected the following tools and technologies together with the AWS platform:

Android tablet - I like to repurpose and recycle old and unused items, so I decided to use a decommissioned tablet as the IoT device which will act as the camera module for the system. Android devices are, in my opinion, one of the best toys to have lying around for building solutions requiring mobile, IoT, or embedded components. The platform is quite easy to use and easy to write applications in.
NodeRed - Since I didn’t want to spend too much time configuring and setting up the IoT libraries and framework in the Android devices, I decided to use NodeRed as the solution providing the MQTT protocol support, as it provides easy to use programming tools for doing quick PoCs around IoT. Running NodeRed requires SSH-access to the device, which I established using Termux and associated modules or controlling the camera etc.
The AWS IoT Button - This was an obvious choice as it was one of the technology components I wanted to test and one that also made me start working with the project in the first place.

As the main idea of the solution was to build something around the AWS IoT Button and see how easy it is to set-up and use, this meant using the AWS platform as the IoT "backend". For the rest of the solution, (as I didn't want to start maintaining or setting up servers myself) I decided to use as many platform services as possible in AWS. I ended up working with the following AWS services:

AWS IoT

Using the AWS IoT platform for the message brokering, connectivities, and overall management of the IoT solution.

AWS IAM

The requirement here was to configure the various access roles and rights for all the architectural components in a secure way.

AWS S3

Using two distinct S3 buckets. One for uploading the images taken by the camera, one for hosting the website for the “monitoring” purposes.

AWS Lambda

Lambda functions were used to perform the required calculations and actions in a "serverless"-fashion and to remove the need for maintaining infrastructure components.

AWS Polly

Text-to-speech service used for creating the audio-streams required by the solution.

AWS Rekognition

Image recognition service used for analysing, and labelling the images.

AWS CloudWatch and logs

Used for monitoring and debugging the solution during the project.

AWS CloudFormation

Used for creating the resources, functions, roles etc. in the solution.

Python/Boto3

I selected to use Python as the programming language as the Boto3 libraries provide easy APIs to utilise the AWS services. Python was used to write all the Lambda functions to perform the processing required by the overall solution.

How everything was brought together

After registering the AWS IoT button (which was easily done with the AWS Android app), and Android devices to AWS IoT framework and provisioning the security credentials for them, they were good to be used as part of the solution. The architectural idea was to press a button to trigger a Lambda function which will do a few checks on the “upload” S3 bucket, creating a temporary signed URL for the S3 bucket. It will then use the AWS IoT topic to notify the Android devices on the image capture trigger. The Android device would then take the picture of whatever is standing in front of the camera and upload it securely to the “upload” S3 bucket using the temporary upload URL provided via the MQTT message it received earlier. Whenever new images are uploaded to the S3 bucket, this will trigger another serverless action in the background. This Lambda-function will take the image and use AWS Rekognition for performing the image recognition on it. The recognised labels and objects will then be run through AWS Polly to create the required audio stream. After all the new content is created, the Lambda-function will upload the content to the other S3 bucket where the website is hosted to show and play the content for whoever is watching the “monitoring” website. The separation of the S3 buckets provides added security measures, (a DMZ of sorts) to safeguard the website for the potentially harmful content which could, in theory, be uploaded to the upload bucket if the temporary upload URL was somehow acquired by an attacker. The whole solution is secured by AWS IAM by providing the least amount of necessary privileges for all the components to perform their actions in the exact resources they are using. Enabling Cloudwatch monitoring and logging is a good choice for debugging the solution, at least during the development phase. This enabled me to catch unnecessary typing errors in the granular IAM policies in the Lambda function's IAM Role during the set-up.

My findings

This was a rather quick and fun project to work with and provided some insight into using the AWS IoT Button and Android devices as part of the AWS IoT ecosystem. The devices themselves were rather easy to get registered and functioning in the set-up. Of course in a large-scale real-world environment the set-up, certification creation, and installation of the IoT devices would need to be automated as well to make it feasible. Incorporating small Lambda-functions with image recognition and text-to-speech was quite straightforward and worked as a good learning platform for the technologies. When applying the project to a customer situation, I would definitely improve it by adding image transcoding for different screen sizes, create a proper web-service with searchable UI and proper picture database/index etc. All in all, I can highly recommend playing around with the IoT framework, IoT button, and NodeRed in Android. Creating these kinds of small side-projects is the perfect platform for people in our business to continue improving our skills and know-how around the ever-expanding technology selection in modern IT environment. Nordcloud offers deep-dive workshop which will help to identify the opportunities that impact your business and help you shape data-driven solutions which will take your business to the next level, contact us for more information.