Getting Started with Image Analysis using Azure AI Vision and Azure Functions

Tech Community • 8 min read

Royalty free image from Pixabay

Image Analysis is a powerful tool that transforms images into valuable insights. By using pre-trained AI models to extract detailed information, categorizes visual content, and enables you to make data-driven decisions. From understanding image content to automating tasks, this service offers a versatile solution for businesses across industries.

Image Analysis is an essential part of every industry. In healthcare it can analyze X-rays and MRIs (Magnetic Resonance Imaging), helping doctors spot abnormalities with exceptional accuracy. Self-driving cars use image analysis to navigate roads, identify pedestrians, and avoid obstacles. Banks are using image analysis to automate tasks like check processing and fraud detection. Security systems are getting smarter too, using image analysis to identify suspicious activity and even recognize faces.

Azure AI Vision

By leveraging cloud platforms like Microsoft Azure, you can access powerful AI tools without the heavy lifting of managing your own infrastructure. This makes image analysis scalable, efficient, and cost-effective for businesses of all sizes.

Now, let's talk about Azure AI Vision, a powerful suite of tools within Azure for image analysis. Azure AI Vision offers a wide range of capabilities, including:

  • Object detection: Identify and locate specific objects within an image, like cars in a traffic scene or medical instruments in an X-ray.
  • Image classification: Categorize an entire image, like classifying a picture as a landscape, portrait, or containing a specific product.
  • Image analysis: Extract detailed information from images, like identifying dominant colors, detecting adult content, or understanding the overall layout of a scene.
  • Face recognition: Detect, recognize, and analyze human faces in images.
  • OCR (Optical Character Recognition): Extract text from images, which is incredibly useful for tasks like reading receipts or processing handwritten documents.

Azure Functions

Azure Functions provide a serverless approach to building applications. This means you can focus on the code that analyzes the images, without worrying about managing servers or scaling your infrastructure. You simply write functions that react to specific events, like a new image being uploaded.

By combining Azure AI Vision with Azure Functions, you can create powerful image analysis solutions that are:

  • Scalable: Easily handle large volumes of images without worrying about infrastructure limitations.
  • Cost-effective: Only pay for the resources you use, making it ideal for projects with fluctuating workloads.
  • Efficient: Quickly build and deploy image analysis solutions without managing complex infrastructure.

Azure AI Vision Image Analysis with Azure Function

To get you started let's dive into a practical example by creating an image analysis implementation using Azure Function together.

Prerequisites

  • An Azure subscription
  • Once you have your Azure subscription, create a Computer Vision resource in the Azure portal. After it deploys, select Go to resource.
    • You need the key and endpoint from the resource you create to connect your application to the Azure AI Vision service.
    • You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • The Visual Studio IDE with workload Azure development installed.

Create a function app project

The Azure Functions project template in Visual Studio creates a C# class library project that you can publish to a function app in Azure. You can use a function app to group functions as a logical unit for easier management, deployment, scaling, and sharing of resources.

1. From the Visual Studio menu, select File > New > Project.

2. In Create a new project, enter functions in the search box, choose the Azure Functions template, and then select Next.

3. In Configure your new project, enter a Project name for your project, and then select Next. The function app name must be valid as a C# namespace, so don't use underscores, hyphens, or any other non-alphanumeric characters.

4. Using the following values for Additional information settings:

  • a. Functions worker: .NET 8.0 Isolated (Long Term Support
  • b. Function: HTTP trigger
  • c. Use Azurite for runtime storage account (AzureWebJobsStorage): Enable
  • d. Authorisation level: Anonymous


5. Select Create to create the function project and HTTP trigger function.

Rename the function

The Function method attribute sets the name of the function, which by default is generated as Function1. Since the tooling doesn't let you override the default function name when you create your project, take a minute to create a better name for the function class, file, and metadata.

1. In File Explorer, right-click the Function1.cs file and rename it to ImageAnalysis.cs.

2. In the code, rename the Function1 class to ImageAnalysis.

3. In the method named Run, rename the Function method attribute to ImageAnalysis.

Your function definition should now look like the following code:

C#
  public class ImageAnalysis
    {
        [Function("ImageAnalysis")]
        public IActionResult Run([HttpTrigger(AuthorizationLevel.Anonymous, "get", "post")] HttpRequest req)
        {
            return new OkObjectResult("Welcome to Azure Functions!");
        }
    }

Run the function locally

Visual Studio integrates with Azure Functions Core Tools so that you can test your functions locally using the full Azure Functions runtime.

1. To run your function, press F5 in Visual Studio. You might need to enable a firewall exception so that the tools can handle HTTP requests. 

2. Copy the URL of your function from the Azure Functions runtime output.

3. Paste the URL for the HTTP request into your browser's address bar and run the request. The following image shows the response in the browser to the local GET request returned by the function:

4. To stop debugging, press Shift+F5 in Visual Studio.

Install the ImageAnalysis client SDK

Install the client SDK by right-clicking on the project solution in the Solution Explorer and selecting Manage NuGet Packages. In the package manager that opens select Browse, check Include prerelease, and search for Azure.AI.Vision.ImageAnalysis. Select Install.

Create and authenticate the client

To authenticate against the Image Analysis service, you need a Computer Vision key and endpoint URL. This guide assumes that you've defined the environment variables VISION_KEY and VISION_ENDPOINT with your key and endpoint.

C#
var endpoint = "<VISION_ENDPOINT>";
var key = "<VISION_KEY>";

var imageAnalysisClient = new ImageAnalysisClient(
    new Uri(endpoint),
    new AzureKeyCredential(key));

Image URL

Create a Uri object for the image you want to analyse.

C#
var imageURL = new Uri("https://cdn.pixabay.com/photo/2018/04/26/12/14/travel-3351825_1280.jpg");

Select visual features

The Analysis 4.0 API gives you access to all of the service's image analysis features. Choose which operations to do based on your own use case.

C#
var visualFeatures =
    VisualFeatures.Caption |
    VisualFeatures.DenseCaptions |
    VisualFeatures.Objects |
    VisualFeatures.Read |
    VisualFeatures.Tags;

Select analysis options

Use an ImageAnalysisOptions object to specify various options for the Analyze Image API call.

  • Language: You can specify the language of the returned data. 
  • Gender neutral captions: If you're extracting captions or dense captions (using VisualFeatures.Caption or VisualFeatures.DenseCaptions), you can ask for gender neutral captions. For example, in English, when you select gender neutral captions, terms like woman or man are replaced with person, and boy or girl are replaced with child.
  • Crop aspect ratio: An aspect ratio is calculated by dividing the target crop width by the height. Supported values are from 0.75 to 1.8 (inclusive). Setting this property is only relevant when VisualFeatures.SmartCrops was selected as part of the visual feature list.
C#
var imageAnalysisOptions = new ImageAnalysisOptions
{
    GenderNeutralCaption = true,
    Language = "en"
};

Call the Analyze API

Call the Analyze method on the ImageAnalysisClient object, as shown here. The call is synchronous, and blocks execution until the service returns the results or an error occurs. 

C#
var imageAnalysisResult = imageAnalysisClient.Analyze(
    imageURL,
    visualFeatures,
    imageAnalysisOptions);

Clean up resources

If you want to clean up and remove an Azure AI services subscription, you can delete the resource or resource group. Deleting the resource group also deletes any other resources associated with it.

API Response

The Analyze operation returns an ImageAnalysisResult object, which encapsulates a comprehensive set of visual data extracted from the input image. This rich dataset includes information pertaining to various visual attributes, such as captions, object detection, text and image tags. 

Let's delve into the specific components of this response object to understand the insights it provides.

Caption:

The generated phrase that describes the content of the analyzed image along with confidence score.

JSON
"caption": 
{
"confidence": 0.80155146,
       "text": "a pink car on a beach with kites flying in the air"
},

Dense Captions:

Up to 10 generated phrases, the first describing the content of the whole image, and the others describing the content of different regions of the image along with a confidence score. Including bounding box size and coordinates on the image.

JSON
"denseCaptions": {
            "values": [
                {
                    "confidence": 0.7487749,
                    "text": "a pink car parked on the beach",
                    "boundingBox": {
                        "x": 543,
                        "y": 349,
                        "width": 619,
                        "height": 291
                    }
                },
                {
                    "confidence": 0.7620856,
                    "text": "a white sign with black text",
                    "boundingBox": {
                        "x": 1040,
                        "y": 493,
                        "width": 63,
                        "height": 54
                    }
                },
            ]
        },

Objects:

A list of detected physical objects in the analyzed image along with confidence score, and their location.

JSON
"objects": {
            "values": [
                {
                    "boundingBox": {
                        "x": 1079,
                        "y": 138,
                        "width": 48,
                        "height": 70
                    },
                    "tags": [
                        {
                            "confidence": 0.628,
                            "name": "Kite"
                        }
                    ]
                },
                {
                    "boundingBox": {
                        "x": 555,
                        "y": 360,
                        "width": 623,
                        "height": 286
                    },
                    "tags": [
                        {
                            "confidence": 0.645,
                            "name": "car"
                        }
                    ]
                }
            ]
        },

Read:

The extracted printed or hand-written text in the analyzed image. Also known as OCR. Including confidence score and location.

JSON
"read": {
            "blocks": [
                {
                    "lines": [
                        {
                            "text": "35 EAF",
                            "boundingPolygon": [
                                {
                                    "x": 1049,
                                    "y": 501
                                },
                                {
                                    "x": 1094,
                                    "y": 501
                                },
                                {
                                    "x": 1093,
                                    "y": 518
                                },
                                {
                                    "x": 1049,
                                    "y": 519
                                }
                            ],
                            "words": [
                                {
                                    "text": "35",
                                    "boundingPolygon": [
                                        {
                                            "x": 1050,
                                            "y": 501
                                        },
                                        {
                                            "x": 1065,
                                            "y": 501
                                        },
                                        {
                                            "x": 1065,
                                            "y": 519
                                        },
                                        {
                                            "x": 1050,
                                            "y": 519
                                        }
                                    ],
                                    "confidence": 0.999
                                },
                                {
                                    "text": "EAF",
                                    "boundingPolygon": [
                                        {
                                            "x": 1070,
                                            "y": 501
                                        },
                                        {
                                            "x": 1093,
                                            "y": 501
                                        },
                                        {
                                            "x": 1093,
                                            "y": 518
                                        },
                                        {
                                            "x": 1071,
                                            "y": 519
                                        }
                                    ],
                                    "confidence": 0.995
                                }
                            ]
                        }
                    ]
                }
            ]
        },

Metadata:

Metadata associated with the analyzed image.

JSON
"metadata": {
"height": 853,
       "width": 1280
},

Tags:

A list of content tags in the analyzed image along with a confidence score.

JSON
"tags": {
            "values": [
                {
                    "confidence": 0.99748945,
                    "name": "outdoor"
                },
                {
                    "confidence": 0.9937184,
                    "name": "vehicle"
                },
                {
                    "confidence": 0.9691309,
                    "name": "land vehicle"
                },
                {
                    "confidence": 0.9618436,
                    "name": "sky"
                },
                {
                    "confidence": 0.9477532,
                    "name": "beach"
                },
                {
                    "confidence": 0.94232225,
                    "name": "ground"
                },
                {
                    "confidence": 0.9136018,
                    "name": "wheel"
                },
                {
                    "confidence": 0.90356946,
                    "name": "transport"
                },
                {
                    "confidence": 0.89537966,
                    "name": "car"
                },
                {
                    "confidence": 0.7229159,
                    "name": "mountain"
                },
                {
                    "confidence": 0.64437807,
                    "name": "people"
                },
                {
                    "confidence": 0.54529554,
                    "name": "sand"
                },
                {
                    "confidence": 0.52938634,
                    "name": "parked"
                }
            ]
        }

Recap

This post demonstrated how to extract valuable metadata, including captions, objects, text, and tags, from images using Azure AI Vision. By leveraging a serverless function, we showcased the ease of integrating powerful image analysis capabilities into your applications. 

The potential for innovative solutions built upon this technology is vast and exciting.

Get in Touch.

Let’s discuss how we can help with your cloud journey. Our experts are standing by to talk about your migration, modernisation, development and skills challenges.

Ilja Summala
Ilja’s passion and tech knowledge help customers transform how they manage infrastructure and develop apps in cloud.
Ilja Summala LinkedIn
Group CTO