Azure Video Indexer Face Detection transparency note – Microsoft Docs

  • Article
  • 6 minutes to read

Face Detection is an Azure Video Indexer AI feature that automatically detects faces in a media file and aggregates instances of similar faces into the same group. The Celebrities Recognition Module is then run to recognize celebrities. This module covers approximately one million faces and is based on commonly requested data sources. Faces that are not recognized by Azure Video Indexer are still detected but are left unnamed. Customers can build their own custom Person modules whereby the Azure Video Indexer recognizes faces that are not recognized by default.

The resulting insights are generated in a categorized list in a JSON file which includes a thumbnail and either name or ID of each face. Clicking face’s thumbnail displays information like the name of the person (if they were recognized), the % of appearances in the video, and their biography if they are a celebrity. It also enables scrolling between the instances in the video. 


Review Transparency Note overview

General principles

This Transparency Note discusses Faces Detection and the key considerations for making use of this technology responsibly. There are a number of things you need to consider when deciding how to use and implement an AI-powered feature:

Will this feature perform well in my scenario? Before deploying Faces Detection into your scenario, test how it performs using real-life data and make sure it can deliver the accuracy you need.

Are we equipped to identify and respond to errors? AI-powered products and features will not be 100% accurate, so consider how you will identify and respond to any errors that may occur.

Key terms

Term Definition
Insight  The information and knowledge derived from the processing and analysis of video and audio files that generate different types of insights and can include detected objects, people, faces, animated characters, keyframes and translations or transcriptions.
Face recognition  The analysis of images to identify the faces that appear in the images. This process is implemented via the Azure Cognitive Services Face API.
Template Enrolled images of people are converted to templates, which are then used for facial recognition. Machine-interpretable features are extracted from one or more images of an individual to create that individual’s template. The enrolment or probe images are not stored by Face API and the original images cannot be reconstructed based on a template. Template quality is a key determinant on the accuracy of your results.
Enrolment The process of enrolling images of individuals for template creation so they can be recognized. When a person is enrolled to a verification system used for authentication, their template is also associated with a primary identifier2 that is used to determine which template to compare with the probe template. High-quality images and images representing natural variations in how a person looks (for instance wearing glasses, not wearing glasses) generate high-quality enrolment templates.
Deep search  The ability to retrieve only relevant video and audio files from a video library by searching for specific terms within the extracted insights. 

View the insight

To see the instances on the website, do the following:

  1. When uploading the media file, go to Video + Audio Indexing, or go to Audio Only or Video + Audio and select Advanced.
  2. After the file is uploaded and indexed, go to Insights and scroll to People.

To see Face Detection insight in the JSON filet, do the following:

  1. Click Download and then Insights (JSON).

  2. Copy the faces element from under Insights and paste it into your JSON Viewer.

    "faces": [
        "id": 1785,
        "name": "Emily Tran",
        "confidence": 0.7855,
        "description": null,
        "thumbnailId": "fd2720f7-b029-4e01-af44-3baf4720c531",
        "knownPersonId": "92b25b4c-944f-4063-8ad4-f73492e42e6f",
        "title": null,
        "imageUrl": null,
        "thumbnails": [
            "id": "4d182b8c-2adf-48a2-a352-785e9fcd1fcf",
            "fileName": "FaceInstanceThumbnail_4d182b8c-2adf-48a2-a352-785e9fcd1fcf.jpg",
            "instances": [
                "adjustedStart": "0:00:00",
                "adjustedEnd": "0:00:00.033",
                "start": "0:00:00",
                "end": "0:00:00.033"
            "id": "feff177b-dabf-4f03-acaf-3e5052c8be57",
            "fileName": "FaceInstanceThumbnail_feff177b-dabf-4f03-acaf-3e5052c8be57.jpg",
            "instances": [
                "adjustedStart": "0:00:05",
                "adjustedEnd": "0:00:05.033",
                "start": "0:00:05",
                "end": "0:00:05.033"

To download the JSON file via the API, Azure Video Indexer developer portal.

Face Detection Components

During the Face Detection procedure, people in a media file are processed by Azure API, as follows:

Component Definition
Source file The user uploads the source file for indexing by Azure APIs.
Detection and aggregation The Face Detector identifies the faces in each frame. The faces are then aggregated and grouped.
Recognition The Celebrities module runs over the aggregated groups to recognize celebrities. If the customer has created their own Persons module it is also run to recognize people. When people are not recognized, they are labelled Unknown1, Unknown2 and so on.
Confidence value Where applicable for well-known faces or faces identified in the customizable list, the estimated confidence level of each label is calculated as a range of 0 to 1. The confidence score represents the certainty in the accuracy of the result. For example, an 82% certainty will be represented as an 0.82 score.

Example use cases

  • Summarizing where an actor appears in a movie or reusing footage by deep searching for specific faces in organizational archives for insight on a specific celebrity.
  • Improved efficiency when creating feature stories at a news or sports agency, for example deep searching for a celebrity or football player in organizational archives.
  • Using faces appearing in the video to create promos, trailers or highlights. Azure Video Indexer can assist by adding keyframes, scene markers, timestamps and labelling so that content editors invest less time reviewing numerous files.  

Considerations when choosing a use case

  • Carefully consider the accuracy of the results, to promote more accurate detections, check the quality of the video, low quality video might impact the detected insights.

  • Carefully consider when using for law enforcement. People might not be detected if they are small, sitting, crouching, or obstructed by objects or other people. To ensure fair and high-quality decisions, combine Face detection-based automation with human oversight.

  • Do not use Face Detection for decisions that may have serious adverse impacts. Decisions based on incorrect output could have serious adverse impacts. Additionally, it is advisable to include human review of decisions that have the potential for serious impacts on individuals.

    When used responsibly and carefully Face Detection is a valuable tool for many industries. To respect the privacy and safety of others, and to comply with local and global regulations, we recommend the following:  

  • Always respect an individual’s right to privacy, and only ingest videos for lawful and justifiable purposes.  

  • Do not purposely disclose inappropriate content about young children or family members of celebrities or other content that may be detrimental or pose a threat to an individual’s personal freedom.  

  • Commit to respecting and promoting human rights in the design and deployment of your analyzed media.  

  • When using 3rd party materials, be aware of any existing copyrights or permissions required before distributing content derived from them. 

  • Always seek legal advice when using content from unknown sources. 

  • Always obtain appropriate legal and professional advice to ensure that your uploaded videos are secured and have adequate controls to preserve the integrity of your content and to prevent unauthorized access.    

  • Provide a feedback channel that allows users and individuals to report issues with the service.  

  • Be aware of any applicable laws or regulations that exist in your area regarding processing, analyzing, and sharing media containing people. 

  • Keep a human in the loop. Do not use any solution as a replacement for human oversight and decision-making.  

  • Fully examine and review the potential of any AI model you are using to understand its capabilities and limitations. 

Next steps

Learn More about Responsible AI :

Contact us:

See the following Azure Video Indexer transparency notes:

post comes from:

Post was first posted at: