2023

Developing a closed captioning quality assessment system

Closed Captioning (CC) is a telecommunications service to display textual information equivalent to audio. Although the primary consumer group is Deaf (D) and Hard of Hearing (HOH) viewers, they are typically excluded from the quality assessment process. Including D and HOH viewers for all assessments is nearly impossible and requires enormous effort. To address this problem, an automated system called the Caption Quality Assessment Intelligent System (CAIS) was developed using machine learning algorithms to replicate human subjective evaluation.

1. System Overview of CAIS

CAIS uses a multi-label classifier trained with an active learning algorithm. The Multilayer Perceptron (MLP) structure processes four main caption error types: synchronization delay, presentation speed, number of missing words, and caption paraphrasing. An Active Learning strategy using Query by Committee (QBC) was used to fine tune the system and reduce the required number of training data points from real human assessors.

CAIS Prediction and Inquiry Flow

Figure 1. The system diagram of CAIS outlining the prediction flow and the active learning inquiry flow.

2. User Study and Methodology

An online user study was conducted with 15 Deaf and 15 Hard of Hearing participants who watched 20 video clips encoded with various caption errors. Participants indicated whether they agreed or disagreed with CAIS predicted quality ratings. When a participant agreed with the predicted rating, CAIS would learn the rating because it was confirmed by participants. In the case of disagreement, the participant was invited to provide CAIS with a new rating by clicking on a new quality label.

User study web application interface

Figure 2. A screenshot of the user study web application displaying the video and the predicted ratings.

3. Results and Findings

The results revealed a positive rate of change in the percent agreement over time, showing that CAIS successfully learned from the viewers. Overall, participants had positive attitudes toward the machine predictions, particularly for the delay and speed factors.

Kernel Density Estimation graphs

Figure 3. A kernel density estimation graph between CAIS and human participants.

Percent agreement trendline

Figure 4. A linear trendline and scatter plots for the percentage of agreement between CAIS and human participants.