Designing an artificial intelligence system for dental occlusion classification using intraoral photographs: A comparative analysis between artificial intelligence-based and clinical diagnoses

Introduction

This study aimed to design an artificial intelligence (AI) system for dental occlusion classification using intraoral photographs. Moreover, the performance of this system was compared with that of an expert clinician.

Methods

This study included 948 adult patients with permanent dentition who presented to the Department of Orthodontics, School of Dentistry, Mashhad University of Medical Sciences, during 2022-2023. The intraoral photographs taken from the patients in left, right, and frontal views (3 photographs for each patient) were collected and underwent augmentation, and about 7500 final photographs were obtained. Moreover, the patients were clinically examined by an expert orthodontist for malocclusion, overjet, and overbite and were classified into 6 groups: Class I, Class II, half-cusp Class II, Super Class I, Class III, and unclassifiable. In addition, a multistage neural network system was created and trained using the photographs of 700 patients. Then, it was used to classify the remaining 248 patients using their intraoral photographs. Finally, its performance was compared with that of the expert clinician. All statistical analyses were performed using the Stata software (version 17; Stata Corp, College Station, Tex).

Results

The accuracy, precision, recall, and F1 score of the AI system in the malocclusion classification of molars were calculated to be 93.1%, 88.6%, 91.2%, and 89.7%, respectively, whereas the AI system had an accuracy, precision, recall, and F1 score of 89.1%, 88.8%, 91.42%, and 89.8% for malocclusion classification of canines, respectively. Moreover, the mean absolute error of the AI system accuracy was 1.98 ± 2.11 for overjet and 1.28 ± 1.60 for overbite classifications.

Conclusions

AI exhibited remarkable performance in detecting all classes of malocclusion, which was higher than that of orthodontists, especially in predicting angle classification. However, its performance was not acceptable in overjet and overbite measurement compared with expert orthodontists.

Highlights

•

The AI system showed 90%-96% accuracy and precision in predicting various dental occlusion types.
•

WThe AI outperformed expert orthodontists in diagnosing canine occlusions consistently.
•

TThe AI excelled in identifying cases with infeasible occlusion classification, such as missing teeth.
•

The AI was accurate in diagnosing occlusion but had moderate accuracy with overjet and overbite.

The integration of artificial intelligence (AI) into the ever-evolving field of health care has ushered in a new era of precision and efficiency. Considering its vast capabilities, AI has found several applications in multiple medical disciplines, significantly enhancing diagnosis and treatment. ^, Moreover, AI has led to a remarkable developmental surge in the disciplines of dentistry, especially orthodontics.

As a powerful associate, AI offers orthodontic solutions extending beyond diagnostic support. For example, it can streamline the time-consuming process of identifying cephalometric landmarks, reducing the treatment planning time significantly. Using deep learning techniques, AI can decrease the need for human subjectivity, achieving consistent and accurate decision-making. In addition, AI may enhance social equity by simplifying complicated medical decisions, such as tooth extraction, thus increasing access to specialized health care. ^,

Defined as the alignment and positioning of teeth within the oral cavity, dental occlusion is a fundamental aspect of dental health and orthodontic treatment. Accurate diagnosis of occlusion Class is essential for proper treatment planning and optimal outcomes. Traditionally, such a diagnosis is made by skilled dentists using clinical examination and manual evaluation of intraoral photographs. However, this manual process is inherently time-consuming and susceptible to human errors. Moreover, various factors can affect the accuracy of this process, such as image quality, visual fatigue, and clinician expertise.

To advance dental diagnostics, our team aimed to design an AI system for malocclusion, overjet, and overbite classification using intraoral photographs. This study discussed the development, training, and validation of this system using deep learning and machine vision techniques. In addition, its performance was compared with that of the related clinicians.

Material and methods

This study was conducted at the Orthodontics Department of the School of Dentistry, Mashhad University of Medical Sciences, Mashhad, Iran. The study participants included adult patients with permanent dentition who had presented to the Orthodontics Department during 2022-23. Moreover, patients who were currently undergoing orthodontic treatment or had a history of such treatment, those with incomplete data, cleft lip and palate, or >1 missing tooth in each quadrant were excluded from the study. Finally, the data for 948 patients, including 2844 photographs showing occlusal status, were included in the study.

The extracted data consisted of original intraoral occlusal images, as well as the occlusal status of molars and canines and measurements of overjet and overbite obtained through clinical examination by an orthodontic specialist. Four weeks later, 25 patients who were randomly selected from the previously examined patients underwent a reevaluation by the same clinician to assess the intraobserver reliability.

This study used the following classification for malocclusion: Class I malocclusion was defined as the alignment of the mesiobuccal cusp of the maxillary first molar with the buccal groove of the mandibular first molar with a mesial or distal deviation of less than one-fourth of a cusp. Moreover, the mesial deviation of one-fourth to three-fourths of a cusp was considered a half-cusp Class II malocclusion, whereas a mesial deviation of more than three-fourths of a cusp was classified as Class II malocclusion. In addition, a distal deviation of one-fourth to one-half of a cusp was considered as Super Class I malocclusion, whereas the distal deviation of more than one-half of a cusp was classified as Class III.

In contrast, each patient had a set of 3 intraoral photographs showing the occlusal status in left, right, and frontal views. However, the photographs were taken using different cameras under various conditions. Thus, there was extensive variability in image quality, lighting condition, and camera distortion, which was crucial for reducing the risk of overfitting in our training data, thereby increasing the generalizability and applicability of our results. However, we ensured that the photographs taken from the left and right sides were captured using mirrors and included the dental arch from the distal of the first molars to at least the midline. Moreover, frontal images were supposed to include left and right canines. In addition, we used the original, unaltered raw photographs for easier augmentation.

Finally, all patients were randomly divided into 2 groups using a Python script. The images for the first group, including 700 patients, were used for system training, whereas those for the second group, including 248 patients, were used for evaluating the AI system. Moreover, the images of the test patients were evaluated by an orthodontic specialist for malocclusion, overbite, and overjet. However, this specialist was different from the one who had examined the patients at the beginning, had no role in the treatment of the patients, and made his diagnosis without clinical examination. In addition, the intraobserver reliability of this second clinician was assessed by reevaluating the images for 25 randomly chosen patients.

Our team used 3 distinct convolutional neural networks to develop an efficient AI system for classifying malocclusion. In the first phase, we designed a Fully Convolutional Neural Network (FCNN) for semantic segmentation, using the YOLOv8-Seg AI model as the foundation for segmentation and object detection. Developed by Meta (Menlo Park, Calif), the YOLOv8-Seg model is a specialized version of the YOLOv8 framework, which is designed for object detection and semantic segmentation. This model includes the CSPDarknet53 backbone for efficient extraction of image features. Moreover, having a novel C2f module for improving detection and segmentation across several image sizes, the YOLOv8-Seg model is different from the standard YOLO designs and has 2 specialized segmentation heads focused on the effective prediction of semantic segmentation masks in images. However, this model is trained using the same dataset as YOLOv8, and we used its overhead to perform the custom training using our intraoral images ( Fig 1 ).

Using a random set of 300 intraoral occlusal images taken from right, left, and frontal views, the teeth and surrounding tissues were annotated separately. We used image augmentation to change the brightness, contrast, and saturation up to 25%, obtaining about 1200 annotated images per view (3600 images in total), which were used for training the model. Moreover, 70% of images (2520 images) were used for model training, whereas 20% and 10% of images were used for intra-training validation and posttraining evaluations, respectively. Then, the system was used to mask all images of 700 patients in the training dataset ( Figs 1 and 2 ).

In the second phase, masked dental images were processed by a second FCNN AI system, which was based on the same YOLOv8-Seg model for semantic segmentation. Using 900 random images from left, right, and frontal views (300 images from each view), the teeth were annotated separately. Then, image augmentation altered the dimensions by up to 25% to accommodate various perspectives, angles, and lenses, resulting in 1800 training images that were used for training the model. The ratios of images used for training, validation, and evaluation were similar to the previous phase ( Fig 1 ).

In the final phase, the images with separately annotated dental components entered an FCNN system for object detection. Bounding boxes were defined for occlusal detection regions of molars and canines, as well as overjet (in left and right views) and overbite (in frontal views) measurements. Moreover, the occlusal classification included the following categories: Class I, Class II, half-cusp Class II, Super Class I, Class III, and unclassifiable (for missing molars or high buccal canines).

The training was performed using the Detectron2 AI system developed by Meta (Menlo Park, Calif, USA). Similar to the previous phases, 70% of 2100 images were used for training, while 20% and 10% were used for intratraining validation and posttraining evaluation ( Fig 3 ). As an advanced object detection model, Detectron2 is an implementation of Faster R-CNN and includes a backbone network, a region proposal network, and region of interest heads. The backbone of Detectron2 is based on the feature pyramid networks with ResNet/ResNeXt, which are essential for extracting features from input images. Moreover, the region proposal network head is crucial for generating regional proposals from the features extracted by the backbone. These proposals are then processed by the region of interest heads to detect and classify objects within the proposed regions ( Fig 1 ).

In this step, the segmented images prepared for malocclusion detection in the first and second phases were classified on the basis of their overjet and overbite status. This study used 23 classes for overjet and overbite status, ranging from −6 mm to 6 mm, with each Class divided at 0.5 mm intervals. For instance, an image could be identified as having a 3.5 mm overjet and a 2.0 mm overbite or a 1 mm overjet and a −4.5 mm overbite. It is important to note that the overjet and overbite measurements recorded in the clinical setting were subsequently used to annotate the segmented images according to their respective 23 classes. Moreover, the overjet and overbite measurements of the AI system were similar to the malocclusion classification based on object detection. The training was performed using an FCNN system on the basis of the Detectron AI system developed by Meta, and the ratios of images used for training, intra-training validation, and posttraining evaluation were the same as the ratios used for malocclusion classification.

The training was conducted using a Linux Ubuntu 18 (Linux Foundation, San Fransisco, Calif) server with an Nvidia A100 GPU (Nvidia, Santa Clara, Calif) with 48 GB (gigabytes) VRAM and 256 GB RAM. Moreover, we used the Keras 2.3.1 framework (Alphabet, Mountain View, Calif) with Python 3.7.4 (Python Software Foundation, Wilmington, Del) and the TensorFlow-GPU 2.5.0 (Nvidia, Santa Clara, Calif) backend. Image dimensions were set to 640 horizontal pixels with different vertical dimensions adjusted to the aspect ratio of the original images to ensure efficient use of the RAM. In addition, we used 5-fold cross-validation to address the overfitting concerns and optimized the network parameters using the Adam optimizer, which had a batch size of 256 and 50 training epochs. The initial learning rate was set at 0.003 with a 2.0 multiplication if the validation loss stagnated over 3 consecutive epochs. Furthermore, early stopping was employed if the validation loss did not improve over 8 consecutive epochs. Throughout the training process, we diligently monitored and recorded the training accuracy and training loss, as well as the validation accuracy and loss.

The performance of the AI system was evaluated using images from 248 patients whose data were not a part of the training dataset. The system was required to predict the occlusal status of each patient using the photographs taken from the left, right, and frontal views, and the obtained results were integrated into the patient’s data file. Then, the mean overjet was calculated using the left and right views, while overbite was calculated using the frontal views, and the data files were updated accordingly. Finally, the results obtained from the AI system were compared with those obtained using clinical examinations by an orthodontist to assess the performance of our AI system.

Statistical analysis

At first, we performed a pilot study that included data from 25 patients to obtain the mean error in overjet measurement between the clinician and AI system, which was used as the effect size. Then, the sample size was calculated to be 88 using the G∗Power software (version 3.1; Heinrich Heine, University, Dusseldorf, Germany), considering the effect size of 0.5, an α error of 0.05, and a statistical power of 0.95. However, we included the data for 700 patients for training the AI system and 248 patients for posttraining evaluation to increase the stability of our results.

The indexes of accuracy, precision, and mean Intersection over Union (mIoU) were used to evaluate the performance of the AI system in semantic segmentation. The accuracy was defined as the overlap of the ground truth and semantic segmentation divided by the area of ground truth, and it indicates how often the AI model was right in its predictions. Moreover, the precision was defined as the overlap of the ground truth and semantic segmentation divided by the semantic segmentation by the AI system, indicating the number of true positive results divided by all positive results identified by the AI model. In addition, mIoU was considered as the overlap of the ground truth and semantic segmentation divided by the sum of the areas of ground truth and semantic segmentation, showing the ability of the AI model to accurately outline and locate different classes ( Fig 4 )

For statistical analysis, the overjet and overbite measurements were normalized at first. Then, the discrepancy between the overjet and overbite results reported by the AI system and the clinician was calculated using the mean absolute error and standard deviation. Moreover, the occlusion results were described using confusion matrices, accuracy, precision, recall, F1 score, and Cohen’s Kappa test. Notably, the statistical analysis was performed using the Stata software (version 17; Stata Corp, College Station, Tex).

Results

This study included the orthodontic records (malocclusion, overjet, and overbite data) of 948 patients presenting to the Orthodontic Department of the School of Dentistry, Mashhad University of Medical Sciences, Mashhad, Iran. The photographs taken from 700 patients were used for training an AI system used for malocclusion and overjet and overbite classifications, whereas the data for 248 patients were used for posttraining evaluation of the AI model. Table I presents the frequency of different malocclusion classes in the training and test groups, showing no significant intergroup differences in malocclusion classes. Moreover, the intraobserver reliability of the findings obtained using clinical examination (reference classification) was calculated to be 94.3% and 96.7% for malocclusion and overjet and overbite classifications, respectively.

Table I

Prevalence of occlusion types in the included records

Occlusion Type	Test	Train
Class I
Right molar	105 (42.2)	272 (38.9)
Left molar	96 (38.6)	250 (35.7)
Right canine	80 (32.3)	239 (34.1)
Left canine	68 (27.6)	219 (31.3)
Class II
Right molar	63 (25.3)	173 (24.7)
Left molar	71 (28.6)	204 (29.2)
Right canine	76 (30.7)	268 (38.3)
Left canine	97 (39.2)	285 (40.7)
Class III
Right molar	29 (11.8)	99 (14.2)
Left molar	35 (14.3)	105 (15)
Right canine	31 (12.4)	80 (11.4)
Left canine	34 (13.8)	81 (11.6)
Half cusp Class II
Right molar	16 (6.6)	54 (7.7)
Left molar	14 (5.5)	32 (4.5)
Super Class I
Right molar	5 (2.1)	18 (2.6)
Left molar	6 (2.5)	20 (2.9)
Unclassifiable
Right molar	31 (12.4)	83 (11.9)
Left molar	26 (10.5)	89 (12.7)
Right canine	39 (15.8)	109 (15.6)
Left canine	45 (18.1)	116 (16.5)

Note. Values are shown as n (%).

Regarding the first phase of the malocclusion classification by the AI model, we reported an accuracy of 98.3%, a precision of 99.5%, and a mIoU of 97% ( Fig 5 ), whereas the second phase had an accuracy of 94.3%, a precision of 98.7%, and a mIoU of 94%. Moreover, the weighted mean accuracy of the AI performance was 91.8%. Table II presents detailed information regarding the accuracy, macro-average precision, and recall ability of the AI system.