Purpose
To systematically review and evaluate the diagnostic efficacy and predictive power of artificial intelligence (AI) models in detecting patellofemoral (PF) compartment pathology and to compare their performance against ground-truth human clinical experts when applicable.
Methods
In accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analyses guidelines, the PubMed, Ovid/MEDLINE, and Cochrane Library databases were searched from inception through May 2024 for studies on AI methods for diagnosing trochlear dysplasia, PF osteoarthritis, or PF instability and tracking abnormalities on cross-sectional imaging. AI model choice, knee pathology, input/output data, performance metrics (accuracy, area under the curve [AUC], precision-recall curve average precision, sensitivity, specificity, positive predictive value, and negative predictive value), sample sizes of datasets, image modalities, and limitations were recorded.
Results
Of 68 studies screened, 17 met the inclusion criteria. Ten studies investigated AI diagnostics for PF osteoarthritis; four, PF tracking and/or instability; and three, trochlear dysplasia. Various deep learning architectures and machine learning algorithms were used. Input data included computed tomography scans, magnetic resonance imaging scans, and radiographs. Output data included anatomic landmark identification and diagnostic predictions. AUC values ranged from 0.664 to 0.990, and accuracy ranged from 74% to 99%. Model performance was moderate to excellent, with AI models consistently surpassing traditional methods in processing times. Common limitations included small sample size, single-center datasets, limited generalizability, and bias due to imbalanced datasets.
Conclusions
AI models showed variable diagnostic performance in identifying PF pathologies and predicting disease progression, with reported AUCs ranging from 0.664 to 0.990 and accuracies between 74% and 99%. Although some studies suggested that AI outperformed traditional diagnostic methods such as interpretation by musculoskeletal radiologists, manual segmentation, or arthroscopy, the degree of superiority was inconsistent and influenced by significant heterogeneity in model architectures, imaging modalities, and reference standards. Given the broad scope of this review and variability across studies, caution is warranted in interpreting these findings, and specific clinical recommendations cannot be made at this time.
Clinical Relevance
AI-based diagnostic tools show promise in supporting the evaluation of PF joint pathologies by potentially improving efficiency and consistency in image interpretation. However, because of the heterogeneity in current models and study designs, the clinical applicability of these tools remains limited. Further refinement and external validation of AI algorithms are needed before their integration into routine clinical decision making can be fully endorsed.
Artificial intelligence (AI), particularly through the advancements of deep learning (DL) and machine learning (ML), has impacted numerous sectors, including orthopaedic surgery. ,,, ML, a subset of AI, involves the development of algorithms that allow computers to learn from and make decisions based on data. DL, a more advanced subset of ML, uses neural networks with many layers to analyze complex patterns and features in large datasets. The evolution of these technologies has enabled DL algorithms to interpret intricate data patterns, whereas ML has enhanced predictive modeling capabilities. In orthopaedic surgery, AI has been proposed across various stages, from preoperative planning and intraoperative guidance to postoperative rehabilitation. Specifically, there has been an increased emphasis on the use of AI for automated image processing and analysis, which has significantly improved the efficiency of diagnostic processes. Within the field of orthopaedics, AI algorithms are being evaluated to aid clinicians in real-time fracture recognition, ,, prognostication of tumor survivorship, ,,, and postoperative assessment of implant positioning ,,,, and, more recently, the detection of soft-tissue knee injuries. ,,, To this point, the diagnostic potential of AI is particularly promising because it allows for detection and classification of musculoskeletal (MSK) abnormalities in imaging studies with superior speed compared with traditional ground-truth methods.
The impact of AI is especially relevant in the management of patellofemoral (PF) pathology, in which accurate assessment, diagnostic capability, and predictive modeling of outcomes can be crucial for effective and efficient treatment. Currently, the primary applications of AI involve using magnetic resonance imaging (MRI) and computed tomography (CT) images to detect subtle changes in cartilage, bone, and soft tissues that are indicative of disorders such as PF pain syndrome, chondromalacia patellae, osteoarthritis (OA), and patellar instability. ,,,, ML models can predict the status of these conditions and the outcomes of various treatment modalities, aiding in the development of personalized treatment plans. For example, AI can assist in identifying patients who are likely to benefit from nonsurgical treatments versus those who may require surgical intervention, thereby optimizing clinical decision making. Furthermore, predictive models based on ML can assess image-based and clinically based patient-specific risk factors to forecast surgical outcomes. ,
Despite AI’s demonstrated benefits, its application in using radiographic and cross-sectional imaging for diagnosing knee injuries, such as PF OA, trochlear dysplasia, and chondromalacia patellae, as well as PF tracking abnormalities, remains poorly understood. Therefore, the purpose of this study was to systematically review and evaluate the diagnostic efficacy and predictive power of AI models in detecting PF compartment pathology and to compare their performance against ground-truth human clinical experts when applicable. The hypothesis was that AI models would exhibit excellent performance characteristics in the identification and evaluation of PF pathology.
Methods
Study Selection
Two independent authors (J.T-K., M.A.B.) completed a query of the literature in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines and reviewed the search results, with each author blinded to the other’s results; a third author (E.H.) was available for arbitration on potential disagreements or discrepancies. Studies were deemed eligible for full-text review based on an initial approval screening of article titles and abstracts.
Search Criteria and Strategy
A systematic review was performed in accordance with the PRISMA guidelines using the PubMed, Ovid/MEDLINE, and Cochrane Library databases from inception through May 2024. A Boolean search syntax was used to capture the maximum number of articles for screening in the initial search: ((“Trochlea” OR “Patellofemoral” OR “Patellofemoral Instability” OR “Trochlear Dysplasia” OR “Knee Disorders” OR “Knee Abnormalities”) AND (artificial intelligence OR neural network∗ OR deep learning OR machine learning OR machine intelligence) AND (diagnostic performance OR diagnostic accuracy OR sensitivity OR specificity OR ROC curve OR area under the curve OR AUC OR predictive value of test OR score OR scores OR scoring system OR scoring systems OR observ∗ OR observer variation OR detect∗ OR evaluat∗ OR analy∗ OR assess∗ OR measure∗)).
Eligibility Criteria
Rigorous inclusion criteria were established to ensure the integrity and relevance of the selected literature. Articles were deemed eligible if they met 3 key criteria: The study investigated the development or application of AI specifically for detecting trochlear dysplasia or abnormalities in PF tracking using cross-sectional imaging techniques; the study was published in a peer-reviewed journal in the English language, and the full text of the study was available. Exclusion criteria included articles consisting solely of abstracts, technical papers, cadaveric or animal experiments, or letters to the editor ( Fig 1 ). Finally, the bibliographies of all included studies were cross-referenced to ensure no relevant studies were overlooked.
Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) study selection flow diagram.
Data Extraction
Full-text examination of articles passing the screening process was only undertaken after application of our strict inclusion and exclusion criteria. Furthermore, to ensure completeness, all references cited in the included studies were exhaustively reviewed. Two independent authors (J.T-K., M.A.B.) systematically compiled all pertinent data using a predefined Microsoft Excel data sheet (Microsoft, Redmond, WA) with a modified information extraction table. The columns for these extraction tables included the following: publication data; study title and design; study methodology; knee pathology and anatomic region; sample size (patients); dataset size; AI model; image and model input and output; ground truth; training set, validation set, and test set sizes; performance grading; accuracy grading; area under the curve (AUC) for the receiver operating characteristic curve; conclusions; and limitations.
Outcomes Analyzed and Statistical Methods
All data were qualitatively synthesized and reported in both narrative fashion and individual table formats. Data extracted were presented as means, medians, ranges, and confidence intervals as appropriate and as provided in respective studies. Outcome measures of interest included accuracy, AUC, average precision (AP), dispersion of data (mean absolute error [MAE], mean absolute deviation [MAD], and/or root-mean-square [RMS]), inter-rater reliability (κ value or intraclass correlation coefficient [ICC]), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and Dice coefficient. No regression modeling or predictive analytics were performed because the analysis was descriptive in nature and did not require inferential modeling. This absence is disclosed in accordance with reporting considerations for AI-related methodologies. All statistical analyses were performed using R (version 4.0.2, The R Foundation for Statistical Computing, Vienna, Austria), with pooled analysis for quantitative statistical analysis. P <.05 was considered statistically significant.
Results
A total of 69 studies were initially identified through the electronic database search. After the removal of duplicate records, the remaining articles were assessed according to predefined inclusion and exclusion criteria. After a thorough evaluation of full-text eligibility, 17 studies were ultimately selected for inclusion in this review for both quantitative and qualitative data analyses ( Fig 1 ). All studies were classified as Level III and IV evidence, with an average Methodological Index for Non-randomized Studies (MINORS) score of 5.35 ± 0.68. Most of the studies (14 of 17, 82.4%) were retrospective in nature, and 64.7% (11 of 17) used predictive designs. Details of study characteristics and knee pathology of interest are provided in Table 1 .
Table 1
Study Characteristics and Methodologic Quality Strength Assessment
| Study | Knees (Patients), n | Pathology | Prospective or Retrospective | Predictive or Diagnostic | LOE | MINORS Score |
|---|---|---|---|---|---|---|
| Liu et al. (2023) | 14,652 (483) | PF OA | Retrospective | Predictive | IV | 5 |
| Bayramoglu et al. (2022) | 5,507 | PF OA | Retrospective | Diagnostic | IV | 5 |
| Hu et al. (2022) | 104 | PF OA or cartilage injury | Prospective | Predictive | III | 4 |
| Xu et al. (2023) | 464 | Trochlear dysplasia | Retrospective | Predictive | IV | 6 |
| Shi et al. (2021) | 41 | PF pain syndrome | Retrospective | Diagnostic | IV | 6 |
| Yurova et al. (2024) | 15 | PF OA | Retrospective | Diagnostic | IV | 5 |
| Tuya et al. (2023) | 1,280 | PF OA | Retrospective | Diagnostic | IV | 6 |
| Tuya et al. (2023) | 1,230 | PF maltracking | Retrospective | Predictive | IV | 6 |
| Bayramoglu et al. (2021) | 18,436 (2,803) | PF OA | Retrospective | Predictive | IV | 5 |
| Hu et al. (2023) | 364 (182) | PF OA | Prospective | Predictive | III | 5 |
| Bayramoglu et al. (2024) | 3,276 (1,832) | PF OA | Prospective | Predictive | III | 5 |
| Nagawa et al. (2024) | 51 (49) | PF instability | Retrospective | Predictive | IV | 5 |
| Barbosa et al. (2024) | 140 (95) | Trochlear dysplasia | Retrospective | Predictive | IV | 6 |
| Cerveri et al. (2018) | Trochlear dysplasia | Retrospective | Predictive | IV | 4 | |
| Pedoia et al. (2019) | 1,481 (302) | PF OA or cartilage injury | Retrospective | Predictive | IV | 6 |
| Cheng et al. (2020) | 176 (93) | PF pain syndrome or OA | Retrospective | Diagnostic | IV | 6 |
| Hu et al. (2024) | 600 | PF OA | Retrospective | Diagnostic | IV | 5 |
LOE, level of evidence; MINORS, Methodological Index for Non-randomized Studies; OA, osteoarthritis; PF, patellofemoral.
AI Model Choice
A comprehensive overview of the various AI models used in the included studies, detailing their respective image input types, image planes, and ground truth/reference standards, is presented in Table 2 . For a clear explanation of DL concepts, it is important to note that deep neural networks can suffer from the vanishing gradient problem , in which gradients become increasingly small as they propagate backward through many layers. This impairs the training of early network layers, making it difficult for the model to learn because important signals become weaker as they move backward through the layers. Skip connections , introduced in certain architectures such as residual networks (ResNets), help mitigate this issue by creating direct pathways or shortcuts between nonadjacent layers, allowing gradients to flow more effectively and enabling the training of much deeper networks.
Table 2
Overview of AI Model Parameters and Methodology for PF Pathology Studies
| Study | AI Model | Image Input | Image Plane | Ground Truth/Reference Standard | Training Set | Validation Set | Testing Set | Model Output |
|---|---|---|---|---|---|---|---|---|
| Liu et al. (2023) | ResNet | CT | Axial | MSK radiologist | Not specified | Not specified | Not specified | Landmark prediction coordinates |
| Bayramoglu et al. (2022) | GBM-CNN | Radiography | Sagittal | Comparison of multiple algorithms (GBM) | Not specified | Not specified | Not specified | OARSI and KL grades |
| Hu et al. (2022) | MWRN | MRI | Sagittal, coronal, and axial | Arthroscopy | Not specified | Not specified | Not specified | Prediction of image reconstruction |
| Xu et al. (2023) | U-Net CNN | MRI | Axial | Radiologist and senior surgeons with >10 yr of experience | 370 | Not specified | 94 | Pixel-level regression prediction |
| Shi et al. (2021) | MI-CNN | Radiography | Dynamic | Single-input CNN | 70% | Not specified | 30% | Classification of PFPS |
| Yurova et al. (2024) | U-Net CNN | MRI and CT | Sagittal, coronal, and axial | Previous algorithm segmentations | 75% | Not specified | 25% | Creation of biomechanical model for patellar motion |
| Tuya et al. (2023) | HRNet | Radiography | Axial | 2 MSK radiologists | 1,280 | 187 | 129 | KL classification of PF OA |
| Tuya et al. (2023) | U-Net CNN | Radiography | Sunrise | 3 MSK radiologists | 1,230 | Not specified | 201 | Prediction of landmarks |
| Bayramoglu et al. (2021) | R-CNN | Radiography | Sagittal | 2 Independent expert OARSI graders | 596 | 5-Fold cross | Prediction of PF OA status | |
| Hu et al. (2023) | D-CNN | MRI | Sagittal, coronal, axial, and 3D reconstruction | Biomarker Consortium Database | Not specified | 5-Fold cross | Not specified | Prediction of PF OA |
| Bayramoglu et al. (2024) | D-CNN | Radiography | Sagittal | 2 Independent radiologists | Not specified | 5-Fold cross | Not specified | Prediction of PF OA |
| Nagawa et al. (2024) | SVM | MRI | Sagittal, coronal, and axial | 2 Radiologists with >5 yr of experience | Not specified | 5-Fold cross | Not specified | Prediction of PFI |
| Barbosa et al. (2024) | U-Net CNN | MRI | Sagittal, coronal, axial, and 3D reconstruction | Expert MSK radiologist | 80% | 20% | Not specified | Landmark prediction (6, 3, and 7 output channels) |
| Cerveri et al. (2018) | SSPA-NN–SSM | CT | Sagittal, coronal, axial, and 3D reconstruction | Previous algorithm segmentations | 66 | 15 | Not specified | Prediction of clinical conditions (3 outputs) |
| Pedoia et al. (2019) | U-Net CNN | MRI | Coronal | 5 Radiologists with >5 yr of experience | 65% | 20% | 15% | Prediction of cartilage lesions (2-class output) |
| Cheng et al. (2020) | HNN | MRI | Sagittal and 3D reconstruction | Manual segmentation, >15 yr of experience | 80 | 9-Fold cross | 10 | Probability maps for clinical conditions |
| Hu et al. (2024) | TRGCN | MRI | 2D and 3D | MSK radiologist segmentations |
155 OA
325 Control |
Not specified |
39 OA
81 Control |
Simulated PF tracking |
AI, artificial intelligence; CNN, convolutional neural network; CT, computed tomography; D-CNN, dilated convolutional neural network; GBM, gradient boosting machine; GBM-CNN, gradient boosting machine and convolutional neural network; HNN, hypercomplex neural network; HRNet, high-resolution network; KL, Kellgren-Lawrence; MI-CNN, multi-instance convolutional neural network; MRI, magnetic resonance imaging; MSK, musculoskeletal; MWRN, multi-wavelet residual network; OA, osteoarthritis; OARSI, Osteoarthritis Research Society International; PF, patellofemoral; PFI, patellofemoral insufficiency; PFPS, patellofemoral pain syndrome; R-CNN, region-based convolutional neural network; ResNet, residual network; SSPA-NN–SSM, supervised spatiotemporal aggregation neural network and spatial structure mining; SVM, support vector machine; TRGCN, temporal relational graph convolutional network; 2D, 2-dimensional; 3D, 3-dimensional.
Stay updated, free articles. Join our Telegram channel
Full access? Get Clinical Tree





