TY - JOUR T1 - YOLOv11-MobileNetV2 Two-Stage AI Framework for Lesion Localization and Differentiation of Oral Cancer and Precancerous Lesions A1 - Thomas K. Nguyen A1 - Sarah J. Bennett A1 - William J. Carter JF - Journal of Current Research in Oral Surgery JO - J Curr Res Oral Surg SN - 3062-3480 Y1 - 2021 VL - 1 IS - 1 DO - 10.51847/kMIYO81bAR SP - 98 EP - 109 N2 - This work sought to construct and assess an artificial intelligence workflow that merges object-detection and image-classification models to support early recognition and distinction of oral lesions. A retrospective cross-sectional design was applied, using clinical photographs of oral potentially malignant disorders and oral squamous cell carcinoma. The primary dataset consisted of 773 images from the Faculdade de Odontologia de Piracicaba, Universidade Estadual de Campinas (FOP-UNICAMP), and an independent validation set included 132 images from the Federal University of Paraíba (UFPB). All images were captured before biopsy, each paired with histopathological confirmation. For lesion localization, ten YOLOv11 variants employing different augmentation schemes were trained for 200 epochs with pretrained COCO weights. For classification, three MobileNetV2 networks were trained on crops generated according to expert bounding boxes, each adopting distinct learning rate and augmentation configurations. After identifying the top-performing detection–classification pair, both components were linked in a two-stage pipeline in which the detector-generated crops were forwarded into the classifier. The optimal YOLOv11 model achieved an mAP50 of 0.820, precision of 0.897, recall of 0.744, and an F1-score of 0.813. The strongest MobileNetV2 model reached an accuracy of 0.846, precision of 0.871, recall of 0.846, F1-score of 0.844, and an AUC-ROC of 0.852. On the external set, the same classifier obtained an accuracy of 0.850, precision of 0.866, recall of 0.850, an F1-score of 0.851, and an AUC-ROC of 0.935. The integrated two-step framework, tested on the baseline dataset, achieved an accuracy of 0.784, precision of 0.793, recall of 0.784, F1-score of 0.784, and an AUC-ROC of 0.811. When applied to the independent dataset, it produced an accuracy of 0.863, a precision of 0.879, a recall of 0.863, F1-score of 0.866, and an AUC-ROC of 0.934. Visual review of the YOLO outputs showed consistent lesion localization across varied oral images, though 17.4% were not detected. The t-SNE map revealed partial clustering of OPMD and OSCC embeddings, suggesting the model captured relevant discriminative signals despite some overlap. This proof-of-concept investigation indicates that a coupled detection–classification AI framework can feasibly support early screening of oral lesions. Nonetheless, caution is necessary when interpreting two-stage results, since images not detected by YOLO do not advance to classification, potentially influencing the final metrics. UR - https://tsdp.net/article/yolov11-mobilenetv2-two-stage-ai-framework-for-lesion-localization-and-differentiation-of-oral-cance-2owoqposztwl4fu ER -