ECEESPE2025 ePoster Presentations Thyroid (198 abstracts)
1Faculty of Medicine, School of Health Sciences, University of Thessaly, Department of Endocrinology and Metabolic Diseases, University General Hospital of Larissa, Larissa, Greece; 2Department of Endocrinology and Metabolic Diseases, University General Hospital of Larissa, Larissa, Greece; 3Faculty of Medicine, School of Health Science, University of Patras, Patras, Greece; 4Endocrine Department, NIMTS Veterans Hospital, Athens, Greece; 5Endocrine Unit, Department of Clinical Therapeutics, National and Kapodistrian University, Athens, Greece.
JOINT1502
Introduction: The management of thyroid nodules and papillary thyroid carcinoma (PTC) varies widely, influenced by evolving guidelines and clinical judgment. With advancements in artificial intelligence (AI), chatbots like ChatGPT, Copilot and Gemini have emerged as potential decision-making supporting tools.
Aim: This study aims to compare clinical decision-making patterns among endocrinologists and AI chatbots responses.
Methods: A web-based survey was distributed to the members of the Hellenic Endocrine Society (HES), presenting 12 clinical scenarios addressing management strategies for solitary thyroid nodules and PTC across various risk profiles (EU-TIRADS 4 and 5 nodules and very-low, low, and low-to-intermediate risk PTCs). Participants selected one of four management strategies for each scenario. ChatGPT, Copilot and Gemini answered the 12 scenarios at two time points: April 2024 (T1 survey end) and January 2025 (T2). AI chatbots responses were compared with the recent American and European Thyroid Association guideline recommendations, as well as with endocrinologists responses.
Results: A total of 201 endocrinologists (25% of HES members) participated in the survey. Between T1 and T2, ChatGPT, Copilot and Gemini altered their responses for 8/12, 6/12 and 7/12 scenarios, respectively. At T1, ChatGPT and Copilot aligned with 33% and 58% of guideline recommendations, increasing to 66% for both at T2. Conversely, Geminis alignment remained at 17% at both time points. Agreement between the AI chatbots responses and the predominant choices among endocrinologists was as follows: at T1, ChatGPT matched the endocrinologists prevailing choice in 2 (17%), Copilot in 1 (8%) and Gemini in 3 out of 12 scenarios (25%). By T2, agreement rates improved to 5/12 (42%) for ChatGPT, 4/12 (33%) for Copilot, and 6/12 (50%) for Gemini. Finally, the mean percentage of survey respondents whose answers corresponded to answers provided by AI chatbots was calculated. At T1, ChatGPT had a 24% agreement rate, Copilot 24% and Gemini 37% (noted 1 and 2 questions were excluded for Copilot and Gemini as the chatbots could not provide any recommendation). At T2, these rates increased to 34%, 30% and 43% for ChatGPT, Copilot and Gemini, respectively.
Conclusion: This study highlights the evolving role of AI chatbots in clinical decision-making for thyroid nodules and PTC. Over time, the alignment of AI responses with clinical guidelines and endocrinologists choices improved, suggesting their potential utility in supporting clinical decisions. Further research is needed to optimize AI integration into clinical practice and ensure consistency with evolving medical guidelines.