Searchable abstracts of presentations at key conferences in endocrinology
Endocrine Abstracts (2025) 110 EP1489 | DOI: 10.1530/endoabs.110.EP1489

ECEESPE2025 ePoster Presentations Thyroid (198 abstracts)

AI chatbots vs endocrinologists in clinical decision-making in thyroid nodule and papillary thyroid carcinoma management

Grigoris Effraimidis 1 , Athanasios Kasotas 2 , Eleni Sazakli 3 , Olga Karapanou 4 , Katerina Saltiki 5 & Marina Michalaki 3


1Faculty of Medicine, School of Health Sciences, University of Thessaly, Department of Endocrinology and Metabolic Diseases, University General Hospital of Larissa, Larissa, Greece; 2Department of Endocrinology and Metabolic Diseases, University General Hospital of Larissa, Larissa, Greece; 3Faculty of Medicine, School of Health Science, University of Patras, Patras, Greece; 4Endocrine Department, NIMTS Veteran’s Hospital, Athens, Greece; 5Endocrine Unit, Department of Clinical Therapeutics, National and Kapodistrian University, Athens, Greece.


JOINT1502

Introduction: The management of thyroid nodules and papillary thyroid carcinoma (PTC) varies widely, influenced by evolving guidelines and clinical judgment. With advancements in artificial intelligence (AI), chatbots like ChatGPT, Copilot and Gemini have emerged as potential decision-making supporting tools.

Aim: This study aims to compare clinical decision-making patterns among endocrinologists and AI chatbots responses.

Methods: A web-based survey was distributed to the members of the Hellenic Endocrine Society (HES), presenting 12 clinical scenarios addressing management strategies for solitary thyroid nodules and PTC across various risk profiles (EU-TIRADS 4 and 5 nodules and very-low, low, and low-to-intermediate risk PTCs). Participants selected one of four management strategies for each scenario. ChatGPT, Copilot and Gemini answered the 12 scenarios at two time points: April 2024 (T1 – survey end) and January 2025 (T2). AI chatbots responses were compared with the recent American and European Thyroid Association guideline recommendations, as well as with endocrinologists’ responses.

Results: A total of 201 endocrinologists (25% of HES members) participated in the survey. Between T1 and T2, ChatGPT, Copilot and Gemini altered their responses for 8/12, 6/12 and 7/12 scenarios, respectively. At T1, ChatGPT and Copilot aligned with 33% and 58% of guideline recommendations, increasing to 66% for both at T2. Conversely, Gemini’s alignment remained at 17% at both time points. Agreement between the AI chatbots responses and the predominant choices among endocrinologists was as follows: at T1, ChatGPT matched the endocrinologists’ prevailing choice in 2 (17%), Copilot in 1 (8%) and Gemini in 3 out of 12 scenarios (25%). By T2, agreement rates improved to 5/12 (42%) for ChatGPT, 4/12 (33%) for Copilot, and 6/12 (50%) for Gemini. Finally, the mean percentage of survey respondents whose answers corresponded to answers provided by AI chatbots was calculated. At T1, ChatGPT had a 24% agreement rate, Copilot 24% and Gemini 37% (noted 1 and 2 questions were excluded for Copilot and Gemini as the chatbots could not provide any recommendation). At T2, these rates increased to 34%, 30% and 43% for ChatGPT, Copilot and Gemini, respectively.

Conclusion: This study highlights the evolving role of AI chatbots in clinical decision-making for thyroid nodules and PTC. Over time, the alignment of AI responses with clinical guidelines and endocrinologists’ choices improved, suggesting their potential utility in supporting clinical decisions. Further research is needed to optimize AI integration into clinical practice and ensure consistency with evolving medical guidelines.

Volume 110

Joint Congress of the European Society for Paediatric Endocrinology (ESPE) and the European Society of Endocrinology (ESE) 2025: Connecting Endocrinology Across the Life Course

European Society of Endocrinology 
European Society for Paediatric Endocrinology 

Browse other volumes

Article tools

My recent searches

No recent searches