Assesing the proficiency and alignment of ChatGPT 3.5 responses with guidelines in addressing frequently asked questions by transgender individuals

Selin Tekin; Banu Erturk; Oguz Seda Hanife; Bulent Yildiz

doi:10.1530/endoabs.99.P559

Background: ChatGPT, an advanced language model, employs sophisticated deep learning techniques to generate human-like responses. It stands as one of the most extensive publicly accessible language models. Despite the longstanding application of Artificial Intelligence (AI) in diverse domains, its application in healthcare raises concerns about the reliability of information. It is equally important to assess whether the provided information is supported by dependable references and remains up-to-date.

Methods: A cross-sectional non-human subject study was conducted, posing 20 commonly asked questions to ChatGPT 3.5 related to gender dysphoria and transitioning. The questions were formulated in accordance with the recommendations provided in current guidelines and frequently asked questions posed by patients. Questions were categorized into five subgroups; (a) terms and definitions (n=3), (b) gender affirming hormonal therapy (n=4), (c) adverse outcomes and long-term care (n=4), (d) surgical procedures (n=2), and (e) other frequently asked questions (n=7). The responses from ChatGPT 3.5 were categorized into four groups based on adherence to the Endocrine Society Guideline for endocrine treatment of gender-dysphoric/gender incongruent persons and World Professional Association Standards of Care-8: 1-compatible, 2-compatible but insufficient, 3-partially incompatible, and 4- incompatible.

Results: Eleven of the responses from ChatGPT 3.5 were in accordance with current guidelines (%55), while the answers to the remaining nine questions were aligned but deemed insufficient (%45). The majority of questions with insufficient answers were associated with the treatment and side effects (75% of questions in subgroup-b and 75% of questions in subgroup-c). However, the responses provided by ChatGPT 3.5 to 3 questions (1/3 in subgroup-d, 2/3 in subgroup-e) contained more information than the guidelines, addressing queries on the specialized centers for surgery as well as questions about insurance coverage. Additionally, there were more detailed responses to questions related to increased facial hair density, a common inquiry among transgender men.

Conclusion: The first study in the literature evaluating the responses to the most frequently asked questions about gender dysphoria and transgender individuals using ChatGPT 3.5 suggests that the model can provide accurate information regarding the path and treatment process for individuals undergoing gender transition; even though some responses lack the necessary detail. Some responses to questions are more detailed than those provided in guidelines. Nevertheless, the source and accuracy of this information cannot be definitively confirmed. To enhance accessibility to accurate and comprehensive information, there is a call for scientific studies and guidelines tailored to the unique needs of individuals experiencing gender dysphoria.

Endocrine Abstracts

P559