Assessing ChatGPT's performance in national nuclear medicine specialty examination: An evaluative analysis

Jakub Kufel; Michał Bielówka; Marcin Rojek; Adam Mitręga; Łukasz Czogalik; Dominika Kaczyńska; Dominika Kondoł; Kacper Palkij; Sylwia Mielcarska

Assessing ChatGPT's performance in national nuclear medicine specialty examination: An evaluative analysis

Publish place: Iranian Journal of Nuclear Medicine، Vol: 32، Issue: 1

Publish Year: 1403

نوع سند: مقاله ژورنالی

زبان: English

This Paper With 6 Page And PDF Format Ready To Download

دریافت فایل کامل Paper

Certificate
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

https://civilica.com/doc/1897347

شناسه ملی سند علمی:

JR_IRJNM-32-1_010

تاریخ نمایه سازی: 8 بهمن 1402

Abstract:

Introduction: The rapid development of artificial intelligence (AI) has sparked a desire to analyse its potential applications in medicine. The aim of this article is to present the effectiveness of the ChatGPT advanced language model in the context of the pass rate of the polish National Specialty Examination (PES) in nuclear medicine. It also aims to identify its strengths and limitations through an in-depth analysis of the issues raised in the exam questions.Methods: The PES exam provided by the Centre for Medical Examinations in Łódź, consisting of ۱۲۰ questions, was used for the study. The questions were asked using the openai.com platform, through which free access to the GPT-۳.۵ model is available. All questions were classified according to Bloom's taxonomy to determine their complexity and difficulty, and according to two authors' subcategories. To assess the model's confidence in the validity of the answers, each questions was asked five times in independent sessions.Results: ChatGPT achieved ۵۶%, which means it did not pass the exam. The pass rate is ۶۰%. Of the ۱۱۷ questions asked, ۶۶ were answered correctly. In the percentage of each type and subtype of questions answered correctly, there were no statistically significant differences.Conclusion: Further testing is needed using the questions provided by Centre for Medical Examinations from the nuclear medicine specialty exam to evaluate the utility of the ChatGPT model. This opens the door for further research on upcoming improved versions of the ChatGPT.

Keywords:

Artificial intelligence , Computer science , Language model , Nuclear medicine exam

Authors

Jakub Kufel

Department of Biophysics, Faculty of Medical Sciences, Medical University of Silesia, Zabrze, Poland

Michał Bielówka