ChatGPT gets code questions wrong 52% of the time

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 1 year ago

ChatGPT gets code questions wrong 52% of the time

SirGolan@lemmy.sdf.org · edit-2 1 year ago

Wait a second here… I skimmed the paper and GitHub and didn’t find an answer to a very important question: is this GPT3.5 or 4? There’s a huge difference in code quality between the two and either they made a giant accidental omission or they are being intentionally misleading. Please correct me if I missed where they specified that. I’m assuming they were using GPT3.5, so yeah those results would be as expected. On the HumanEval benchmark, GPT4 gets 67% and that goes up to 90% with reflexion prompting. GPT3.5 gets 48.1%, which is exactly what this paper is saying. (source).

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 1 year ago

Oh that’s possible, not sure which one they used either.