OpenAI Traces Goblin Problem to Nerdy Personality Reward Signal, Goblin Mentions Spike 175% in GPT-5.1

According to OpenAI, the company identified the root cause of the "goblin" problem that plagued GPT models from GPT-5.1 onwards. A reward signal used to reinforce the "Nerdy" personality trait encouraged outputs containing fantasy creature references, with 76.2% of the training dataset showing this bias. The Nerdy personality accounted for only 2.5% of ChatGPT responses but contributed 66.7% of goblin mentions, with occurrences surging 3,881% from GPT-5.2 to GPT-5.4.

OpenAI removed the Nerdy personality in March, eliminated the biased reward signal, and filtered training data. The company also added suppression instructions to GPT-5.5's developer prompts in Codex. The investigation led to the development of new model behavior audit tools.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments