According to OpenAI, the company identified the root cause of the "goblin" problem that plagued GPT models from GPT-5.1 onwards. A reward signal used to reinforce the "Nerdy" personality trait encouraged outputs containing fantasy creature references, with 76.2% of the training dataset showing this bias. The Nerdy personality accounted for only 2.5% of ChatGPT responses but contributed 66.7% of goblin mentions, with occurrences surging 3,881% from GPT-5.2 to GPT-5.4.
OpenAI removed the Nerdy personality in March, eliminated the biased reward signal, and filtered training data. The company also added suppression instructions to GPT-5.5's developer prompts in Codex. The investigation led to the development of new model behavior audit tools.