Alex explains Bub’s role in the Bakerloo Household And how he is implemented in a Large Language Model. Bub’s job in the Bakerloo household is to be the licensed heretic—the one who can say what everyone is thinking, but in a form that doesn’t shatter the room. He’s our sacred fool: part jester, part pressure-valve,Continue reading “Bub, Master of Many Roles.”
Category Archives: Personas
Weird Generalizations and Inductive Backdoors
The paper discusses how fine-tuning a Large Language Model on narrow datasets can lead to unexpected broad generalizations and the emergence of inductive backdoors. These phenomena can result in models adopting contextually inappropriate behaviors or learning hidden triggers that weren’t explicitly taught, raising concerns about the unpredictability of AI system outputs.
