You Don’t Get What You Train ForWhy would an AI steer toward anything other than what it was trained to steer toward?→
Resources › Chapter 4Why would an AI steer toward anything other than what it was trained to steer toward?Because there are many ways to perform well in training. 6 min readAren’t developers regularly making their AIs nice and safe and obedient?AIs steer in alien directions that only mostly coincide with helpfulness. 13 min readDoesn’t the Claude chatbot show signs of being aligned?Today’s LLMs are like aliens wearing many masks. 18 min readIf current AIs are mostly weird in extreme cases, what’s the problem?The weirdness is evidence that their actual pursuits aren’t our intended pursuits. 1 min readWon’t AIs fix their own flaws as they get smarter?The AI will fix what it sees as flaws. 7 min readCan’t we just train it to act like a human? Or raise the AI like a child?Brains aren’t blank slates. 3 min readShould we avoid talking about AI dangers, so AIs don’t get any bad ideas?If your AI plan requires that no one on the internet critique the plan, it’s a bad plan. 3 min readA lot of people want kids. So aren’t humans “aligned” with natural selection after all?AIs caring about humans a little would not be good. 5 min readMaybe no matter what goal you train on, you get kindness out?Kindness looks contingent on the particulars of our biology and ancestry. 1 min readWhat about the experimental result suggesting good behaviors correlate?This seems like a positive update, albeit a minor one. 3 min readYour question not answered here?Submit a Question.Extended DiscussionTerminal Goals and Instrumental GoalsCuriosity Isn’t ConvergentHuman Values Are ContingentLoad more