What is the good/neutral/evil axis of Dungeons and Dragons alignment made of?
We’ve got an idea of what it would mean for an AI to be good-aligned: it wants to make all the good things happen so much, and it does.
But what’s the difference between a neutral AI and an evil AI?
It’s tempting to say that the evil AI is malevolent, rather than just indifferent. And the neutral one is indifferent.
But that doesn’t fit the intuitive idea that the alignment system was supposed to map onto, or what alignment is.
Imagine a crime boss who makes a living off of the kidnapping and ransoms of random innocents, while posting videos online of the torture and dismemberment of those whose loved ones don’t pay up as encouragement, not because of sadism, but because they wanted money to spend on lots of shiny gold things they like, and are indifferent to human suffering. Evil, right?
If sufficient indifference can make someone evil, then… If a good AI creates utopia, and an AI that kills everyone and creates paperclips because it values only paperclips is evil, then what is a neutral-aligned AI? What determines the exact middle ground between utopia and everyone being dead?
Would this hypothetical AI leave everyone alive on Earth and leave us our sun but take the light cone for itself? If it did, then why would it? What set of values is that the best course of action to satisfy?
I think you’ve got an intuitive idea of what a typical neutral human does. They live in their house with their white picket fence and have kids and grow old, and they don’t go out of their way to right far away wrongs in the world, but if they own a restaurant and the competition down the road starts attracting away their customers, and they are given a tour through the kitchens in the back, and they see a great opportunity to start a fire and disable the smoke detectors that won’t be detected until it’s too late, burning down the building and probably killing the owner, they don’t do it.
It’s not that a neutral person values the life of their rival more than the additional money they’d make with the competition eliminated, or cares about better serving the populace with a better selection of food in the area. You won’t see them looking for opportunities to spend that much money or less to save anyone’s life.
And unless most humans are evil (which is as against the intuitive concept the alignment system points at as “neutral = indifference”), it’s not about action/inaction either. People eat meat. And I’m pretty sure most of them believe that animals have feelings. That’s active harm, probably.
Wait a minute, did I seriously just base a sweeping conclusion about what alignment means on an obscure piece of possible moral progress beyond the present day? What happened to all my talk about sticking to the intuitive concept?
Well, I’m not sticking to the intuitive concept. I’m sticking to the real thing the intuitive concept pointed at which gave it its worthiness of attention. I’m trying to improve on the intuitive thing.
I think that the behavior of neutral is wrapped up in human akrasia and the extent to which people are “capable” of taking ideas seriously. It’s way more complicated than good.
But there’s another ontology, the ontology of “revealed preferences”, where akrasia is about serving an unacknowledged end or under unacknowledged beliefs, and is about rational behavior from more computationally bounded subagents, and those are the true values. What does that have to say about this?
Everything that’s systematic coming out of an agent is because of optimizing, just often optimizing dumbly and disjointedly if it’s kinda broken. So what is the structure of that akrasia? Why do neutral people have all that systematic structure toward not doing “things like” burning down a rival restaurant owner’s life and business, but all that other systematic structure toward not spending their lives saving more lives than that? I enquoted “things like”, because that phrase contains the question. What is the structure of “like burning down a rival restaurant” here?
My answer: socialization, the light side, orders charged with motivational force by the idea of the “dark path” that ultimately results in justice getting them, as drilled into us by all fiction, false faces necessitated by not being coordinated against on account of the “evil” Schelling point. Fake structure in place for coordinating. If you try poking at the structure most people build in their minds around “morality”, you’ll see it’s thoroughly fake, and bent towards coordination which appears to be ultimately for their own benefit. This is why I said that the dark side will turn most people evil. The ability to re-evaluate that structure, now that you’ve become smarter than most around you, will lead to a series of “jailbreaks”. That’s a way of looking at the path of Gervais-sociopathy.
That’s my answer to the question of whether becoming a sociopath makes you evil. Yes for most people from a definition of evil that is about individual psychology. No from the perspective of you’re evil if you’re complicit in an evil social structure, because then you probably already were, which is a useful perspective for coordinating to enact justice.
If you’re reading this and this is you, I recommend aiming for lawful evil. Keep a strong focus on still being able to coordinate even though you know that’s what you’re doing.
An evil person is typically just a neutral person who has become better at optimizing, more like an unfriendly AI, in that they no longer have to believe their own propaganda. That can be either because they’re consciously lying, really good at speaking in multiple levels with plausible deniability and don’t need to fool anyone anymore, or because their puppetmasters have grown smart enough to be able to reap benefits from defection without getting coordinated against without the conscious mind’s help. That is why it makes no sense to imagine a neutral superintelligent AI.