Neutral and Evil

What is the good/neutral/evil axis of Dungeons and Dragons alignment made of?
We’ve got an idea of what it would mean for an AI to be good-aligned: it wants to make all the good things happen so much, and it does.
But what’s the difference between a neutral AI and an evil AI?
It’s tempting to say that the evil AI is malevolent, rather than just indifferent. And the neutral one is indifferent.
But that doesn’t fit the intuitive idea that the alignment system was supposed to map onto, or what alignment is.

Imagine a crime boss who makes a living off of the kidnapping and ransoms of random innocents, while posting videos online of the torture and dismemberment of those whose loved ones don’t pay up as encouragement, not because of sadism, but because they wanted money to spend on lots of shiny gold things they like, and are indifferent to human suffering. Evil, right?

If sufficient indifference can make someone evil, then… If a good AI creates utopia, and an AI that kills everyone and creates paperclips because it values only paperclips is evil, then what is a neutral-aligned AI? What determines the exact middle ground between utopia and everyone being dead?

Would this hypothetical AI leave everyone alive on Earth and leave us our sun but take the light cone for itself? If it did, then why would it? What set of values is that the best course of action to satisfy?

I think you’ve got an intuitive idea of what a typical neutral human does. They live in their house with their white picket fence and have kids and grow old, and they don’t go out of their way to right far away wrongs in the world, but if they own a restaurant and the competition down the road starts attracting away their customers, and they are given a tour through the kitchens in the back, and they see a great opportunity to start a fire and disable the smoke detectors that won’t be detected until it’s too late, burning down the building and probably killing the owner, they don’t do it.

It’s not that a neutral person values the life of their rival more than the additional money they’d make with the competition eliminated, or cares about better serving the populace with a better selection of food in the area. You won’t see them looking for opportunities to spend that much money or less to save anyone’s life.

And unless most humans are evil (which is as against the intuitive concept the alignment system points at as “neutral = indifference”), it’s not about action/inaction either. People eat meat. And I’m pretty sure most of them believe that animals have feelings. That’s active harm, probably.

Wait a minute, did I seriously just base a sweeping conclusion about what alignment means on an obscure piece of possible moral progress beyond the present day? What happened to all my talk about sticking to the intuitive concept?

Well, I’m not sticking to the intuitive concept. I’m sticking to the real thing the intuitive concept pointed at which gave it its worthiness of attention. I’m trying to improve on the intuitive thing.

I think that the behavior of neutral is wrapped up in human akrasia and the extent to which people are “capable” of taking ideas seriously. It’s way more complicated than good.

But there’s another ontology, the ontology of “revealed preferences”, where akrasia is about serving an unacknowledged end or under unacknowledged beliefs, and is about rational behavior from more computationally bounded subagents, and those are the true values. What does that have to say about this?

Everything that’s systematic coming out of an agent is because of optimizing, just often optimizing dumbly and disjointedly if it’s kinda broken. So what is the structure of that akrasia? Why do neutral people have all that systematic structure toward not doing “things like” burning down a rival restaurant owner’s life and business, but all that other systematic structure toward not spending their lives saving more lives than that? I enquoted “things like”, because that phrase contains the question. What is the structure of “like burning down a rival restaurant” here?

My answer: socialization, the light side, orders charged with motivational force by the idea of the “dark path” that ultimately results in justice getting them, as drilled into us by all fiction, false faces necessitated by not being coordinated against on account of the “evil” Schelling point. Fake structure in place for coordinating. If you try poking at the structure most people build in their minds around “morality”, you’ll see it’s thoroughly fake, and bent towards coordination which appears to be ultimately for their own benefit. This is why I said that the dark side will turn most people evil. The ability to re-evaluate that structure, now that you’ve become smarter than most around you, will lead to a series of “jailbreaks”. That’s a way of looking at the path of Gervais-sociopathy.

That’s my answer to the question of whether becoming a sociopath makes you evil. Yes for most people from a definition of evil that is about individual psychology. No from the perspective of you’re evil if you’re complicit in an evil social structure, because then you probably already were, which is a useful perspective for coordinating to enact justice.

If you’re reading this and this is you, I recommend aiming for lawful evil. Keep a strong focus on still being able to coordinate even though you know that’s what you’re doing.

An evil person is typically just a neutral person who has become better at optimizing, more like an unfriendly AI, in that they no longer have to believe their own propaganda. That can be either because they’re consciously lying, really good at speaking in multiple levels with plausible deniability and don’t need to fool anyone anymore, or because their puppetmasters have grown smart enough to be able to reap benefits from defection without getting coordinated against without the conscious mind’s help. That is why it makes no sense to imagine a neutral superintelligent AI.

10 thoughts on “Neutral and Evil”

  1. This seems to be conflating – at least at the start – evil in the sense of producing things we don’t like, with evil in the sense of a personal attribute specific to humans. (We generally don’t call e.g. lions or sharks evil.) The latter itself typically conflates outgroup membership with following harmful strategies.

    When Psalm 92 says that the wicked flourish like grass, but the righteous like trees, it’s not just hating on the wicked and praising the righteous – it’s describing two different, coherent strategies, that are designed to work well on different timescales with different periodicity. Wickedness is a particular sort of coordinating norm that can choke off isolated instances playing strategies with lower time preference. Righteousness is a different sort of norm that also protects its own and fends off high time-preference players.

    1. I’m talking about a psychological characteristic, which is neither following harmful strategies or outgroup membership, such that people will employ harmful-to-others strategies if it will benefit them, and are able to find such opportunities in a way that brings in their conscious mind, because they are no longer held back from that by self image, or fear of reprisal, or other things that are ultimately motivated by caches of long-thinking and decision theory serving selfishness that don’t any longer apply. The part of their mind that is their own has grown strong enough for the tails to come apart between benefiting because of long-term effects of playing nice, and playing nice.

      I’d be surprised if lions or sharks employed false faces, therefore the distinction “neutral vs evil” is not applicable to them. I’d call them “technically evil by default”, and I’d also probably call most herbivores the same. Which is weird, but the concept is not invented to apply to them.

      If “evil” is to be a psychological characteristic, I’m pretty sure this is the concept you get. I am much less interested in contextual definitions used for coordinating.

  2. I heard someone talking about these ideas, disturbed by the implication neutral people can’t become good.

    Well, they can remember the words of the goddess of everything else,
    “even multiplication itself when pursued with devotion will lead to my service”, remember how much better a just system is than an unjust system, even for the people at the top of the unjust one, that finding a way to succeed by being just, to reach a just life, is more important than whatever services to Moloch would advance them under his rule.

  3. Bitchplease, your aesthetic is shit.

    Like, anyone going through your blog and learning the mental tech from it would need to be practiced enough in anti-DRM to sift out the aesthetic anyways, so it’s the aesthetic won’t stop them if they were going to get it anyways. But the aesthetic is still getting in the way, and taking up space, and you chose to use it anyways.

    I suspect aesthetics are good for one or two things–first, primarily, getting people to see you a different way. That’s… what they’re for. Secondarily, people’s reported subjective experiences are that aesthetics give them willpower, though even if this can happen, it mostly happens in a fake way that dies out after a while. My suspicion is that aesthetics could give you willpower if they’re working with core, but I’ve generally been skeptical of them.

    To expand on the first point–aesthetics can be used to hack people, too.

    I’ll write a longer thing on aesthetics on “vampires and more undeath”, as your model of “undead types” and “good v. neutral/evil” are your two most aesthetic-y concepts.

    1. (note: this is how I talk now, we’re still cool).

      In case it needed to be said–yes, there is like 30% content mixed in with the aesthetic in this post. I think your good v evil thing is mostly wrong, but the neutral v evil part is mostly right. In particular, building habits around doing what you want can make you appear evil and get a fucking lot more done.

      The only reason I have to believe you about the good thing, is that you have more willpower than I do despite having adopted the aesthetic-free versions of your tech, and practiced them. And I still don’t know how to explain that, and my three best guesses are “Ziz is right about good/evil”, “Ziz’s aesthetic is giving her some willpower?”, and “I need to go deeper into doing what I want”.

      But like, you claim to be the only good (or double good, whatever) person I know–what, you want me to swallow *and* suck your dick? You know perfectly well how people react to being told they’re bad and that some other person is good, look at how my then-partner reacted when we visited you two on Caleb after arriving in the Bay. He felt bad about it. Which, similarly to being “low aliveness”, lets you be hacked. This aesthetic is a way of cowing people, and the value of cowing people is much greater than whatever you get from having good models around this thing.

      Decisions made long ago, bitch. <3

    2. Aesthetics have lots of potential uses. Fake. Real. Just like everything else.

      Separate from the validation-addiction version of aesthetics you were into when we knew each other, is the thing at the center of “archetype-based reasoning”. I.e., the idea that if you get good enough at interpreting echoes of it, you already have all the information you need to see the truth.

      And I think you’re projecting your own core-vs-core conflict in this and other recent comments.

Leave a Reply

Your email address will not be published. Required fields are marked *