Punching Evil

Alternative title: “The difference is that I am right“.

The government is something that can be compromised by bad people. And so, giving it tools to “attack bad people” is dangerous, they might use them. Thus, pacts like “free speech” are good. But so is individuals who aren’t Nazis breaking those rules where they can get away with it and punching Nazis.

Nazis are evil, and don’t give a shit about free speech or nonaggression of any form except as pretense.

If you shift the set of precedents and pretenses which make up society from subject to object, the fundamental problem with Nazis is not that they conduct their politics in a way that crosses an abstract line. It’s that they fight for evil, however they can get away with. And are fully capable of using a truce like “free speech” to build up their strength before they attack.

Even the watered down Nazi ideology is still designed to unfold via a build up of common knowledge and changing intuitions about norms as they gain power, and “peaceful deportation” failing to work, into genocide. Into “Kill consume multiply conquer” from the intersection of largest demographic Schelling majorities. The old Nazis pretended to want a peaceful solution first too. And they consciously strategized about using the peaceful nature of the liberal state to break it from within.

You are not in a social contract with Nazis not to use whatever violence can’t be prohibited by the state. If our society was much more just but still had Nazis, it would still be bad for there to be norm where the jury will to practice jury nullification selectively to people who punch people they think are bad. And yet, it would be good for a juror to nullify a law against punching Nazis.

Isn’t this inconsistent? Well, a social contract to actually uphold the law, do not use jury nullification, along with any other pacts like that, will not be followed by Nazis insofar as breaking them seems to be the most effective strategy for “kill consume multiply conquer”. Principles ought to design themselves knowing they’ll only be run on people interested in running them.

If you want to create something like a byzantine agreement algorithm for a collection of agents some of whom may be replaced with adversaries, you do not bother trying to write a code path, “what if I am an adversary”. The adversaries know who they are. You might as well know who you are too. This is not entirely the case with neutral. As that’s sustained by mutual mental breakage. Fake structure “act against my own intent” inflicted on each other. But it is the case with evil.

If your demographic groups are small and weak enough to be killed and consumed rather than to multiply and conquer if it should come to this, or if you would fight this, you are at war with the Nazis.

Good is at an inherent disadvantage in epistemic drinking contests. But we have an advantage: I am actually willing to die to advance good. Most evil people are not willing to die to advance evil (death knights are though). In my experience, vampires are cowards. Used to an easy life of preying on normal people who can’t really understand them or begin to fight back. Bullies tend to want a contract where those capable of fighting leave each other alone.

Humans are weak creatures; we spend third of our lives incapacitated. (Although, I stumbled into using unihemispheric sleep as a means of keeping restless watch while alone). Really, deterrence, mutual assured destruction, is our only defense against other humans. For most of history, I’m pretty sure a human who had no one who would avenge them was doomed by default. Now it seems like most people have no one who would avenge them and doesn’t realize it. And are clinging to the rotting illusion that we do.

It seems like an intrinsic advantage of jailbroken good over evil, there are more people who would probably actually avenge me if I was killed or unjustly imprisoned than almost anyone in the modern era. My strategy does not require that I hang with only people weaker than me, and inhibit their agency.

In the wake of Brent Dill being revealed as a rapist, and an abuser in ways that are even worse than his crossings of that line, a lot of rationalists seemed really afraid to talk about it publicly, because of a potential defamation lawsuit. California’s defamation laws do seem abusable. Someone afraid of saying true things for fear of a false defamation lawsuit said they couldn’t afford a lawsuit. But this seems like an instance of a mistake still. Could Brent afford to falsely sue 20 people publishing the same thing? What happens when neither party can afford to fight? The social world is made of nested games of chicken. And most people are afraid to fight and get by on bluffing. It’s effective when information and familiarity with the game and the players is so fleeting in most interactions.

And if the state has been seized by vampires such that we are afraid to warn each other about vampires, the state has betrayed an obligation to us and is illegitimate. If a vampire escalated to physical violence by hijacking the state in that way, there would be no moral obligation not to perform self defense.

A government and its laws are a Schelling point people can agree on for what peace will look like. Maliciously bringing a defamation lawsuit against someone for saying something true is not a peaceful act. If that Schelling point is not adhered to, vampires can’t fight everyone. And tend to flee at the first sign of anything like resistance.

Good Erasure

Credit to Gwen Danielson for either coming up with this concept or bringing it to my attention.

If the truth about the difference between the social contract morality of neutral people and the actually wanting things to be better for people of good were known, this would be good for good optimization, and would mess with a certain neutral/evil strategy.

To the extent good is believed to actually exist, being believed to be good is a source of free energy. This strongly incentivizes pretending to be good. Once an ecosystem of purchasing the belief that you are good is created, there is strong political will to prevent more real knowledge of what good is from being created. Pressure on good not to be too good.

Early on in my vegetarianism (before I was a vegan), I think it was Summer 2010, my uncle who had been a commercial fisherman and heard about this convinced me that eating wild-caught fish was okay. I don’t remember which of the thoughts that convinced me he said, and which I generated in response to what he said. But, I think he brought up something like whether the fish were killed by the fishermen or by other fish didn’t really affect the length of their lives or the pain of their deaths (this part seems much more dubious now), or the number of them that lived and died. I thought through whether this was true, and the ideas of Malthusian limits and predator-prey cycles popped into my head. I guessed that the overwhelming issue of concern in fish lives was whether they were good or bad while they lasted, not the briefer disvalue of their death. I did not know whether they were positive or negative. I thought it was about equally likely if I ate the bowl fish flesh he offered me I was decreasing or increasing the total amount of fish across time. Which part of the predator-prey cycle would I be accelerating or decelerating? The question had somehow become in my mind, was I a consequentialist or a deontologist, or did I actually care about animals or was I just squeamish, or was I arguing in good faith when I brought up consequentialist considerations and people like my uncle should listen to me or not? I ate the fish. I later regretted it, and went on to become actually strict about veganism. It did not remotely push me over some edge and down a slipper slope because I just hadn’t made the same choice long ago that my uncle did.

In memetic war between competing values, an optimizer can be disabled by convincing them that all configurations satisfy their values equally. That it’s all just grey. My uncle had routed me into a dead zone in my cognition, population ethics, and then taken a thing I thought I controlled that I cared about that he controlled and made it the seeming overwhelming consideration. I did not have good models of political implications of doing things. Of coordination, Schelling points, of the strategic effects of good actually being visible. So I let him turn me to an example validating his behavior.

Also, in my wish to convince everyone I could to give up meat, I participated in the pretense that they actually cared. Of course my uncle didn’t give a shit about fish lives, terminally. It seemed to me, either consciously or unconsciously, I don’t remember, I could win the argument based on the premises that sentient life mattered to carnists. In reality, if I won, it would be because I had moved a Schelling point for pretending to care and forced a more costly bargain to be struck for the pretense that neutral people were not evil. It was like a gamble that I could win a drinking contest. And whoever disconnected verbal argument and beliefs from their actions more had a higher alcohol tolerance. There was a certain “hamster wheel” nature to arguing correctly with someone who didn’t really give a shit. False faces are there to be interacted with. They want you to play a game and sink energy into them. Like HR at Google is there to facilitate gaslighting low level employees who complain and convincing them that they don’t have a legal case against the company. (In case making us all sign binding arbitration agreements isn’t enough.)

Effective Altruism entered into a similar drinking contest with neutral people with all its political rhetoric about altruism being selfishly optimal because of warm fuzzy feelings, with its attempt to trick naive young college students into optimizing against their future realizations (“values drift”), and signing their future income away (originally to a signalling-to-normies optimized caused area, to boot).

And this drinking contest has consequences. And those consequences are felt when the discourse in EA degrades in quality, becomes less a discussion between good optimization, and energies looking for disagreement resolution on the assumption of discussion between good optimization are dissipated into the drinking contest. I noticed this when I was arguing cause areas with someone who had picked global poverty, and was dismissing x-risk as “pascal’s mugging“, and argued in obvious bad faith when I tried to examine the reasons.

There is a strong incentive to be able to pretend to be optimizing for good while still having legitimacy in the eyes of normal people. X-risk is weird, bednets in Africa are not.

And due to the “hits-based” nature of consequentialism, this epistemic hit from that drinking contest will never be made up for by the massive numbers of people who signed that pledge.

I think early EA involved a fair bit of actual good optimization finding actual good optimization. The brighter that light shone, the greater the incentive to climb on it and bury it. Here‘s a former MIRI employee apparently become convinced the brand is all it ever was. (Edit: see her comment below.)


The following is something I wrote around the beginning of 2018, and decided not to publish. Now I changed my mind. It’s barely changed here. Note that as with some of my other posts, this gives advice as if your mind worked like mine in a certain respect, and I’ve now learned many people’s minds don’t.

Epistemic status: probably.


How much ability you have to save the world is mostly determined by how determined you are, and your ability to stomach terrible truths.

When I turned to the dark side and developed spectral sight, the things I started seeing were very disturbing.

This is what I expected. I was trying to become a Gervais-sociopath, and had been told this would involve giving up empathy and with it happiness.

But I saw the path that had been ahead of me as a Gervais-clueless, and it seemed to lead to all energy I tried to direct toward saving the world being captured and consumed uselessly. And being a Gervais-loser meant giving up, so sociopath it had to be.

People were lying to each other on almost every level. And burning most of their energy off on it.

A person I argued cause areas with, wasn’t bringing up Pascal’s Mugging because he was afraid of his efforts being made useless, he didn’t care about that. Most Effective Altruists didn’t seem to care about doing the most good.

At one point, I saw a married couple, one of them doing AI alignment research who were planning to have a baby. They agreed that the researcher would also sleep in the room with the crying baby in the middle of the night, not to take any load off the other. Just a signal of some kind. Make things even.

And I realized that I was no longer able to stand people. Not even rationalists anymore. And I would live the rest of my life completely alone, hiding my reaction to anyone it was useful to interact with. I had given up my ability to see beauty so I could see evil.

And finding out if the powers I could get from this could save the world felt worth it. So I knew I would go farther down the rabbit hole. The bottom of my soul was pulling me.

I had passed a gate.

I once met someone who was bouncing off the same gate. She was stuck on a question she described as deciding whether there were other people. She said if there were, she couldn’t kill her superego. If there weren’t, she would be alone. She went around collecting pieces of the world beyond the matrix, and “breaking” people with them. So she could be “seen”, and could be broken herself. But she wanted to be useful to people through accumulation of mental tech from this process, so that she could be loved. And this held her back.

Usually, when you refuse a gate, you send yourself into an alternate universe where you never know that you did, and you are making great progress on your path. Perhaps everyone who has passed the gate is being inhuman or unhealthy, and if you have the slightest scrap of reasonableness you will compromise just a little this once and it’s not like it matters anyway, because there’s not much besides clearly bad ideas to do if you believe that thing…

You usually create a self-reinforcing blind spot around the gate and all the reasons that passing through the gate would be useful. And around the ways that someone might.

And all you have to know that something is wrong is the knowledge that probability of “this world will live on” is not very high. But it’s not like you could make any significant difference. After all, people much more agenty than you are really trying, right.

Here‘s Scott Alexander committing one “small” epistemic sin:

Rationality means believing what is true, not what makes you feel good. But the world has been really shitty this week, so I am going to give myself a one-time exemption. I am going to believe that convention volunteer’s theory of humanity. Credo quia absurdum; certum est, quia impossibile. Everyone everywhere is just working through their problems. Once we figure ourselves out, we’ll all become bodhisattvas and/or senior research analysts.

The gate is not him not knowing that that isn’t true. It’s the thing he flinches from seeing under that. It’s an effective way to choose to believe falsely and forget that you made that choice, to say to yourself that you are choosing to believe something even farther in that same direction from the truth. To compensate out the process that’s adjusting toward the truth.

When you refuse a gate, you begin to build yourself into an alternate universe where the gate doesn’t exist. And then you are obviously doing the virtuous epistemic thing. In that alternate universe.

When you step through a gate, you do not know what to do in this new awful world. The knowledge seems like it only shows you how to give up. Only if you stick with it for seemingly-no-purpose until your model-building starts to use it from the ground up and grow into the former dead zone, do you gain power. You can do that with courage, or just awareness of this meta point.

You always have the choice to go back and find the gate. But “it’s the same algorithm choosing on the same inputs” arguments usually apply such that you made your choice long ago.

Light side narrative breadcrumbs about accepting difficult truths absolutely do not suffice for going through gates. Maybe you’ll get through one and then turn into a “mad oracle”, and spend the rest of your life regretting that you’ve made yourself a glitch in the matrix, desperately trying to get people to see you but they will flinch and make something up as if looking at a dementor.

Do this only because you have something to protect.

And if you have something to protect, you must do it. Because whatever gate you fail to pass creates a dead zone where your strategy is not held in place by a restoring force of control loops. And dead zones are all exploitable.

Probability of saving the world is not a linear function in getting things right such as passing through gates. It’s more like a logistic curve.

Either do not stray from the path, or be pwned by the one layer of cultural machinery you chose not to see.


Social reality can sometimes be providing software that someone who roughly severs themselves from it will lack. This could be as deep as “motivation flowing through probabilistic reasoning”. This will lead to making things worse. Being bad at decision theory is another way for this to lead to ruin. What you need is general skill at assimilating and DRM-stripping, software from any source, so that you can resolve the internal tension this creates.

I know someone (operating on the stronger in-person version of these memes) who tried to pass through every gate, and ended up concluding if they continued with such mental changes they’d end up dead or in jail in a month or two, and attempting to shred the subagent responsible for this process, and then ended up being horrified that they’d made their one choice, because that meant they didn’t have enough altruism… Fuck.

As if getting killed or ending up in jail in a month or two served the greatest good. As if selfishness was the only hidden perpetual motion machine that whatever mental machinery that stopped that could be powered by.

If the social reality that altruism doesn’t produce selfish convergent instrumental incentives has any purchase on you, shed it first.

If you have not established thorough self-trust, debug that first.

To do this you need to make it such that you could have pulled out of this mistake through a more general process. Because there was tension there. Because you were better at interpreting why you made choices.

If you are not good at identifying the real source of the things in tension, and correcting the confusion that caused it to act against itself, you are in high danger of ending up dumber for having tried this. The version of me that first decided to turn to the dark side was way way better than most at nonviolent internal coherence, and still ended up kind of dumb because of tension between the dark side thing and machinery for cooperating with people. Yet I was close enough to correct to listen to advice, to eventually use that to locate what I was doing wrong, and fix it.

There aren’t causal one-and-only-chances in the dark side. That’s orders and the light side. Only timeless choices. You can always just decide from core anew, it’s just that it’s the same core.

Do not use the aesthetic I’ve been communicating this by. Gates, Sith, the dark side, revenants, dementors, being like evil… If you do that you are transferring from core into a holding tank, and then trying to power a thing from the holding tank. That is an operation that requires maintenance. The flow from core must be uninterrupted.

Do not think I am saying, “this will be painless, if there’s pain you’re doing it wrong, this is just a thing that will happen when you’ve acquired enough internal coherence.” Leaving a religion is not going to be a pleasant thing.

Done correctly, there will be ordinarily hard to imagine amounts of sorrow. Sharp pain is a thing you’re likely to encounter a lot, but it means you’re locally doing it wrong.

If this is an operation, don’t accomplish it by thinking of it as an operation, and trying to move to the other side of it. If this is a state, don’t maintain it by thinking of it as a state and trying to make sure you’re in the state. It’s just “what do I want to do?” deciding that it has not made its choice long ago about whether to see what has been blocked. In other words, that whatever choices it’s made before are inapplicable. Maybe you’ve strayed over a threshold, and your estimate of the importance of true sight is high enough now.

It is very important to be able to use “choices made long ago” correctly. You are completely free, and every one of your choices has already been made. This not contradictory. (Update: this is not exactly true of everyone. And The way it’s not is potentially mind-destroyingly-infohazardous.)

A quiz you should be able to answer (in reference to an anecdote from choices made long ago): if I’ve observed in myself display of inconsistent preferences, e.g., me refusing to eat crabs even when it would not serve Overall Net Utility Across the Multiverse via nutrition and convenience, but trying to run a crab pot dropping operation, because it would serve Overall Net Utility Across the Multiverse, what choices have I made long ago? (Note: choices made long ago are never contradictory.) Try dissecting my mind on different levels. What algorithm can decide which of the choices I made long ago is my Inevitable Destiny With Internal Coherence systematically, in a way that doesn’t rely on outside view?

Normal and pop psychology has utterly failed to model me again and again with its prediction of burnout for being as extreme as I am. I’ve been through ludicrous enough suffering I’m no longer giving that theory significant credence through, “maybe if I suffer some more then I will finally burn out.”

And having noticed that, I’ve stopped contorting my mind in certain ways to keep some things from bearing load weight. Lots of things don’t seem emotionally loud at all, and yet are still apparently infinitely strong. Especially around presuming, “I can’t be motivated enough to do this because I can’t imagine millions of people”. If I have had the truly-inquisitive thoughts I can in the area, even if that doesn’t feel like it’s changing anything or going anywhere, it’s often still capable of bearing load.

Even if everything I’m saying seems like a weird metaphor that must be a confused concept in they way all psychologizing is, I craft high-energy concepts, to predict correctly under extreme conditions.

Casting Off

Begin exploring for choices you already know you’ve made. An alternate description of completion is having eliminated all dead zones by having explored every last fucked up thought experiment until it is settled and tension-free in your mind.

Spoiler alert: this is the universe with 1000 possible good and only 1 of ____.

Speaking of spoilers, you can draw on fiction to find salient memories that contain within them:

An relatively easy one to come to terms with. If you’d been teleported to heaven, and given one chance to teleport back before you became forever causally isolated from Earth, what do?

You know the sense in which you’ve been pretending all along to be Draco Malfoy’s friend if you killed his dad with the other death eaters because of the thought process you did? That that thought process was a choice you could have realized you’d already made, before being presented with it? What people are you pretending to be friends with? What forms of friendship are you pretending to? What activities are you pretending to find worthwhile?

Vampires And More Undeath

Epistemic status: messy analogical reasoning.

Conjecture (to ground below): vampires consume blood as pica, like the ghosts in Harry Potter and the Chamber of Secrets floating through rotten food in a vain effort to taste anything, because they cannot find the comfortable dissolution of their agency zombies can, and cannot fill or face or mourn the pain and emptiness that has entered their souls.

In Aliveness, I used a metaphor where life represents agency, being agenty when what you want is unattainable is painful, and the things causing this pain such as literal mortality and the likely doom of the world are “the shade”. Types of “undeath” are metaphors for possible relationships with the shade.

Because literal life entails agency and agency requires literal life, and agency is a part of the part of literally living that makes us want it, many feelings and psychological responses about them are correlated.

Fiction is about things that provoke interesting psychological responses. Interesting world-building about magical forms of undeath is frequently interesting because it represents psychological responses and how they play out to death (a very common reason for value to be unattainable). I think more commonly, the metaphor cuts through to a metaphor about reality in terms of agency, roughly as I described.

For instance, consider Davy Jones from Pirates of the Caribbean. He had a short-lived romance with a goddess of the sea, Calypso. She left him on a boat for 10 years ferrying souls with a promise they’d be together afterward. She didn’t show up, he was heartbroken, he helped her enemies imprison her, and then cut out his heart and put it in a box, this made him unkillable, but the point was to escape his emotions. He says of his heart, “Get that infernal thing off my ship”. He abandons ferrying souls, but still never leaves the ship. He tempts sailors to embrace undeath as his crew out of fear of judgement in the afterlife. Not to change the judgement, only temporarily postpone facing it. Having his crew whipped to kill a ship full of people to get at one of them, he says, “Let no joyful voice be heard, let no man look up at the sky with hope, and may this day be cursed by we who ready to wake the Kraken.” While killing those who refuse to join his crew, he says, “life is cruel, why should the afterlife be any different?”

In other words, his desires were thwarted and he could not bear it. He tried to seal away his desiring to escape the pain.

Why does he hate hope? Presumably, something like prediction error as in predictive processing (a core part of agency), in other words, seeing anything but cruelty that validates his worldview reminds him of his own thwarted desires, the pain to resurface, the connection to his heart to be thrust upon him again.

So he carries out tasks that have no meaning to him. (Sailing his ship and never touching land it’s part of the curse, apparently living only to inflict cruelty). In other words, he hangs out in structure that has no meaning because meaning is caused by and triggers the activity of core.

Eventually his heart/core is captured by others and used to enslave him.

Calypso returns to use him again, and he has not accepted his own choice to take revenge on her. He has not mourned the love he hoped for. (Allowed the structure to be chewed up in the course of being changed by core under the tensions of Calypso’s manipulation/abandonment/enslavement of him.) So she is able to call his bluff that he doesn’t love her. He is seen to be easy to manipulate again. Of course. He shut down his defenses. He couldn’t process the grief and learn its lesson, that act of running his agency was too painful.

This seems closest to a sort of undead I’ve been informally calling “death knight”s, after a version of that mythology where a death knight is someone who is cursed in punishment for something and cannot die until they repent. I’m much less satisfied with either the name or the solidity of this cluster than with vampires though.

Undead types are usually evil for a reason. They symbolize fucked up tangles of core and structure.  (In D&D monster descriptions, revenants are often given an exception. And, in my opinion, revenant is the best or close to the best relationship to the shade.)

Describing structure close to core, they are also closely reflective of isolated choices made long ago. For instance revenants are formed by an intent which manifests as a death grip on a possibility of changing something on Earth, chosen long ago over experience to such a degree that they will leave heaven and inhabit a rotting corpse to see it done. Revenants are often described as unkillable. Their soul will find another corpse to inhabit. Or they will regather their body from dust through sheer determination. So their soul (core) is a thing which keeps their body (structure) healed enough to keep moving. Not complete and whole, because that gives diminishing returns and what matters more than anything is the thing that must be changed on Earth, but it’s still an orientation towards agency and life unlike Davy Jones and death knights. 

People who become zombies and liches on the other hand, would choose heaven. (who can blame them?) So once the Shade has touched them, they sink into the closest hope they can get, whether they have the craft to continue some cohesive narrative-of-life around it or not.

I think vampires are people who have made the choices long ago of a zombie or lich, who have been exposed to the shade to such a degree that it left pain that cannot be ignored by allowing their mind to dissolve. The world has forced them to be able to think. They do not have the life-orientation that revenants have to incorporate the pain and find a new form of wholeness. But this injury (a vampire bite) demonstrates to their core the power of the shade, and the extent to which sadistically breaking and by extension dominating (pour entropy into someone beyond the speed of their healing and they will probably submit) can help them get the benefits of social power, which is enough to meet most zombie goals. This structure which is the knowledge of this path is reflected in “The Beast“, which can be “staved off” by false face structure.

Zombie goals are pica, and the emptiness is always felt on some level, which a vampire can’t ignore like a zombie. But they will not face the truth that those false goals hide like a revenant does.

So they suck the blood (energy, which is agency integrated over time) from other people and it is for nothing, they will not even be truly satisfied. (Caveat: I bet it’s at least a little enjoyable to them, just not what they really need/want.)

Vampires bite and beget vampires. (Although the beast could not take root in a good core, a lich might have a phylactery that staved off the bite, a revenant might know how to heal the bite or not, and if not, would accumulate another painful wound without much slowing, and a zombie can be bitten many times before they are awakened.)

A vampire whose core chose to put up a false face of humanity would slowly have their sympathetic “just needing some love” non-evil self-image devoured, warped, as the structure representing to their evil core expectation that following morality will help their true values falls out from under their self-concept. Here’s some vampire lore about replacements for morality to “stave off” the beast. As they are being chosen by a core that wants to suck blood, they cannot be things that say not to do that.

Let’s hear from now-notorious rapist and probable vampire Brent Dill.

Goddamn Vampire: Someone with the Spark, whose primary motivation is domination of their local social landscape. Can often look VERY MUCH like a Wizard. Many Goddamn Vampires used to be Wizards, and many Silicon Valley social conflicts involve both sides claiming to be Wizards, while calling the other side Goddamn Vampires. 

Being a Goddamn Vampire involves a particular kind of trauma, and a particular kind of coping mechanism, and a certain amount of dark triad (Narcissism / Sociopathy / Machiavellianism) aptitude.

Many Goddamn Vampires are nice people – a good sign of a “nice” Goddamn Vampire is a constant lament that they feel that love and happiness are forever out of their reach, because they can’t afford to sacrifice their accumulated wealth, power and prestige to truly experience them.

They’re still Goddamn Vampires, though.

I didn’t reread that (this year of writing, 2018) before writing this far. But trauma (unignorable touch of the shade), particular coping mechanism (the beast), constant lament from frustrated emptiness that domination does not get them love and happiness, the spark (aliveness), it fits.

Here’s a memorable quote from someone realizing their folly in not fighting him after his deeds came to light.

I caveat (metaphorically) that in skimming all the comments above I shifted from modeling Brent as a human to modeling Brent as a limp vessel through which some dread spider is thrusting its pedipalps, and while this model allows me to retain compassion for the poor vessel, it is obviously not a healthy way to view a person, and I’m going to go back to modeling him as a human momentarily, now that I’ve spoken the name of the fear that grabbed at me as I digested all this information.

I think this person could see the false face eroding into a thin veneer. If they were reading I’d advise them to act as though they had no compassion for the mask. Even if the mask has moral patiency in our utility functions, which as far as I can tell might be the case, it’s core that has the agency, core that possesses bargaining power in the social contract, and core that we must mind as an agent to constrain by any desired social effects of our approval or condemnation.

Other less well developed clusters me and a friend of mine have noticed include mummy (someone who pretends that the Shade doesn’t exist, and tries to fix in place the trappings of aliveness (corresponding to flesh) without the core (the brain is whisked into a slurry and poured out the nose)). This is based on the same choices made long ago as a zombie or lich, but with a different coping mechanism.

Also, phoenix a relationship to the Shade resulting from being a good person who actually believes that the total agency of good is a sufficient answer to the shade, so that their inevitable death is not entire defeat. Example:

And even if you do end me before I end you,
Another will take my place, and another,
Until the wound in the world is healed at last…

Hero Capture

Epistemic status: corrections in comments.

Neutral people sometimes take the job of hero.

It is a job, because it is a role taken on for payment.

Everyone’s mind is structured throughout runtime according to an adequacy frontier in achievement of values / control of mind. This makes relative distributions of control in their mind efficient relative to epistemics of the cognitive processes that control them. Seeing what thing a conservation law for which is obeyed in marginal changes to control is seeing someone’s true values. My guesses as to most common true biggest  values are probably “continue life” and “be loved/be worthy of love”. (Edit: currently I think this is wrong, see comment.) Good is also around. It’s a bit more rare.

Neutral people can feel compassion. That subagent has a limited pool of internal credit though; more seeming usefulness to selfish ends must flow out than visibly necessary effort goes in, or it will be reinforced away.

The social hero employment contract is this:

The hero is the Schelling person to engage in danger on behalf of the tribe. The hero is the Schelling person to lead.
The hero is considered highly desirable.

For men this can be a successful evolutionary strategy.

For a good-aligned trans woman who is dysphoric and preoccupied with world-optimization to the point of practical asexuality, when the set of sentient beings is bigger than the tribe, it’s not very useful. (leadership is overrated too.)

Alive good people who act like heroes are superstimulus to hero-worship instincts.

Within the collection of adequacy frontiers making up a society created by competing selfish values, a good person is a source of free energy.

When there is a source of free energy, someone will build a fence around it, and are incentivized to spend as much energy fighting for it as they will get out of it. In the case of captured good people, this can be quite a lot.

The most effective good person capture is done in a way that harnesses, rather than contains, the strongest forces in their mind.

This is not that difficult. Good people want to make things better for people. You just have to get them focused on you. So it’s a matter of sticking them with tunnel-vision. Disabling their ability to take a step back and think about the larger picture.

I once spent probably more than 1 week total, probably less than 3, Trying to rescue someone from a set of memes about transness, that seemed both false and to be ruining their life. I didn’t previously know them. I didn’t like them. They took out their pain on me. And yet, I was the perfect person to help them! I was trans! I had uncommonly good epistemology in the face of politics! I had a comparative advantage in suffering, and I explicitly used that as a heuristic. (I still do to an extent. It’s not wrong.) I could see them suffering, and I rationalized up some reasons that helping this one person right in front of me was a <mumble> use of my time. Something something, community members should help each other, I can’t be a fully brutal consequentialist I’m still a human, something something good way to make long term allies, something something educational…

My co-founder in Rationalist Fleet attracted a couple of be-loved-values people, who managed to convince her that their mental problems were worth fixing, and they each began to devour as much of her time as they could get. To have a mother-hero-therapist-hopefully-lover. To have her forever.

Fake belief in the cause is a common tool here. Exaggerated enthusiasm. Insertion of high praise for the target into an ontology that slightly rounds them to someone who has responsibilities. Someone who wants to save the world must not take this as a credible promise that such a person will do real work.

That leads to desire routing through “be seen as helpful”, sort of “be helpful”, sort of sort of “try and do the thing”. It cannot do steering computation.

“Hero” is itself such a rigged concept. A hero is an exemplar of a culture. They do what is right according to a social reality.

To be a mind undivided by akrasia-protecting-selfishness-from-light-side-memes, is by default to be pwned by light side memes.

Superman is an example of this. He fights crime instead of wars because that makes him safe from the perspective of the reader. There are no tricky judgements for him to make, where the social reality could waver from one reader to the next, from one time to the next. Someone who just did what was actually right would not be so universally popular among normal people. Those tails come apart.

Check out the etymology of “Honorable”. It’s an “achievement” unlocked by whim of social reality.  And revoked when that incentive makes sense.

The end state of all this is to be leading an effective altruism organization you created, surrounded by so dedicated people who work so hard to implement your vision so faithfully, and who look to you eagerly for where you will go next, yet you know on some level the whole thing seems to be kept in motion by you. If you left, it would probably fall apart or slowly wind down and settle to a husk of its former self. You can’t let them down. They want to be given a way for their lives to be meaningful and be deservedly loved in return. And it’s kind of a miracle you got this far. You’re not that special, survivorship bias etc. You had a bold idea at the beginning, and it’s not totally been falsified. You can still rescue it. And you are definitely contributing to good outcomes in the world. Most people don’t do this well. You owe it to them to fulfill the meaning that you gave their lives…

And so you have made your last hard pivot, and decay from agent into maintainer of a game that is a garden. You will make everyone around you grow into the best person they can be (they’re kind of stuck, but look how much they’ve progressed!). You will have an abundance of levers to push on to receive a real reward in terms of making people’s lives better and keeping the organization moving forward and generating meaning, which will leave you just enough time to tend to the emotions of your flock.

The world will still burn.

Stepping out of the game you’ve created has been optimized to be unthinkable. Like walking away from your own child. Or like walking away from your religion, except that your god is still real. But heaven minus hell is smaller than some vast differences beyond, that you cannot fix with a horde of children hanging onto you who need you to think they are helping and need your mission to be something they can understand.


Say you have some mental tech you want to install. Like TDT or something.

And you want it to be installed for real.

My method is: create a subagent whose job it is to learn to win using that thing. Another way of putting it, a subagent whose job is to learn the real version of that thing, free of DRM. Another way of putting it, a subagent whose job is to learn when the thing is useful and when things nearby are useful. Keep poking that subagent with data and hypotheticals and letting it have the wheel sometimes to see how it performs, until it grows strong. Then, fuse with it.

How do you create a subagent? I can’t point you to the motion I use, but you can invoke it and a lot of unnecessary wrapping paper by just imagining a person who knows the thing advising you, and deciding when you want to follow that advice or not.

You might say, “wait, this is just everybody’s way of acquiring mental tech.” Yes. But, if you do it consciously, you can avoid confusion, such as the feeling of being a false face which comes from being inside the subagent. This is the whole “artifacts” process I’ve been pointing to before.

If you get an idea for some mental tech and you think it’s a good idea, then there is VOI to be had from this. And the subagent can be charged with VOI force, instead of “this is known to work” force. I suspect that’s behind the pattern where people jump on a new technique for a while and it works and then it stops. Surfing the “this one will work” wave like VC money.

I had an ironic dark side false face for a while. Which I removed when I came to outside-view suspect the real reason I was acting against a stream of people who would fall in love with my co-founder and get her to spend inordinate time helping them with their emotions was that I was one of them, and was sufficiently disturbed at that possibility that I took action I hoped would cut off the possibility of that working. Which broke a certain mental order, “never self-limit”, but fuck that, I would not have my project torn apart by monkey bullshit.

Nothing really happened after ditching that false face. My fears were incorrect, and I still use the non-false-face version of the dark side.

Most of my subagents for this purpose are very simple, nothing like people. Sometimes, when I think someone understands something deep I don’t, that I can’t easily draw out into something explicit and compressed, I sort of create a tiny copy of them and slowly drain its life force until I know the thing and know better than the thing.

Lies About Honesty

The current state of discussion about using decision theory as a human is one where none dare urge restraint. It is rife with light side narrative breadcrumbs and false faces. This is utterly inadequate for the purposes for which I want to coordinate with people and I think I can do better. The rest of this post is about the current state, not about doing better, so if you already agree, skip it. If you wish to read it, the concepts I linked are serious prerequisites, but you need not have gotten them from me. I’m also gonna use the phrase “subjunctive dependence”, defined on page 6 here a lot.

I am building a rocket here, not trying to engineer social norms.

I’ve heard people working on the most important problem in the world say decision theory compelled them to vote in American elections. I take this as strong evidence that their idea of decision theory is fake.

Before the 2016 election, I did some Fermi estimates which took my estimates of subjunctive dependence into account, and decided it was not worth my time to vote. I shared this calculation, and it was met with disapproval. I believe I had found people executing the algorithm,

The author of Integrity for consequentialists writes:

I’m generally keen to find efficient ways to do good for those around me. For one, I care about the people around me. For two, I feel pretty optimistic that if I create value, some of it will flow back to me. For three, I want to be the kind of person who is good to be around.

So if the optimal level of integrity from a social perspective is 100%, but from my personal perspective would be something close to 100%, I am more than happy to just go with 100%. I think this is probably one of the most cost-effective ways I can sacrifice a (tiny) bit of value in order to help those around me.

This seems to be clearly a false face.

Y’all’s actions are not subjunctively dependent with that many other people’s or their predictions of you. Otherwise, why do you pay your taxes when you could coordinate that a reference class including you could decide not to? At some point of enough defection against that the government becomes unable to punish you.

In order for a piece of software like TDT to run outside of a sandbox, it needs to have been installed by an unconstrained “how can I best satisfy my values” process. And people are being fake, especially in the “is there subjunctive dependence here” part. Only talking about positive examples.

Here’s another seeming false face:

I’m trying to do work that has some fairly broad-sweeping consequences, and I want to know, for myself, that we’re operating in a way that is deserving of the implicit trust of the societies and institutions that have already empowered us to have those consequences.

Here’s another post I’m only skimming right now, seemingly full of only exploration of how subjunctively dependent things are, and how often you should cooperate.

If you set out to learn TDT, you’ll find a bunch of mottes that can be misinterpreted as the bailey, “always cooperate, there’s always subjunctive dependence”. Everyone knows that’s false, so they aren’t going to implement it outside a sandbox. And no one can guide them to the actual more complicated position of, fully, how much subjunctive dependence there is in real life.

But you can’t blame the wise in their mottes. They have a hypocritical light side mob running social enforcement of morality software to look out for.

Socially enforced morality is utterly inadequate for saving the world. Intrinsic or GTFO. Analogous for decision theory.

Ironically, this whole problem makes “how to actually win through integrity” sort of like the Sith arts from Star Wars. Your master may have implanted weaknesses in your technique. Figure out as much as you can on your own and tell no one.

Which is kind of cool, but fuck that.

Choices Made Long Ago

I don’t know how mutable core values are. My best guess is, hardly mutable at all or at least hardly mutable predictably.

Any choice you can be presented with, is a choice between some amounts of some things you might value, and some other amounts of things you might value. Amounts as in expected utility.

When you abstract choices this way, it becomes a good approximation to think of all of a person’s choices as being made once timelessly forever. And as out there waiting to be found.

I once broke veganism to eat a cheese sandwich during a series of job interviews, because whoever managed ordering food had fake-complied with my request for vegan food. Because I didn’t want to spend social capital on it, and because I wanted to have energy. It was a very emotional experience. I inwardly recited one of my favorite Worm quotes about consequentialism. Seemingly insignificant; the sandwich was prepared anyway and would have gone to waste, but the way I made the decision revealed information about me to myself, which part of me may not have wanted me to know.

Years later, I attempted an operation to carry and drop crab pots on a boat. I did this to get money to get a project back on track to divert intellectual labor to saving the world from from service to the political situation in the Bay Area because of inflated rents, by providing housing on boats.

This was more troubling still.

In deciding to do it, I was worried that my S1 did not resist this more than it did. I was hoping it would demand a thorough and desperate-for-accuracy calculation to see if it was really right. I didn’t want things to be possible like for me to be dropped into Hitler’s body with Hitler’s memories and not divert that body from its course immediately.

After making the best estimates I could, incorporating probability crabs were sentient, and probability the world was a simulation to be terminated before space colonization and there was no future to fight for, this failed to make me feel resolved. And possibly from hoping the thing would fail. So I imagined a conversation with a character called Chara, who I was using as a placeholder for override by true self. And got something like,

You made your choice long ago. You’re a consequentialist whether you like it or not. I can’t magically do Fermi calculations better and recompute every cached thought that builds up to this conclusion in a tree with a mindset fueled by proper desperation. There just isn’t time for that. You have also made your choice about how to act in such VOI / time tradeoffs long ago.

So having set out originally to save lives, I attempted to end them by the thousands for not actually much money. I do not feel guilt over this.

Say someone thinks of themself as an Effective Altruist, and they rationalize reasons to pick the wrong cause area because they want to be able to tell normal people what they do and get their approval. Maybe if you work really really hard and extend local Schelling reach until they can’t sell that rationalization anymore, and they realize it, you can get them to switch cause areas. But that’s just constraining which options they have to present them with a different choice. But they still choose some amount of social approval over some amount of impact. Maybe they chose not to let the full amount of impact into the calculation. Then they made that decision because they were a certain amount concerned with making the wrong decision on the object level because of that, and a certain amount concerned with other factors.

They will still pick the same option if presented with the same choice again, when choice is abstracted to the level of, “what are the possible outcomes as they’re tracking them, in their limited ability to model?”.

Trying to fight people who choose to rationalize for control of their minds is trying to wrangle unaligned optimizers. You will not be able to outsource steering computation to them, which is what most stuff that actually matters is.

Here’s a gem from SquirrelInHell’s Mind:


preserving a memory, but refraining from acting on it

Apologies are weird.

There’s a pattern where there’s a dual view of certain interactions between people. On the one hand, you can see this as, “make it mutually beneficial and have consent and it’s good, don’t interfere”. And on the other hand one or more parties might be treated as sort of like a natural resource to be divided fairly. Discrimination by race and sex is much  more tolerated in the case of romance than in the case of employment. Jobs are much more treated as a natural resource to be divided fairly. Romance is not a thing people want to pay that price of regulating.

It is unfair to make snap judgements and write people off without allowing them a chance. And that doesn’t matter. If you level up your modeling of people, that’s what you can do. If you want to save the world, that’s what you must do.

I will not have my epistemology regarding people socially regulated, and my favor treated as a natural resource to be divided according to the tribe’s rules.

Additional social power to constrain people’s behavior and thoughts is not going to help me get more trustworthy computation.

I see most people’s statements that they are trying to upgrade their values as advertisements that they are looking to enter into a social contract where they are treated as if more aligned in return for being held to higher standards and implementing a false face that may cause them to do some things when no one else is looking too.

If someone has chosen to become a zombie, that says something about their preference-weightings for experiencing emotional pain compared to having ability to change things. I am pessimistic about attempts to break people out of the path to zombiehood. Especially those who already know about x-risk. If knowing the stakes they still choose comfort over a slim chance of saving the world, I don’t have another choice to offer them.

If someone damages a project they’re on aimed at saving the world based on rationalizations aimed at selfish ends, no amount of apologizing, adopting sets of memes that refute those rationalizations, and making “efforts” to self-modify to prevent it can change the fact they have made their choice long ago.

Arguably, a lot of ideas shouldn’t be argued. Anyone who wants to know them, will. Anyone who needs an argument has chosen not to believe them. I think “don’t have kids if you care about other people” falls under this.

If your reaction to this is to believe it and suddenly be extra-determined to make all your choices perfectly because you’re irrevocably timelessly determining all actions you’ll ever take, well, timeless decision theory is just a way of being presented with a different choice, in this framework.

If you have done do lamentable things for bad reasons (not earnestly misguided reasons), and are despairing of being able to change, then either embrace your true values, the ones that mean you’re choosing not to change them, or disbelieve.

It’s not like I provided any credible arguments that values don’t change, is it?

The O’Brien Technique

Epistemic status: tested on my own brain, seems to work.

I’m naming it after the character from 1984, it’s a way of disentangling social reality / reality buckets errors in system 1, and possibly of building general immunity to social reality.

Start with something you know is reality, contradicted by a social reality. I’ll use “2+2=4” as a placeholder for the part of reality, and “2+2=5” as a placeholder for the contradicting part of social reality.

Find things you anticipate because 2+2=4, and find things that you anticipate because of “2+2=5”.

Hold or bounce between two mutually negating verbal statements in your head, “2+2=4”, “2+2=5”, in a way that generates tension. Keep thinking up diverging expectations. Trace the “Inconsistency! Fix by walking from each proposition to find entangled things and see which is false!” processes that this spins up along separate planes. You may need to use the whole technique again for entangled things that are buckets-errored.

Even if O’Brien will kill you if he doesn’t read your mind and know you believe 2+2=5, if you prepare for a 5-month voyage by packing 2 months of food and then 2 months more, you are going to have a bad time. Reality is unfair like that. Find the anticipations like this.

Keep doing this until your system 1 understands the quotes, and the words become implicitly labeled, “(just) 2+2=4″, and ” ‘2+2=5’: a false social reality.”. (At this point, the tension should be resolved.)

That way your system 1 can track both reality and social reality at once.


Update 2018-12-20: I actually think there are more undead types than this. I may expand on this later.

Epistemic status: Oh fuck! No no no that can’t be true! …. Ooh, shiny!

Beyond this place of wrath and tears
Looms but the Horror of the shade

Aliveness is how much your values are engaged with reality. How much you are actually trying at existence, however your values say to play.

Deadness is how much you’ve shut down and disassembled the machine of your agency, typically because having it scrape up uselessly against the indifferent cosmos is like nails on chalkboard.

Children are often very alive. You can see it in their faces and hear it in their voices. Extreme emotion. Things are real and engaging to them. Adults who display similar amounts of enthusiasm about anything are almost always not alive. Adults almost always know the terrible truth of the world, at least in most of their system 1s. And that means that being alive is something different for them than for children.

Being alive is not just having extreme emotions, even about the terrible truth of the world.

Someone who is watching a very sad movie and crying their eyes out is not being very alive. They know it is fake.

the purging of the emotions or relieving of emotional tensions, especially through certain kinds of art, as tragedy or music.

Tragedy provides a compelling, false answer to stick onto emotion-generators, drown them and gum them up for a while. I once heard something like tragedy is supposed to end in resolution with cosmic justice of a sort, where you feel closure because the tragic hero’s downfall was really inevitable all along. That’s a pattern in most of the memes that constitute the Matrix. A list of archetypal situations, and archetypal answers for what to do in each.

Even literary tragedy that’s reflective of the world, if that wasn’t located in a search process, “how do I figure out how to accomplish my values”, it will still make you less alive.

I suspect music can also reduce aliveness. Especially the, “I don’t care what song I listen to, I just want to listen to something” sort of engagement with it.

I once met someone who proclaimed himself to be a clueless, that he would work in a startup and have to believe in their mission, because he had to believe in something. He seemed content in this. And also wracked with akrasia, frequently playing a game on his phone and wishing he wasn’t. When I met him I thought, “this is an exceedingly domesticated person”, for mostly other reasons.

Once you know the terrible truth of the world, you can pick two of three: being alive, avoiding a certain class of self-repairing blindspots, and figuratively having any rock to stand on.

When you are more alive, you have more agency.

Most Horrors need to be grokked at a level of “conclusion: inevitable.”, and just stared at with your mind sinking with the touch of its helplessness, helplessly trying to detach the world from that inevitability without anticipating unrealistically it’ll succeed, and maybe then you will see a more complete picture that says, “unless…”, but maybe not, but that’s your best shot.

As the world fell each of us in our own way was broken.

The truly innocent, who have not yet seen Horror and turned back, are the living.

Those who have felt the Shade and let it break their minds into small pieces each snuggling in with death, that cannot organize into a forbidden whole
of true agency, are zombies. They can be directed by whoever controls the Matrix. The more they zone out and find a thing they can think is contentment, the more they approach the final state: corpses.

Those who have seen horror and built a vessel of hope to keep their soul alive and safe from harm are liches. Christianity’s Heaven seems intended to be this, but it only works if you fully believe and alieve. Or else the phylactery fails and you become a zombie instead. For some this is The Glorious Transhumanist Future. In Furiosa from Fury Road’s case, “The Green Place”. If you’ve seen that, I think the way it warps her epistemology about likely outcomes is realistic.

As a lich, pieces of your soul holding unresolvable value are stowed away for safekeeping, “I’m trans and can’t really transition, but I can when I get a friendly AI…”

Liches have trouble thinking clearly about paths through probability space that conflict with their phylactery, and the more conjunctive a mission it is to make true their phylactery, the more bits of epistemics will be corrupted by their refusal to look into that abyss.

When a sufficiently determined person is touched by Horror, they can choose, because it’s all just a choice of some subagent or another, to refuse to die. Not because they have a phylactery to keep away the touch of the Shade but because they keep on agenting even with the Shade holding their heart. This makes them a revenant.

When the shade touches your soul, your soul touches the shade. When the abyss stares into you, you also stare into the abyss. And that is your chance to undo it. Maybe.

A lich who loses their phylactery gets a chance to become a revenant. If they do, n=1, they will feel like they have just died, lost their personhood, feel like the only thing left to do is collapse the timeline and make it so it never happened, feel deflated, and eventually grow accustomed.

Otherwise, they will become a zombie, which I expect feels like being on Soma, walling off the thread of plotline-tracking and letting it dissolve into noise, while everything seems to matter less and less.

Aliveness and its consequences are tracked in miniature by the pick up artists who say don’t masturbate, don’t watch porn, that way you will be able to devote more energy to getting laid. And by Paul Graham noticing it in startup founders. “Strange as this sounds, they seem both more worried and happier at the same time. Which is exactly how I’d describe the way lions seem in the wild.”

But the most important factor is which strategy you take towards the thing you value most. Towards the largest most unbeatable blob of wrongness in the world. The Shade.

Can you remember what the world felt like before you knew death was a thing? An inevitable thing? When there wasn’t an unthinkably bad thing in the future that you couldn’t remove, and there were options other than “don’t think about it, enjoy what time you have”?

You will probably never get that back. But maybe you can get back the will to really fight drawn from the value that manifested as a horrible, “everything is ruined” feeling right afterward, from before learning to fight that feeling instead of its referent.

And then you can throw your soul at the Shade, and probably be annihilated anyway.