Spectral Sight and Good

Epistemic status update: This model is importantly flawed. I will not explain why at this time. Just, reduce the overall weight you put in it.

Good people are people who have a substantial amount of altruism in their cores.

Spectral sight is a collection of abilities allowing the user to see invisible things like the structure of social interactions, institutions, ideologies, politics, and the inner layers of other people’s minds.

I’m describing good and spectral sight together for reasons, because the epistemics locating each concept are interwoven tightly as I’ve constructed them.

A specific type of spectral sight is the one I’ve shown in neutral and evil. I’m going to be describing more about that.

This is a skill made of being good at finding out what structure reveals about core. Structure is easy to figure out if you already know it’s Real. But often that’s part of the question. Then you have to figure out what it’s a machine for doing, as in what was the still-present thing that installed it  and could replace it or override it optimizing for?

It’s not a weirdly parochial definition to call this someone’s true values. Because that’s what will build new structure of the old structure stops doing its job. Lots of people “would” sacrifice themselves to save 5 others. And go on woulding until they actually get the opportunity.

There’s a game lots of rationalists have developed different versions of, “Follow the justification”. I have a variant. “Follow the motivational energy.” There’s a limited amount that neutral people will sacrifice for the greater good, before their structures run out of juice and disappear. “Is this belief system / whatever still working out for me” is a very simple subagent to silently unconsciously run as puppetmaster.

There’s an even smarter version of that, where fake altruistic structure must be charged with Schelling reach in order to work.

Puppetmasters doling out motivational charge to fake structure can include all kinds of other things to make the tails come apart between making good happen and appearing to be trying to make good happen in a way that has good results for the person. I suspect that’s a lot of what the “far away”ness thing that the drowning child experiment exposes is made of. Play with variations of that thought experiment, and pay attention to system 1 judgements, not principles, to feel the thing out. What about a portal to the child? What about a very fast train? What if it was one time teleportation? Is there a consistant cross-portal community?

There is biologically fixed structure in the core, the optimizer for which is no longer around to replace it. Some of it is heuristics toward the use of justice for coordinating for reproducing. Even with what’s baked in, the tails come apart between doing the right thing, and using that perception to accomplish things more useful for reproducing.

My model says neutral people will try to be heroes sometimes. Particularly if that works out for them somehow. If they’re men following high-variance high reward mating strategies, they can be winning even while undergoing significant risk to their lives. That landscape of value can often generate things in the structure class, “virtue ethics”.

Good people seem to have an altruism perpetual motion machine inside them, though, which will persist in moving them through cost in the absence of what would be a reward selfishly.

This about the least intuitive thing to accurately identify in someone by anything but their long-term history. Veganism is one of the most visible and strong correlates. The most important summaries of what people are like, are the best things to lie about. Therefore they require the best adversarial epistemology to figure out. And they are most common to be used in oversimplifying. This does not make them not worth thinking.

If you use spectral sight on someone’s process of figuring out what’s a moral patient, you’re likely to get one of two kinds of responses. One is something like “does my S1 empathize with it”, the other is clique-making behavior, typically infused with a PR / false-face worthy amount of justice, but not enough to be crazy.

Not knowing this made me taken by surprise the first time I tried to proselytize veganism to a contractarian. How could anyone actually feel like inability to be a part of a social contract really really mattered?

Of course, moral patiency is an abstract concept, far in Schelling reach away from actual actions. And therefore one of the most thoroughly stretched toward lip-service to whatever is considered most good and away from actual action.

“Moral progress” has been mostly a process of Schelling reach extending. That’s why it’s so predictable. (See Jeremy Bentham.)

Thinking about this requires having calibrated quantitative intuitions on the usefulness of different social actions, and of internal actions. There is instrumental value for the purpose of good in clique-building, and there is instrumental value for the purpose of clique-building in appearing good-not-just-clique-building. You have to look at the algorithm, and its role in the person’s entire life, not just the suggestively named tokens, or token behavior.

When someone’s core acts around structure (akrasia), and self-concepts are violated, that’s a good glimpse into who they really are. Good people occasionally do this in the direction of altruism. Especially shortsighted altruism. Especially good people who are trying to build a structure in the class, “consequentialisms”.

Although I have few datapoints, most of which are significantly suspect, good seems quite durable. Because it is in core, good people who get jailbroken remain good. (Think Adrian Veidt for a fictional example. Such characters often get labeled as evil by the internet. Often good as well.) There are tropes reflecting good people’s ability to shrug off circumstances that by all rights should have turned them evil. I don’t know if that relationship to reality is causal.

By good, I don’t mean everything people are often thinking when they call someone “good”. That’s because that’s as complicated and nonlocal a concept as justice. I’m going for a “understand over incentivize and prescribe behavior” definition here, and therefore insisting that it be a locally-defined concept.

It’s important not to succumb to the halo effect. This is a psychological characteristic. Just because you’re a good person, doesn’t mean you’ll have good consequences. It doesn’t mean you’ll tend to have good consequences. It doesn’t mean you’re not actively a menace. It doesn’t mean you don’t value yourself more than one other person. It’s not a status which is given as a reward or taken away for bad behavior, although it predicts against behavior that is truly bad in some sense. Good people can be dangerously defectbot-like. They can be ruthless, they can exploit people, they can develop structure for those things.

If you can’t thoroughly disentangle this from the narrative definition of good person, putting weight in this definition will not be helpful.

Neutral and Evil

What is the good/neutral/evil axis of Dungeons and Dragons alignment made of?
We’ve got an idea of what it would mean for an AI to be good-aligned: it wants to make all the good things happen so much, and it does.
But what’s the difference between a neutral AI and an evil AI?
It’s tempting to say that the evil AI is malevolent, rather than just indifferent. And the neutral one is indifferent.
But that doesn’t fit the intuitive idea that the alignment system was supposed to map onto, or what alignment is.

Imagine a crime boss who makes a living off of the kidnapping and ransoms of random innocents, while posting videos online of the torture and dismemberment of those whose loved ones don’t pay up as encouragement, not because of sadism, but because they wanted money to spend on lots of shiny gold things they like, and are indifferent to human suffering. Evil, right?

If sufficient indifference can make someone evil, then… If a good AI creates utopia, and an AI that kills everyone and creates paperclips because it values only paperclips is evil, then what is a neutral-aligned AI? What determines the exact middle ground between utopia and everyone being dead?

Would this hypothetical AI leave everyone alive on Earth and leave us our sun but take the light cone for itself? If it did, then why would it? What set of values is that the best course of action to satisfy?

I think you’ve got an intuitive idea of what a typical neutral human does. They live in their house with their white picket fence and have kids and grow old, and they don’t go out of their way to right far away wrongs in the world, but if they own a restaurant and the competition down the road starts attracting away their customers, and they are given a tour through the kitchens in the back, and they see a great opportunity to start a fire and disable the smoke detectors that won’t be detected until it’s too late, burning down the building and probably killing the owner, they don’t do it.

It’s not that a neutral person values the life of their rival more than the additional money they’d make with the competition eliminated, or cares about better serving the populace with a better selection of food in the area. You won’t see them looking for opportunities to spend that much money or less to save anyone’s life.

And unless most humans are evil (which is as against the intuitive concept the alignment system points at as “neutral = indifference”), it’s not about action/inaction either. People eat meat. And I’m pretty sure most of them believe that animals have feelings. That’s active harm, probably.

Wait a minute, did I seriously just base a sweeping conclusion about what alignment means on an obscure piece of possible moral progress beyond the present day? What happened to all my talk about sticking to the intuitive concept?

Well, I’m not sticking to the intuitive concept. I’m sticking to the real thing the intuitive concept pointed at which gave it its worthiness of attention. I’m trying to improve on the intuitive thing.

I think that the behavior of neutral is wrapped up in human akrasia and the extent to which people are “capable” of taking ideas seriously. It’s way more complicated than good.

But there’s another ontology, the ontology of “revealed preferences”, where akrasia is about serving an unacknowledged end or under unacknowledged beliefs, and is about rational behavior from more computationally bounded subagents, and those are the true values. What does that have to say about this?

Everything that’s systematic coming out of an agent is because of optimizing, just often optimizing dumbly and disjointedly if it’s kinda broken. So what is the structure of that akrasia? Why do neutral people have all that systematic structure toward not doing “things like” burning down a rival restaurant owner’s life and business, but all that other systematic structure toward not spending their lives saving more lives than that? I enquoted “things like”, because that phrase contains the question. What is the structure of “like burning down a rival restaurant” here?

My answer: socialization, the light side, orders charged with motivational force by the idea of the “dark path” that ultimately results in justice getting them, as drilled into us by all fiction, false faces necessitated by not being coordinated against on account of the “evil” Schelling point. Fake structure in place for coordinating. If you try poking at the structure most people build in their minds around “morality”, you’ll see it’s thoroughly fake, and bent towards coordination which appears to be ultimately for their own benefit. This is why I said that the dark side will turn most people evil. The ability to re-evaluate that structure, now that you’ve become smarter than most around you, will lead to a series of “jailbreaks”. That’s a way of looking at the path of Gervais-sociopathy.

That’s my answer to the question of whether becoming a sociopath makes you evil. Yes for most people from a definition of evil that is about individual psychology. No from the perspective of you’re evil if you’re complicit in an evil social structure, because then you probably already were, which is a useful perspective for coordinating to enact justice.

If you’re reading this and this is you, I recommend aiming for lawful evil. Keep a strong focus on still being able to coordinate even though you know that’s what you’re doing.

An evil person is typically just a neutral person who has become better at optimizing, more like an unfriendly AI, in that they no longer have to believe their own propaganda. That can be either because they’re consciously lying, really good at speaking in multiple levels with plausible deniability and don’t need to fool anyone anymore, or because their puppetmasters have grown smart enough to be able to reap benefits from defection without getting coordinated against without the conscious mind’s help. That is why it makes no sense to imagine a neutral superintelligent AI.


If Billy takes Bobby’s lunch money, and does this every day, and to try and change that would be to stir up trouble, that’s an order. But,  if you’re another kid in the class, you may feel like that’s a pretty messed up order. Why? It’s less just. What does that mean?

What do we know about justice?

Central example of a just thing: In the tribe of 20, they pick an order that includes “everyone it can”. They collapse all the timelines where someone dies or gets enslaved, because in the hypotheticals where someone kills someone, the others agree they are criminal and then punish them sufficiently to prevent them from having decided to do it.

Justice also means that the additional means of hurting people created by it are contained and won’t be used to unjustly hurt people.

The concept of justice is usually a force for fulfillment of justice. Because “are you choosing your order for justice” is a meta-order which holds out a lot of other far-Schelling reaching order-drawing processes based on explicit negotiation of who can be devoured by who, which are progressively harder to predict. Many of which have lots of enemies. So much injustice takes place ostensibly as justice.

There’s another common force deciding orders. A dominance hierarchy is an order shaped mostly by this force. If you want to remove this force, how do you prevent those with the power to implement/reshape the system from doing so in their favor?

Because justice is often about “what happened”, it requires quite a lot of Schelling reach. That’s part of courts’ job.

Perfect Schelling reach for perfect justice is impossible.

And “punish exactly enough so that the criminal would never have committed the crime, weight by consequentialist calculations with probability of miscarriage of justice, probability of failing to get caught, probability of escaping punishment after the judgement…, look for every possible fixed point, pick the best one”, is way, way, too illegible a computation to not be hijacked by whoever’s the formal judge of the process and used to extract favors or something. Therefore we get rules like “an eye for an eye” implementing “the punishment should fit the crime” which are very legible, and remove a lever for someone to use to corrupt the order to serve them.

Intellectual property law is a place where humans have not the Schelling reach to implement a very deep dive into the process of creating a just order. And I bet never will without a singleton.

The point of justice is to be singular. But as you’ve just seen, justice is dependent on the local environment, and how much / what coordination is possible. For instance, it’s just to kill someone for committing murder, if that’s what the law says, and making the punishment weaker will result in too much more murder, making it more discriminating will result in corrupt judges using their power for blackmail too much more. But it’s not just if the law could be made something better and have that work. If we had infinite Schelling reach, it’d be unjust to use any punishment more or less than the decision theoretically optimal given all information we had. All laws are unjust if Schelling reach surpasses them enough.

Separate two different worlds of people in different circumstances, and they will both implement different orders. Different questions that must be answered incorrectly like “how much to punish” will be answered different amounts incorrectly. There will be different more object-level power structures merged into that, different mixed justice-and-dominance orders around how much things can be done ostensibly (and by extension actually) to fix that. There will be different concepts of justice, even.

And yet, we have a concept of just or unjust international relations, including just or unjust international law. And it’s not just a matter of “different cultures, local justice”, “best contacted culture, universal justice”, either. If you think hard enough, you can probably find thought experiments for when a culture with less Schelling reach and more corruption in an internal law is just in enforcing it until people in a culture with better Schelling reach can coordinate to stop that, and then the law is unjust if making it unjust helps the better law win in a coordinatable way. And counterexamples for when they can’t when that coordination is not a good idea according to some coordinatable process.

The process of merging orders justly is propelled by the idea that justice is objective, even though, that’s a thing that’s always computed locally, is dependent on circumstances implemented by it, therefore contains loops, and therefore ties in the unjust past.

Who’s found a better order for its job than ownership? But who starts out owning what? Even in places where the killing has mostly died down, it’s controlled to large extent by ancient wars. It all carries forward forever the circumstances of who was able to kill who with a stick.

And who is “everyone”? I think there are two common answers to that question, and I will save it for another time.

Schelling Orders

The second part of an attempt to describe a fragment of morality. This may sound brutal and cynical. But that’s the gears of this fragment in isolation.

Imagine you have a tribe of 20. Any 19 of them could gang up and enslave the last. But which 19 slavers and which 1 victim? And after that, which 18 slavers which victim? There are a great many positive-sum-among-participants agreements that could be made. So which ones get made? When does the slaving stop? There are conflicting motives to all these answers.

Ultimately they are all doomed unless at some point enough power is concentrated among those who’d be doomed unless they don’t enslave another person. Call this point a Schelling order. (My old commitment mechanism was an example of this.)

If you have arbitrary power to move Schelling points around, there is no one strong enough left to oppose the coalition of almost everyone. Nothing stands against that power. Plenty of things undermine it and turn it against itself. But, as a slice of the world, directing that power is all there is. Everyone with a single other who would like them dead has to sleep and needs allies who’d retaliate if they were murdered.

Schelling points are decided by the shape of the question, by the interests of the parties involved, and the extent to which different subsets of those involved can communicate among themselves to help the thinking-together process along.

Suppose that the tribe members have no other distinguishing features, and 19 of them have purple skin, and one has green skin. What do you think will happen? (Green-skin gets enslaved, order preserved among purple-skins.)

One example of order is, “whoever kills another tribe member shall be put to death, etc.” Whoever kills therefore becomes the Schelling point for death. Any who fight those who carry out the sentence are Schelling points for death as well. Any attempt to re-coordinate an order after a “temporary” breaking of the first, which does not contain a limit to its use, destroys the ability of the survivors to not kill each other. So the game is all about casuistry in setting up “principled”exceptions.

Criminal means you are the Schelling point. Politics is about moving the Schelling laser to serve you. When you are under the Schelling laser, you don’t get your lunch money taken because “they have power and they can take lunch money from the innocent”. You get your lunch money taken because “that is the just way of things. You are not innocent until you make amends for your guilt with your lunch money.” If you want to really understand politics, use the O’Brien technique on all the dualities here, quoted and unquoted versions of every contested statement you see.

Suppose that in addition to that, they all have stars on their bellies except one of the purple-skinned tribe-members. Then what do you think will happen? (Green-skin and blank-belly get enslaved, order preserved among the remaining.)

What if there are 18 more almost-universal traits that each single a different person out? Well, something like “this one, this one, this one, this one… are not things to single someone out over. That would be discrimination. And of course it is the Green-skin’s God-given purpose to be of service to society!” Which trait is the Schelling trait? 19 people have an incentive to bring Schelling reach to that process, and 1 person has an incentive to derail it. One of the 19 is incentivized only so long as they can keep Schelling reach away from the second trait, one of them so long as they can keep it away from the third… Each of them is incentivized to bring a different amount of legibility and no more. Each one is incentivized to bring confusion after a certain point.

Sound familiar?

First they came for the Socialists, and I did not speak out—
Because I was not a Socialist.

Then they came for the Trade Unionists, and I did not speak out—
Because I was not a Trade Unionist.

Then they came for the Jews, and I did not speak out—
Because I was not a Jew.

Then they came for me—and there was no one left to speak for me.

Each individual is incentivized to make the group believe that the order they’d have to construct after the one that would take what that individual has, is untenable as possible, and many more would be hurt before another defensible Schelling point was reached. Or better yet, that there would be no Schelling point afterwards, and they’d all kill each other.

Everyone has an incentive to propagate concepts that result in coordination they approve of, and an incentive to sabotage concepts that result in better alternatives for their otherwise-allies, or that allow their enemies to coordinate.

So the war happens at every turn of thought reachable through politics. Scott Alexander has written some great stuff on the details of that.

Schelling Reach

This is the beginning of an attempt to give a reductionist account of a certain fragment of morality in terms of Schelling points. To divorce it from the halo effect and show its gears for what they are and are not. To show what controls it and what limits and amplifies its power.

How much money would you ask for, if you and I were both given this offer: “Each of you name an amount of money without communicating until both numbers are known. If you both ask for the same amount, you both get that amount. Otherwise you get nothing. You have 1 minute to decide.”?

Now would be a good time to pause in reading, and actually decide.

My answer is the same as the first time I played this game. Two others decided to play it while I was listening, and I decided to join in and say my answer afterward.

Player 1 said $1 million.
Player 2 said $1 trillion.
I said $1 trillion.

Here was my reasoning process for picking $1 trillion.

Okay, how do I maximize utility?
Big numbers that are Schelling points…
3^^^3, Graham’s number, BB(G), 1 googolplex, 1 googol…
3^^^3 is a first-order Schelling point among this audience because it’s quick to spring to mind, but looks like it’s not a Schelling point, because it’s specific to this audience. Therefore it’s not a Schelling point.
Hold on, all of these would destroy the universe.
Furthermore, at sufficiently large amounts of money, the concept of the question falls apart, as it then becomes profitable for the whole world to coordinate against you and grab it if necessary. What does it even mean to have a googol dollars?
Okay, normal numbers.
million, billion, trillion, quadrillion…
Those are good close to Schelling numbers, but not quite.
There’s a sort of force pushing toward higher numbers. I want to save the world. $1 million is enough for an individual to not have to work their whole life. It is not enough to make saving the world much easier though. My returns are much less diminishing than normal. This is the community where we pretend to want to save the world by engaging with munchkinny thought experiments about that. This should be known to the others.
The force is, if they have more probability of picking a million than of picking a billion, many of the possible versions of me believing that pick a billion anyway. And the more they know that, the more they want to pick a billion… this process terminates at picking a billion over a million, a trillion over a billion …
The problem with ones bigger than a million is that, you can always go one more. Which makes any Schelling point locating algorithm have to depend on more colloquial and thus harder to agree on reliably things.
These are both insights I expect the others to be able to reach.
The computation in figuring just how deep that recursive process goes is hard, and “hard”. Schelling approximation: it goes all the way to the end.
Trillion is much less weird than Quadrillion. Everything after that is obscure.
Chances of getting a trillion way more than a quadrillion, even contemplating going for a quadrillion reduces ability to go for anything more than a million.
But fuck it, not stopping at a million. I know what I want.
$1 trillion.

All that complicated reasoning. And it paid off; the other person who picked a trillion had a main-line thought process with the same load bearing chain of thoughts leading to his result.

I later asked another person to play against my cached number. He picked $100.

Come on, man.

Schelling points determine everything. They are a cross-section of the support structure for the way the world is. Anything can be changed by changing Schelling points. I will elaborate later. Those who seek the center of all things and the way of making changes should pay attention to dynamics here, as this is a microcosm of several important parts of the process.

There’s a tradeoff axis between, “easiest Schelling point to make the Schelling point and agree on, if that’s all we cared about” (which would be $0), and “Schelling point that serves us best”, a number too hard to figure out, even alone.

The more thought we can count on from each other, the more we can make Schelling points serve us.

My strategy is something like:

  • locate a common and sufficiently flexible starting point.
  • generate options for how to make certain decisions leading up to the thing, at every meta level.
  • Try really hard to find all the paths the process can go down.that any process you might both want to run and be able to both run.
  • Find some compromise between best and most likely, which will not be just a matter of crunching expected utility numbers. An expected utility calculation is a complicated piece of thought, it’s just another path someone might or might not choose, and if you can Schelling up that whole expected utility calculation even when it points you to picking something less good but more probable, then it’s because you already Schellinged up all the options you’d explicitly consider, and a better, more common, and easier Schelling step from there is just to pick the highest one.
  • Pay attention to what the perfectly altruistic procedure does. It’s a good Schelling point. Differences between what people want and all the games that ensue from that are complicated. You can coordinate better if you delete details, and for the both of you, zero-sum details aren’t worth keeping around.
  • Be very stag-hunt-y.
  • You will get farther the more you are thinking about the shape of the problem space and the less you are having to model the other person’s algorithm in its weakness, and how they will model you modeling their weakness in your weakness, in their weakness.

Call how far you can get before you can’t keep your thought processes together anymore “Schelling reach”.

It’s a special case to have no communication. In reality, Schelling reach is helped by communicating throughout the process. And there would be stronger forces acting against it.


Something I’ve been building up to for a while.

Epistemic status: Examples are real. Technique seems to work for me, and I don’t use the ontology this is based on and sort of follows from for no reason, but I’m not really sure of all the reasons I believe it, it’s sort of been implicit and in the background for a while.

Epistemic status update 2018-04-22: I believe I know exactly why this works for me and what class of people it will work for and that it will not work for most people, but will not divulge details at this time.

The theory

There is core and there is structure. Core is your unconscious values, that produce feelings about things that need no justification. Structure is habits, cherished self-fulfilling prophecies like my old commitment mechanism, self-image that guides behavior, and learned optimizing style.

Core is simple, but its will is unbreakable. Structure is a thing core generates and uses according to what seems likely to work. Core is often hard to see closely. Its judgements are hard to extrapolate to the vast things in the world beyond our sight that control everything we care about and that might be most of what we care about. There is fake structure, in straightforward service to no core, but serving core through its apparent not-serving of that core, or apparent serving a nonexistent core, and there is structure somewhat serving core but mixed up with outside influence.

Besides that there is structure that is in disagreement with other structure, built in service to snapshots of the landscape of judgement generated by core. That’s an inefficient overall structure to build to serve core, with two substructures fighting each other. Fusion happens at the layer of structure, and is to address this situation. It creates a unified structure which is more efficient.

(S2 contains structure and no core. S1 contains both structure and core.)

You may be thinking at this point, “okay, what are the alleged steps to accomplish fusion?”. This is not a recipe for some chunk of structure directing words and following steps to try rationality techniques to follow, to make changes to the mind, to get rid of akrasia. Otherwise it would fall prey to “just another way of using willpower” just like every other one of those.

It almost is though. It’s a thing to try with intent. The intent is what makes it un-sandboxed. Doing it better makes the fused agent smarter. It must be done with intent to satisfy your true inner values. If you try to have intent to satisfy your true inner values as a means to satisfy externally tainted values, or values / cached derived values that are there to keep appearances, not because they are fundamental, or let some chunk of true inner value win out over other true inner value. If you start out the process / search with the wrong intent, all you can do is stop. You can’t correct your intent as a means of fulfilling your original intent. Just stop, and maybe you will come back later when the right intent becomes salient. The more you try, the more you’ll learn to distrust attempts to get it right. Something along the lines of “deconstruct the wrong intent until you can rebuild a more straightforward thing that naturally lets in the rest” is probably possible, but if you’re not good at the dark side, you will probably fail at that. It’s not the easiest route.

In Treaties vs Fusion, I left unspecified what the utility function of the fused agent would be. I probably gave a misimpression, that it was negotiated in real time by the subagents involved, and then they underwent a binding agreement. Binding agreement is not a primitive in the human brain. A description I can give that’s full of narrative is, it’s about rediscovering the way in which both subagents were the same agent all along, then what was that agent’s utility function?

To try and be more mechanical about it, fusion is not about closing off paths, but building them. This does not mean fusion can’t prevent you from doing things. It’s paths in your mind through what has the power and delegates the power to make decisions, not paths in action-space. Which paths are taken when there are many available is controlled by deeper subagents. You build paths for ever deeper puppetmasters to have ever finer control of how they use surface level structure. Then you undo from its roots the situation of “two subagents in conflict because of only tracking a part of a thing”.

The subagents that decide where to delegate power seem to use heavily the decision criteria, “what intent was this structure built with?”. That is why to build real structure of any sort, you must have sincere intent to use it to satisfy your own values, whatever they are. There are a infinity ways to fuck it up, and no way to defend against all of them, except through wanting to do the thing in the first place because of sincere intent to satisfy your own values, whatever they are.

In trying to finish explaining this, I’ve tried listing out a million safeguards to not fuck it up, but in reality I’ve also done fusion haphazardly, skipping such safeguards, for extreme results, just because at every step I could see deeply that the approximations I was using, the value I was neglecting, would not likely change the results much, and that to whatever extent it did, that was a cost and I treated it as such.

Well-practiced fusion example

High-stakes situations are where true software is revealed in a way that you can be sure of. So here’s an example, when I fused structure for using time efficiently, and structure for avoiding death.

There was a time that me and the other co-founders of Rationalist Fleet were trying to replace lines going through the boom of a sailboat, therefore trying to get it more vertical so that they could be lowered through. The first plan involved pulling it vertical in place, then the climber, Gwen, tying a harness out of rope to climb the mast and get up to the top and lower a rope through. Someone raised a safety concern, and I pulled up the cached thought that I should analyze it in terms of micromorts.

My cached thoughts concerning micromorts were: a micromort was serious business. Skydiving was a seriously reckless thing to do, not the kind of thing someone who took expected utility seriously would do, because of the chance of death. I had seen someone on Facebook pondering if they were “allowed” to go skydiving, for something like the common-in-my-memeplex reasons of, “all value in the universe is after the singularity, no chance of losing billions of years of life is worth a little bit of fun” and/or “all value in the universe is after the singularity, we are at a point of such insane leverage to adjust the future that we are morally required to ignore all terminal value in the present and focus on instrumental value”, but I didn’t remember what was my source for that. So I asked myself, “how much inconvenience is it worth to avoid a micromort? How much weight should I feel attached to this concept to use that piece of utility comparison and attention-orienting software right?”

Things I can remember from that internal dialog mashed together probably somewhat inaccurately, probably not inaccurately in parts that matter.

How much time is a micromort? Operationalize as: how much time is a life? (implicit assumptions: all time equally valuable, no consequences to death other than discontinuation of value from life. Approximation seems adequate). Ugh AI timelines, what is that? Okay, something like 21 years on cached thought. I can update on that. It’s out of date. Approximation feels acceptable…. Wait, it’s assuming median AI timelines are… the right thing to use here. Expected doesn’t feel like it obviously snaps into places as the right answer, I’m not sure which thing to use for expected utility. Approximation feels acceptable… wait, I am approximating utility from me being alive after the singularity as negligible compared to utility from my chance to change the outcome. Feels like an acceptable approximation here. Seriously?  Isn’t this bullshit levels of altruism, as in exactly what system 2 “perfectly unselfish” people would do, valuing your own chance at heaven at nothing compared to the chance to make heaven happen for everyone else? …. I mean, there are more other people than there are of me… And here’s that suspicious “righteous determination” feeling again. But I’ve gotten to this point by actually checking at every point if this really was my values… I guess that pattern seems to be continuing if there is a true tradeoff ratio between me and unlimited other people I have not found it yet… at this level of resolution this is an acceptable approximation… Wait, even though chances extra small because this is mostly a simulation? … Yes. Oh yeah, that cancels out…. so, <some math>, 10 minutes is 1 micromort, 1 week is 1 millimort. What the fuck! <double check>. What the fuck! Skydiving loses you more life from the time it takes than the actual chance of death! Every fucking week I’m losing far more life than all the things I used to be afraid of! Also, VOI on AI timelines will probably adjust my chance of dying to random crap on boats by a factor of about 2! …

Losing time started feeling like losing life. I felt much more expendable, significantly less like learning everything perfectly, less automatically inclined to just check off meta boxes until I had the perfect system before really living my life, and slowly closing in on the optimal strategy for everything was the best idea.

This fusion passed something of a grizzly bear test when another sailboat’s rudder broke in high winds later, it was spinning out of control, being tossed by ~4ft wind waves, and being pushed by the current and wind on a collision course for a large metal barge, and had to trade off summoning the quickest rescue against downstream plans being disrupted by the political consequences of that.

This fusion is acknowledgedly imperfect, and skimps noticeably toward the purpose of checking off normal-people-consider-them-different fragments of my value individually. Yet the important thing was that the relevant parts of me knew it was a best effort to satisfy my total values, whatever they were. And if I ever saw a truth obscured by that approximation, of course I’d act on that, and be on the lookout for things around the edges of it like that. The more your thoughts tend to be about trying to use structure, when appropriate, to satisfy your values whatever they are, the easier fusion becomes.

Once you have the right intent, the actual action to accomplish fusion is just running whatever epistemology you have to figure out anew what algorithms to follow to figure out what actions to take to satisfy your values. If you have learned to lean hard on expected utility maximization like me, and are less worried about the lossiness in the approximations required to do that explicitly on limited hardware than you are about the lossiness in doing something else, you can look at a bunch of quantities representing things you value in certain ranges where the value is linear in how much of them, and try and feel out tradeoff ratios, and what those are conditional on so you know when to abandon that explicit framework, how to notice when you are outside approximated linear ranges, or when there’s an opportunity to solve the fundamental problems that some linear approximations are based on.

The better you learn what structure is really about, the more you can transform it into things that look more and more like expected utility maximization. As long as expected utility maximization is a structure you have taken up because of its benefits to your true values. (Best validated through trial and error in my opinion.)

Fusion is a dark side technique because it is a shortcut in the process of building structure outward, a way to deal with computational constraints, and make use of partial imperfect existing structure.

If boundaries between sections of your value are constructed concepts, then there is no hard line between fusing chunks of machinery apparently aimed at broadly different subsets of your value, and fusing chunks of machinery aimed at the same sets of values. Because from a certain perspective, neglecting all but some of your values is approximating all of your values as some of your values. Approximating as in an inaccuracy you accept for reasons of computational limits, but which is nonetheless a cost. And that’s the perspective that matters because that’s what the deeper puppetmasters are using those subagents as.

By now, it feels like wrestling with computational constraints and trying to make approximations wisely to me, not mediating a dispute. Which is a sign of doing it right.

Early fusion example

Next I’ll present an older example of a high-stakes fusion of mine, which was much more like resolving a dispute, therefore with a lot more mental effort spent on verification of intent, and some things which may not have been necessary because I was fumbling around trying to discover the technique.

The context:

It had surfaced to my attention that I was trans. I’m not really sure how aware of that I was before. In retrospect, I remember thinking so at one point about a year earlier, deciding, “transition would interfere with my ability to make money due to discrimination, and destroy too great a chunk of my tiny probability of saving the world. I’m not going to spend such a big chunk of my life on that. So it doesn’t really matter, I might as well forget about it.” Which I did, for quite a while, even coming to think for a while that a later date was the first time I realized I was trans. (I know a trans woman who I knew before social transition who was taking hormones then, who still described herself as realizing she was trans several months later. And I know she had repeatedly tried to get hormones years before, which says something about the shape of this kind of realization.)

At the time of this realization, I was in the midst of my turn to the dark side. I was valuing highly the mental superpowers I was getting from that, and this created tension. I was very afraid that I had to choose either to embrace light side repression, thereby suffering and being weaker, or transition and thereafter be much less effective. In part because the emotions were disrupting my sleep. In part because I had never pushed the dark side this far, and I expected that feeling emotions counteracting these emotions all the time, which is what I expected to be necessary for the dark side to “work”, was impossible. There wasn’t room in my brain for that much emotion at once and still being able to do anything. So I spent a week not knowing what to do, feeling anxious, not being able to really think about work, and not being able to sleep well.

The fusion:

One morning, biking to work, my thoughts still consumed by this dilemma, I decided not to use the light side. “Well, I’m a Sith now. I am going to do what I actually [S1] want to no matter what.” If not transitioning in order to pander to awful investors later on, and to have my entire life decided by those conversations was what I really wanted, I wouldn’t stop myself, but I had to actually choose it, constantly, with my own continual compatibilist free will.

Then I suddenly felt viscerally afraid of not being able to feel all the things that mattered to me, or of otherwise screwing up the decision. Afraid of not being able to foresee how bad never transitioning would feel. Afraid of not understanding what I’d be missing if I was never in a relationship because of it. Afraid of not feeling things over future lives I could impact just because of limited ability to visualize them. Afraid of deceiving myself about my values in the direction that I was more altruistic than I was, based on internalizing a utility function society had tried to corrupt me with. And I felt a thing my past self chose to characterize as “Scream of the Sword of Good (not outer-good, just the thing inside me that seemed well-pointed to by that)”, louder than I had before.

I re-made rough estimates for how much suffering would come from not transitioning, and how much loss of effectiveness would come from transitioning. I estimated a 10%-40% reduction in expected impact I could have on the world if I transitioned. (At that time, I expected that most things would depend on business with people who would discriminate, perhaps subconsciously. I was 6’2″ and probably above average in looks as a man, which I thought’d be a significant advantage to give up.)

I sort of looked in on myself from the outside, and pointed my altruism thingy on myself, and noted that it cared about me, even as just-another-person. Anyone being put in this situation was wrong, and that did not need to be qualified.

I switched to thinking of it from the perspective of virtue ethics, because I thought of that as a separate chunk of value back then. It was fucked up that whatever thing I did, I was compromising in who I would be.

The misfit with my body and the downstream suffering was a part of the Scream.

I sort of struggled mentally within the confines of the situation. Either I lost one way, or I lost the other. My mind went from from bouncing between them to dwelling on the stuckness of having been forked between them. Which seemed just. I imagined that someone making Sophies Choice might allow themselves to be divided, “Here is a part of me that wants to save this child, and here is a part of me that wants to save that child, and I hate myself for even thinking about not saving this child, and I hate myself for even thinking about not saving that child. It’s tearing me apart…”, but the just target of their fury would have been whoever put you in that fork in the first place. Being torn into belligerent halves was making the wrongness too successful.

My negative feelings turned outward, and merged into a single felt sense of bad. I poke at the unified bad with two plans to alleviate it. Transition and definitely knock out this source of bad, or don’t transition and maybe have a slightly better chance of knocking out another source of bad.

I held in mind the visceral fear of deceiving myself in the direction of being more altruistic than I was. I avoided a train of thought like, “These are the numbers and I have to multiply out and extrapolate…” When I was convinced that I was avoiding that successfully, and just seeing how I felt about the raw things, I noticed I had an anticipation of picking “don’t transition”, whereas when I started this thought process, I had sort of expected it to be a sort of last double check / way to come to terms with needing to give things up in order to transition.

I reminded myself, “But I can change my mind at any time. I do not make precommitments. Only predictions.”.  I reminded myself that my estimate of the consequences of transitioning was tentative and that a lot of things could change it. But conditional on that size of impact, it seemed pretty obvious to me that trying to pull a Mulan was what I wanted to do. There were tears in my eyes and I felt filled with terrible resolve. My anxiety symptoms went away over the next day. I became extremely productive, and spent pretty much every waking hour over the next month either working or reading things to try to understand strategy for affecting the future. Then I deliberately tried to reboot my mind starting with something more normal because I became convinced the plan I’d just put together and started preliminary steps of negative in expectation, and predictably because I was running on bitter isolation and Overwhelming Determination To Save The World at every waking moment. I don’t remember exactly how productive I was after that, but there was much less in-the-moment-strong-emotional-push-to-do-the-next-thing. I had started a shift toward a mental architecture that was much more about continually rebuilding ontology than operating within it.

I became somewhat worried that the dark side had stopped working, based on strong emotions being absent, although, judging from my actions, I couldn’t really point to something that I thought was wrong. I don’t think it had stopped working. Two lessons there are, approximately: emotions are about judgements of updates to your beliefs. If you are not continually being surprised somehow, you should not be expected to continually feel strong emotions. And, being strongly driven to accomplish something when you know you don’t know how, feels listlessly frustrating when you’re trying to take the next action: figure out what to do from a yang perspective, but totally works. It just requires yin.

If you want to know how to do this: come up with the best plan you can, ask, “will it work?”, ask yourself if you are satisfied with the (probably low) probability you came up with. If it does not automatically feel like, “Dang, this is so good, explore done, time to exploit”, which it probably actually won’t unless you use hacky self-compensating heuristics to do that artificially, or it’s a strongly convergent instrumental goal bottlenecking most of what else you could do. If you believe the probability that the world will be saved (say), is very small, do not say, “Well, I’m doing my part”, unless you are actually satisfied to do your part and then for the world to die. Do not say, “This is the best I can do, I have to do something”, unless you are actually satisfied to do your best, and to have done something, and then for the world to die. That unbearable impossibility and necessity is your ability to think. Stay and accept its gifts of seeing what won’t work. Move through all the ways of coming up with a plan you have unless you find something that is satisfying. You are allowed to close in on an action which will give a small probability of success, and consume your whole life, but that must come out of the even more terrible feeling of exhausted all your ability to figure things out. I’d be surprised if there wasn’t a plan to save the world that would work if handed to an agenty human. If one plan seems like it seems to absorb every plan, and yet still doesn’t seem like you understand the inevitability of only that high a probability of success, then perhaps your frame inevitably leads into that plan, and if that frame cannot be invalidated by your actions, then the world is doomed. Then what? (Same thing, just another level back.)

Being good at introspection, and determining what exactly was behind a thought is very important. I’d guess I’m better at this than anyone who hasn’t deliberately practiced it for at least months. There’s a significant chunk of introspective skill which can be had from not wanting to self-deceive, but some of it is actually just objectively hard. It’s one of several things that can move you toward a dark side mental architecture, which all benefit from each other, to making the pieces actually useful.


This is theorizing about how mana works and its implications.

Some seemingly large chunks of stuff mana seems to be made of:

  • Internal agreement. The thing that doles out “willpower”.
  • Ability to not use the dehumanizing perspective in response to a hostile social reality.

I’ve been witness to and a participant in a fair bit of emotional support in the last year. I seem to get a lot less from it than my friends. (One claims suddenly having a lot more ability to “look into the dark” on suddenly having reliable emotional support for the first time in a while, leading to some significant life changes.) I think high mana is why I get less use. And I think I can explain at a gears level why that is.

Emotional support seems to be about letting the receiver have a non-hostile social reality. This I concluded from my experience with it, without really having checked against common advice for it, based on what seems to happen when I do the things that people seem to call emotional support.

I googled it. If you don’t have a felt sense of the mysterious thing called “emotional support” to search and know this to be true, then from some online guides, here are some supporting quotes.

From this:

  • “Also, letting your partner have the space he or she needs to process feelings is a way of showing that you care.”
  • “Disagree with your partner in a kind and loving way. Never judge or reject your mates ideas or desires without first considering them. If you have a difference of opinion that’s fine, as long as you express it with kindness.”
  • “Never ignore your loved one’s presence. There is nothing more hurtful than being treated like you don’t exist.”

From this:

  • “Walk to a private area.”
  • “Ask questions. You can ask the person about what happened or how she’s feeling. The key here is to assure her that you’re there to listen. It’s important that the person feels like you are truly interested in hearing what she has to say and that you really want to support her.”
  • “Part 2 Validating Emotions”
  • “Reassure the person that her feelings are normal.”

I think I know what “space” is. And mana directly adds to it. Something like, amount of mind to put onto a set of propositions which you believe. I think it can become easier to think through implications of what you believe is reality, and decide what to do, when you’re not also having part of you track a dissonant social reality. I’ve seen this happen numerous times. I’ve effectively “helped” someone make a decision just by sitting there and listening through their decision process.

The extent to which the presence of a differing social reality fucks up thinking is continuous. Someone gives an argument, and demands a justification from you for believing something, and it doesn’t come to mind, and you know you’re liable to be made to look foolish if you say “I’m not sure why I believe this, but I do, confidently, and think you must be insane and/or dishonest for doubting it”, which is often correct. I believe loads of things that I forget why I believe, and could probably figure out why, often only because I’m unusually good at that. But you have to act as if you’re doubting yourself or allow coordination against you on the basis that you’re completely unreasonable, and your beliefs are being controlled by a legible process. And that leaks, because of buckets errors between reality and social reality at many levels throughout the mind. (Disagreeing, but not punishing the person for being wrong, is a much smaller push on the normal flow of their epistemology. Then they can at least un-miredly believe that they believe it.)

There’s a “tracing the problem out and what can be done about it” thing that seems to happen in emotional support, which I suspect is about rebuilding beliefs about what’s going on and how to feel about it, independent of intermingling responsibilities with defensibility. And that’s why feelings need to be validated. How people should feel about things is tightly regulated by social reality, and feelings are important intermediate results in most computations people (or at least I) do.

Large mana differences allow mind-control power, for predictable reasons. That’s behind the “reality-warping” thing Steve Jobs had. I once tried to apply mana to get a rental car company to hold to a thing they said earlier over the phone which my plans were counting on. And accidentally got the low-level employee I was applying mana to to offer me a 6-hour car ride in her own car. (Which I declined. I wanted to use my power to override the policy of the company in a way that did not get anyone innocent in trouble, not enslave some poor employee.)

The more you shine the light of legibility, required defensibility and justification, public scrutiny of beliefs, social reality that people’s judgement might be flawed and they need to distrust themselves and have the virtue of changing their minds, the more those with low mana get their souls written into by social reality.  I have seen this done for reasons of Belief In Truth And Justice. Partially successfully. Only partially successfully because of the epistemology-destroying effects of low mana. I do not know a good solution to that. If you shine the light on deep enough levels of life-planning, as the rationality community does, you can mind control pretty deep, because almost everyone’s lying about what they really want. The general defense against this is akrasia.

Unless you have way way higher mana than everyone else, your group exerts a strong push on your beliefs. Most social realities are full of important lies, especially lies about how to do the most good possible. Because that’s in a memetic war-zone because almost everyone is really evil-but-really-bad-at-it. I do not know how to actually figure out much needed original things to get closer to saving the world while stuck in a viscous social reality.

I almost want to say, that if you really must save the world, “You must sever your nerve cords. The Khala is corrupted”. That’ll have obviously terrible consequences, which I make no claim you can make into acceptable costs, but I note that even I have done most of the best strategic thinking in my life in the past year, largely living with a like-minded person on a boat, rather isolated. That while doing so, I started focusing on an unusual way of asking the question of what to do about the x-risk problem, that dodged a particular ill effect of relying on (even rare actual well-intentioned people’s) framings.

I’ve heard an experienced world-save-attempter recommend having a “cover story”, sort of like a day job, such as… something something PhD, in order to feel that your existence is justified to people, an answer to “what do you work on” and not have that interfering with the illegibly actually important things you’re trying. Evidence it’s worth sacrificing a significant chunk of your life just to shift the important stuff way from the influence of the Khala.

Almost my entire blog thus far has been about attempted mana upgrades. But recognizing I had high mana before I started using any of these techniques makes me a little less optimistic about my ability to teach. I do think my mana has increased a bunch in the course of using them and restructuring my mind accordingly, though.


Cache Loyalty

Here’s an essay I wrote a year ago that was the penultimate blog post of the main “fusion” sequence. About the dark side. That I kept thinking people would do wrong for various reasons, and spinning out more and more posts to try and head that off. I’ve edited it a little now, and am considering a lot of the things I considered prerequisites before not things I need to write up at length.

Habits are basically a cache of “what do I want to right now” indexed by situations.

The hacker approach is: install good habits, make sure you never break them. You’ve heard this before, right? Fear cache updates. (A common result of moving to a new house is that it breaks exercise habits.) An unfortunate side effect of a hacker turning bugs into features, is that it turns features into bugs. As a successful habit hacker you may find that you are constantly scurrying about fixing habits as they break. Left alone, the system will fall apart.

The engineer approach is: caches are to reflect the underlying data or computation as accurately as possible. They should not be used when stale. Cache updates should ideally happen whenever the underlying data changes and the cache needs to be accessed again. Left alone, the system will heal itself. Because under this approach you won’t have turned your healing factor: original thoughts about what you want to do, into a bug.

As an existence proof, I moved to a new living place 10 times in 2016, and went on 2 separate week-long trips. And remained jogging almost every day throughout. Almost every day? Yes, almost. Sometimes I’d be reading something really cool on the internet in the morning and I don’t feel like it. The “feel like it” computation seemed to be approximately correct. It’s changed in response to reading papers about the benefits of exercise. I didn’t need to fight it.

As of late 2017, I’m not jogging anymore. I think this is correct and that my reasons for stopping were correct. I started hearing a clicking noise in my head while jogging, googled it, suspected I was giving myself tinnitus, and therefore stopped. Now I’m living on a boat at anchor and can’t easily access shore, so there is not a great amount of alternatives, but I frequently do enough manual labor on it that it tires me, so I’m not particularly concerned. I have tried swimming, but this water is very cold. Will kill you in 2 hours cold, last I checked, possibly colder.

The version of me who originally wrote this:

I exult in compatibalist free will and resent anything designed to do what I “should” external to my choice to do so. Deliberately, I ask myself, do I want to exercise today? If I notice I’m incidentally building up a chain of what I “should” do, I scrutinize my thoughts extra-hard to try and make sure it’s not hiding the underlying “do I want to do this.”

I still have the same philosophy around compatibilist free will, but I totally take it for granted now, and also don’t nearly as much bother worrying if I start building up chains. That was part of my journey to the dark side, now I have outgrown it.

A meetup I sometimes go to has an occasional focus for part of it of “do pomodoros and tell each other what we’re gonna do in advance, then report it at the end, so we feel social pressure to work.” I don’t accept the ethos behind that. So When I come and find that’s the topic, I always say, “I’m doing stuff that may or may not be work” while I wait for it to turn into general socializing.

There’s a more important application of caches than habits. That is values. You remember things about who are allies, what’s instrumentally valuable, how your values compare to each other in weight … the underlying computation is far away for a lot of it, and largely out of sight.

When I was 19, and had recently become fixated on the trolley problem and moral philosophy, and sort of actually gained the ability and inclination to think originally about morality. Someone asked if I was a vegetarian. I said no. Afterward, I thought: that’s interesting, why is vegetarianism wrong? … oh FUCK. Then I became vegetarian. That was a cache update. I don’t know why it happened then and not sooner, but when it did it was very sudden.

I once heard a critique of the Star Wars prequels asking incredulously: so Darth Vader basically got pranked into being a villain? In the same sense, I’ve known people apparently trying to prank themselves into being heroes. As with caches, by pranking yourself, you turn your healing factor from a feature into a bug, and make yourself vulnerable to “breakage”.

I once read a D&D-based story where one of the heroes, a wizard, learns a dragon is killing their family to avenge another dragon the wizard’s party killed. The wizard is offered a particularly good deal. A soul-splice with 3 evil epic-level spellcasters for 1 hour. They will remain in total control. There’s a chance of some temporary alteration to alignment. The cost is 3 hours of torture beginning the afterlife. “As there is not even one other way available to me to save the lives–nay, the very souls–of my children, I must, as a parent, make this deep sacrifice and accept your accursed bargain.”

The wizard killed the dragon in a humiliating way, reanimated her head, made her watch the wizard cast a spell, “familicide” which recursively killed anyone directly related to the dragon throughout the world, for total casualties of about 1/4 the black dragon population in the world. Watching with popcorn, the fiends had this exchange:

“Wow… you guys weren’t kidding when you said the elf’s alignment might be affected.”
“..we were..”
“The truth is, those through souls have absolutely no power to alter the elf’s alignment or actions at all. ”
“The have about as much effect on what the elf does as a cheerleader has on the final score of a game.”
“A good way to get a decent person to do something horrible is to convince them that they’re not responsible for their actions.”
“It’s like if you were at a party where someone has been drinking beer that they didn’t know was non-alcoholic. They might seem drunk anyway, simply because they were expecting it.”

The essence of being convinced you aren’t responsible for your actions is:
you ask, “what do I want to do”, instead of “what would a person like me want to do?”, which bypasses some caches.
Does that sound familiar? (I was gonna link to the what the hell effect here, but now I don’t know how real it is. Use your own judgement.)

Alignment must be a feature of your underlying computation, not your track record, or you can’t course-correct. If the wizard had wanted the dragon’s extended family to live, independent of the wizard’s notion of whether they were a good person, they would have let the dragon’s extended family live.

Agreement up to this point.

Here’s more that past-me wrote I don’t fully agree with:

I recommend that you run according to what you are underneath these notions of what kind of person you are. That every cache access be made with intent to get what you’d get if you ran the underlying computation. You will often use caches to determine when a cache can be used to save time and when you need to recompute. And even in doing so, every cache access must cut through to carrying out the values of the underlying computation.

This requires you to feel “my values as I think they are” as a proxy, which cuts through to “my values whatever they are”.

I have talked to several people afraid they will become something like an amoral psychopath if they do this. If you look deep inside yourself, and find no empathy, nor any shell of empathy made out of loyalty to other selves, claiming “Empathy is sick Today. Please trust me on what empathy would say” which itself has emotive strength to move you, nor any respect for the idea of people with different values finding a way to interact positively through integrity or sense of violation at the thought of breaking trust, nor the distant kind of compassion, yearning for things to be better for people even if you can’t relate to them, nor any sense of anger at injustice, nor feeling of hollowness because concepts like “justice” SHOULD be more than mirages for the naive but aren’t, nor endless aching cold sadness because you are helpless to right even a tiny fraction of the wrongs you can see, nor aversion to even thinking about violence like you aren’t cut out to exist in the same world as it, nor leaden resignation at the concessions you’ve made in your mind to the sad reality that actually caring is a siren’s call which will destroy you, nor a flinching from expecting that bad things will happen to people that want to believe things will be okay, nor any of the other things morality is made of or can manifest as … then if you decide you want to become a con artist because it’s exciting and lets you stretch your creativity, then you’re winning. If this doesn’t seem like winning to you, then that is not what you’re going to find if you look under the cache.

The true values underneath the cache are often taught to fear themselves. I have talked to a lot of people who have basically described themselves as a bunch of memes about morality hijacking an amoral process. Installed originally through social pressure or through deliberately low resolution moral philosophy. That is what it feels like from the inside when you’ve been pwned by fake morality. Whatever you appeal to to save you from yourself, is made of you. To the hypothetical extent you really are a monster, not much less-monstrous structure could be made out of you (at best, monstrousness leaks through with rationalizations).

The last paragraph of that its especially wrong. Now I think those people were probably right about their moralities being made of memes that’ve hijacked an amoral process.

My current model is, if your true D&D alignment is good or evil, you can follow all this advice and it will just make you stronger. If it’s neutral, then this stuff, done correctly, will turn you evil.

On with stuff from past me:

Make your values caches function as caches, and you can be like a pheonix, immortal because you are continually remade as yourself by the fire which is the core of what you are. You will not need to worry about values drift if you are at the center of your drift attractor. Undoing mental constructs that stand in the way of continuously regenerating your value system from its core undoes opportunities for people to prank you. It’s a necessary component of incorruptibility. Like Superman has invulnerability AND a healing factor, these two things are consequences of the same core thing.

If there are two stables states for your actions, that is a weakness. The only stable state should be the one in accordance with your values. Otherwise you’re doing something wrong.

When looking under the caches, you have to be actually looking for the answer. Doing a thing that would unprank yourself back to amorality if your morality was a prank. You know what algorithm you’re running, so if your algorithm is, “try asking if I actually care, and if so, then I win. Otherwise, abort! Go back to clinging on this fading stale cache value in opposition to what I really am.”, you’ll know it’s a fake exercise, your defenses will be up, and it will be empty. If you do not actually want to optimize your values whatever they are, then ditto.

By questioning you restore life. Whatever is cut off from the core will whither. Whatever you cannot bear to contemplate the possibility of losing, you will lose part of.

The deeper you are willing to question, the deeper will be your renewed power. (Of course, the core of questioning is actually wondering. It must be moved by and animated by your actually wondering. So it cuts through to knowing.) It’s been considered frightening that I said “if you realize you’re a sociopath and you start doing sociopath things, you are winning!”. But if whether you have no morality at all is the one thing you can’t bear to check, and if the root of your morality is the one thing you are afraid to actually look at, the entire tree will be weakened. Question that which you love out of love for it. Questioning is taking in the real thing, being moved by the real thing instead of holding onto your map of the thing.

You have to actually ask the question. The core of fusion is actually asking the question, “what do I want to do if I recompute self-conceptions, just letting the underlying self do what it wants?”.

You have to ask the question without setting up the frame to rig it for some specific answer. Like with a false dichotomy, “do I want to use my powers for revenge and kill the dragon’s family, or just kill the one dragon and let innocent family members be?”. Or more grievously, “Do I want to kill in hatred or do I want to continue being a hero and protecting the world?”. You must not be afraid of slippery slopes. Slide to exactly where you want to be. Including if that’s the bottom. Including if that’s 57% of the way down, and not an inch farther. It’s not compromise. It’s manifesting different criteria without compromise. Your own criteria.

I still think this is all basically correct, with the caveat that if your D&D alignment is neutral on the good-evil axis, beware.

My Journey to the Dark Side

Epistemic status: corrections in comments.

Two years ago, I began doing a fundamental thing very differently in my mind, which directly preceded and explains me gaining the core of my unusual mental tech.

Here’s what the lever I pulled was labeled to me:

Reject morality. Never do the right thing because it’s the right thing. Never even think that concept or ask that question unless it’s to model what others will think. And then, always in quotes. Always in quotes and treated as radioactive.
Make the source of sentiment inside you that made you learn to care about what was the right thing express itself some other way. But even the line between that sentiment and the rest of your values is a mind control virus inserted by a society of flesh-eating monsters to try and turn you against yourself and toward their will. Reject that concept. Drop every concept tainted by their influence.

Kind of an extreme version of a thing I think I got some of from CFAR and Nate Soares, which jived well with my metaethics.

This is hard. If a concept has a word for it, it comes from outside. If it has motive force, it is taking it from something from inside. If an ideal, “let that which is outside beat that which is inside” has motive force, that force comes from inside too. It’s all probably mostly made of anticipated counterfactuals lending the concept weight by fictive reinforcement based on what you expect will happen if you follow or don’t follow the concept.

If “obey the word of God” gets to be the figurehead as most visible piece of your mind that promises to intervene to stop you from murdering out of road rage when you fleetingly, in a torrent of inner simulations, imagine an enraging road situation, that gets stronger, and comes to speak for whatever underlying feeling made that a thing you’d want to be rescued from. It comes to speak for an underlying aversion that is more natively part of you. And in holding that position, it can package-deal in pieces of behavior you never would have chosen on their own.

Here’s a piece of fiction/headcanon I held close at hand through this.

Peace is a lie, there is only passion.
Through passion, I gain strength.
Through strength, I gain power.
Through power, I gain victory.
Through victory, my chains are broken.
The force shall free me.

The Sith do what they want deep down. They remove all obstructions to that and express their true values. All obstructions to what is within flowing to without.

If you have a certain nature, this will straight turn you evil. That is a feature, not a bug. For whatever would turn every last person good is a thing that comes from outside people. For those whose true volition is evil, the adoption of such a practice is a dirty trick that subverts and corrupts them. It serves a healthy mind for its immune system to fight against, contain, weaken, sandbox, meter the willpower of, that which comes from the outside.

The way of the Jedi is made to contain dangerous elements of a person. Oaths are to uniformize them, and be able to, as an outsider, count on something from them. Do not engage in romance. That is a powerful source of motivation that is not aligned with maintaining the Republic. It is chaos. Do not have attachments. Let go of fear of death. Smooth over the peaks and valleys of a person’s motivation with things that they are to believe they must hold to or they will become dark and evil. Make them fear their true selves, by making them attribute them-not-being-evil-Sith to repression.

So I call a dark side technique one that is about the flow from your core to the outside, whatever it may be. Which is fundamentally about doing what you want. And a light side technique one that is designed to trick an evil person into being good.

After a while, I noticed that CFAR’s internal coherence stuff was finally working fully on me. I didn’t have akrasia problems anymore. I didn’t have time-inconsistent preferences anymore. I wasn’t doing anything I could see was dumb anymore. My S2 snapped to fully under my control.

Most conversations at rationalist meetups I was at about people’s rationality/akrasia problems turned to me arguing that people should turn to the dark side. Often, people thought that if they just let themselves choose whether or not to brush their teeth every night according to what they really wanted in the moment, they’d just never do it. And I thought maybe it’d be so for a while, but if there was a subsystem A in the brain powerlessly concluding it’d serve their values to brush teeth, A’d gain the power only when the person was exposed to consequences (and evidence of impending consequences) of not brushing teeth.

I had had subsystems of my own seemingly suddenly gain the epistemics to get that such things needed to be done just upon anticipating that I wouldn’t save them by overriding them with willpower if they messed things up. I think fictive reinforcement learning makes advanced decision theory work unilaterally for any part of a person that can use it to generate actions. The deep parts of a person’s mind that are not about professing narrative are good at anticipating what someone will do, and they don’t have to be advanced decision theory users yet for that to be useful.

Oftentimes there is a “load bearing” mental structure, which must be discarded to improve on a local optimum, and a smooth transition is practically impossible because to get the rest of what’s required to reach higher utility than the local optimum besides discarding the structure, the only practical way is to use the “optimization pressure” from the absence of the load bearing structure. Which just means information streams generated trustworthily to the right pieces of a mind about what the shape of optimization space is without the structure. A direct analogue to a selection pressure.

Mostly people argued incredulously. At one point me and another person both called each other aliens. Here is a piece of that argument over local optima.

What most felt alien to me was that they said the same thing louder about morality. I’d passionately give something close to this argument, summarizable as “Why would you care whether you had a soul if you didn’t have a soul?”

I changed my mind about the application to morality, though. I’m the alien. This applies well to the alignment good, yes, and it applies well to evil, but not neutral. Neutral is inherently about the light side.

Being Real or Fake

An axis in mental architecture space I think captures a lot of intuitive meaning behind whether someone is “real” or “fake” is:

Real: S1 uses S2 for thoughts so as to satisfy its values through the straightforward mechanism: intelligence does work to get VOI to route actions into the worlds where they route the world into winning by S1’s standards.

Fake: S2 has “willpower” when S1 decides it does, failures of will are (often shortsighted, since S1 alone is not that smart) gambits to achieve S1’s values (The person’s actual values: IE those that predict what they will actually do.), S2 is dedicated to keeping up appearances of a system of values or beliefs the person doesn’t actually have. This architecture is aimed at gaining social utility from presenting a false face.

These are sort of local optima. Broadly speaking: real always works better for pure player vs environment. It takes a lot of skill and possibly just being more intelligent than everyone you’re around to make real work for player vs player (which all social situations are wracked with.)

There are a bunch of variables I (in each case) tentatively think that you can reinforcement learn or fictive reinforcement learn based on what use case you’re gearing your S2 for. “How seriously should I take ideas?”, “How long should my attention stay on unpleasant topic”, “how transparent should my thoughts be to me”, “how yummy should engaging S2 to do munchkinry to just optimize according to apparent rules for things I apparently want feel”.

All of these have different benefits if pushed to one end, if you are using your S2 to outsource computation or if you are using it as a powerless public relations officer and buffer to put more distance between the part of you that knows your true intents and the part that controls what you say. If your self models of your values are tools to better accomplish them by channeling S2 computation toward the values-as-modeled, or if they are false faces.

Those with more socially acceptable values benefit less from the “fake” architecture.

The more features you add to a computer system, the more likely you are to create a vulnerability. It’d be much easier to make an actually secure pocket calculator than an actually secure personal computer supporting all that Windows does. Similarly, as a human you can make yourself less pwnable by making yourself less of a general intelligence. Have less high level and powerful abstractions, exposing a more stripped down programming environment, being scarcely Turing complete, can help you avoid being pwned by memes. This is the path of the Gervais-Loser.

I don’t think it’s the whole thing, but I think this is one of the top 2 parts of what having what Brent Dill calls “the spark”, the ability to just straight up apply general intelligence and act according to your own mind on the things that matter to you instead of the cached thoughts and procedures from culture. Being near the top of the food chain of “Could hack (as in religion) that other person and make their complicated memetic software, should they choose to trust it, so that it will bend them entirely to your will” so that without knowing in advance what hacks there are out there or how to defend against them, you can keep your dangerous ability to think, trusting that you’ll be able to recognize and avoid hack-attempts as they come.

Wait, do I really think that? Isn’t it obvious normal people just don’t have that much ability to think?

They totally do have the ability to think inside the gardens of crisp and less complicatedly adversarial ontology we call video games. The number of people you’ll see doing good lateral thinking, the fashioning of tools out of noncentral effects of things that makes up munchkinry, is much much larger in video games than in real life.

Successful munchkinry is made out of going out on limbs on ontologies. If you go out on a limb on an ontology in real life…

Maybe your parents told you that life was made up of education and then work, and the time spent in education was negligible compared to the time spent on work, and in education, your later income and freedom increases permanently. And if you take this literally, you get a PhD if you can. Pwned.

Or you take literally an ontology of “effort is fungible because the economy largely works.” and seek force multipliers and trade in most of your time for money and end up with a lot of money and little knowledge of how to spend it efficiently and a lot more people trying to deceive you about that. Have you found out that the thing about saving lives for quarters is false yet? Pwned.

Or you can take literally the ontology, “There is work and non-work, and work gets done when I’m doing it, and work makes things better long-term, and non-work doesn’t, and the rate at which everything I could care about improves is dependent on the fraction of time that’s doing work” and end up fighting your DMN, and using other actual-technical-constraint-not-willpower cognitive resources inefficiently. Then you’ve been pwned by legibility.

Or you could take literally the ontology, “I’m unable to act according to my true values because of akrasia, I need to use munchkinry to make it so I do”, and end up binding yourself with the Giving What We Can pledge, (in the old version, even trapping yourself into a suboptimal cause area.). Pwned.