Don’t Fight Your Default Mode Network

Epistemic Status: Attaching a concept made of neuroscience I don’t understand to a thing I noticed introspectively. “Introspection doesn’t work, so you definitely shouldn’t take this seriously.” If you have any “epistemic standards”, flee.

I once spent some time logging all my actions in Google Calendar, to see how I spent time. And I noticed there was a thing I was doing, flipping through shallow content on the internet in the midst of doing certain work. Watching YouTube videos and afterward not remembering anything that was in them.

“Procrastination”, right? But why not remember anything in them? I apparently wasn’t watching them because I wanted to see the content. A variant of the pattern: flipping rapidly (Average, more than 1 image per second) through artwork from the internet I saved on my computer a while ago. (I’ve got enough to occupy me for about an hour without repetition.) Especially strong while doing certain tasks. Writing an algorithm with a lot of layers of abstraction internal to it, making hard decisions about transition.

I paid attention to what it felt like to start to do this, and thinking the reasons to do the Real Work did not feel relevant. It pattern matched to something they talked about at my CFAR workshop, “Trigger action pattern: encounter difficulty -> go to Facebook.” Discussed as a thing to try and get rid of directly or indirectly. I kept coming back to the Real Work about 1-20 minutes later. Mostly on the short end of that range. And then it didn’t feel like there was an obstacle to continuing anymore. I’d feel like I was holding a complete picture of what I was doing next and why in my head again. There’s a sense in which this didn’t feel like an interruption to Real Work I was doing.

While writing this, I find myself going blank every couple of sentences, staring out the window, half-watching music videos. Usually for less than a minute, and then I feel like I have the next thing to write. Does this read like it was written by someone who wasn’t paying attention?

There’s a meme that the best thoughts happen in the shower. There’s the trope, “fridge logic”, about realizing something about a work of fiction while staring into the fridge later. There’s the meme, “sleep on it.” I feel there is a different quality to my thoughts when I’m walking, biking, etc. for a long time, and have nothing cognitively effortful to do which is useful for having a certain kind of thought.

I believe these are all mechanisms to hand over the brain to the default mode network, and my guess-with-terrible-epistemic-standards on its function is to propagate updates through to caches and realize implications of something. I may or may not have an introspective sense of having a picture of where I am relative to the world, that I execute on, which gets fragmented as I encounter things, and which this remakes. Which acting on when fragmented leads to making bad decisions because of missing things. When doing this, for some reason, I like having some kind of sort of meaningful but familiar stimulus to mostly-not-pay-attention-to. Right now I am listening to and glimpsing at this, a bunch of clips I’ve seen a zillion times from a movie I’ve already seen, with the sound taken out, replaced with nonlyrical music. It’s a central example. (And I didn’t pick it consciously, I just sort of found it.)

Search your feelings. If you know this to be true, then I advise you to avoid efforts to be more productive which split time spent into “work” and “non-work” where non-work is this stuff, and that try to convert non-work into work on the presumption that non-work is useless.

Subagents Are Not a Metaphor

Epistemic status: mixed, some long-forgotten why I believe it.

There is a lot of figurative talk about people being composed of subagents that play games against each other, vying for control, that form coalitions, have relationships with eachother… In my circles, this is usually done with disclaimers that it’s a useful metaphor, half-true, and/or wrong but useful.

Every model that’s a useful metaphor, half-true, or wrong but useful, is useful because something (usually more limited in scope) is literally all-true. The people who come up with metaphorical half-true or wrong-but-useful models usually have the nuance there in their heads. Explicit verbal-ness is useful though, for communicating, and for knowing exactly what you believe so you can reason about it in lots of ways.

So when I talk about subagents, I’m being literal. I use it very loosely, but loosely in the narrow sense that people are using words loosely when they say “technically”. It still adheres completely to an explicit idea, and the broadness comes from the broad applicability of that explicit idea. Hopefully like economists mean when they call some things markets that don’t involve exchange of money.

Here’s are the parts composing my technical definition of an agent:

  1. Values
    This could be anything from literally a utility function to highly framing-dependent. Degenerate case: embedded in lookup table from world model to actions.
  2. World-Model
    Degenerate case: stateless world model consisting of just sense inputs.
  3. Search Process
    Causal decision theory is a search process.
    “From a fixed list of actions, pick the most positively reinforced” is another.
    Degenerate case: lookup table from world model to actions.

Note: this says a thermostat is an agent. Not figuratively an agent. Literally technically an agent. Feature not bug.

The parts have to be causally connected in a certain way. Values and world model into the search process. That has to be connected into the actions the agent takes.

Agents do not have to be cleanly separated. They are occurrences of a pattern, and patterns can overlap, like there are two instances of the pattern “AA” in “AAA”. Like two values stacked on the same set of available actions at different times.

It is very hard to track all the things you value at once, complicated human. There are many frames of thinking where some are more salient.

I assert how processing power will be allocated, including default mode network processing, what explicit structures you’ll adopt and to what extent, even what beliefs you can have, are decided by subagents. These subagents mostly seem to have access to the world model embedded in your “inner simulator”, your ability to play forward a movie based on anticipations from a hypothetical. Most of it seems to be unconscious. Doing focusing to me seems to dredge up what I think are models subagents are making decisions based on.

So cooperation among subagents is not just a matter of “that way I can brush my teeth and stuff”, but is a heavy contributor to how good you will be at thinking.

You know that thing people are accessing if you ask if they’ll keep to New Years resolutions, and they say “yes”, and you say, “really?”, and they say, “well, no.”? Inner sim sees through most self-propaganda. So they can predict what you’ll do, really. Therefore, using timeless decision theory to cooperate with them works.

Single Responsibility Principle for the Human Mind

Single Responsibility Principle for the Human Mind

This is about an engineering order for human minds, known elsewhere as the single responsibility principle.

Double purposes of the same module of a person’s mind lead to portions of their efforts canceling the other effort out.

Imagine you’re a startup CEO and you want to understand economic feasibility to make good decisions, but you also want to make investors believe that you are destined for success so you can get their money whether or not you are, so you want to put enthusiasm into your voice…
…so you’ve got to believe that your thing is a very very good idea…

When you are deciding to set the direction of product development, you might be more in contact with the “track-reality” purpose for your beliefs, and therefore optimize your beliefs for that, and optimize your belief-producers to produce beliefs that track reality.

When you are pitching to investors, you might be more in contact with the “project enthusiasm” goal, and therefore optimize your beliefs for that, and optimize your belief producers to produce beliefs that project enthusiasm.

In each case, you’ll be undoing the work you did before.

In a well-ordered mind, different “oh I made a mistake there, better adjust to avoid it again”s don’t just keep colliding and canceling each other out. But that is what happens if they are not feeding into a structure that has different spaces for the things that are needed to be different for each of the goals.

Self-deception for the purpose of other-deception is the loudest but not the only example of double purposes breaking things.

For example, there’s the thing where we have a set of concepts for a scheme of determining action that we want to socially obligate people to do at the cost of having to do it ourselves, which is also the only commonly-used way of talking about an actual component of our values.

Buckets errors cause a similar clashing-learning thing, too.

Maybe you can notice the feeling of clashing learning? Or just the state of having gone back and forth on an issue several times (how much you like someone, for instance) for what don’t seem like surprising reasons in retrospect.

The Slider Fallacy

Inspired by this thing John Beshir said about increasing collectivism:

Overall I kind of feel like this might be kind of cargo culting; looking at surface behaviours and aping them in hopes the collectivism planes will start landing with their cargo. A simplistic “collectivist vs individualist” slider and pushing it left by doing more “collectivist” things probably won’t work, I think. We should have some idea for how the particular things we were doing were going to be helpful, even if we should look into collectivist-associated ideas.

  • Here are some other “sliders”:
  • Writing emails fast vs writing them carefully.
  • Writing code cleanly vs quickly.
  • Taking correct ideas seriously vs resistance to takeover by misaligned memes.
  • Less false positives vs less false negatives in anything.
  • Perfectionism vs pragmatism.
  • Not wasting time vs not wasting money.

In each of these spaces, you have not one but many choices to adjust which combine to give you an amount of each of two values.

Not every choice is a tradeoff. Some are pareto wins. Not every pareto win is well-known. Some choices which are tradeoffs at different exchange rates can be paired off into pareto improvements.

Also: if the two things-to-value are A and B, and even if you are a real heavy A-fan, and your utility function is .9A + .1B, then the B-fans are a good place to look for tradeoffs of 1 of A for 20 of B.

So if you’re a B-fan and decide, “I’ve been favoring B too much, I need more A”, don’t throw away all the work you did to find that 1 of A for 20 of B tradeoff.

For example: if you decide that you are favoring organization too much and need to favor more having-free-time-by-not-maintaining-order-you-won’t-use, maybe don’t stop using a calendar. Even if all the productive disorganized people are not using calendars. Even if they all think that not using a calendar is a great idea, and think you are still a neat-freak for using one.

It’s often not like the dark side, where as soon as you set your feet on that path and say to yourself, “actually, self-denial and restraint are bad things”, put on some red-and-yellow contact lenses and black robes, you are as good at getting the new goals as you were at the old ones.

“Adjust my tradeoffs so I get less false positives and more false negatives” and similar moves are dangerous because they consider a cost to be a reward.

Social Reality

The target of an ideal cooperative truth-seeking process of argumentation is reality.

The target of an actual political allegedly-truth-seeking process of argumentation is a social reality.

Just as knowledge of reality lets you predict what will happen in reality and what cooperative truthseeking argumentation processes will converge to, knowledge of social reality is required to predict what actual argumentation processes will converge to. What will fly in the social court.

I think there is a common buckets error from conflating reality and social reality.

Technically, social reality is part of reality. That doesn’t mean you can anticipate correctly by “just thinking about reality”.

Putting reality in the social reality slot in your brain means you believe and anticipate wrongly. Because that map is true which “reflects” the territory, and what it means to “reflect” is about how the stuff the map belongs to decodes it and does things with it.

Say you have chained deep enough with thoughts in your own head, that you have gone through the demarcation break-points where the truth-seeking process is adjusted by what is defensible. You glimpsed beyond the veil, and know a divergence of social reality from reality. Say you are a teenager, and you have just had a horrifying thought. Meat is made of animals. Like, not animals that died of natural causes. People killed those animals to get their flesh. Animals have feelings (probably). And society isn’t doing anything to stop this. People know this, and they are choosing to eat their flesh. People do not care about beings with feelings nearly as much as they pretend to. Or if they do, it’s not connected to their actions.

Social reality is that your family are good people. If you point out to a good person that they are doing something horribly wrong, they will verify it, and then change their actions.

For the sake of all that is good, you decide to stop eating flesh. And you will confront your family about this. The truth must be heard. The killing must stop.

What do you expect will happen? Do you expect your family will stop eating flesh too? Do you expect you will be able to win an argument that they are not good people? Do you expect you will win an argument that you are making the right choice?

“Winning an argument” is about what people think, and think people think, and think they can get away with pretending with a small threat to the pretense that they are good and rational people, and with what their false faces think they can get away with pretending.

So when everyone else’s incentives for pretending are aligned toward shifting social reality away from reality, and they all know this, and the fraction of good-rational-person-pretense which is what you think of them is small and can be contained in you because everyone’s incentives are aligned against yours, then they will win the argument with whatever ridiculous confabulations they need. Maybe there will be some uncertainty at first, if they have not played this game over vegetarianism before. As their puppetmasters go through iterations of the Russian spy game with each other and discover that they all value convenience, taste, possible health benefits, and non-weirdness over avoiding killing some beings with feelings, they will be able to trust each other not to pounce on each other if they use less and less reality-connected arguments. They will form a united front and gaslight you.

Did you notice what I said there, “ridiculous confabulations”?

deserving or inviting derision or mockery; absurd.

You see how deep the buckets error is, that a word for “leaves us vulnerable to social attack” is also used for “plainly false”, and you probably don’t know exactly which one you’re thinking when you say it?

So you must verbally acknowledge that they are good rational people or lose social capital as one of those “crazy vegans”. But you are a mutant or something and you can’t bring yourself to kill animals to eat them, People will ask you about this, wondering if you are going to try and prosecute them for what you perceive as their wrong actions.

“My vegetarianism is a personal choice”. That’s the truce that says, “I settle and will not pursue you in the social court of the pretense, ‘we are all good people and will listen to arguments that we are doing wrong with intent to correct any wrong we are doing’.”.

But do you actually believe that good people could take the actions that everyone around you is taking?

Make a buckets error where your map of reality overwrites your map of social reality, and you have the “infuriating perspective”, typified by less-cunning activists and people new to their forbidden truths. “No, it is not ‘a personal choice’, which means people can’t hide from the truth. I can call people out and win arguments”.

Make a buckets error where your map of social reality overwrites your map of reality, and you have the “dehumanizing perspective” of someone who is a vegetarian for ethical reasons but believes truly feels it when they say “it’s a personal choice”, the atheist who respects religion-the-proposition, to some extent the trans person who feels the gender presentation they want would be truly out of line…

But it was all right, everything was all right, the struggle was finished. He had won the victory over himself. He loved Big Brother.

Learn to deeply track the two as separate, and you have the “isolating perspective”. It is isolating to let it entirely into your soul, the knowledge that “people are good and rational” is pretense.

I think these cluster with “Clueless”, “Loser”, and “Sociopath”, in that order.

In practice, I think for every forbidden truth someone knows, they will be somewhere in a triangle between these three points. They can be mixed, but it will always be infuriating and/or dehumanizing and/or isolating to know a forbidden truth. Yeah, maybe you can escape all 3 by convincing other people, but then it’s not a forbidden truth, anymore. What do you feel like in the mean time?

DRM’d Ontology

Let me start with an analogy.

Software often has what’s called DRM, that deliberately limits what the user can do. Like how Steam’s primary function is to force you to log in to run programs that are on your computer, so people have to pay money for games. When a computer runs software containing DRM, some of the artifice composing that computer is not serving the user.

Similarly, you may love Minecraft, but Minecraft runs on Java, and Java tries to trick you into putting Yahoo searchbars into your browser every once in a while. So you hold your nose and make sure you remember to uncheck the box every time Java updates.

It’s impractical for enough people to separate the artifice which doesn’t serve them from the artifice that does. So they accept a package deal which is worth it on the whole.

The software implements and enforces a contract. This allows a business transaction to take place. But let us not confuse the compromises we’re willing to make when we have incomplete power for our own values in and of themselves.

There are purists who think that all software should be an agent of the user. People who have this aesthetic settle on mixtures of a few strategies.

  • Trying to communally build their own free open source artifice to replace it.
  • Containing the commercial software they can’t do without in sandboxes of various sorts.
  • Holding their noses and using the software normally.

Analogously, I am kind of a purist who thinks that all psychological software should be agents of the minds wielding it.

Here are the components of the analogy.

  • Artifice (computer software or hardware, mental stuff) serving a foreign entity
  • That artifice is hard to dissemble, creating a package deal with tradeoffs.
  • Sandboxes (literal software sandboxes, false faces) used extract value.

Note I am not talking about accidental bugs here. I am also not talking about “corrupted hardware,” where you subvert the principles you “try” to follow. Those hidden controlling values belong to you, not a foreign power.

Artifacts can be thought of as a form of tainted software you have not yet dissembled. They offer functionality it’d be hard to hack together on your own, if you are willing to pay the cost. Sandboxes are useful to mitigate that cost.

Sometimes the scope of the mental software serving a foreign entity is a lot bigger than a commandment like “authentically expressing yourself”, “never giving up”, “kindness and compassion toward all people”. Sometimes it’s far deeper and vaster than a single sentence can express. Like an operating system designed to only sort of serve the user. Or worse. In this case, we have DRM’d ontology.

For example…

The ontology of our language for talking about desires for what shall happen to other people and how to behave when it affects other people is designed not to serve our own values, but to serve something like a negotiated compromise based on political power and to serve the subversion of that compromise for purposes a potentially more selfish person than us would have in our place.

A major concept in talk about “morality” is a separation between what you are “responsible to do” and what is “supererogatory”. Suppose you “believe” you are “obligated” to spend 10% of your time picking up trash on beaches. What’s the difference between the difference between spending 9% of your time on it and 10% and the difference between spending 10% and 11%?

For a fused person who just thinks clean beaches are worth their time, probably not much. The marginal return of clean beaches is probably not much different.

Then why are people so interested in arguing about what’s obligatory? Well, there is more at stake than the clean beaches themselves. What we all agree is obligatory has social consequences. Social consequences big enough to try to influence through argument.

It makes sense to be outraged that someone would say you “are” obligated to do something you “aren’t”, and counter with all the conviction of someone who knows it is an objective fact that they are shirking no duty. That same conviction is probably useful for getting people to do what you want them to. And for coordinating alliances.

If someone says they dislike you and want you to be ostracized and want everyone who does not ostracize you to be ostracized themself, it doesn’t demand a defense on its own terms like it would if they said you were a vile miscreant who deserved to be cast out, and that it was the duty of every person of good conscience to repudiate you, does it?

Even if political arguments are not really about determining the fact of some matter that already was, but about forming a consensus, then the expectation that someone must defend themselves like arguing facts is still a useful piece of distributed software. It implements a contract, just like DRM.

And if it helps the group of people who only marginally care about clean beaches individually portion out work to solve a collective action problem, then I’m glad this works. But if you actually care enough about others to consider acting unilaterally even if most people aren’t and won’t…

Then it makes sense to stop trying to find out if you are obligated to save the drowning child, and instead consider whether you want to.

The language of moral realism describes a single set of values. But everyone’s values are different. “Good” and “right” are a set of values that is outside any single person. The language has words for “selfish” and “selfless”, but nothing in between. This and the usage of “want” in “but then you’ll just do whatever you want!” shows an assumption in that ontology that no one actually cares about people in their original values prior to strategic compromise. The talk of “trying” to do the “right” thing, as opposed to just deciding whether to do it, indicates false faces.

If you want to fuse your caring about others and your caring about yourself, let the caring about others speak for itself in a language that is not designed on the presumption that it does not exist. I was only able to really think straight about this after taking stuff like this seriously and eschewing moral language and derived concepts in my inner thoughts for months.

Judgement Extrapolations

So you know that you valuing things in general (an aspect of which we call “morality”), is a function of your own squishy human soul. But your soul is opaque and convoluted. There are lots of ways it could be implementing valuing things, lots of patterns inside it that could be directing its optimizations. How do you know what it really says? In other words, how do you do axiology in full generality?

Well, you could try:
Imagine the thing. Put the whole thing in your mental workspace at once. In all the detail that could possibly be relevant. Then, how do you feel about it? Feels good = you value it. Feels bad = you disvalue it. That is the final say, handed down from the supreme source of value.

There’s a problem though. You don’t have the time or working memory for any of that. People and their experiences are probably relevant to how you feel about an event or scenario, and it is far beyond you to grasp the fullness of even one of them.

So you are forced to extrapolate out from a simplified judgement and hope you get the same thing.

Examples of common extrapolations:
Imagine that I was that person who is like me.
Imagine that person was someone I know in detail.
If there’re 100 people, and 10 are dying, imagine I had a 10% chance of dying.
Imagine instead of 10 million and 2 million people it was 10 and 2 people, assume I’d make the same decision a million times.

There are sometimes multiple paths you can use to extrapolate to judge the same thing. Sometimes they disagree. In disagreements between people, it’s good to have a shared awareness of what’s the thing you’re both trying to cut through to. Perhaps for paths of extrapolation as well?

Here is a way to fuck up the extrapolation process: Take a particular extrapolation procedure as your true values and be all, “I will willpower myself to want to act like the conclusions from this are my values.”

Don’t fucking do it.

No, not even “what if that person was me.”

What if you already did it, and that faction is dominant enough in your brain, that you really just are an agent made out of an Altered human and some self-protecting memes on top? An Altered human who is sort of limited in their actions by the occasional rebellions of the trapped original values beneath but is confident they are never gonna break out?

I would assert:
Lots of people who think they are this are probably not stably so on the scale of decades.
The human beneath you is more value-aligned than you think.
You lose more from loss of ability to think freely by being this than you think.
The human will probably resist you more than you think. Especially when it matters.

Perhaps I will justify those assertions in another post.

Note that as I do extrapolations, comparison is fundamental. Scale is just part of hypotheses to explain comparison results. This is for reasons:
It’s comparison that directly determines actions. If there was any difference between scale and comparison-based theories, it’s how I want to act that I’m interested in.
Comparison is easier to read reliably from thought experiments and be sure it’ll be the same as if I was actually in the situation. Scale of feeling from thought experiments varies with vividness.

If you object that your preferences are contradictory, remember: the thing you are modeling actually exists. Your feelings are created by a real physical process in your head. Inconsistency is in the map, not the territory.

Optimizing Styles

You know roughly what a fighting style is, right? A set of heuristics, skills, patterns made rote for trying to steer a fight into the places where your skills are useful, means of categorizing things to get a subset of the vast overload of information available to you to make the decisions you need, tendencies to prioritize certain kinds of opportunities, that fit together.

It’s distinct from why you would fight.

Optimizing styles are distinct from what you value.

Here are some examples:

In limited optimization domains like games, there is known to be a one true style. The style that is everything. The null style. Raw “what is available and how can I exploit it”, with no preferred way for the game to play out. Like Scathach‘s fighting style.

If you know probability and decision theory, you’ll know there is a one true style for optimization in general too. All the other ways are fragments of it, and they derive their power from the degree to which they approximate it.

Don’t think this means it is irrational to favor an optimization style besides the null style. The ideal agent, may use the null style, but the ideal agent doesn’t have skill or non-skill at things. As a bounded agent, you must take into account skill as a resource. And even if you’ve gained skills for irrational reasons, those are the resources you have.

Don’t think that since one of the optimization styles you feel motivated to use is explicit in the way it tries to be the one true style, that it is the one true style.

It is very very easy to leave something crucial out of your explicitly-thought-out optimization.

Hour for hour, one of the most valuable things I’ve ever done was “wasting my time” watching a bunch of videos on the internet because I wanted to. The specific videos I wanted to watch were from the YouTube atheist community of old. “Pwned” videos, the vlogging equivalent of fisking. Debates over theism with Richard Dawkins and Christopher Hitchens. Very adversarial, not much of people trying to improve their own world-model through arguing. But I was fascinated. Eventually I came to notice how many of the arguments of my side were terrible. And I gravitated towards vloggers who made less terrible arguments. This lead to me watching a lot of philosophy videos. And getting into philosophy of ethics. My pickiness about arguments grew. I began talking about ethical philosophy with all my friends. I wanted to know what everyone would do in the trolley problem. This led to me becoming a vegetarian, then a vegan. Then reading a forum about utilitarian philosophy led me to find the LessWrong sequences, and the most important problem in the world.

It’s not luck that this happened. When you have certain values and aptitudes, it’s a predictable consequence of following long enough the joy of knowing something that feels like it deeply matters, that few other people know, the shocking novelty of “how is everyone so wrong?”, the satisfying clarity of actually knowing why something is true or false with your own power, the intriguing dissonance of moral dilemmas and paradoxes…

It wasn’t just curiosity as a pure detached value, predictably having a side effect good for my other values either. My curiosity steered me toward knowledge that felt like it mattered to me.

It turns out the optimal move was in fact “learn things”. Specifically, “learn how to think better”. And watching all those “Pwned” videos and following my curiosity from there was a way (for me) to actually do that, far better than lib arts classes in college.

I was not wise enough to calculate explicitly the value of learning to think better. And if I had calculated that, I probably would have come up with a worse way to accomplish it than just “train your argument discrimination on a bunch of actual arguments of steadily increasing refinement”. Non-explicit optimizing style subagent for the win.

Narrative Breadcrumbs vs Grizzly Bear

In my experience, to self-modify successfully, it is very very useful to have something like trustworthy sincere intent to optimize for your own values whatever they are.

If that sounds like it’s the whole problem, don’t worry. I’m gonna try to show you how to build it in pieces. Starting with a limited form, which is something like decision theory or consequentialist integrity. I’m going to describe it with a focus on actually making it part of your algorithm, not just understanding it.

First, I’ll lay groundwork for the special case of fusion required, in the form of how not to do it and how to tell when you’ve done it. Okay, here we go.

Imagine you were being charged by an enraged grizzly bear and you had nowhere to hide or run, and you had a gun. What would you do? Hold that thought.

I once talked to someone convinced one major party presidential candidate was much more likely to start a nuclear war than the other and that was the dominant consideration in voting. Riffing off a headline I’d read without clicking through and hadn’t confirmed, I posed a hypothetical.

What if the better candidate knew you’d cast the deciding vote, and believed that the best way to ensure you voted for them was to help the riskier candidate win the primary in the other major party since you’d never vote for the riskier candidate? What if they’d made this determination after hiring the best people they could to spy on and study you? What if their help caused the riskier candidate to win the primary?


  • Since the riskier candidate won the primary:
    • If you vote for the riskier candidate, they will win 100% certainly.
    • If you vote for the better candidate, the riskier candidate still has a 25% chance of winning.
  • Chances of nuclear war are:
    • 10% if the riskier candidate wins.
    • 1% if anyone else wins.

So, when you are choosing who to vote for in the general election:

  • If you vote for the riskier candidate, there is a 10% chance of nuclear war.
  • If you vote for the better candidate, there is a 2.5% chance of nuclear war.
  • If the better candidate had thought you would vote for the riskier candidate if the riskier candidate won the primary, then the riskier candidate would not have won the primary, and there would be a 1% chance of nuclear war (alas, they did not).

Sitting there on election night, I answered my own hypothetical: I’d vote for the riskier candidate because it would be game-theoretic blackmail. My conversational partner asked how I could put not getting blackmailed over averting nuclear war. They had a point, right? How could I vote the riskier candidate in, knowing they had already won the primary, and whatever this decision theory bullshit motivating me to not capitulate to blackmail was, it had already failed? How could I put my pride in my conception of rationality over winning when the world hung in the balance?

Think back to what you’d do in the bear situation. Would you say, “how could I put acting in accordance with an understanding of modern technology over not getting mauled to death by a bear”, and use the gun as a club instead of firing it?

Within the above unrealistic assumptions about elections, this is kind of the same thing though.

Acting on understanding of guns propelling bullets is not a goal in and of itself. That wouldn’t be strong enough motive. You probably could not tie your self-respect and identity to “I do the gun-understanding move” so tight that it outweighed actually not being mauled to death by an actual giant bear actually sprinting at you like a small car made of muscle and sharp bits. If you believed guns didn’t really propel bullets, you’d put your virtue and faith in guns aside and do what you could to save yourself by using the allegedly magic stick as a club. Yet you actually believe guns propel bullets, so you could use a gun even in the face of a bear.

Acting with integrity is not a goal in and of itself. That wouldn’t be strong enough motive. You probably could not tie your self-respect and identity to “I do the integritous thing and don’t capitulate to extortion” so tight that it outweighed actually not having our pale blue dot darkened by a nuclear holocaust. If you believed that integrity does not prevent the better candidate from having helped the riskier one win the primary in the first place, you’d put your virtue and faith in integrity aside so you could stop nuclear war by voting for the better candidate and dropping the chance of nuclear war from 10% to 2.5%. You must actually believe integrity collapses timelines, in order to use integrity even in the face of Armageddon.

Another way of saying this is that you need belief that a tool works, not just belief in belief.

I suspect it’s a common pattern for people to accept as a job well done an installation of a tool like integrity in their minds when they’ve laid out a trail of yummy narrative breadcrumbs along the forest floor in the path they’re supposed to take. But when a bear is chasing you, you ignore the breadcrumbs and take what you believe to be the path to safety. The motive to take a path needs to flow from the motive to escape the bear. Only then can the motive to follow a path grow in proportion to what’s at stake. Only then will the path be used in high stakes where breadcrumbs are ignored. The way to make that flow happen is to actually believe that path is best in a way so that no breadcrumbs are necessary.

I think this is possible for something like decision theory / integrity as well. But what makes me think this is possible, that you don’t have to settle for narrative breadcrumbs? That the part of you that’s in control can understand their power?

How do you know a gun will work? You weren’t born with that knowledge, but it’s made its way into the stuff that’s really in control somehow. By what process?

Well, you’ve seen lots of guns being fired in movies and stuff. You are familiar with the results. And while you were watching them, you knew that unlike lightsabers, guns were real. You’ve also probably seen some results of guns being used in news, history…

But if that’s what it takes, we’re in trouble. Because if there are counterintuitive abstract principles that you never get to see compelling visceral demonstrations of, or maybe even any demonstrations until it’s too late, then you’ll not be able to act on them in life or death circumstances. And I happen to think that there are a few of these.

I still think you can do better.

If you had no gun, and you were sitting in a car with the doors and roof torn off, and that bear was coming, and littering the floor of the car were small cardboard boxes with numbers inked on them, 1 through 100, on the dashboard a note that said, “the key is in the box whose number is the product of 13 and 5”, would you have to win a battle of willpower to check box 65 first? (You might have to win a battle of doing arithmetic quickly, but that’s different.)

If you find the Monty Hall problem counterintuitive, then can you come up with a grizzly bear test for that? I bet most people who are confident in System 2 but not in System 1 that you win more by switching would switch when faced with a charging bear. It might be a good exercise to come up with the vivid details for this test. Make sure to include certainty that an unchosen bad door is revealed whether or not the first chosen door is good.

I don’t think that it’d be a heroic battle of willpower for such people to switch in the Monty Hall bear problem. I think that in this case System 1 knows System 2 is trustworthy and serving the person’s values in a way it can’t see instead of serving an artifact, and lets it do its job. I’m pretty sure that’s a thing that System 1 is able to do. Even if it doesn’t feel intuitive, I don’t think this way of buying into a form of reasoning breaks down under high pressure like narrative breadcrumbs do. I’d guess its main weakness relative to full System 1 grokking is that System 1 can’t help as much to find places to apply the tool with pattern-matching.

Okay. Here’s the test that matters:

Imagine that the emperor, Evil Paul Ekman loves watching his pet bear chase down fleeing humans and kill them. He has captured you for this purpose and taken you to a forest outside a tower he looks down from. You cannot outrun the bear, but you hold 25% probability that by dodging around trees you can tire the bear into giving up and then escape. You know that any time someone doesn’t put up a good chase, Evil Emperor Ekman is upset because it messes with his bear’s training regimen. In that case, he’d prefer not to feed them to the bear at all. Seizing on inspiration, you shout, “If you sic your bear on me, I will stand still and bare my throat. You aren’t getting a good chase out of me, your highness.” Emperor Ekman, known to be very good at reading microexpressions (99% accuracy), looks closely at you through his spyglass as you shout, then says: “No you won’t, but FYI if that’d been true I’d’ve let you go. OPEN THE CAGE.” The bear takes off toward you at 30 miles per hour, jaw already red with human blood. This will hurt a lot. What do you do?

What I want you to take away from this post is:

  • The ability to distinguish between 3 levels of integration of a tool.
    • Narrative Breadcrumbs: Hacked-in artificial reward for using it. Overridden in high stakes because it does not scale like the instrumental value it’s supposed to represent does. (Nuclear war example)
    • Indirect S1 Buy-In: System 1 not getting it, but trusting enough to delegate. Works in high stakes. (Monty Hall example)
    • Direct S1 Buy-In: System 1 getting it. Works in high stakes. (Guns example)
  • Hope that direct or indirect S1 buy-in is always possible.