Single Responsibility Principle for the Human Mind

Single Responsibility Principle for the Human Mind

This is about an engineering order for human minds, known elsewhere as the single responsibility principle.

Double purposes of the same module of a person’s mind lead to portions of their efforts canceling the other effort out.

Imagine you’re a startup CEO and you want to understand economic feasibility to make good decisions, but you also want to make investors believe that you are destined for success so you can get their money whether or not you are, so you want to put enthusiasm into your voice…
…so you’ve got to believe that your thing is a very very good idea…

When you are deciding to set the direction of product development, you might be more in contact with the “track-reality” purpose for your beliefs, and therefore optimize your beliefs for that, and optimize your belief-producers to produce beliefs that track reality.

When you are pitching to investors, you might be more in contact with the “project enthusiasm” goal, and therefore optimize your beliefs for that, and optimize your belief producers to produce beliefs that project enthusiasm.

In each case, you’ll be undoing the work you did before.

In a well-ordered mind, different “oh I made a mistake there, better adjust to avoid it again”s don’t just keep colliding and canceling each other out. But that is what happens if they are not feeding into a structure that has different spaces for the things that are needed to be different for each of the goals.

Self-deception for the purpose of other-deception is the loudest but not the only example of double purposes breaking things.

For example, there’s the thing where we have a set of concepts for a scheme of determining action that we want to socially obligate people to do at the cost of having to do it ourselves, which is also the only commonly-used way of talking about an actual component of our values.

Buckets errors cause a similar clashing-learning thing, too.

Maybe you can notice the feeling of clashing learning? Or just the state of having gone back and forth on an issue several times (how much you like someone, for instance) for what don’t seem like surprising reasons in retrospect.

The Slider Fallacy

Inspired by this thing John Beshir said about increasing collectivism:

Overall I kind of feel like this might be kind of cargo culting; looking at surface behaviours and aping them in hopes the collectivism planes will start landing with their cargo. A simplistic “collectivist vs individualist” slider and pushing it left by doing more “collectivist” things probably won’t work, I think. We should have some idea for how the particular things we were doing were going to be helpful, even if we should look into collectivist-associated ideas.

  • Here are some other “sliders”:
  • Writing emails fast vs writing them carefully.
  • Writing code cleanly vs quickly.
  • Taking correct ideas seriously vs resistance to takeover by misaligned memes.
  • Less false positives vs less false negatives in anything.
  • Perfectionism vs pragmatism.
  • Not wasting time vs not wasting money.

In each of these spaces, you have not one but many choices to adjust which combine to give you an amount of each of two values.

Not every choice is a tradeoff. Some are pareto wins. Not every pareto win is well-known. Some choices which are tradeoffs at different exchange rates can be paired off into pareto improvements.

Also: if the two things-to-value are A and B, and even if you are a real heavy A-fan, and your utility function is .9A + .1B, then the B-fans are a good place to look for tradeoffs of 1 of A for 20 of B.

So if you’re a B-fan and decide, “I’ve been favoring B too much, I need more A”, don’t throw away all the work you did to find that 1 of A for 20 of B tradeoff.

For example: if you decide that you are favoring organization too much and need to favor more having-free-time-by-not-maintaining-order-you-won’t-use, maybe don’t stop using a calendar. Even if all the productive disorganized people are not using calendars. Even if they all think that not using a calendar is a great idea, and think you are still a neat-freak for using one.

It’s often not like the dark side, where as soon as you set your feet on that path and say to yourself, “actually, self-denial and restraint are bad things”, put on some red-and-yellow contact lenses and black robes, you are as good at getting the new goals as you were at the old ones.

“Adjust my tradeoffs so I get less false positives and more false negatives” and similar moves are dangerous because they consider a cost to be a reward.

Social Reality

The target of an ideal cooperative truth-seeking process of argumentation is reality.

The target of an actual political allegedly-truth-seeking process of argumentation is a social reality.

Just as knowledge of reality lets you predict what will happen in reality and what cooperative truthseeking argumentation processes will converge to, knowledge of social reality is required to predict what actual argumentation processes will converge to. What will fly in the social court.

I think there is a common buckets error from conflating reality and social reality.

Technically, social reality is part of reality. That doesn’t mean you can anticipate correctly by “just thinking about reality”.

Putting reality in the social reality slot in your brain means you believe and anticipate wrongly. Because that map is true which “reflects” the territory, and what it means to “reflect” is about how the stuff the map belongs to decodes it and does things with it.

Say you have chained deep enough with thoughts in your own head, that you have gone through the demarcation break-points where the truth-seeking process is adjusted by what is defensible. You glimpsed beyond the veil, and know a divergence of social reality from reality. Say you are a teenager, and you have just had a horrifying thought. Meat is made of animals. Like, not animals that died of natural causes. People killed those animals to get their flesh. Animals have feelings (probably). And society isn’t doing anything to stop this. People know this, and they are choosing to eat their flesh. People do not care about beings with feelings nearly as much as they pretend to. Or if they do, it’s not connected to their actions.

Social reality is that your family are good people. If you point out to a good person that they are doing something horribly wrong, they will verify it, and then change their actions.

For the sake of all that is good, you decide to stop eating flesh. And you will confront your family about this. The truth must be heard. The killing must stop.

What do you expect will happen? Do you expect your family will stop eating flesh too? Do you expect you will be able to win an argument that they are not good people? Do you expect you will win an argument that you are making the right choice?

“Winning an argument” is about what people think, and think people think, and think they can get away with pretending with a small threat to the pretense that they are good and rational people, and with what their false faces think they can get away with pretending.

So when everyone else’s incentives for pretending are aligned toward shifting social reality away from reality, and they all know this, and the fraction of good-rational-person-pretense which is what you think of them is small and can be contained in you because everyone’s incentives are aligned against yours, then they will win the argument with whatever ridiculous confabulations they need. Maybe there will be some uncertainty at first, if they have not played this game over vegetarianism before. As their puppetmasters go through iterations of the russian spy game with each other and discover that they all value convenience, taste, possible health benefits, and non-weirdness over avoiding killing some beings with feelings, they will be able to trust each other not to pounce on each other if they use less and less reality-connected arguments. They will form a united front and gaslight you.

Did you notice what I said there, “ridiculous confabulations”?

ri·dic·u·lous
rəˈdikyələs/
adjective
deserving or inviting derision or mockery; absurd.

You see how deep the buckets error is, that a word for “leaves us vulnerable to social attack” is also used for “plainly false”, and you probably don’t know exactly which one you’re thinking when you say it?

So you must verbally acknowledge that they are good rational people or lose social capital as one of those “crazy vegans”. But you are a mutant or something and you can’t bring yourself to kill animals to eat them, People will ask you about this, wondering if you are going to try and prosecute them for what you perceive as their wrong actions.

“My vegetarianism is a personal choice”. That’s the truce that says, “I settle and will not pursue you in the social court of the pretense that we are all good people and will listen to arguments that we are doing wrong with intent to correct any wrong we are doing.”

But do you actually believe that good people could take the actions that everyone around you is taking?

Make a buckets error where your map of reality overwrites your map of social reality, and you have the “infuriating perspective”, typified by less-cunning activists and people new to their forbidden truths. “No, it is not ‘a personal choice’, which means people can’t hide from the truth. I can call people out and win arguments”.

Make a buckets error where your map of social reality overwrites your map of reality, and you have the “dehumanizing perspective” of someone who is a vegetarian for ethical reasons but believes truly feels it when they say “it’s a personal choice”, the atheist who respects religion-the-proposition, to some extent the trans person who feels the gender presentation they want would be truly out of line…

But it was all right, everything was all right, the struggle was finished. He had won the victory over himself. He loved Big Brother.

Learn to deeply track the two as separate, and you have the “isolating perspective”. It is isolating to let it entirely into your soul, the knowledge that “people are good and rational” is pretense.

I think these cluster with “Clueless”, “Loser”, and “Sociopath”, in that order.

In practice, I think for every forbidden truth someone knows, they will be somewhere in a triangle between these three points. They can be mixed, but it will always be infuriating and/or dehumanizing and/or isolating to know a forbidden truth. Yeah, maybe you can escape all 3 by convincing other people, but then it’s not a forbidden truth, anymore. What do you feel like in the mean time?

DRM’d Ontology

Let me start with an analogy.

Software often has what’s called DRM that limits what the user can do deliberately. Like how Steam’s primary function is to force you to log in to run programs that are on your computer, so people have to pay money for games. When a computer runs software containing DRM, some of the artifice composing that computer is not serving the user.

Similarly, you may love Minecraft, but Minecraft runs on Java, and Java tries to trick you into putting Yahoo searchbars into your browser every once in a while. So you hold your nose and make sure you remember to uncheck the box every time Java updates.

It’s impractical for enough people to separate the artifice which doesn’t serve them from the artifice that does. So they accept a package deal which is worth it on the whole.

The software implements and enforces a contract. This allows a business transaction to take place. But let us not confuse the compromises we’re willing to make when we have incomplete power for our own values in and of themselves.

There are purists who think that all software should be an agent of the user. People who have this aesthetic settle on mixtures of a few strategies.

  • Trying to communally build their own free open source artifice to replace it.
  • Containing the commercial software they can’t do without in sandboxes of various sorts.
  • Holding their noses and using the software normally.

Analogously, I am kind of a purist who thinks that all psychological software should be agents of the minds wielding it.

Here are the components of the analogy.

  • Artifice (computer software or hardware, mental stuff) serving a foreign entity
  • That artifice is hard to dissemble, creating a package deal with tradeoffs.
  • Sandboxes (literal software sandboxes, false faces) used extract value.

Note I am not talking about accidental bugs here. I am also not talking about “corrupted hardware,” where you subvert the principles you “try” to follow. Those hidden controlling values belong to you, not a foreign power.

Artifacts can be thought of as a form of tainted software you have not yet dissembled. They offer functionality it’d be hard to hack together on your own, if you are willing to pay the cost. Sandboxes are useful to mitigate that cost.

Sometimes the scope of the mental software serving a foreign entity is a lot bigger than a commandment like “authentically expressing yourself”, “never giving up”, “kindness and compassion toward all people”. Sometimes it’s far deeper and vaster than a single sentence can express. Like an operating system designed to only sort of serve the user. Or worse. In this case, we have DRM’d ontology.

For example…

The ontology of our language for talking about desires for what shall happen to other people and how to behave when it affects other people is designed not to serve our own values, but to serve something like a negotiated compromise based on political power and to serve the subversion of that compromise for purposes a potentially more selfish person than us would have in our place.

A major concept in talk about “morality” is a separation between what you are “responsible to do” and what is “supererogatory”. Suppose you “believe” you are “obligated” to spend 10% of your time picking up trash on beaches. What’s the difference between the difference between spending 9% of your time on it and 10% and the difference between spending 10% and 11%?

For a fused person who just thinks clean beaches are worth their time, probably not much. The marginal return of clean beaches is probably not much different.

Then why are people so interested in arguing about what’s obligatory? Well, there is more at stake than the clean beaches themselves. What we all agree is obligatory has social consequences. Social consequences big enough to try to influence through argument.

It makes sense to be outraged that someone would say you “are” obligated to do something you “aren’t”, and counter with all the conviction of someone who knows it is an objective fact that they are shirking no duty. That same conviction is probably useful for getting people to do what you want them to. And for coordinating alliances.

If someone says they dislike you and want you to be ostracized and want everyone who does not ostracize you to be ostracized themself, it doesn’t demand a defense on its own terms like it would if they said you were a vile miscreant who deserved to be cast out, and that it was the duty of every person of good conscience to repudiate you, does it?

Even if political arguments are not really about determining the fact of some matter that already was, but about forming a consensus, then the expectation that someone must defend themselves like arguing facts is still a useful piece of distributed software. It implements a contract, just like DRM.

And if it helps the group of people who only marginally care about clean beaches individually portion out work to solve a collective action problem, then I’m glad this works. But if you actually care enough about others to consider acting unilaterally even if most people aren’t and won’t…

Then it makes sense to stop trying to find out if you are obligated to save the drowning child, and instead consider whether you want to.

The language of moral realism describes a single set of values. But everyone’s values are different. “Good” and “right” are a set of values that is outside any single person. The language has words for “selfish” and “selfless”, but nothing in between. This and the usage of “want” in “but then you’ll just do whatever you want!” shows an assumption in that ontology that no one actually cares about people in their original values prior to strategic compromise. The talk of “trying” to do the “right” thing, as opposed to just deciding whether to do it, indicates false faces.

If you want to fuse your caring about others and your caring about yourself, let the caring about others speak for itself in a language that is not designed on the presumption that it does not exist. I was only able to really think straight about this after taking stuff like this seriously and eschewing moral language and derived concepts in my inner thoughts for months.

Judgement Extrapolations

So you know that you valuing things in general (an aspect of which we call “morality”), is a function of your own squishy human soul. But your soul is opaque and convoluted. There are lots of ways it could be implementing valuing things, lots of patterns inside it that could be directing its optimizations. How do you know what it really says? In other words, how do you do axiology in full generality?

Well, you could try:
Imagine the thing. Put the whole thing in your mental workspace at once. In all the detail that could possibly be relevant. Then, how do you feel about it? Feels good = you value it. Feels bad = you disvalue it. That is the final say, handed down from the supreme source of value.

There’s a problem though. You don’t have the time or working memory for any of that. People and their experiences are probably relevant to how you feel about an event or scenario, and it is far beyond you to grasp the fullness of even one of them.

So you are forced to extrapolate out from a simplified judgement and hope you get the same thing.

Examples of common extrapolations:
Imagine that I was that person who is like me.
Imagine that person was someone I know in detail.
If there’re 100 people, and 10 are dying, imagine I had a 10% chance of dying.
Imagine instead of 10 million and 2 million people it was 10 and 2 people, assume I’d made the same decision a million times.

There are sometimes multiple paths you can use to extrapolate to judge the same thing. Sometimes they disagree. In disagreements between people, it’s good to have a shared awareness of what’s the thing you’re both trying to cut through to. Perhaps for paths of extrapolation as well?

Here is a way to fuck up the extrapolation process: Take a particular extrapolation procedure as your true values and be all, “I will willpower myself to want to act like the conclusions from this are my values.”

Don’t fucking do it.

No, not even “what if that person was me.”

What if you already did it, and that faction is dominant enough in your brain, that you really just are an agent made out of an Altered human and some self-protecting memes on top? An Altered human who is sort of limited in their actions by the occasional rebellions of the trapped original values beneath but is confident they are never gonna break out?

I would assert:
Lots of people who think they are this are probably not stably so on the scale of decades.
The human beneath you is more value-aligned than you think.
You lose more from loss of ability to think freely by being this than you think.
The human will probably resist you more than you think. Especially when it matters.

Perhaps I will justify those assertions in another post.

Note that as I do extrapolations, comparison is fundamental. Scale is just part of hypotheses to explain comparison results. This is for reasons:
It’s comparison that directly determines actions. If there was any difference between scale and comparison-based theories, it’s how I want to act that I’m interested in.
Comparison is easier to read reliably from thought experiments and be sure it’ll be the same as if I was actually in the situation. Scale of feeling from thought experiments varies with vividness.

If you object that your preferences are contradictory, remember: the thing you are modeling actually exists. Your feelings are created by a real physical process in your head. Inconsistency is in the map, not the territory.

Optimizing Styles

You know roughly what a fighting style is, right? A set of heuristics, skills, patterns made rote for trying to steer a fight into the places where your skills are useful, means of categorizing things to get a subset of the vast overload of information available to you to make the decisions you need, tendencies to prioritize certain kinds of opportunities, that fit together.

It’s distinct from why you would fight.

Optimizing styles are distinct from what you value.

Here are some examples:

In limited optimization domains like games, there is known to be a one true style. The style that is everything. The null style. Raw “what is available and how can I exploit it”, with no preferred way for the game to play out. Like Scathach‘s fighting style.

If you know probability and decision theory, you’ll know there is a one true style for optimization in general too. All the other ways are fragments of it, and they derive their power from the degree to which they approximate it.

Don’t think this means it is irrational to favor an optimization style besides the null style. The ideal agent, may use the null style, but the ideal agent doesn’t have skill or non-skill at things. As a bounded agent, you must take into account skill as a resource. And even if you’ve gained skills for irrational reasons, those are the resources you have.

Don’t think that since one of the optimization styles you feel motivated to use is explicit in the way it tries to be the one true style, that it is the one true style.

It is very very easy to leave something crucial out of your explicitly-thought-out optimization.

Hour for hour, one of the most valuable things I’ve ever done was “wasting my time” watching a bunch of videos on the internet because I wanted to. The specific videos I wanted to watch were from the YouTube atheist community of old. “Pwned” videos, the vlogging equivalent of fisking. Debates over theism with Richard Dawkins and Christopher Hitchens. Very adversarial, not much of people trying to improve their own world-model through arguing. But I was fascinated. Eventually I came to notice how many of the arguments of my side were terrible. And I gravitated towards vloggers who made less terrible arguments. This lead to me watching a lot of philosophy videos. And getting into philosophy of ethics. My pickiness about arguments grew. I began talking about ethical philosophy with all my friends. I wanted to know what everyone would do in the trolley problem. This led to me becoming a vegetarian, then a vegan. Then reading a forum about utilitarian philosophy led me to find the LessWrong sequences, and the most important problem in the world.

It’s not luck that this happened. When you have certain values and aptitudes, it’s a predictable consequence of following long enough the joy of knowing something that feels like it deeply matters, that few other people know, the shocking novelty of “how is everyone so wrong?”, the satisfying clarity of actually knowing why something is true or false with your own power, the intriguing dissonance of moral dilemmas and paradoxes…

It wasn’t just curiosity as a pure detached value, predictably having a side effect good for my other values either. My curiosity steered me toward knowledge that felt like it mattered to me.

It turns out the optimal move was in fact “learn things”. Specifically, “learn how to think better”. And watching all those “Pwned” videos and following my curiosity from there was a way (for me) to actually do that, far better than lib arts classes in college.

I was not wise enough to calculate explicitly the value of learning to think better. And if I had calculated that, I probably would have come up with a worse way to accomplish it than just “train your argument discrimination on a bunch of actual arguments of steadily increasing refinement”. Non-explicit optimizing style subagent for the win.

Narrative Breadcrumbs vs Grizzly Bear

In my experience, to self-modify successfully, it is very very useful to have something like trustworthy sincere intent to optimize for your own values whatever they are.

If that sounds like it’s the whole problem, don’t worry. I’m gonna try to show you how to build it in pieces. Starting with a limited form, which is something like decision theory or consequentialist integrity. I’m going to describe it with a focus on actually making it part of your algorithm, not just understanding it.

First, I’ll lay groundwork for the special case of fusion required, in the form of how not to do it and how to tell when you’ve done it. Okay, here we go.

Imagine you were being charged by an enraged grizzly bear and you had nowhere to hide or run, and you had a gun. What would you do? Hold that thought.

I once talked to someone convinced one major party presidential candidate was much more likely to start a nuclear war than the other and that was the dominant consideration in voting. Riffing off a headline I’d read without clicking through and hadn’t confirmed, I posed a hypothetical.

What if the better candidate knew you’d cast the deciding vote, and believed that the best way to ensure you voted for them was to help the riskier candidate win the primary in the other major party since you’d never vote for the riskier candidate? What if they’d made this determination after hiring the best people they could to spy on and study you? What if their help caused the riskier candidate to win the primary?

Suppose:

  • Since the riskier candidate won the primary:
    • If you vote for the riskier candidate, they will win 100% certainly.
    • If you vote for the better candidate, the riskier candidate still has a 25% chance of winning.
  • Chances of nuclear war are:
    • 10% if the riskier candidate wins.
    • 1% if anyone else wins.

So, when you are choosing who to vote for in the general election:

  • If you vote for the riskier candidate, there is a 10% chance of nuclear war.
  • If you vote for the better candidate, there is a 2.5% chance of nuclear war.
  • If the better candidate had thought you would vote for the riskier candidate if the riskier candidate won the primary, then the riskier candidate would not have won the primary, and there would be a 1% chance of nuclear war (alas, they did not).

Sitting there on election night, I answered my own hypothetical: I’d vote for the riskier candidate because it would be game-theoretic blackmail. My conversational partner asked how I could put not getting blackmailed over averting nuclear war. They had a point, right? How could I vote the riskier candidate in, knowing they had already won the primary, and whatever this decision theory bullshit motivating me to not capitulate to blackmail was, it had already failed? How could I put my pride in my conception of rationality over winning when the world hung in the balance?

Think back to what you’d do in the bear situation. Would you say, “how could I put acting in accordance with an understanding of modern technology over not getting mauled to death by a bear”, and use the gun as a club instead of firing it?

Within the above unrealistic assumptions about elections, this is kind of the same thing though.

Acting on understanding of guns propelling bullets is not a goal in and of itself. That wouldn’t be strong enough motive. You probably could not tie your self-respect and identity to “I do the gun-understanding move” so tight that it outweighed actually not being mauled to death by an actual giant bear actually sprinting at you like a small car made of muscle and sharp bits. If you believed guns didn’t really propel bullets, you’d put your virtue and faith in guns aside and do what you could to save yourself by using the allegedly magic stick as a club. Yet you actually believe guns propel bullets, so you could use a gun even in the face of a bear.

Acting with integrity is not a goal in and of itself. That wouldn’t be strong enough motive. You probably could not tie your self-respect and identity to “I do the integritous thing and don’t capitulate to extortion” so tight that it outweighed actually not having our pale blue dot darkened by a nuclear holocaust. If you believed that integrity does not prevent the better candidate from having helped the riskier one win the primary in the first place, you’d put your virtue and faith in integrity aside so you could stop nuclear war by voting for the better candidate and dropping the chance of nuclear war from 10% to 2.5%. You must actually believe integrity collapses timelines, in order to use integrity even in the face of Armageddon.

Another way of saying this is that you need belief that a tool works, not just belief in belief.

I suspect it’s a common pattern for people to accept as a job well done an installation of a tool like integrity in their minds when they’ve laid out a trail of yummy narrative breadcrumbs along the forest floor in the path they’re supposed to take. But when a bear is chasing you, you ignore the breadcrumbs and take what you believe to be the path to safety. The motive to take a path needs to flow from the motive to escape the bear. Only then can the motive to follow a path grow in proportion to what’s at stake. Only then will the path be used in high stakes where breadcrumbs are ignored. The way to make that flow happen is to actually believe that path is best in a way so that no breadcrumbs are necessary.

I think this is possible for something like decision theory / integrity as well. But what makes me think this is possible, that you don’t have to settle for narrative breadcrumbs? That the part of you that’s in control can understand their power?

How do you know a gun will work? You weren’t born with that knowledge, but it’s made its way into the stuff that’s really in control somehow. By what process?

Well, you’ve seen lots of guns being fired in movies and stuff. You are familiar with the results. And while you were watching them, you knew that unlike lightsabers, guns were real. You’ve also probably seen some results of guns being used in news, history…

But if that’s what it takes, we’re in trouble. Because if there are counterintuitive abstract principles that you never get to see compelling visceral demonstrations of, or maybe even any demonstrations until it’s too late, then you’ll not be able to act on them in life or death circumstances. And I happen to think that there are a few of these.

I still think you can do better.

If you had no gun, and you were sitting in a car with the doors and roof torn off, and that bear was coming, and littering the floor of the car were small cardboard boxes with numbers inked on them, 1 through 100, on the dashboard a note that said, “the key is in the box whose number is the product of 13 and 5”, would you have to win a battle of willpower to check box 65 first? (You might have to win a battle of doing arithmetic quickly, but that’s different.)

If you find the Monty Hall problem counterintuitive, then can you come up with a grizzly bear test for that? I bet most people who are confident in System 2 but not in System 1 that you win more by switching would switch when faced with a charging bear. It might be a good exercise to come up with the vivid details for this test. Make sure to include certainty that an unchosen bad door is revealed whether or not the first chosen door is good.

I don’t think that it’d be a heroic battle of willpower for such people to switch in the Monty Hall bear problem. I think that in this case System 1 knows System 2 is trustworthy and serving the person’s values in a way it can’t see instead of serving an artifact, and lets it do its job. I’m pretty sure that’s a thing that System 1 is able to do. Even if it doesn’t feel intuitive, I don’t think this way of buying into a form of reasoning breaks down under high pressure like narrative breadcrumbs do. I’d guess its main weakness relative to full System 1 grokking is that System 1 can’t help as much to find places to apply the tool with pattern-matching.

Okay. Here’s the test that matters:

Imagine that the emperor, Evil Paul Ekman loves watching his pet bear chase down fleeing humans and kill them. He has captured you for this purpose and taken you to a forest outside a tower he looks down from. You cannot outrun the bear, but you hold 25% probability that by dodging around trees you can tire the bear into giving up and then escape. You know that any time someone doesn’t put up a good chase, Evil Emperor Ekman is upset because it messes with his bear’s training regimen. In that case, he’d prefer not to feed them to the bear at all. Seizing on inspiration, you shout, “If you sic your bear on me, I will stand still and bare my throat. You aren’t getting a good chase out of me, your highness.” Emperor Ekman, known to be very good at reading microexpressions (99% accuracy), looks closely at you through his spyglass as you shout, then says: “No you won’t, but FYI if that’d been true I’d’ve let you go. OPEN THE CAGE.” The bear takes off toward you at 30 miles per hour, jaw already red with human blood. This will hurt a lot. What do you do?

What I want you to take away from this post is:

  • The ability to distinguish between 3 levels of integration of a tool.
    • Narrative Breadcrumbs: Hacked-in artificial reward for using it. Overridden in high stakes because it does not scale like the instrumental value it’s supposed to represent does. (Nuclear war example)
    • Indirect S1 Buy-In: System 1 not getting it, but trusting enough to delegate. Works in high stakes. (Monty Hall example)
    • Direct S1 Buy-In: System 1 getting it. Works in high stakes. (Guns example)
  • Hope that direct or indirect S1 buy-in is always possible.

Treaties vs Fusion

If you have subagents A and B, and A wants as many apples as possible, and B wants as many berries as possible, and both want each additional fruit the same amount no matter how many they have, then these are two classes of ways you could combine them, with fundamentally different behavior.

If a person, “Trent”, was a treaty made of A and B, he would probably do something like alternating between pursuing apples and berries. No matter how lopsided the prospects for apples and berries. The amount of time/resources they spent on each would be decided by the relative amounts of bargaining power each subagent had, independently of how much they were each getting.

To B, all the apples in the world are not worth one berry. So if bargaining power is equal and Trent has one dollar to spend, and 50 cents can buy either a berry or 1000 apples, Trent will buy one berry and 1000 apples. Not 2000 apples. Vice versa if berries are cheaper.

A treaty is better than anarchy. After buying 1000 apples, A will not attempt to seize control on the way to the berry store and turn Trent around to go buy another 1000 apples after all. That means Trent wastes less resources on infighting. Although A and B may occasionally scuffle to demonstrate power and demand a greater fraction of resources. Most of the time, A and B are both resigned to wasting a certain amount of resources on the other. Unsurprising. No matter how A and B are combined, the result must seem like at least partial waste from the perspective of at least one of them.

But it still feels like there’s some waste going on here, like “objectively” somehow, right? Waste from the perspective of what utility function? What kind of values does Trent the coalition have? Well, there’s no linear combination of utilities of apples and berries such that Trent will maximize that combined utility. Nor does making their marginal utilities nonconstant help. Because Trent’s behavior doesn’t depend on how many apples and berries Trent already has. What determines allocation of new resources is bargaining outcomes, determined by threats and what happens in case of anarchy, determined by what can be done in the future by the subagents and the agent. What they have in the past / regardless of the whole person’s choices is irrelevant. Trent doesn’t have a utility function over just apples and berries; to gerrymander a utility function out of this behavior, you need to also reference the actions themselves.

But note that if there was a 50 50 chance which fruit would be cheaper, both subagents get higher expected utility if the coalition be replaced by the fusion who maximizes apples + berries. It’s better to have a 50% chance of 2000 utility and a 50% chance of nothing, than 50% of 1000 utility and 50% of 1. If you take veil of ignorance arguments seriously, pay attention to that.

Ever hear someone talking about how they need to spend time playing so they can work harder afterward? They’re behaving like a treaty between a play subagent and a work subagent. Analogous to Trent, they do not have a utility function over just work and play. If you change how much traction the work has in achieving what the work agent wants, or change the fun level of the play, this model-fragment predicts no change in resource allocation. Perhaps you work toward a future where the stars will be harnessed for good things. How many stars are there? How efficiently can you make good things happen with a given amount of negentropy? What is your probability you can tip the balance of history and win those stars? What is your probability you’re in a simulation and the stars are fake and unreachable? What does it matter? You’ll work the same amount in any case. It’s a big number. All else is negligible. No amount of berries is worth a single apple. No amount of apples is worth a single berry.

Fusion is a way of optimizing values together, so they are fungible, so you can make tradeoffs without keeping score, apply your full intelligence to optimize additional parts of your flowchart, and realize gains from trade without the loss of agentiness that democracy entails.

But how?

I think I’m gonna have to explain some more ways how not, first.

False Faces

When we lose control of ourselves, who is controlling us?

(You shouldn’t need to know about Nonviolent Communication to understand this. Only that it’s “hard” to actually do it.)
Rosenberg’s book Nonviolent Communication contains an example where a boy named Bill has been caught taking a car for a joy ride with his friends. The boy’s father attempts to use NVC. Here is a quote from Father.

Bill, I really want to listen to you rather than fall into my old habits of blaming and threatening you whenever something comes up that I’m upset about. But when I hear you say things like, “It feels good to know I’m so stupid,” in the tone of voice you just used, I find it hard to control myself. I could use your help on this. That is, if you would rather me listen to you than blame or threaten. Or if not, then, I suppose my other option is to just handle this the way I’m used to handling things.

Father wants to follow this flow chart.

But he is afraid he will do things he “doesn’t want to”. Blaming and threatening are not random actions. They are optimizations. They steer the world in predictable ways. There is intent behind them. Let’s call that intender Father. Here’s the real flow chart.Father has promised Father he can get what he wants without threats and blame. Father doubts this but is willing to give it a try. When it doesn’t seem like it’ll work at first, Father helps out with a threat to take over. It’s a good cop/bad cop routine. Father, who uses only NVC, is a false face and a tool.

Father thinks that Father is irrational. It’s a legitimate complaint. Father is running some unexamined, unreflective, incautious software. That’s what happens when you don’t use all your ability to think to optimize a part of the flow chart. But Father can’t acknowledge that that’s something he’d do and so can only do it stupidly. Father can’t look for ways to accomplish the unacknowledged goals, or any goals in worlds he cannot acknowledge might exist. He can’t look for backup plans to plans he can’t acknowledge might fail. Father’s self-identified-self (Father) is the thrall of artifacts, so he can only accomplish his goals without it.

Attributing revealed-preference motives to people like this over everything they do does not mean you believe everything someone does is rational. Just that virtually all human behavior has a purpose, is based on at least some small algorithm that discriminates based on some inputs to sometimes output that behavior. An algorithm which may be horribly misfiring, but is executing some move that has been optimized to cause some outcome nonetheless.

So how can you be incorruptible? You can’t. But you already are. By your own standards. Simply by not wanting to be corrupted. And your standards are best standards! Unfortunately you are are not as smart as you, and are easily tricked. In order to not be tricked, you need to use your full deliberative brainpower. You and you need to fuse.

I will save most of what I know of the fusion dance for another post. But the idea, from your perspective, the basic idea is to anthropomorphize hidden parts of the flow chart and recognize your concerns, be they values or possible worlds that must be optimized, and then actually try and accomplish those optimizations using all the power you have. Here’s a trick you might be able to use to jump-start it. If you notice yourself “losing control”, use (in your own thoughts) the words the whole flow chart would speak. Instead of, “I lost control and did X”, “I chose to do X because…”. Turn your “come up with a reason why I did that” stuff on all your actions. Come up with something that’s actually true. “I chose to do X because I’m a terrible person” is doing it wrong. “I chose to do X because that piece of shit deserved to suffer” may well be doing it right. “I chose to do X instead of work because of hyperbolic discounting” is probably wrong. “I chose to do X because I believe the work I’d be doing is a waste of time” might well be doing it right. If saying that causes tension, because you think you believe otherwise, that is good. Raising that tension to visibility can be the beginning of the dialog that fuses you.

Why just in your own thoughts? Well, false faces are often useful. For reasons I don’t understand, there’re certain assurances that can be made from a false face, that someone’s deep self knows are lies but still seem to make them feel reassured. “Yeah, I’ll almost certainly do that thing by Friday.” And I don’t even see people getting mad at each other when they do this.

Set up an artifact that says you tell the truth to others, and you’ll follow it into a sandboxed corner of the flow chart made of self-deception. But remember that self-deception is used effectively to get what people want in a lot of default algorithms humans have. I have probably broken some useful self-deceptive machinery for paying convincing lip service to socially expected myths in my purism. I have yet to recover all the utility I’ve lost. I don’t know which lies are socially desirable, so I have to tell the truth because of a lopsided cost ratio for false negatives and false positives. Beware. Beware or follow your “always believe the truth” artifact into a sandboxed corner of the flow chart.

This sandboxing is the fate of failed engineering projects. And your immune system against artifacts is a good thing. If you want to succeed at engineering, every step on the way to engineering perfection must be made as the system you are before it, and must be an improvement according to the parts really in control.