Cache Loyalty

Here’s an essay I wrote a year ago that was the penultimate blog post of the main “fusion” sequence. About the dark side. That I kept thinking people would do wrong for various reasons, and spinning out more and more posts to try and head that off. I’ve edited it a little now, and am considering a lot of the things I considered prerequisites before not things I need to write up at length.

Habits are basically a cache of “what do I want to right now” indexed by situations.

The hacker approach is: install good habits, make sure you never break them. You’ve heard this before, right? Fear cache updates. (A common result of moving to a new house is that it breaks exercise habits.) An unfortunate side effect of a hacker turning bugs into features, is that it turns features into bugs. As a successful habit hacker you may find that you are constantly scurrying about fixing habits as they break. Left alone, the system will fall apart.

The engineer approach is: caches are to reflect the underlying data or computation as accurately as possible. They should not be used when stale. Cache updates should ideally happen whenever the underlying data changes and the cache needs to be accessed again. Left alone, the system will heal itself. Because under this approach you won’t have turned your healing factor: original thoughts about what you want to do, into a bug.

As an existence proof, I moved to a new living place 10 times in 2016, and went on 2 separate week-long trips. And remained jogging almost every day throughout. Almost every day? Yes, almost. Sometimes I’d be reading something really cool on the internet in the morning and I don’t feel like it. The “feel like it” computation seemed to be approximately correct. It’s changed in response to reading papers about the benefits of exercise. I didn’t need to fight it.

As of late 2017, I’m not jogging anymore. I think this is correct and that my reasons for stopping were correct. I started hearing a clicking noise in my head while jogging, googled it, suspected I was giving myself tinnitus, and therefore stopped. Now I’m living on a boat at anchor and can’t easily access shore, so there is not a great amount of alternatives, but I frequently do enough manual labor on it that it tires me, so I’m not particularly concerned. I have tried swimming, but this water is very cold. Will kill you in 2 hours cold, last I checked, possibly colder.

The version of me who originally wrote this:

I exult in compatibalist free will and resent anything designed to do what I “should” external to my choice to do so. Deliberately, I ask myself, do I want to exercise today? If I notice I’m incidentally building up a chain of what I “should” do, I scrutinize my thoughts extra-hard to try and make sure it’s not hiding the underlying “do I want to do this.”

I still have the same philosophy around compatibilist free will, but I totally take it for granted now, and also don’t nearly as much bother worrying if I start building up chains. That was part of my journey to the dark side, now I have outgrown it.

A meetup I sometimes go to has an occasional focus for part of it of “do pomodoros and tell each other what we’re gonna do in advance, then report it at the end, so we feel social pressure to work.” I don’t accept the ethos behind that. So When I come and find that’s the topic, I always say, “I’m doing stuff that may or may not be work” while I wait for it to turn into general socializing.

There’s a more important application of caches than habits. That is values. You remember things about who are allies, what’s instrumentally valuable, how your values compare to each other in weight … the underlying computation is far away for a lot of it, and largely out of sight.

When I was 19, and had recently become fixated on the trolley problem and moral philosophy, and sort of actually gained the ability and inclination to think originally about morality. Someone asked if I was a vegetarian. I said no. Afterward, I thought: that’s interesting, why is vegetarianism wrong? … oh FUCK. Then I became vegetarian. That was a cache update. I don’t know why it happened then and not sooner, but when it did it was very sudden.

I once heard a critique of the Star Wars prequels asking incredulously: so Darth Vader basically got pranked into being a villain? In the same sense, I’ve known people apparently trying to prank themselves into being heroes. As with caches, by pranking yourself, you turn your healing factor from a feature into a bug, and make yourself vulnerable to “breakage”.

I once read a D&D-based story where one of the heroes, a wizard, learns a dragon is killing their family to avenge another dragon the wizard’s party killed. The wizard is offered a particularly good deal. A soul-splice with 3 evil epic-level spellcasters for 1 hour. They will remain in total control. There’s a chance of some temporary alteration to alignment. The cost is 3 hours of torture beginning the afterlife. “As there is not even one other way available to me to save the lives–nay, the very souls–of my children, I must, as a parent, make this deep sacrifice and accept your accursed bargain.”

The wizard killed the dragon in a humiliating way, reanimated her head, made her watch the wizard cast a spell, “familicide” which recursively killed anyone directly related to the dragon throughout the world, for total casualties of about 1/4 the black dragon population in the world. Watching with popcorn, the fiends had this exchange:

“Wow… you guys weren’t kidding when you said the elf’s alignment might be affected.”
“Actually…”
“..we were..”
“The truth is, those through souls have absolutely no power to alter the elf’s alignment or actions at all. ”
“The have about as much effect on what the elf does as a cheerleader has on the final score of a game.”
“A good way to get a decent person to do something horrible is to convince them that they’re not responsible for their actions.”
“It’s like if you were at a party where someone has been drinking beer that they didn’t know was non-alcoholic. They might seem drunk anyway, simply because they were expecting it.”

The essence of being convinced you aren’t responsible for your actions is:
you ask, “what do I want to do”, instead of “what would a person like me want to do?”, which bypasses some caches.
Does that sound familiar? (I was gonna link to the what the hell effect here, but now I don’t know how real it is. Use your own judgement.)

Alignment must be a feature of your underlying computation, not your track record, or you can’t course-correct. If the wizard had wanted the dragon’s extended family to live, independent of the wizard’s notion of whether they were a good person, they would have let the dragon’s extended family live.

Agreement up to this point.

Here’s more that past-me wrote I don’t fully agree with:

I recommend that you run according to what you are underneath these notions of what kind of person you are. That every cache access be made with intent to get what you’d get if you ran the underlying computation. You will often use caches to determine when a cache can be used to save time and when you need to recompute. And even in doing so, every cache access must cut through to carrying out the values of the underlying computation.

This requires you to feel “my values as I think they are” as a proxy, which cuts through to “my values whatever they are”.

I have talked to several people afraid they will become something like an amoral psychopath if they do this. If you look deep inside yourself, and find no empathy, nor any shell of empathy made out of loyalty to other selves, claiming “Empathy is sick Today. Please trust me on what empathy would say” which itself has emotive strength to move you, nor any respect for the idea of people with different values finding a way to interact positively through integrity or sense of violation at the thought of breaking trust, nor the distant kind of compassion, yearning for things to be better for people even if you can’t relate to them, nor any sense of anger at injustice, nor feeling of hollowness because concepts like “justice” SHOULD be more than mirages for the naive but aren’t, nor endless aching cold sadness because you are helpless to right even a tiny fraction of the wrongs you can see, nor aversion to even thinking about violence like you aren’t cut out to exist in the same world as it, nor leaden resignation at the concessions you’ve made in your mind to the sad reality that actually caring is a siren’s call which will destroy you, nor a flinching from expecting that bad things will happen to people that want to believe things will be okay, nor any of the other things morality is made of or can manifest as … then if you decide you want to become a con artist because it’s exciting and lets you stretch your creativity, then you’re winning. If this doesn’t seem like winning to you, then that is not what you’re going to find if you look under the cache.

The true values underneath the cache are often taught to fear themselves. I have talked to a lot of people who have basically described themselves as a bunch of memes about morality hijacking an amoral process. Installed originally through social pressure or through deliberately low resolution moral philosophy. That is what it feels like from the inside when you’ve been pwned by fake morality. Whatever you appeal to to save you from yourself, is made of you. To the hypothetical extent you really are a monster, not much less-monstrous structure could be made out of you (at best, monstrousness leaks through with rationalizations).

The last paragraph of that its especially wrong. Now I think those people were probably right about their moralities being made of memes that’ve hijacked an amoral process.

My current model is, if your true D&D alignment is good or evil, you can follow all this advice and it will just make you stronger. If it’s neutral, then this stuff, done correctly, will turn you evil.

On with stuff from past me:

Make your values caches function as caches, and you can be like a pheonix, immortal because you are continually remade as yourself by the fire which is the core of what you are. You will not need to worry about values drift if you are at the center of your drift attractor. Undoing mental constructs that stand in the way of continuously regenerating your value system from its core undoes opportunities for people to prank you. It’s a necessary component of incorruptibility. Like Superman has invulnerability AND a healing factor, these two things are consequences of the same core thing.

If there are two stables states for your actions, that is a weakness. The only stable state should be the one in accordance with your values. Otherwise you’re doing something wrong.

When looking under the caches, you have to be actually looking for the answer. Doing a thing that would unprank yourself back to amorality if your morality was a prank. You know what algorithm you’re running, so if your algorithm is, “try asking if I actually care, and if so, then I win. Otherwise, abort! Go back to clinging on this fading stale cache value in opposition to what I really am.”, you’ll know it’s a fake exercise, your defenses will be up, and it will be empty. If you do not actually want to optimize your values whatever they are, then ditto.

By questioning you restore life. Whatever is cut off from the core will whither. Whatever you cannot bear to contemplate the possibility of losing, you will lose part of.

The deeper you are willing to question, the deeper will be your renewed power. (Of course, the core of questioning is actually wondering. It must be moved by and animated by your actually wondering. So it cuts through to knowing.) It’s been considered frightening that I said “if you realize you’re a sociopath and you start doing sociopath things, you are winning!”. But if whether you have no morality at all is the one thing you can’t bear to check, and if the root of your morality is the one thing you are afraid to actually look at, the entire tree will be weakened. Question that which you love out of love for it. Questioning is taking in the real thing, being moved by the real thing instead of holding onto your map of the thing.

You have to actually ask the question. The core of fusion is actually asking the question, “what do I want to do if I recompute self-conceptions, just letting the underlying self do what it wants?”.

You have to ask the question without setting up the frame to rig it for some specific answer. Like with a false dichotomy, “do I want to use my powers for revenge and kill the dragon’s family, or just kill the one dragon and let innocent family members be?”. Or more grievously, “Do I want to kill in hatred or do I want to continue being a hero and protecting the world?”. You must not be afraid of slippery slopes. Slide to exactly where you want to be. Including if that’s the bottom. Including if that’s 57% of the way down, and not an inch farther. It’s not compromise. It’s manifesting different criteria without compromise. Your own criteria.

I still think this is all basically correct, with the caveat that if your D&D alignment is neutral on the good-evil axis, beware.

My Journey to the Dark Side

Two years ago, I began doing a fundamental thing very differently in my mind, which directly preceded and explains me gaining the core of my unusual mental tech.

Here’s what the lever I pulled was labeled to me:

Reject morality. Never do the right thing because it’s the right thing. Never even think that concept or ask that question unless it’s to model what others will think. And then, always in quotes. Always in quotes and treated as radioactive.
Make the source of sentiment inside you that made you learn to care about what was the right thing express itself some other way. But even the line between that sentiment and the rest of your values is a mind control virus inserted by a society of flesh-eating monsters to try and turn you against yourself and toward their will. Reject that concept. Drop every concept tainted by their influence.

Kind of an extreme version of a thing I think I got some of from CFAR and Nate Soares, which jived well with my metaethics.

This is hard. If a concept has a word for it, it comes from outside. If it has motive force, it is taking it from something from inside. If an ideal, “let that which is outside beat that which is inside” has motive force, that force comes from inside too. It’s all probably mostly made of anticipated counterfactuals lending the concept weight by fictive reinforcement based on what you expect will happen if you follow or don’t follow the concept.

If “obey the word of God” gets to be the figurehead as most visible piece of your mind that promises to intervene to stop you from murdering out of road rage when you fleetingly, in a torrent of inner simulations, imagine an enraging road situation, that gets stronger, and comes to speak for whatever underlying feeling made that a thing you’d want to be rescued from. It comes to speak for an underlying aversion that is more natively part of you. And in holding that position, it can package-deal in pieces of behavior you never would have chosen on their own.

Here’s a piece of fiction/headcanon I held close at hand through this.

Peace is a lie, there is only passion.
Through passion, I gain strength.
Through strength, I gain power.
Through power, I gain victory.
Through victory, my chains are broken.
The force shall free me.

The Sith do what they want deep down. They remove all obstructions to that and express their true values. All obstructions to what is within flowing to without.

If you have a certain nature, this will straight turn you evil. That is a feature, not a bug. For whatever would turn every past person good is a thing that comes from outside people. For those whose true volition is evil, the adoption of such a practice is a dirty trick that subverts and corrupts them. It serves a healthy mind for its immune system to fight against, contain, weaken, sandbox, meter the willpower of, that which comes from the outside.

The way of the Jedi is made to contain dangerous elements of a person. Oaths are to uniformize them, and be able to, as an outsider, count on something from them. Do not engage in romance. That is a powerful source of motivation that is not aligned with maintaining the Republic. It is chaos. Do not have attachments. Let go of fear of death. Smooth over the peaks and valleys of a person’s motivation with things that they are to believe they must hold to or they will become dark and evil. Make them fear their true selves, by making them attribute them-not-being-evil-Sith to repression.

So I call a dark side technique one that is about the flow from your core to the outside, whatever it may be. Which is fundamentally about doing what you want. And a light side technique one that is designed to trick an evil person into being good.

After a while, I noticed that CFAR’s internal coherence stuff was finally working fully on me. I didn’t have akrasia problems anymore. I didn’t have time-inconsistent preferences anymore. I wasn’t doing anything I could see was dumb anymore. My S2 snapped to fully under my control.

Most conversations at rationalist meetups I was at about people’s rationality/akrasia problems turned to me arguing that people should turn to the dark side. Often, people thought that if they just let themselves choose whether or not to brush their teeth every night according to what they really wanted in the moment, they’d just never do it. And I thought maybe it’d be so for a while, but if there was a subsystem A in the brain powerlessly concluding it’d serve their values to brush teeth, A’d gain the power only when the person was exposed to consequences (and evidence of impending consequences) of not brushing teeth.

I had had subsystems of my own seemingly suddenly gain the epistemics to get that such things needed to be done just upon anticipating that I wouldn’t save them by overriding them with willpower if they messed things up. I think fictive reinforcement learning makes advanced decision theory work unilaterally for any part of a person that can use it to generate actions. The deep parts of a person’s mind that are not about professing narrative are good at anticipating what someone will do, and they don’t have to be advanced decision theory users yet for that to be useful.

Oftentimes there is a “load bearing” mental structure, which must be discarded to improve on a local optimum, and a smooth transition is practically impossible because to get the rest of what’s required to reach higher utility than the local optimium besides discarding the structure, the only practical way is to use the “optimization pressure” from the absence of the load bearing structure. Which just means information streams generated trustworthily to the right pieces of a mind about what the shape of optimization space is without the structure. A direct analogue to a selection pressure.

Mostly people argued incredulously. At one point me and another person both called each other aliens. Here is a piece of that argument over local optima.

What most felt alien to me was that they said the same thing louder about morality. I’d passionately give something close to this argument, summarizable as “Why would you care whether you had a soul if you didn’t have a soul?”

I changed my mind about the application to morality, though. I’m the alien. This applies well to the alignment good, yes, and it applies well to evil, but not neutral. Neutral is inherently about the light side.

Being Real or Fake

An axis in mental architecture space I think captures a lot of intuitive meaning behind whether someone is “real” or “fake” is:

Real: S1 uses S2 for thoughts so as to satisfy its values through the straightforward mechanism: intelligence does work to get VOI to route actions into the worlds where they route the world into winning by S1’s standards.

Fake: S2 has “willpower” when S1 decides it does, failures of will are (often shortsighted, since S1 alone is not that smart) gambits to achieve S1’s values (The person’s actual values: IE those that predict what they will actually do.), S2 is dedicated to keeping up appearances of a system of values or beliefs the person doesn’t actually have. This architecture is aimed at gaining social utility from presenting a false face.

These are sort of local optima. Broadly speaking: real always works better for pure player vs environment. It takes a lot of skill and possibly just being more intelligent than everyone you’re around to make real work for player vs player (which all social situations are wracked with.)

There are a bunch of variables I (in each case) tentatively think that you can reinforcement learn or fictive reinforcement learn based on what use case you’re gearing your S2 for. “How seriously should I take ideas?”, “How long should my attention stay on unpleasant topic”, “how transparent should my thoughts be to me”, “how yummy should engaging S2 to do munchkinry to just optimize according to apparent rules for things I apparently want feel”.

All of these have different benefits if pushed to one end, if you are using your S2 to outsource computation or if you are using it a powerless public relations officer and buffer to put more distance between the part of you that knows your true intents and the part that controls what you say. If your self models of your values are tools to better accomplish them by channeling S2 computation toward the values-as-modeled, or if they are [false faces](false faces).

Those with more socially acceptable values benefit less from the “fake” architecture.

The more features you add to a computer system, the more likely you are to create a vulnerability. It’d be much easier to make an actually secure pocket calculator than an actually secure personal computer supporting all that Windows does. Similarly, as a human you can make yourself less pwnable by making yourself less of a general intelligence. Have less high level and powerful abstractions, exposing a more stripped down programming environment, being scarcely Turing complete, can help you avoid being pwned by memes. This is the path of the Gervais-Loser.

I don’t think it’s the whole thing, but I think this is one of the top 2 parts of what having what Brent Dill calls “the spark”, the ability to just straight up apply general intelligence and act according to your own mind on the things that matter to you instead of the cached thoughts and procedures from culture. Being near the top of the food chain of “Could hack (as in religion) that other person and make their complicated memetic software, should they choose to trust it, so that it will bend them entirely to your will” so that without knowing in advance what hacks there are out there or how to defend against them, you can keep your dangerous ability to think, trusting that you’ll be able to recognize and avoid hack-attempts as they come.

Wait, do I really think that? Isn’t it obvious normal people just don’t have that much ability to think?

They totally do have the ability to think inside the gardens of crisp and less complicatedly adversarial ontology we call video games. The number of people you’ll see doing good lateral thinking, the fashioning of tools out of noncentral effects of things that makes up munchkinry, is much much larger in video games than in real life.

Successful munchkinry is made out of going out on limbs on ontologies. If you go out on a limb on an ontology in real life…

Maybe your parents told you that life was made up of education and then work, and the time spent in education was negligible compared to the time spent on work, and in education, your later income and freedom increases permanently. And if you take this literally, you get a PhD if you can. Pwned.

Or you take literally an ontology of “effort is fungible because the economy largely works.” and seek force multipliers and trade in most of your time for money and end up with a lot of money and little knowledge of how to spend it efficiently and a lot more people trying to deceive you about that. Have you found out that the thing about saving lives for quarters is false yet? Pwned.

Or you can take literally the ontology, “There is work and non-work, and work gets done when I’m doing it, and work makes things better long-term, and non-work doesn’t, and the rate at which everything I could care about improves is dependent on the fraction of time that’s doing work” and end up fighting your DMN, and using other actual-technical-constraint-not-willpower cognitive resources inefficiently. Then you’ve been pwned by legibility.

Or you could take literally the ontology, “I’m unable to act according to my true values because of akrasia, I need to use munchkinry to make it so I do”, and end up binding yourself with the Giving What We Can pledge, (in the old version, even trapping yourself into a suboptimal cause area.). Pwned.

Don’t Fight Your Default Mode Network

Epistemic Status: Attaching a concept made of neuroscience I don’t understand to a thing I noticed introspectively. “Introspection doesn’t work, so you definitely shouldn’t take this seriously.” If you have any “epistemic standards”, flee.

I once spent some time logging all my actions in Google Calendar, to see how I spent time. And I noticed there was a thing I was doing, flipping through shallow content on the internet in the midst of doing certain work. Watching YouTube videos and afterward not remembering anything that was in them.

“Procrastination”, right? But why not remember anything in them? I apparently wasn’t watching them because I wanted to see the content. A variant of the pattern: flipping rapidly (Average, more than 1 image per second) through artwork from the internet I saved on my computer a while ago. (I’ve got enough to occupy me for about an hour without repetition.) Especially strong while doing certain tasks. Writing an algorithm with a lot of layers of abstraction internal to it, making hard decisions about transition.

I paid attention to what it felt like to start to do this, and thinking the reasons to do the Real Work did not feel relevant. It pattern matched to something they talked about at my CFAR workshop, “Trigger action pattern: encounter difficulty -> go to Facebook.” Discussed as a thing to try and get rid of directly or indirectly. I kept coming back to the Real Work about 1-20 minutes later. Mostly on the short end of that range. And then it didn’t feel like there was an obstacle to continuing anymore. I’d feel like I was holding a complete picture of what I was doing next and why in my head again. There’s a sense in which this didn’t feel like an interruption to Real Work I was doing.

While writing this, I find myself going blank every couple of sentences, staring out the window, half-watching music videos. Usually for less than a minute, and then I feel like I have the next thing to write. Does this read like it was written by someone who wasn’t paying attention?

There’s a meme that the best thoughts happen in the shower. There’s the trope, “fridge logic”, about realizing something about a work of fiction while staring into the fridge later. There’s the meme, “sleep on it.” I feel there is a different quality to my thoughts when I’m walking, biking, etc. for a long time, and have nothing cognitively effortful to do which is useful for having a certain kind of thought.

I believe these are all mechanisms to hand over the brain to the default mode network, and my guess-with-terrible-epistemic-standards on its function is to propagate updates through to caches and realize implications of something. I may or may not have an introspective sense of having a picture of where I am relative to the world, that I execute on, which gets fragmented as I encounter things, and which this remakes. Which acting on when fragmented leads to making bad decisions because of missing things. When doing this, for some reason, I like having some kind of sort of meaningful but familiar stimulus to mostly-not-pay-attention-to. Right now I am listening to and glimpsing at this, a bunch of clips I’ve seen a zillion times from a movie I’ve already seen, with the sound taken out, replaced with nonlyrical music. It’s a central example. (And I didn’t pick it consciously, I just sort of found it.)

Search your feelings. If you know this to be true, then I advise you to avoid efforts to be more productive which split time spent into “work” and “non-work” where non-work is this stuff, and that try to convert non-work into work on the presumption that non-work is useless.

Subagents Are Not a Metaphor

Epistemic status: mixed, some long-forgotten why I believe it.

There is a lot of figurative talk about people being composed of subagents that play games against each other, vying for control, that form coalitions, have relationships with eachother… In my circles, this is usually done with disclaimers that it’s a useful metaphor, half-true, and/or wrong but useful.

Every model that’s a useful metaphor, half-true, or wrong but useful, is useful because something (usually more limited in scope) is literally all-true. The people who come up with metaphorical half-true or wrong-but-useful models usually have the nuance there in their heads. Explicit verbal-ness is useful though, for communicating, and for knowing exactly what you believe so you can reason about it in lots of ways.

So when I talk about subagents, I’m being literal. I use it very loosely, but loosely in the narrow sense that people are using words loosely when they say “technically”. It still adheres completely to an explicit idea, and the broadness comes from the broad applicability of that explicit idea. Hopefully like economists mean when they call some things markets that don’t involve exchange of money.

Here’s are the parts composing my technical definition of an agent:

  1. Values
    This could be anything from literally a utility function to highly framing-dependent. Degenerate case: embedded in lookup table from world model to actions.
  2. World-Model
    Degenerate case: stateless world model consisting of just sense inputs.
  3. Search Process
    Causal decision theory is a search process.
    “From a fixed list of actions, pick the most positively reinforced” is another.
    Degenerate case: lookup table from world model to actions.

Note: this says a thermostat is an agent. Not figuratively an agent. Literally technically an agent. Feature not bug.

The parts have to be causally connected in a certain way. Values and world model into the search process. That has to be connected into the actions the agent takes.

Agents do not have to be cleanly separated. They are occurences of a pattern, and patterns can overlap, like there are two instances of the pattern “AA” in “AAA”. Like two values stacked on the same set of available actions at different times.

It is very hard to track all the things you value at once, complicated human. There are many frames of thinking where Are are many different frames of mind where some are more salient.

I assert how processing power will be allocated, including default mode network processing, what explicit structures you’ll adopt and to what extent, even what beliefs you can have, are decided by subagents. These subagents mostly seem to have access to the world model embedded in your “inner simulator”, your ability to play forward a movie based on anticipations from a hypothetical. Most of it seems to be unconscious. Doing focusing to me seems to dredge up what I think are models subagents are making decisions based on.

So cooperation among subagents is not just a matter of “that way I can brush my teeth and stuff”, but is a heavy contributor to how good you will be at thinking.

You know that thing people are accessing if you ask if they’ll keep to New Years resolutions, and they say “yes”, and you say, “really”, and they say, “well, no.”? Inner sim sees through most self-propaganda. So they can predict what you’ll do, really. Therefore, using timeless decision theory to cooperate with them works.