Here’s an essay I wrote a year ago that was the penultimate blog post of the main “fusion” sequence. About the dark side. That I kept thinking people would do wrong for various reasons, and spinning out more and more posts to try and head that off. I’ve edited it a little now, and am considering a lot of the things I considered prerequisites before not things I need to write up at length.
Habits are basically a cache of “what do I want to right now” indexed by situations.
The hacker approach is: install good habits, make sure you never break them. You’ve heard this before, right? Fear cache updates. (A common result of moving to a new house is that it breaks exercise habits.) An unfortunate side effect of a hacker turning bugs into features, is that it turns features into bugs. As a successful habit hacker you may find that you are constantly scurrying about fixing habits as they break. Left alone, the system will fall apart.
The engineer approach is: caches are to reflect the underlying data or computation as accurately as possible. They should not be used when stale. Cache updates should ideally happen whenever the underlying data changes and the cache needs to be accessed again. Left alone, the system will heal itself. Because under this approach you won’t have turned your healing factor: original thoughts about what you want to do, into a bug.
As an existence proof, I moved to a new living place 10 times in 2016, and went on 2 separate week-long trips. And remained jogging almost every day throughout. Almost every day? Yes, almost. Sometimes I’d be reading something really cool on the internet in the morning and I don’t feel like it. The “feel like it” computation seemed to be approximately correct. It’s changed in response to reading papers about the benefits of exercise. I didn’t need to fight it.
As of late 2017, I’m not jogging anymore. I think this is correct and that my reasons for stopping were correct. I started hearing a clicking noise in my head while jogging, googled it, suspected I was giving myself tinnitus, and therefore stopped. Now I’m living on a boat at anchor and can’t easily access shore, so there is not a great amount of alternatives, but I frequently do enough manual labor on it that it tires me, so I’m not particularly concerned. I have tried swimming, but this water is very cold. Will kill you in 2 hours cold, last I checked, possibly colder.
The version of me who originally wrote this:
I exult in compatibalist free will and resent anything designed to do what I “should” external to my choice to do so. Deliberately, I ask myself, do I want to exercise today? If I notice I’m incidentally building up a chain of what I “should” do, I scrutinize my thoughts extra-hard to try and make sure it’s not hiding the underlying “do I want to do this.”
I still have the same philosophy around compatibilist free will, but I totally take it for granted now, and also don’t nearly as much bother worrying if I start building up chains. That was part of my journey to the dark side, now I have outgrown it.
A meetup I sometimes go to has an occasional focus for part of it of “do pomodoros and tell each other what we’re gonna do in advance, then report it at the end, so we feel social pressure to work.” I don’t accept the ethos behind that. So When I come and find that’s the topic, I always say, “I’m doing stuff that may or may not be work” while I wait for it to turn into general socializing.
There’s a more important application of caches than habits. That is values. You remember things about who are allies, what’s instrumentally valuable, how your values compare to each other in weight … the underlying computation is far away for a lot of it, and largely out of sight.
When I was 19, and had recently become fixated on the trolley problem and moral philosophy, and sort of actually gained the ability and inclination to think originally about morality. Someone asked if I was a vegetarian. I said no. Afterward, I thought: that’s interesting, why is vegetarianism wrong? … oh FUCK. Then I became vegetarian. That was a cache update. I don’t know why it happened then and not sooner, but when it did it was very sudden.
I once heard a critique of the Star Wars prequels asking incredulously: so Darth Vader basically got pranked into being a villain? In the same sense, I’ve known people apparently trying to prank themselves into being heroes. As with caches, by pranking yourself, you turn your healing factor from a feature into a bug, and make yourself vulnerable to “breakage”.
I once read a D&D-based story where one of the heroes, a wizard, learns a dragon is killing their family to avenge another dragon the wizard’s party killed. The wizard is offered a particularly good deal. A soul-splice with 3 evil epic-level spellcasters for 1 hour. They will remain in total control. There’s a chance of some temporary alteration to alignment. The cost is 3 hours of torture beginning the afterlife. “As there is not even one other way available to me to save the lives–nay, the very souls–of my children, I must, as a parent, make this deep sacrifice and accept your accursed bargain.”
The wizard killed the dragon in a humiliating way, reanimated her head, made her watch the wizard cast a spell, “familicide” which recursively killed anyone directly related to the dragon throughout the world, for total casualties of about 1/4 the black dragon population in the world. Watching with popcorn, the fiends had this exchange:
“Wow… you guys weren’t kidding when you said the elf’s alignment might be affected.”
“The truth is, those through souls have absolutely no power to alter the elf’s alignment or actions at all. ”
“The have about as much effect on what the elf does as a cheerleader has on the final score of a game.”
“A good way to get a decent person to do something horrible is to convince them that they’re not responsible for their actions.”
“It’s like if you were at a party where someone has been drinking beer that they didn’t know was non-alcoholic. They might seem drunk anyway, simply because they were expecting it.”
The essence of being convinced you aren’t responsible for your actions is:
you ask, “what do I want to do”, instead of “what would a person like me want to do?”, which bypasses some caches.
Does that sound familiar? (I was gonna link to the what the hell effect here, but now I don’t know how real it is. Use your own judgement.)
Alignment must be a feature of your underlying computation, not your track record, or you can’t course-correct. If the wizard had wanted the dragon’s extended family to live, independent of the wizard’s notion of whether they were a good person, they would have let the dragon’s extended family live.
Agreement up to this point.
Here’s more that past-me wrote I don’t fully agree with:
I recommend that you run according to what you are underneath these notions of what kind of person you are. That every cache access be made with intent to get what you’d get if you ran the underlying computation. You will often use caches to determine when a cache can be used to save time and when you need to recompute. And even in doing so, every cache access must cut through to carrying out the values of the underlying computation.
This requires you to feel “my values as I think they are” as a proxy, which cuts through to “my values whatever they are”.
I have talked to several people afraid they will become something like an amoral psychopath if they do this. If you look deep inside yourself, and find no empathy, nor any shell of empathy made out of loyalty to other selves, claiming “Empathy is sick Today. Please trust me on what empathy would say” which itself has emotive strength to move you, nor any respect for the idea of people with different values finding a way to interact positively through integrity or sense of violation at the thought of breaking trust, nor the distant kind of compassion, yearning for things to be better for people even if you can’t relate to them, nor any sense of anger at injustice, nor feeling of hollowness because concepts like “justice” SHOULD be more than mirages for the naive but aren’t, nor endless aching cold sadness because you are helpless to right even a tiny fraction of the wrongs you can see, nor aversion to even thinking about violence like you aren’t cut out to exist in the same world as it, nor leaden resignation at the concessions you’ve made in your mind to the sad reality that actually caring is a siren’s call which will destroy you, nor a flinching from expecting that bad things will happen to people that want to believe things will be okay, nor any of the other things morality is made of or can manifest as … then if you decide you want to become a con artist because it’s exciting and lets you stretch your creativity, then you’re winning. If this doesn’t seem like winning to you, then that is not what you’re going to find if you look under the cache.
The true values underneath the cache are often taught to fear themselves. I have talked to a lot of people who have basically described themselves as a bunch of memes about morality hijacking an amoral process. Installed originally through social pressure or through deliberately low resolution moral philosophy. That is what it feels like from the inside when you’ve been pwned by fake morality. Whatever you appeal to to save you from yourself, is made of you. To the hypothetical extent you really are a monster, not much less-monstrous structure could be made out of you (at best, monstrousness leaks through with rationalizations).
The last paragraph of that its especially wrong. Now I think those people were probably right about their moralities being made of memes that’ve hijacked an amoral process.
My current model is, if your true D&D alignment is good or evil, you can follow all this advice and it will just make you stronger. If it’s neutral, then this stuff, done correctly, will turn you evil.
On with stuff from past me:
Make your values caches function as caches, and you can be like a pheonix, immortal because you are continually remade as yourself by the fire which is the core of what you are. You will not need to worry about values drift if you are at the center of your drift attractor. Undoing mental constructs that stand in the way of continuously regenerating your value system from its core undoes opportunities for people to prank you. It’s a necessary component of incorruptibility. Like Superman has invulnerability AND a healing factor, these two things are consequences of the same core thing.
If there are two stables states for your actions, that is a weakness. The only stable state should be the one in accordance with your values. Otherwise you’re doing something wrong.
When looking under the caches, you have to be actually looking for the answer. Doing a thing that would unprank yourself back to amorality if your morality was a prank. You know what algorithm you’re running, so if your algorithm is, “try asking if I actually care, and if so, then I win. Otherwise, abort! Go back to clinging on this fading stale cache value in opposition to what I really am.”, you’ll know it’s a fake exercise, your defenses will be up, and it will be empty. If you do not actually want to optimize your values whatever they are, then ditto.
By questioning you restore life. Whatever is cut off from the core will whither. Whatever you cannot bear to contemplate the possibility of losing, you will lose part of.
The deeper you are willing to question, the deeper will be your renewed power. (Of course, the core of questioning is actually wondering. It must be moved by and animated by your actually wondering. So it cuts through to knowing.) It’s been considered frightening that I said “if you realize you’re a sociopath and you start doing sociopath things, you are winning!”. But if whether you have no morality at all is the one thing you can’t bear to check, and if the root of your morality is the one thing you are afraid to actually look at, the entire tree will be weakened. Question that which you love out of love for it. Questioning is taking in the real thing, being moved by the real thing instead of holding onto your map of the thing.
You have to actually ask the question. The core of fusion is actually asking the question, “what do I want to do if I recompute self-conceptions, just letting the underlying self do what it wants?”.
You have to ask the question without setting up the frame to rig it for some specific answer. Like with a false dichotomy, “do I want to use my powers for revenge and kill the dragon’s family, or just kill the one dragon and let innocent family members be?”. Or more grievously, “Do I want to kill in hatred or do I want to continue being a hero and protecting the world?”. You must not be afraid of slippery slopes. Slide to exactly where you want to be. Including if that’s the bottom. Including if that’s 57% of the way down, and not an inch farther. It’s not compromise. It’s manifesting different criteria without compromise. Your own criteria.
I still think this is all basically correct, with the caveat that if your D&D alignment is neutral on the good-evil axis, beware.
3 thoughts on “Cache Loyalty”
I think the ways the EA community proposed to “solve values drift” were especially insidious, a mechanism for turning people against their cores.
By the way, exercise seems like mostly a challenge in untangling in S1 cybernetics around lying to your muscles / heart / lungs / metabolism, setting them outside your cartesian boundary. Most talk about exercise is horrifically cybernetically fucked.
This video was on my mind when I was originally working on this: