Something I’ve been building up to for a while.
Epistemic status: Examples are real. Technique seems to work for me, and I don’t use the ontology this is based on and sort of follows from for no reason, but I’m not really sure of all the reasons I believe it, it’s sort of been implicit and in the background for a while.
There is core and there is structure. Core is your unconscious values, that produce feelings about things that need no justification. Structure is habits, cherished self-fulfilling prophecies like my old commitment mechanism, self-image that guides behavior, and learned optimizing style.
Core is simple, but its will is unbreakable. Structure is a thing core generates and uses according to what seems likely to work. Core is often hard to see closely. Its judgements are hard to extrapolate to the vast things in the world beyond our sight that control everything we care about and that might be most of what we care about. There is fake structure, in straightforward service to no core, but serving core through its apparent not-serving of that core, or apparent serving a nonexistent core, and there is structure somewhat serving core but mixed up with outside influence.
Besides that there is structure that is in disagreement with other structure, built in service to snapshots of the landscape of judgement generated by core. That’s an inefficient overall structure to build to serve core, with two substructures fighting each other. Fusion happens at the layer of structure, and is to address this situation. It creates a unified structure which is more efficient.
(S2 contains structure and no core. S1 contains both structure and core.)
You may be thinking at this point, “okay, what are the alleged steps to accomplish fusion?”. This is not a recipe for some chunk of structure directing words and following steps to try rationality techniques to follow, to make changes to the mind, to get rid of akrasia. Otherwise it would fall prey to “just another way of using willpower” just like every other one of those.
It almost is though. It’s a thing to try with intent. The intent is what makes it un-sandboxed. Doing it better makes the fused agent smarter. It must be done with intent to satisfy your true inner values. If you try to have intent to satisfy your true inner values as a means to satisfy externally tainted values, or values / cached derived values that are there to keep appearances, not because they are fundamental, or let some chunk of true inner value win out over other true inner value. If you start out the process / search with the wrong intent, all you can do is stop. You can’t correct your intent as a means of fulfilling your original intent. Just stop, and maybe you will come back later when the right intent becomes salient. The more you try, the more you’ll learn to distrust attempts to get it right. Something along the lines of “deconstruct the wrong intent until you can rebuild a more straightforward thing that naturally lets in the rest” is probably possible, but if you’re not good at the dark side, you will probably fail at that. It’s not the easiest route.
In Treaties vs Fusion, I left unspecified what the utility function of the fused agent would be. I probably gave a misimpression, that it was negotiated in real time by the subagents involved, and then they underwent a binding agreement. Binding agreement is not a primitive in the human brain. A description I can give that’s full of narrative is, it’s about rediscovering the way in which both subagents were the same agent all along, then what was that agent’s utility function?
To try and be more mechanical about it, fusion is not about closing off paths, but building them. This does not mean fusion can’t prevent you from doing things. It’s paths in your mind through what has the power and delegates the power to make decisions, not paths in action-space. Which paths are taken when there are many available is controlled by deeper subagents. You build paths for ever deeper puppetmasters to have ever finer control of how they use surface level structure. Then you undo from its roots the situation of “two subagents in conflict because of only tracking a part of a thing”.
The subagents that decide where to delegate power seem to use heavily the decision criteria, “what intent was this structure built with?”. That is why to build real structure of any sort, you must have sincere intent to use it to satisfy your own values, whatever they are. There are a infinity ways to fuck it up, and no way to defend against all of them, except through wanting to do the thing in the first place because of sincere intent to satisfy your own values, whatever they are.
In trying to finish explaining this, I’ve tried listing out a million safeguards to not fuck it up, but in reality I’ve also done fusion haphazardly, skipping such safeguards, for extreme results, just because at every step I could see deeply that the approximations I was using, the value I was neglecting, would not likely change the results much, and that to whatever extent it did, that was a cost and I treated it as such.
Well-practiced fusion example
High-stakes situations are where true software is revealed in a way that you can be sure of. So here’s an example, when I fused structure for using time efficiently, and structure for avoiding death.
There was a time that me and the other co-founders of Rationalist Fleet were trying to replace lines going through the boom of a sailboat, therefore trying to get it more vertical so that they could be lowered through. The first plan involved pulling it vertical in place, then the climber, Gwen, tying a harness out of rope to climb the mast and get up to the top and lower a rope through. Someone raised a safety concern, and I pulled up the cached thought that I should analyze it in terms of micromorts.
My cached thoughts concerning micromorts were: a micromort was serious business. Skydiving was a seriously reckless thing to do, not the kind of thing someone who took expected utility seriously would do, because of the chance of death. I had seen someone on Facebook pondering if they were “allowed” to go skydiving, for something like the common-in-my-memeplex reasons of, “all value in the universe is after the singularity, no chance of losing billions of years of life is worth a little bit of fun” and/or “all value in the universe is after the singularity, we are at a point of such insane leverage to adjust the future that we are morally required to ignore all terminal value in the present and focus on instrumental value”, but I didn’t remember what was my source for that. So I asked myself, “how much inconvenience is it worth to avoid a micromort? How much weight should I feel attached to this concept to use that piece of utility comparison and attention-orienting software right?”
Things I can remember from that internal dialog mashed together probably somewhat inaccurately, probably not inaccurately in parts that matter.
How much time is a micromort? Operationalize as: how much time is a life? (implicit assumptions: all time equally valuable, no consequences to death other than discontinuation of value from life. Approximation seems adequate). Ugh AI timelines, what is that? Okay, something like 21 years on cached thought. I can update on that. It’s out of date. Approximation feels acceptable…. Wait, it’s assuming median AI timelines are… the right thing to use here. Expected doesn’t feel like it obviously snaps into places as the right answer, I’m not sure which thing to use for expected utility. Approximation feels acceptable… wait, I am approximating utility from me being alive after the singularity as negligible compared to utility from my chance to change the outcome. Feels like an acceptable approximation here. Seriously? Isn’t this bullshit levels of altruism, as in exactly what system 2 “perfectly unselfish” people would do, valuing your own chance at heaven at nothing compared to the chance to make heaven happen for everyone else? …. I mean, there are more other people than there are of me… And here’s that suspicious “righteous determination” feeling again. But I’ve gotten to this point by actually checking at every point if this really was my values… I guess that pattern seems to be continuing if there is a true tradeoff ratio between me and unlimited other people I have not found it yet… at this level of resolution this is an acceptable approximation… Wait, even though chances extra small because this is mostly a simulation? … Yes. Oh yeah, that cancels out…. so, <some math>, 10 minutes is 1 micromort, 1 week is 1 millimort. What the fuck! <double check>. What the fuck! Skydiving loses you more life from the time it takes than the actual chance of death! Every fucking week I’m losing far more life than all the things I used to be afraid of! Also, VOI on AI timelines will probably adjust my chance of dying to random crap on boats by a factor of about 2! …
Losing time started feeling like losing life. I felt much more expendable, significantly less like learning everything perfectly, less automatically inclined to just check off meta boxes until I had the perfect system before really living my life, and slowly closing in on the optimal strategy for everything was the best idea.
This fusion passed something of a grizzly bear test when another sailboat’s rudder broke in high winds later, it was spinning out of control, being tossed by ~4ft wind waves, and being pushed by the current and wind on a collision course for a large metal barge, and had to trade off summoning the quickest rescue against downstream plans being disrupted by the political consequences of that.
This fusion is acknowledgedly imperfect, and skimps noticeably toward the purpose of checking off normal-people-consider-them-different fragments of my value individually. Yet the important thing was that the relevant parts of me knew it was a best effort to satisfy my total values, whatever they were. And if I ever saw a truth obscured by that approximation, of course I’d act on that, and be on the lookout for things around the edges of it like that. The more your thoughts tend to be about trying to use structure, when appropriate, to satisfy your values whatever they are, the easier fusion becomes.
Once you have the right intent, the actual action to accomplish fusion is just running whatever epistemology you have to figure out anew what algorithms to follow to figure out what actions to take to satisfy your values. If you have learned to lean hard on expected utility maximization like me, and are less worried about the lossiness in the approximations required to do that explicitly on limited hardware than you are about the lossiness in doing something else, you can look at a bunch of quantities representing things you value in certain ranges where the value is linear in how much of them, and try and feel out tradeoff ratios, and what those are conditional on so you know when to abandon that explicit framework, how to notice when you are outside approximated linear ranges, or when there’s an opportunity to solve the fundamental problems that some linear approximations are based on.
The better you learn what structure is really about, the more you can transform it into things that look more and more like expected utility maximization. As long as expected utility maximization is a structure you have taken up because of its benefits to your true values. (Best validated through trial and error in my opinion.)
Fusion is a dark side technique because it is a shortcut in the process of building structure outward, a way to deal with computational constraints, and make use of partial imperfect existing structure.
If boundaries between sections of your value are constructed concepts, then there is no hard line between fusing chunks of machinery apparently aimed at broadly different subsets of your value, and fusing chunks of machinery aimed at the same sets of values. Because from a certain perspective, neglecting all but some of your values is approximating all of your values as some of your values. Approximating as in an inaccuracy you accept for reasons of computational limits, but which is nonetheless a cost. And that’s the perspective that matters because that’s what the deeper puppetmasters are using those subagents as.
By now, it feels like wrestling with computational constraints and trying to make approximations wisely to me, not mediating a dispute. Which is a sign of doing it right.
Early fusion example
Next I’ll present an older example of a high-stakes fusion of mine, which was much more like resolving a dispute, therefore with a lot more mental effort spent on verification of intent, and some things which may not have been necessary because I was fumbling around trying to discover the technique.
It had surfaced to my attention that I was trans. I’m not really sure how aware of that I was before. In retrospect, I remember thinking so at one point about a year earlier, deciding, “transition would interfere with my ability to make money due to discrimination, and destroy too great a chunk of my tiny probability of saving the world. I’m not going to spend such a big chunk of my life on that. So it doesn’t really matter, I might as well forget about it.” Which I did, for quite a while, even coming to think for a while that a later date was the first time I realized I was trans. (I know a trans woman who I knew before social transition who was taking hormones then, who still described herself as realizing she was trans several months later. And I know she had repeatedly tried to get hormones years before, which says something about the shape of this kind of realization.)
At the time of this realization, I was in the midst of my turn to the dark side. I was valuing highly the mental superpowers I was getting from that, and this created tension. I was very afraid that I had to choose either to embrace light side repression, thereby suffering and being weaker, or transition and thereafter be much less effective. In part because the emotions were disrupting my sleep. In part because I had never pushed the dark side this far, and I expected that feeling emotions counteracting these emotions all the time, which is what I expected to be necessary for the dark side to “work”, was impossible. There wasn’t room in my brain for that much emotion at once and still being able to do anything. So I spent a week not knowing what to do, feeling anxious, not being able to really think about work, and not being able to sleep well.
One morning, biking to work, my thoughts still consumed by this dilemma, I decided not to use the light side. “Well, I’m a Sith now. I am going to do what I actually [S1] want to no matter what.” If not transitioning in order to pander to awful investors later on, and to have my entire life decided by those conversations was what I really wanted, I wouldn’t stop myself, but I had to actually choose it, constantly, with my own continual compatibilist free will.
Then I suddenly felt viscerally afraid of not being able to feel all the things that mattered to me, or of otherwise screwing up the decision. Afraid of not being able to foresee how bad never transitioning would feel. Afraid of not understanding what I’d be missing if I was never in a relationship because of it. Afraid of not feeling things over future lives I could impact just because of limited ability to visualize them. Afraid of deceiving myself about my values in the direction that I was more altruistic than I was, based on internalizing a utility function society had tried to corrupt me with. And I felt a thing my past self chose to characterize as “Scream of the Sword of Good (not outer-good, just the thing inside me that seemed well-pointed to by that)”, louder than I had before.
I re-made rough estimates for how much suffering would come from not transitioning, and how much loss of effectiveness would come from transitioning. I estimated a 10%-40% reduction in expected impact I could have on the world if I transitioned. (At that time, I expected that most things would depend on business with people who would discriminate, perhaps subconsciously. I was 6’2″ and probably above average in looks as a man, which I thought’d be a significant advantage to give up.)
I sort of looked in on myself from the outside, and pointed my altruism thingy on myself, and noted that it cared about me, even as just-another-person. Anyone being put in this situation was wrong, and that did not need to be qualified.
I switched to thinking of it from the perspective of virtue ethics, because I thought of that as a separate chunk of value back then. It was fucked up that whatever thing I did, I was compromising in who I would be.
The misfit with my body and the downstream suffering was a part of the Scream.
I sort of struggled mentally within the confines of the situation. Either I lost one way, or I lost the other. My mind went from from bouncing between them to dwelling on the stuckness of having been forked between them. Which seemed just. I imagined that someone making Sophies Choice might allow themselves to be divided, “Here is a part of me that wants to save this child, and here is a part of me that wants to save that child, and I hate myself for even thinking about not saving this child, and I hate myself for even thinking about not saving that child. It’s tearing me apart…”, but the just target of their fury would have been whoever put you in that fork in the first place. Being torn into belligerent halves was making the wrongness too successful.
My negative feelings turned outward, and merged into a single felt sense of bad. I poke at the unified bad with two plans to alleviate it. Transition and definitely knock out this source of bad, or don’t transition and maybe have a slightly better chance of knocking out another source of bad.
I held in mind the visceral fear of deceiving myself in the direction of being more altruistic than I was. I avoided a train of thought like, “These are the numbers and I have to multiply out and extrapolate…” When I was convinced that I was avoiding that successfully, and just seeing how I felt about the raw things, I noticed I had an anticipation of picking “don’t transition”, whereas when I started this thought process, I had sort of expected it to be a sort of last double check / way to come to terms with needing to give things up in order to transition.
I reminded myself, “But I can change my mind at any time. I do not make precommitments. Only predictions.”. I reminded myself that my estimate of the consequences of transitioning was tentative and that a lot of things could change it. But conditional on that size of impact, it seemed pretty obvious to me that trying to pull a Mulan was what I wanted to do. There were tears in my eyes and I felt filled with terrible resolve. My anxiety symptoms went away over the next day. I became extremely productive, and spent pretty much every waking hour over the next month either working or reading things to try to understand strategy for affecting the future. Then I deliberately tried to reboot my mind starting with something more normal because I became convinced the plan I’d just put together and started preliminary steps of negative in expectation, and predictably because I was running on bitter isolation and Overwhelming Determination To Save The World at every waking moment. I don’t remember exactly how productive I was after that, but there was much less in-the-moment-strong-emotional-push-to-do-the-next-thing. I had started a shift toward a mental architecture that was much more about continually rebuilding ontology than operating within it.
I became somewhat worried that the dark side had stopped working, based on strong emotions being absent, although, judging from my actions, I couldn’t really point to something that I thought was wrong. I don’t think it had stopped working. Two lessons there are, approximately: emotions are about judgements of updates to your beliefs. If you are not continually being surprised somehow, you should not be expected to continually feel strong emotions. And, being strongly driven to accomplish something when you know you don’t know how, feels listlessly frustrating when you’re trying to take the next action: figure out what to do from a yang perspective, but totally works. It just requires yin.
If you want to know how to do this: come up with the best plan you can, ask, “will it work?”, ask yourself if you are satisfied with the (probably low) probability you came up with. If it does not automatically feel like, “Dang, this is so good, explore done, time to exploit”, which it probably actually won’t unless you use hacky self-compensating heuristics to do that artificially, or it’s a strongly convergent instrumental goal bottlenecking most of what else you could do. If you believe the probability that the world will be saved (say), is very small, do not say, “Well, I’m doing my part”, unless you are actually satisfied to do your part and then for the world to die. Do not say, “This is the best I can do, I have to do something”, unless you are actually satisfied to do your best, and to have done something, and then for the world to die. That unbearable impossibility and necessity is your ability to think. Stay and accept its gifts of seeing what won’t work. Move through all the ways of coming up with a plan you have unless you find something that is satisfying. You are allowed to close in on an action which will give a small probability of success, and consume your whole life, but that must come out of the even more terrible feeling of exhausted all your ability to figure things out. I’d be surprised if there wasn’t a plan to save the world that would work if handed to an agenty human. If one plan seems like it seems to absorb every plan, and yet still doesn’t seem like you understand the inevitability of only that high a probability of success, then perhaps your frame inevitably leads into that plan, and if that frame cannot be invalidated by your actions, then the world is doomed. Then what? (Same thing, just another level back.)
Being good at introspection, and determining what exactly was behind a thought is very important. I’d guess I’m better at this than anyone who hasn’t deliberately practiced it for at least months. There’s a significant chunk of introspective skill which can be had from not wanting to self-deceive, but some of it is actually just objectively hard. It’s one of several things that can move you toward a dark side mental architecture, which all benefit from each other, to making the pieces actually useful.