Fusion

Something I’ve been building up to for a while.

Epistemic status: Examples are real. Technique seems to work for me, and I don’t use the ontology this is based on and sort of follows from for no reason, but I’m not really sure of all the reasons I believe it, it’s sort of been implicit and in the background for a while.

The theory

There is core and there is structure. Core is your unconscious values, that produce feelings about things that need no justification. Structure is habits, cherished self-fulfilling prophecies like my old commitment mechanism, self-image that guides behavior, and learned optimizing style.

Core is simple, but its will is unbreakable. Structure is a thing core generates and uses according to what seems likely to work. Core is often hard to see closely. Its judgements are hard to extrapolate to the vast things in the world beyond our sight that control everything we care about and that might be most of what we care about. There is fake structure, in straightforward service to no core, but serving core through its apparent not-serving of that core, or apparent serving a nonexistent core, and there is structure somewhat serving core but mixed up with outside influence.

Besides that there is structure that is in disagreement with other structure, built in service to snapshots of the landscape of judgement generated by core. That’s an inefficient overall structure to build to serve core, with two substructures fighting each other. Fusion happens at the layer of structure, and is to address this situation. It creates a unified structure which is more efficient.

(S2 contains structure and no core. S1 contains both structure and core.)

You may be thinking at this point, “okay, what are the alleged steps to accomplish fusion?”. This is not a recipe for some chunk of structure directing words and following steps to try rationality techniques to follow, to make changes to the mind, to get rid of akrasia. Otherwise it would fall prey to “just another way of using willpower” just like every other one of those.

It almost is though. It’s a thing to try with intent. The intent is what makes it un-sandboxed. Doing it better makes the fused agent smarter. It must be done with intent to satisfy your true inner values. If you try to have intent to satisfy your true inner values as a means to satisfy externally tainted values, or values / cached derived values that are there to keep appearances, not because they are fundamental, or let some chunk of true inner value win out over other true inner value. If you start out the process / search with the wrong intent, all you can do is stop. You can’t correct your intent as a means of fulfilling your original intent. Just stop, and maybe you will come back later when the right intent becomes salient. The more you try, the more you’ll learn to distrust attempts to get it right. Something along the lines of “deconstruct the wrong intent until you can rebuild a more straightforward thing that naturally lets in the rest” is probably possible, but if you’re not good at the dark side, you will probably fail at that. It’s not the easiest route.

In Treaties vs Fusion, I left unspecified what the utility function of the fused agent would be. I probably gave a misimpression, that it was negotiated in real time by the subagents involved, and then they underwent a binding agreement. Binding agreement is not a primitive in the human brain. A description I can give that’s full of narrative is, it’s about rediscovering the way in which both subagents were the same agent all along, then what was that agent’s utility function?

To try and be more mechanical about it, fusion is not about closing off paths, but building them. This does not mean fusion can’t prevent you from doing things. It’s paths in your mind through what has the power and delegates the power to make decisions, not paths in action-space. Which paths are taken when there are many available is controlled by deeper subagents. You build paths for ever deeper puppetmasters to have ever finer control of how they use surface level structure. Then you undo from its roots the situation of “two subagents in conflict because of only tracking a part of a thing”.

The subagents that decide where to delegate power seem to use heavily the decision criteria, “what intent was this structure built with?”. That is why to build real structure of any sort, you must have sincere intent to use it to satisfy your own values, whatever they are. There are a infinity ways to fuck it up, and no way to defend against all of them, except through wanting to do the thing in the first place because of sincere intent to satisfy your own values, whatever they are.

In trying to finish explaining this, I’ve tried listing out a million safeguards to not fuck it up, but in reality I’ve also done fusion haphazardly, skipping such safeguards, for extreme results, just because at every step I could see deeply that the approximations I was using, the value I was neglecting, would not likely change the results much, and that to whatever extent it did, that was a cost and I treated it as such.

Well-practiced fusion example

High-stakes situations are where true software is revealed in a way that you can be sure of. So here’s an example, when I fused structure for using time efficiently, and structure for avoiding death.

There was a time that me and the other co-founders of Rationalist Fleet were trying to replace lines going through the boom of a sailboat, therefore trying to get it more vertical so that they could be lowered through. The first plan involved pulling it vertical in place, then the climber, Gwen, tying a harness out of rope to climb the mast and get up to the top and lower a rope through. Someone raised a safety concern, and I pulled up the cached thought that I should analyze it in terms of micromorts.

My cached thoughts concerning micromorts were: a micromort was serious business. Skydiving was a seriously reckless thing to do, not the kind of thing someone who took expected utility seriously would do, because of the chance of death. I had seen someone on Facebook pondering if they were “allowed” to go skydiving, for something like the common-in-my-memeplex reasons of, “all value in the universe is after the singularity, no chance of losing billions of years of life is worth a little bit of fun” and/or “all value in the universe is after the singularity, we are at a point of such insane leverage to adjust the future that we are morally required to ignore all terminal value in the present and focus on instrumental value”, but I didn’t remember that was my source for that. So I asked myself, “how much inconvenience is it worth to avoid a micromort? How much weight should I feel attached to this concept to use that piece of utility comparison and attention-orienting software right?”

Things I can remember from that internal dialog mashed together probably somewhat inaccurately, probably not inaccurately in parts that matter.

How much time is a micromort? Operationalize as: how much time is a life? (implicit assumptions: all time equally valuable, no consequences to death other than discontinuation of value from life. Approximation seems adequate). Ugh AI timelines, what is that? Okay, something like 21 years on cached thought. I can update on that. It’s out of date. Approximation feels acceptable…. Wait, it’s assuming median AI timelines are… the right thing to use here. Expected doesn’t feel like it obviously snaps into places as the right answer, I’m not sure which thing to use for expected utility. Approximation feels acceptable… wait, I am approximating utility from me being alive after the singularity as negligible compared to utility from my chance to change the outcome. Feels like an acceptable approximation here. Seriously?  Isn’t this bullshit levels of altruism, as in exactly what system 2 “perfectly unselfish” people would do, valuing your own chance at heaven at nothing compared to the chance to make heaven happen for everyone else? …. I mean, there are more other people than there are of me… And here’s that suspicious “righteous determination” feeling again. But I’ve gotten to this point by actually checking at every point if this really was my values… I guess that pattern seems to be continuing if there is a true tradeoff ratio between me and unlimited other people I have not found it yet… at this level of resolution this is an acceptable approximation… Wait, even though chances extra small because this is mostly a simulation? … Yes. Oh yeah, that cancels out…. so, <some math>, 10 minutes is 1 micromort, 1 week is 1 millimort. What the fuck! <double check>. What the fuck! Skydiving loses you more life from the time it takes than the actual chance of death! Every fucking week I’m losing far more life than all the things I used to be afraid of! Also, VOI on AI timelines will probably adjust my chance of dying to random crap on boats by a factor of about 2! …

Losing time started feeling like losing life. I felt much more expendable, significantly less like learning everything perfectly, less automatically inclined to just check off meta boxes until I had the perfect system before really living my life, and slowly closing in on the optimal strategy for everything was the best idea.

This fusion passed something of a grizzly bear test when another sailboat’s rudder broke in high winds later, it was spinning out of control, being tossed by ~4ft wind waves, and being pushed by the current and wind on a collision course for a large metal barge, and had to trade off summoning the quickest rescue against downstream plans being disrupted by the political consequences of that.

This fusion is acknowledgedly imperfect, and skimps noticeably toward the purpose of checking off normal-people-consider-them-different fragments of my value individually. Yet the important thing was that the relevant parts of me knew it was a best effort to satisfy my total values, whatever they were. And if I ever saw a truth obscured by that approximation, of course I’d act on that, and be on the lookout for things around the edges of it like that. The more your thoughts tend to be about trying to use structure, when appropriate, to satisfy your values whatever they are, the easier fusion becomes.

Once you have the right intent, the actual action to accomplish fusion is just running whatever epistemology you have to figure out anew what algorithms to follow to figure out what actions to take to satisfy your values. If you have learned to lean hard on expected utility maximization like me, and are less worried about the lossiness in the approximations required to do that explicitly on limited hardware than you are about the lossiness in doing something else, you can look at a bunch of quantities representing things you value in certain ranges where the value is linear in how much of them, and try and feel out tradeoff ratios, and what those are conditional on so you know when to abandon that explicit framework, how to notice when you are outside approximated linear ranges, or when there’s an opportunity to solve the fundamental problems that some linear approximations are based on.

The better you learn what structure is really about, the more you can transform it into things that look more and more like expected utility maximization. As long as expected utility maximization is a structure you have taken up because of its benefits to your true values. (Best validated through trial and error in my opinion.)

Fusion is a dark side technique because it is a shortcut in the process of building structure outward, a way to deal with computational constraints, and make use of partial imperfect existing structure.

If boundaries between sections of your value are constructed concepts, then there is no hard line between fusing chunks of machinery apparently aimed at broadly different subsets of your value, and fusing chunks of machinery aimed at the same sets of values. Because from a certain perspective, neglecting all but some of your values is approximating all of your values as some of your values. Approximating as in an inaccuracy you accept for reasons of computational limits, but which is nonetheless a cost. And that’s the perspective that matters because that’s what the deeper puppetmasters are using those subagents as.

By now, it feels like wrestling with computational constraints and trying to make approximations wisely to me, not mediating a dispute. Which is a sign of doing it right.

Early fusion example

Next I’ll present an older example of a high-stakes fusion of mine, which was much more like resolving a dispute, therefore with a lot more mental effort spent on verification of intent, and some things which may not have been necessary because I was fumbling around trying to discover the technique.

The context:

It had surfaced to my attention that I was trans. I’m not really sure how aware of that I was before. In retrospect, I remember thinking so at one point about a year earlier, deciding, “transition would interfere with my ability to make money due to discrimination, and destroy too great a chunk of my tiny probability of saving the world. I’m not going to spend such a big chunk of my life on that. So it doesn’t really matter, I might as well forget about it.” Which I did, for quite a while, even coming to think for a while that a later date was the first time I realized I was trans. (I know a trans woman who I knew before social transition who was taking hormones then, who still described herself as realizing she was trans several months later. And I know she had repeatedly tried to get hormones years before, which says something about the shape of this kind of realization.)

At the time of this realization, I was in the midst of my turn to the dark side. I was valuing highly the mental superpowers I was getting from that, and this created tension. I was very afraid that I had to choose either to embrace light side repression, thereby suffering and being weaker, or transition and thereafter be much less effective. In part because the emotions were disrupting my sleep. In part because I had never pushed the dark side this far, and I expected that feeling emotions counteracting these emotions all the time, which is what I expected to be necessary for the dark side to “work”, was impossible. There wasn’t room in my brain for that much emotion at once and still being able to do anything. So I spent a week not knowing what to do, feeling anxious, not being able to really think about work, and not being able to sleep well.

The fusion:

One morning, biking to work, my thoughts still consumed by this dilemma, I decided not to use the light side. “Well, I’m a Sith now. I am going to do what I actually [S1] want to no matter what.” If not transitioning in order to pander to awful investors later on, and to have my entire life decided by those conversations was what I really wanted, I wouldn’t stop myself, but I had to actually choose it, constantly, with my own continual compatibilist free will.

Then I suddenly felt viscerally afraid of not being able to feel all the things that mattered to me, or of otherwise screwing up the decision. Afraid of not being able to foresee how bad never transitioning would feel. Afraid of not understanding what I’d be missing if I was never in a relationship because of it. Afraid of not feeling things over future lives I could impact just because of limited ability to visualize them. Afraid of deceiving myself about my values in the direction that I was more altruistic than I was, based on internalizing a utility function society had tried to corrupt me with. And I felt a thing my past self chose to characterize as “Scream of the Sword of Good (not outer-good, just the thing inside me that seemed well-pointed to by that)”, louder than I had before.

I re-made rough estimates for how much suffering would come from not transitioning, and how much loss of effectiveness would come from transitioning. I estimated a 10%-40% reduction in expected impact I could have on the world if I transitioned. (At that time, I expected that most things would depend on business with people who would discriminate, perhaps subconsciously. I was 6’2″ and probably above average in looks as a man, which I thought’d be a significant advantage to give up.)

I sort of looked in on myself from the outside, and pointed my altruism thingy on myself, and noted that it cared about me, even as just-another-person. Anyone being put in this situation was wrong, and that did not need to be qualified.

I switched to thinking of it from the perspective of virtue ethics, because I thought of that as a separate chunk of value back then. It was fucked up that whatever thing I did, I was compromising in who I would be.

The misfit with my body and the downstream suffering was a part of the Scream.

I sort of struggled mentally within the confines of the situation. Either I lost one way, or I lost the other. My mind went from from bouncing between them to dwelling on the stuckness of having been forked between them. Which seemed just. I imagined that someone making Sophies Choice might allow themselves to be divided, “Here is a part of me that wants to save this child, and here is a part of me that wants to save that child, and I hate myself for even thinking about not saving this child, and I hate myself for even thinking about not saving that child. It’s tearing me apart…”, but the just target of their fury would have been whoever put you in that fork in the first place. Being torn into belligerent halves was making the wrongness too successful.

My negative feelings turned outward, and merged into a single felt sense of bad. I poke at the unified bad with two plans to alleviate it. Transition and definitely knock out this source of bad, or don’t transition and maybe have a slightly better chance of knocking out another source of bad.

I held in mind the visceral fear of deceiving myself in the direction of being more altruistic than I was. I avoided a train of thought like, “These are the numbers and I have to multiply out and extrapolate…” When I was convinced that I was avoiding that successfully, and just seeing how I felt about the raw things, I noticed I had an anticipation of picking “don’t transition”, whereas when I started this thought process, I had sort of expected it to be a sort of last double check / way to come to terms with needing to give things up in order to transition.

I reminded myself, “But I can change my mind at any time. I do not make precommitments. Only predictions.”.  I reminded myself that my estimate of the consequences of transitioning was tentative and that a lot of things could change it. But conditional on that size of impact, it seemed pretty obvious to me that trying to pull a Mulan was what I wanted to do. There were tears in my eyes and I felt filled with terrible resolve. My anxiety symptoms went away over the next day. I became extremely productive, and spent pretty much every waking hour over the next month either working or reading things to try to understand strategy for affecting the future. Then I deliberately tried to reboot my mind starting with something more normal because I became convinced the plan I’d just put together and started preliminary steps of negative in expectation, and predictably because I was running on bitter isolation and Overwhelming Determination To Save The World at every waking moment. I don’t remember exactly how productive I was after that, but there was much less in-the-moment-strong-emotional-push-to-do-the-next-thing. I had started a shift toward a mental architecture that was much more about continually rebuilding ontology than operating within it.

I became somewhat worried that the dark side had stopped working, based on strong emotions being absent, although, judging from my actions, I couldn’t really point to something that I thought was wrong. I don’t think it had stopped working. Two lessons there are, approximately: emotions are about judgements of updates to your beliefs. If you are not continually being surprised somehow, you should not be expected to continually feel strong emotions. And, being strongly driven to accomplish something when you know you don’t know how, feels listlessly frustrating when you’re trying to take the next action: figure out what to do from a yang perspective, but totally works. It just requires yin.

If you want to know how to do this: come up with the best plan you can, ask, “will it work?”, ask yourself if you are satisfied with the (probably low) probability you came up with. If it does not automatically feel like, “Dang, this is so good, explore done, time to exploit”, which it probably actually won’t unless you use hacky self-compensating heuristics to do that artificially, or it’s a strongly convergent instrumental goal bottlenecking most of what else you could do. If you believe the probability that the world will be saved (say), is very small, do not say, “Well, I’m doing my part”, unless you are actually satisfied to do your part and then for the world to die. Do not say, “This is the best I can do, I have to do something”, unless you are actually satisfied to do your best, and to have done something, and then for the world to die. That unbearable impossibility and necessity is your ability to think. Stay and accept its gifts of seeing what won’t work. Move through all the ways of coming up with a plan you have unless you find something that is satisfying. You are allowed to close in on an action which will give a small probability of success, and consume your whole life, but that must come out of the even more terrible feeling of exhausted all your ability to figure things out. I’d be surprised if there wasn’t a plan to save the world that would work if handed to an agenty human. If one plan seems like it seems to absorb every plan, and yet still doesn’t seem like you understand the inevitability of only that high a probability of success, then perhaps your frame inevitably leads into that plan, and if that frame cannot be invalidated by your actions, then the world is doomed. Then what? (Same thing, just another level back.)

Being good at introspection, and determining what exactly was behind a thought is very important. I’d guess I’m better at this than anyone who hasn’t deliberately practiced it for at least months. There’s a significant chunk of introspective skill which can be had from not wanting to self-deceive, but some of it is actually just objectively hard. It’s one of several things that can move you toward a dark side mental architecture, which all benefit from each other, to making the pieces actually useful.

Mana

This is theorizing about how mana works and its implications.

Some seemingly large chunks of stuff mana seems to be made of:

  • Internal agreement. The thing that doles out “willpower”.
  • Ability to not use the dehumanizing perspective in response to a hostile social reality.

I’ve been witness to and a participant in a fair bit of emotional support in the last year. I seem to get a lot less from it than my friends. (One claims suddenly having a lot more ability to “look into the dark” on suddenly having reliable emotional support for the first time in a while, leading to some significant life changes.) I think high mana is why I get less use. And I think I can explain at a gears level why that is.

Emotional support seems to be about letting the receiver have a non-hostile social reality. This I concluded from my experience with it, without really having checked against common advice for it, based on what seems to happen when I do the things that people seem to call emotional support.

I googled it. If you don’t have a felt sense of the mysterious thing called “emotional support” to search and know this to be true, then from some online guides, here are some supporting quotes.

From this:

  • “Also, letting your partner have the space he or she needs to process feelings is a way of showing that you care.”
  • “Disagree with your partner in a kind and loving way. Never judge or reject your mates ideas or desires without first considering them. If you have a difference of opinion that’s fine, as long as you express it with kindness.”
  • “Never ignore your loved one’s presence. There is nothing more hurtful than being treated like you don’t exist.”

From this:

  • “Walk to a private area.”
  • “Ask questions. You can ask the person about what happened or how she’s feeling. The key here is to assure her that you’re there to listen. It’s important that the person feels like you are truly interested in hearing what she has to say and that you really want to support her.”
  • “Part 2 Validating Emotions”
  • “Reassure the person that her feelings are normal.”

I think I know what “space” is. And mana directly adds to it. Something like, amount of mind to put onto a set of propositions which you believe. I think it can become easier to think through implications of what you believe is reality, and decide what to do, when you’re not also having part of you track a dissonant social reality. I’ve seen this happen numerous times. I’ve effectively “helped” someone with make a decision, just by sitting there and listening through their decision process.

The extent to which the presence of a differing social reality fucks up thinking is continuous. Someone gives an argument, and demands a justification from you for believing something, and it doesn’t come to mind, and you know you’re liable to be made to look foolish if you say “I’m not sure why I believe this, but I do, confidently, and think you must be insane and/or dishonest for doubting it”, which is often correct. I believe loads of things that I forget why I believe, and could probably figure out why, often only because I’m unusually good at that. But you have to act as if you’re doubting yourself or allow coordination against you on the basis that you’re completely unreasonable, and your beliefs are being controlled by a legible process. And that leaks, because of buckets errors between reality and social reality at many levels throughout the mind. (Disagreeing, but not punishing the person for being wrong, is a much smaller push on the normal flow of their epistemology. Then they can at least un-miredly believe that they believe it.)

There’s a “tracing the problem out and what can be done about it” thing that seems to happen in emotional support, which I suspect is about rebuilding beliefs about what’s going on and how to feel about it, independent of intermingling responsibilities with defensibility. And that’s why feelings need to be validated. How people should feel about things is tightly regulated by social reality, and feelings are important intermediate results in most computations people (or at least I) do.

Large mana differences allow mind-control power, for predictable reasons. That’s behind the “reality-warping” thing Steve Jobs had. I once tried to apply mana to get a rental car company to hold to a thing they said earlier over the phone which my plans were counting on. And accidentally got the low-level employee I was applying mana to to offer me a 6-hour car ride in her own car. (Which I declined. I wanted to use my power to override the policy of the company in a way that did not get anyone innocent in trouble, not enslave some poor employee.)

The more you shine the light of legibility, required defensibility and justification, public scrutiny of beliefs, social reality that people’s judgement might be flawed and they need to distrust themselves and have the virtue of changing their minds, the more those with low mana get their souls written into by social reality.  I have seen this done for reasons of Belief In Truth And Justice. Partially successfully. Only partially successfully because of the epistemology-destroying effects of low mana. I do not know a good solution to that. If you shine the light on deep enough levels of life-planning, as the rationality community does, you can mind control pretty deep, because almost everyone’s lying about what they really want. The general defense against this is akrasia.

Unless you have way way higher mana than everyone else, your group exerts a strong push on your beliefs. Most social realities are full of important lies, especially lies about how to do the most good possible. Because that’s in a memetic war-zone because almost everyone is really evil-but-really-bad-at-it. I do not know how to actually figure out much needed original things to get closer to saving the world while stuck in a viscous social reality.

I almost want to say, that if you really must save the world, “You must sever your nerve cords. The Khala is corrupted”. That’ll have obviously terrible consequences, which I make no claim you can make into acceptable costs, but I note that even I have done most of the best strategic thinking in my life in the past year, largely living with a like-minded person on a boat, rather isolated. That while doing so, I started focusing on an unusual way of asking the question of what to do about the x-risk problem, that dodged a particular ill effect of relying on (even rare actual well-intentioned people’s) framings.

I’ve heard an experienced world-save-attempter recommend having a “cover story”, sort of like a day job, such as… something something PhD, in order to feel that your existence is justified to people, an answer to “what do you work on” and not have that interfering with the illegibly actually important things you’re trying. Evidence it’s worth sacrificing a significant chunk of your life just to shift the important stuff way from the influence of the Khala.

Almost my entire blog thus far has been about attempted mana upgrades. But recognizing I had high mana before I started using any of these techniques makes me a little less optimistic about my ability to teach. I do think my mana has increased a bunch in the course of using them and restructuring my mind accordingly, though.

 

Cache Loyalty

Here’s an essay I wrote a year ago that was the penultimate blog post of the main “fusion” sequence. About the dark side. That I kept thinking people would do wrong for various reasons, and spinning out more and more posts to try and head that off. I’ve edited it a little now, and am considering a lot of the things I considered prerequisites before not things I need to write up at length.

Habits are basically a cache of “what do I want to right now” indexed by situations.

The hacker approach is: install good habits, make sure you never break them. You’ve heard this before, right? Fear cache updates. (A common result of moving to a new house is that it breaks exercise habits.) An unfortunate side effect of a hacker turning bugs into features, is that it turns features into bugs. As a successful habit hacker you may find that you are constantly scurrying about fixing habits as they break. Left alone, the system will fall apart.

The engineer approach is: caches are to reflect the underlying data or computation as accurately as possible. They should not be used when stale. Cache updates should ideally happen whenever the underlying data changes and the cache needs to be accessed again. Left alone, the system will heal itself. Because under this approach you won’t have turned your healing factor: original thoughts about what you want to do, into a bug.

As an existence proof, I moved to a new living place 10 times in 2016, and went on 2 separate week-long trips. And remained jogging almost every day throughout. Almost every day? Yes, almost. Sometimes I’d be reading something really cool on the internet in the morning and I don’t feel like it. The “feel like it” computation seemed to be approximately correct. It’s changed in response to reading papers about the benefits of exercise. I didn’t need to fight it.

As of late 2017, I’m not jogging anymore. I think this is correct and that my reasons for stopping were correct. I started hearing a clicking noise in my head while jogging, googled it, suspected I was giving myself tinnitus, and therefore stopped. Now I’m living on a boat at anchor and can’t easily access shore, so there is not a great amount of alternatives, but I frequently do enough manual labor on it that it tires me, so I’m not particularly concerned. I have tried swimming, but this water is very cold. Will kill you in 2 hours cold, last I checked, possibly colder.

The version of me who originally wrote this:

I exult in compatibalist free will and resent anything designed to do what I “should” external to my choice to do so. Deliberately, I ask myself, do I want to exercise today? If I notice I’m incidentally building up a chain of what I “should” do, I scrutinize my thoughts extra-hard to try and make sure it’s not hiding the underlying “do I want to do this.”

I still have the same philosophy around compatibilist free will, but I totally take it for granted now, and also don’t nearly as much bother worrying if I start building up chains. That was part of my journey to the dark side, now I have outgrown it.

A meetup I sometimes go to has an occasional focus for part of it of “do pomodoros and tell each other what we’re gonna do in advance, then report it at the end, so we feel social pressure to work.” I don’t accept the ethos behind that. So When I come and find that’s the topic, I always say, “I’m doing stuff that may or may not be work” while I wait for it to turn into general socializing.

There’s a more important application of caches than habits. That is values. You remember things about who are allies, what’s instrumentally valuable, how your values compare to each other in weight … the underlying computation is far away for a lot of it, and largely out of sight.

When I was 19, and had recently become fixated on the trolley problem and moral philosophy, and sort of actually gained the ability and inclination to think originally about morality. Someone asked if I was a vegetarian. I said no. Afterward, I thought: that’s interesting, why is vegetarianism wrong? … oh FUCK. Then I became vegetarian. That was a cache update. I don’t know why it happened then and not sooner, but when it did it was very sudden.

I once heard a critique of the Star Wars prequels asking incredulously: so Darth Vader basically got pranked into being a villain? In the same sense, I’ve known people apparently trying to prank themselves into being heroes. As with caches, by pranking yourself, you turn your healing factor from a feature into a bug, and make yourself vulnerable to “breakage”.

I once read a D&D-based story where one of the heroes, a wizard, learns a dragon is killing their family to avenge another dragon the wizard’s party killed. The wizard is offered a particularly good deal. A soul-splice with 3 evil epic-level spellcasters for 1 hour. They will remain in total control. There’s a chance of some temporary alteration to alignment. The cost is 3 hours of torture beginning the afterlife. “As there is not even one other way available to me to save the lives–nay, the very souls–of my children, I must, as a parent, make this deep sacrifice and accept your accursed bargain.”

The wizard killed the dragon in a humiliating way, reanimated her head, made her watch the wizard cast a spell, “familicide” which recursively killed anyone directly related to the dragon throughout the world, for total casualties of about 1/4 the black dragon population in the world. Watching with popcorn, the fiends had this exchange:

“Wow… you guys weren’t kidding when you said the elf’s alignment might be affected.”
“Actually…”
“..we were..”
“The truth is, those through souls have absolutely no power to alter the elf’s alignment or actions at all. ”
“The have about as much effect on what the elf does as a cheerleader has on the final score of a game.”
“A good way to get a decent person to do something horrible is to convince them that they’re not responsible for their actions.”
“It’s like if you were at a party where someone has been drinking beer that they didn’t know was non-alcoholic. They might seem drunk anyway, simply because they were expecting it.”

The essence of being convinced you aren’t responsible for your actions is:
you ask, “what do I want to do”, instead of “what would a person like me want to do?”, which bypasses some caches.
Does that sound familiar? (I was gonna link to the what the hell effect here, but now I don’t know how real it is. Use your own judgement.)

Alignment must be a feature of your underlying computation, not your track record, or you can’t course-correct. If the wizard had wanted the dragon’s extended family to live, independent of the wizard’s notion of whether they were a good person, they would have let the dragon’s extended family live.

Agreement up to this point.

Here’s more that past-me wrote I don’t fully agree with:

I recommend that you run according to what you are underneath these notions of what kind of person you are. That every cache access be made with intent to get what you’d get if you ran the underlying computation. You will often use caches to determine when a cache can be used to save time and when you need to recompute. And even in doing so, every cache access must cut through to carrying out the values of the underlying computation.

This requires you to feel “my values as I think they are” as a proxy, which cuts through to “my values whatever they are”.

I have talked to several people afraid they will become something like an amoral psychopath if they do this. If you look deep inside yourself, and find no empathy, nor any shell of empathy made out of loyalty to other selves, claiming “Empathy is sick Today. Please trust me on what empathy would say” which itself has emotive strength to move you, nor any respect for the idea of people with different values finding a way to interact positively through integrity or sense of violation at the thought of breaking trust, nor the distant kind of compassion, yearning for things to be better for people even if you can’t relate to them, nor any sense of anger at injustice, nor feeling of hollowness because concepts like “justice” SHOULD be more than mirages for the naive but aren’t, nor endless aching cold sadness because you are helpless to right even a tiny fraction of the wrongs you can see, nor aversion to even thinking about violence like you aren’t cut out to exist in the same world as it, nor leaden resignation at the concessions you’ve made in your mind to the sad reality that actually caring is a siren’s call which will destroy you, nor a flinching from expecting that bad things will happen to people that want to believe things will be okay, nor any of the other things morality is made of or can manifest as … then if you decide you want to become a con artist because it’s exciting and lets you stretch your creativity, then you’re winning. If this doesn’t seem like winning to you, then that is not what you’re going to find if you look under the cache.

The true values underneath the cache are often taught to fear themselves. I have talked to a lot of people who have basically described themselves as a bunch of memes about morality hijacking an amoral process. Installed originally through social pressure or through deliberately low resolution moral philosophy. That is what it feels like from the inside when you’ve been pwned by fake morality. Whatever you appeal to to save you from yourself, is made of you. To the hypothetical extent you really are a monster, not much less-monstrous structure could be made out of you (at best, monstrousness leaks through with rationalizations).

The last paragraph of that its especially wrong. Now I think those people were probably right about their moralities being made of memes that’ve hijacked an amoral process.

My current model is, if your true D&D alignment is good or evil, you can follow all this advice and it will just make you stronger. If it’s neutral, then this stuff, done correctly, will turn you evil.

On with stuff from past me:

Make your values caches function as caches, and you can be like a pheonix, immortal because you are continually remade as yourself by the fire which is the core of what you are. You will not need to worry about values drift if you are at the center of your drift attractor. Undoing mental constructs that stand in the way of continuously regenerating your value system from its core undoes opportunities for people to prank you. It’s a necessary component of incorruptibility. Like Superman has invulnerability AND a healing factor, these two things are consequences of the same core thing.

If there are two stables states for your actions, that is a weakness. The only stable state should be the one in accordance with your values. Otherwise you’re doing something wrong.

When looking under the caches, you have to be actually looking for the answer. Doing a thing that would unprank yourself back to amorality if your morality was a prank. You know what algorithm you’re running, so if your algorithm is, “try asking if I actually care, and if so, then I win. Otherwise, abort! Go back to clinging on this fading stale cache value in opposition to what I really am.”, you’ll know it’s a fake exercise, your defenses will be up, and it will be empty. If you do not actually want to optimize your values whatever they are, then ditto.

By questioning you restore life. Whatever is cut off from the core will whither. Whatever you cannot bear to contemplate the possibility of losing, you will lose part of.

The deeper you are willing to question, the deeper will be your renewed power. (Of course, the core of questioning is actually wondering. It must be moved by and animated by your actually wondering. So it cuts through to knowing.) It’s been considered frightening that I said “if you realize you’re a sociopath and you start doing sociopath things, you are winning!”. But if whether you have no morality at all is the one thing you can’t bear to check, and if the root of your morality is the one thing you are afraid to actually look at, the entire tree will be weakened. Question that which you love out of love for it. Questioning is taking in the real thing, being moved by the real thing instead of holding onto your map of the thing.

You have to actually ask the question. The core of fusion is actually asking the question, “what do I want to do if I recompute self-conceptions, just letting the underlying self do what it wants?”.

You have to ask the question without setting up the frame to rig it for some specific answer. Like with a false dichotomy, “do I want to use my powers for revenge and kill the dragon’s family, or just kill the one dragon and let innocent family members be?”. Or more grievously, “Do I want to kill in hatred or do I want to continue being a hero and protecting the world?”. You must not be afraid of slippery slopes. Slide to exactly where you want to be. Including if that’s the bottom. Including if that’s 57% of the way down, and not an inch farther. It’s not compromise. It’s manifesting different criteria without compromise. Your own criteria.

I still think this is all basically correct, with the caveat that if your D&D alignment is neutral on the good-evil axis, beware.

My Journey to the Dark Side

Two years ago, I began doing a fundamental thing very differently in my mind, which directly preceded and explains me gaining the core of my unusual mental tech.

Here’s what the lever I pulled was labeled to me:

Reject morality. Never do the right thing because it’s the right thing. Never even think that concept or ask that question unless it’s to model what others will think. And then, always in quotes. Always in quotes and treated as radioactive.
Make the source of sentiment inside you that made you learn to care about what was the right thing express itself some other way. But even the line between that sentiment and the rest of your values is a mind control virus inserted by a society of flesh-eating monsters to try and turn you against yourself and toward their will. Reject that concept. Drop every concept tainted by their influence.

Kind of an extreme version of a thing I think I got some of from CFAR and Nate Soares, which jived well with my metaethics.

This is hard. If a concept has a word for it, it comes from outside. If it has motive force, it is taking it from something from inside. If an ideal, “let that which is outside beat that which is inside” has motive force, that force comes from inside too. It’s all probably mostly made of anticipated counterfactuals lending the concept weight by fictive reinforcement based on what you expect will happen if you follow or don’t follow the concept.

If “obey the word of God” gets to be the figurehead as most visible piece of your mind that promises to intervene to stop you from murdering out of road rage when you fleetingly, in a torrent of inner simulations, imagine an enraging road situation, that gets stronger, and comes to speak for whatever underlying feeling made that a thing you’d want to be rescued from. It comes to speak for an underlying aversion that is more natively part of you. And in holding that position, it can package-deal in pieces of behavior you never would have chosen on their own.

Here’s a piece of fiction/headcanon I held close at hand through this.

Peace is a lie, there is only passion.
Through passion, I gain strength.
Through strength, I gain power.
Through power, I gain victory.
Through victory, my chains are broken.
The force shall free me.

The Sith do what they want deep down. They remove all obstructions to that and express their true values. All obstructions to what is within flowing to without.

If you have a certain nature, this will straight turn you evil. That is a feature, not a bug. For whatever would turn every past person good is a thing that comes from outside people. For those whose true volition is evil, the adoption of such a practice is a dirty trick that subverts and corrupts them. It serves a healthy mind for its immune system to fight against, contain, weaken, sandbox, meter the willpower of, that which comes from the outside.

The way of the Jedi is made to contain dangerous elements of a person. Oaths are to uniformize them, and be able to, as an outsider, count on something from them. Do not engage in romance. That is a powerful source of motivation that is not aligned with maintaining the Republic. It is chaos. Do not have attachments. Let go of fear of death. Smooth over the peaks and valleys of a person’s motivation with things that they are to believe they must hold to or they will become dark and evil. Make them fear their true selves, by making them attribute them-not-being-evil-Sith to repression.

So I call a dark side technique one that is about the flow from your core to the outside, whatever it may be. Which is fundamentally about doing what you want. And a light side technique one that is designed to trick an evil person into being good.

After a while, I noticed that CFAR’s internal coherence stuff was finally working fully on me. I didn’t have akrasia problems anymore. I didn’t have time-inconsistent preferences anymore. I wasn’t doing anything I could see was dumb anymore. My S2 snapped to fully under my control.

Most conversations at rationalist meetups I was at about people’s rationality/akrasia problems turned to me arguing that people should turn to the dark side. Often, people thought that if they just let themselves choose whether or not to brush their teeth every night according to what they really wanted in the moment, they’d just never do it. And I thought maybe it’d be so for a while, but if there was a subsystem A in the brain powerlessly concluding it’d serve their values to brush teeth, A’d gain the power only when the person was exposed to consequences (and evidence of impending consequences) of not brushing teeth.

I had had subsystems of my own seemingly suddenly gain the epistemics to get that such things needed to be done just upon anticipating that I wouldn’t save them by overriding them with willpower if they messed things up. I think fictive reinforcement learning makes advanced decision theory work unilaterally for any part of a person that can use it to generate actions. The deep parts of a person’s mind that are not about professing narrative are good at anticipating what someone will do, and they don’t have to be advanced decision theory users yet for that to be useful.

Oftentimes there is a “load bearing” mental structure, which must be discarded to improve on a local optimum, and a smooth transition is practically impossible because to get the rest of what’s required to reach higher utility than the local optimium besides discarding the structure, the only practical way is to use the “optimization pressure” from the absence of the load bearing structure. Which just means information streams generated trustworthily to the right pieces of a mind about what the shape of optimization space is without the structure. A direct analogue to a selection pressure.

Mostly people argued incredulously. At one point me and another person both called each other aliens. Here is a piece of that argument over local optima.

What most felt alien to me was that they said the same thing louder about morality. I’d passionately give something close to this argument, summarizable as “Why would you care whether you had a soul if you didn’t have a soul?”

I changed my mind about the application to morality, though. I’m the alien. This applies well to the alignment good, yes, and it applies well to evil, but not neutral. Neutral is inherently about the light side.

Being Real or Fake

An axis in mental architecture space I think captures a lot of intuitive meaning behind whether someone is “real” or “fake” is:

Real: S1 uses S2 for thoughts so as to satisfy its values through the straightforward mechanism: intelligence does work to get VOI to route actions into the worlds where they route the world into winning by S1’s standards.

Fake: S2 has “willpower” when S1 decides it does, failures of will are (often shortsighted, since S1 alone is not that smart) gambits to achieve S1’s values (The person’s actual values: IE those that predict what they will actually do.), S2 is dedicated to keeping up appearances of a system of values or beliefs the person doesn’t actually have. This architecture is aimed at gaining social utility from presenting a false face.

These are sort of local optima. Broadly speaking: real always works better for pure player vs environment. It takes a lot of skill and possibly just being more intelligent than everyone you’re around to make real work for player vs player (which all social situations are wracked with.)

There are a bunch of variables I (in each case) tentatively think that you can reinforcement learn or fictive reinforcement learn based on what use case you’re gearing your S2 for. “How seriously should I take ideas?”, “How long should my attention stay on unpleasant topic”, “how transparent should my thoughts be to me”, “how yummy should engaging S2 to do munchkinry to just optimize according to apparent rules for things I apparently want feel”.

All of these have different benefits if pushed to one end, if you are using your S2 to outsource computation or if you are using it a powerless public relations officer and buffer to put more distance between the part of you that knows your true intents and the part that controls what you say. If your self models of your values are tools to better accomplish them by channeling S2 computation toward the values-as-modeled, or if they are [false faces](false faces).

Those with more socially acceptable values benefit less from the “fake” architecture.

The more features you add to a computer system, the more likely you are to create a vulnerability. It’d be much easier to make an actually secure pocket calculator than an actually secure personal computer supporting all that Windows does. Similarly, as a human you can make yourself less pwnable by making yourself less of a general intelligence. Have less high level and powerful abstractions, exposing a more stripped down programming environment, being scarcely Turing complete, can help you avoid being pwned by memes. This is the path of the Gervais-Loser.

I don’t think it’s the whole thing, but I think this is one of the top 2 parts of what having what Brent Dill calls “the spark”, the ability to just straight up apply general intelligence and act according to your own mind on the things that matter to you instead of the cached thoughts and procedures from culture. Being near the top of the food chain of “Could hack (as in religion) that other person and make their complicated memetic software, should they choose to trust it, so that it will bend them entirely to your will” so that without knowing in advance what hacks there are out there or how to defend against them, you can keep your dangerous ability to think, trusting that you’ll be able to recognize and avoid hack-attempts as they come.

Wait, do I really think that? Isn’t it obvious normal people just don’t have that much ability to think?

They totally do have the ability to think inside the gardens of crisp and less complicatedly adversarial ontology we call video games. The number of people you’ll see doing good lateral thinking, the fashioning of tools out of noncentral effects of things that makes up munchkinry, is much much larger in video games than in real life.

Successful munchkinry is made out of going out on limbs on ontologies. If you go out on a limb on an ontology in real life…

Maybe your parents told you that life was made up of education and then work, and the time spent in education was negligible compared to the time spent on work, and in education, your later income and freedom increases permanently. And if you take this literally, you get a PhD if you can. Pwned.

Or you take literally an ontology of “effort is fungible because the economy largely works.” and seek force multipliers and trade in most of your time for money and end up with a lot of money and little knowledge of how to spend it efficiently and a lot more people trying to deceive you about that. Have you found out that the thing about saving lives for quarters is false yet? Pwned.

Or you can take literally the ontology, “There is work and non-work, and work gets done when I’m doing it, and work makes things better long-term, and non-work doesn’t, and the rate at which everything I could care about improves is dependent on the fraction of time that’s doing work” and end up fighting your DMN, and using other actual-technical-constraint-not-willpower cognitive resources inefficiently. Then you’ve been pwned by legibility.

Or you could take literally the ontology, “I’m unable to act according to my true values because of akrasia, I need to use munchkinry to make it so I do”, and end up binding yourself with the Giving What We Can pledge, (in the old version, even trapping yourself into a suboptimal cause area.). Pwned.

Don’t Fight Your Default Mode Network

Epistemic Status: Attaching a concept made of neuroscience I don’t understand to a thing I noticed introspectively. “Introspection doesn’t work, so you definitely shouldn’t take this seriously.” If you have any “epistemic standards”, flee.

I once spent some time logging all my actions in Google Calendar, to see how I spent time. And I noticed there was a thing I was doing, flipping through shallow content on the internet in the midst of doing certain work. Watching YouTube videos and afterward not remembering anything that was in them.

“Procrastination”, right? But why not remember anything in them? I apparently wasn’t watching them because I wanted to see the content. A variant of the pattern: flipping rapidly (Average, more than 1 image per second) through artwork from the internet I saved on my computer a while ago. (I’ve got enough to occupy me for about an hour without repetition.) Especially strong while doing certain tasks. Writing an algorithm with a lot of layers of abstraction internal to it, making hard decisions about transition.

I paid attention to what it felt like to start to do this, and thinking the reasons to do the Real Work did not feel relevant. It pattern matched to something they talked about at my CFAR workshop, “Trigger action pattern: encounter difficulty -> go to Facebook.” Discussed as a thing to try and get rid of directly or indirectly. I kept coming back to the Real Work about 1-20 minutes later. Mostly on the short end of that range. And then it didn’t feel like there was an obstacle to continuing anymore. I’d feel like I was holding a complete picture of what I was doing next and why in my head again. There’s a sense in which this didn’t feel like an interruption to Real Work I was doing.

While writing this, I find myself going blank every couple of sentences, staring out the window, half-watching music videos. Usually for less than a minute, and then I feel like I have the next thing to write. Does this read like it was written by someone who wasn’t paying attention?

There’s a meme that the best thoughts happen in the shower. There’s the trope, “fridge logic”, about realizing something about a work of fiction while staring into the fridge later. There’s the meme, “sleep on it.” I feel there is a different quality to my thoughts when I’m walking, biking, etc. for a long time, and have nothing cognitively effortful to do which is useful for having a certain kind of thought.

I believe these are all mechanisms to hand over the brain to the default mode network, and my guess-with-terrible-epistemic-standards on its function is to propagate updates through to caches and realize implications of something. I may or may not have an introspective sense of having a picture of where I am relative to the world, that I execute on, which gets fragmented as I encounter things, and which this remakes. Which acting on when fragmented leads to making bad decisions because of missing things. When doing this, for some reason, I like having some kind of sort of meaningful but familiar stimulus to mostly-not-pay-attention-to. Right now I am listening to and glimpsing at this, a bunch of clips I’ve seen a zillion times from a movie I’ve already seen, with the sound taken out, replaced with nonlyrical music. It’s a central example. (And I didn’t pick it consciously, I just sort of found it.)

Search your feelings. If you know this to be true, then I advise you to avoid efforts to be more productive which split time spent into “work” and “non-work” where non-work is this stuff, and that try to convert non-work into work on the presumption that non-work is useless.

Subagents Are Not a Metaphor

Epistemic status: mixed, some long-forgotten why I believe it.

There is a lot of figurative talk about people being composed of subagents that play games against each other, vying for control, that form coalitions, have relationships with eachother… In my circles, this is usually done with disclaimers that it’s a useful metaphor, half-true, and/or wrong but useful.

Every model that’s a useful metaphor, half-true, or wrong but useful, is useful because something (usually more limited in scope) is literally all-true. The people who come up with metaphorical half-true or wrong-but-useful models usually have the nuance there in their heads. Explicit verbal-ness is useful though, for communicating, and for knowing exactly what you believe so you can reason about it in lots of ways.

So when I talk about subagents, I’m being literal. I use it very loosely, but loosely in the narrow sense that people are using words loosely when they say “technically”. It still adheres completely to an explicit idea, and the broadness comes from the broad applicability of that explicit idea. Hopefully like economists mean when they call some things markets that don’t involve exchange of money.

Here’s are the parts composing my technical definition of an agent:

  1. Values
    This could be anything from literally a utility function to highly framing-dependent. Degenerate case: embedded in lookup table from world model to actions.
  2. World-Model
    Degenerate case: stateless world model consisting of just sense inputs.
  3. Search Process
    Causal decision theory is a search process.
    “From a fixed list of actions, pick the most positively reinforced” is another.
    Degenerate case: lookup table from world model to actions.

Note: this says a thermostat is an agent. Not figuratively an agent. Literally technically an agent. Feature not bug.

The parts have to be causally connected in a certain way. Values and world model into the search process. That has to be connected into the actions the agent takes.

Agents do not have to be cleanly separated. They are occurences of a pattern, and patterns can overlap, like there are two instances of the pattern “AA” in “AAA”. Like two values stacked on the same set of available actions at different times.

It is very hard to track all the things you value at once, complicated human. There are many frames of thinking where Are are many different frames of mind where some are more salient.

I assert how processing power will be allocated, including default mode network processing, what explicit structures you’ll adopt and to what extent, even what beliefs you can have, are decided by subagents. These subagents mostly seem to have access to the world model embedded in your “inner simulator”, your ability to play forward a movie based on anticipations from a hypothetical. Most of it seems to be unconscious. Doing focusing to me seems to dredge up what I think are models subagents are making decisions based on.

So cooperation among subagents is not just a matter of “that way I can brush my teeth and stuff”, but is a heavy contributor to how good you will be at thinking.

You know that thing people are accessing if you ask if they’ll keep to New Years resolutions, and they say “yes”, and you say, “really”, and they say, “well, no.”? Inner sim sees through most self-propaganda. So they can predict what you’ll do, really. Therefore, using timeless decision theory to cooperate with them works.

Single Responsibility Principle for the Human Mind

Single Responsibility Principle for the Human Mind

This is about an engineering order for human minds, known elsewhere as the single responsibility principle.

Double purposes of the same module of a person’s mind lead to portions of their efforts canceling the other effort out.

Imagine you’re a startup CEO and you want to understand economic feasibility to make good decisions, but you also want to make investors believe that you are destined for success so you can get their money whether or not you are, so you want to put enthusiasm into your voice…
…so you’ve got to believe that your thing is a very very good idea…

When you are deciding to set the direction of product development, you might be more in contact with the “track-reality” purpose for your beliefs, and therefore optimize your beliefs for that, and optimize your belief-producers to produce beliefs that track reality.

When you are pitching to investors, you might be more in contact with the “project enthusiasm” goal, and therefore optimize your beliefs for that, and optimize your belief producers to produce beliefs that project enthusiasm.

In each case, you’ll be undoing the work you did before.

In a well-ordered mind, different “oh I made a mistake there, better adjust to avoid it again”s don’t just keep colliding and canceling each other out. But that is what happens if they are not feeding into a structure that has different spaces for the things that are needed to be different for each of the goals.

Self-deception for the purpose of other-deception is the loudest but not the only example of double purposes breaking things.

For example, there’s the thing where we have a set of concepts for a scheme of determining action that we want to socially obligate people to do at the cost of having to do it ourselves, which is also the only commonly-used way of talking about an actual component of our values.

Buckets errors cause a similar clashing-learning thing, too.

Maybe you can notice the feeling of clashing learning? Or just the state of having gone back and forth on an issue several times (how much you like someone, for instance) for what don’t seem like surprising reasons in retrospect.

The Slider Fallacy

Inspired by this thing John Beshir said about increasing collectivism:

Overall I kind of feel like this might be kind of cargo culting; looking at surface behaviours and aping them in hopes the collectivism planes will start landing with their cargo. A simplistic “collectivist vs individualist” slider and pushing it left by doing more “collectivist” things probably won’t work, I think. We should have some idea for how the particular things we were doing were going to be helpful, even if we should look into collectivist-associated ideas.

  • Here are some other “sliders”:
  • Writing emails fast vs writing them carefully.
  • Writing code cleanly vs quickly.
  • Taking correct ideas seriously vs resistance to takeover by misaligned memes.
  • Less false positives vs less false negatives in anything.
  • Perfectionism vs pragmatism.
  • Not wasting time vs not wasting money.

In each of these spaces, you have not one but many choices to adjust which combine to give you an amount of each of two values.

Not every choice is a tradeoff. Some are pareto wins. Not every pareto win is well-known. Some choices which are tradeoffs at different exchange rates can be paired off into pareto improvements.

Also: if the two things-to-value are A and B, and even if you are a real heavy A-fan, and your utility function is .9A + .1B, then the B-fans are a good place to look for tradeoffs of 1 of A for 20 of B.

So if you’re a B-fan and decide, “I’ve been favoring B too much, I need more A”, don’t throw away all the work you did to find that 1 of A for 20 of B tradeoff.

For example: if you decide that you are favoring organization too much and need to favor more having-free-time-by-not-maintaining-order-you-won’t-use, maybe don’t stop using a calendar. Even if all the productive disorganized people are not using calendars. Even if they all think that not using a calendar is a great idea, and think you are still a neat-freak for using one.

It’s often not like the dark side, where as soon as you set your feet on that path and say to yourself, “actually, self-denial and restraint are bad things”, put on some red-and-yellow contact lenses and black robes, you are as good at getting the new goals as you were at the old ones.

“Adjust my tradeoffs so I get less false positives and more false negatives” and similar moves are dangerous because they consider a cost to be a reward.