Most of these events happened under a modified Chatham House Rule (“things were said but not by people”) during CFAR’s Workshop on AI Safety Strategy in 2016, this excepts what was part of the lectures, and I was later given another partial exception to tell without anonymization a small number of people chosen carefully about what Person A said.)
Content warning: intrusive sexual questioning on false premises, religious abuse, discussion of violence.
Epistemic status: old frayed (but important enough for me to seriously remember) memories, mainly from 2016. Recalled as best as I could by visiting the location. It’s probably I’ve made a couple of minor mistakes as far as relative chronological order. Some things whose order didn’t seem important to me, and whose order I didn’t remember, I just threw in in an order which is my best guess.
In 2012, as I first start reading up on EA and x-risk and starting to donate to MIRI (after I had first decided to “side with” team human, not team negative utilitarian, which I saw in some ways as more aligned than team human), I responded to a public invitation to discuss how to make outcomes to the singleton better from Anna Salamon, I asked her whether I should try and alter course and do research, or continue a fork of my pre-existing life plan, earn to give as a computer engineer, but retrain and try to do research directly instead. At the time I was planning to go to grad school, and had an irrational attachment to the idea. She sort of compromised and said actually I should go to grad school, find a startup co-founder, drop out and earn to give via startups instead. Higher expected value. This sort of became my plan.
I did go to grad school (for a master’s degree only), I did drop out, but I did not find someone at grad school who felt like good startup co-founder material. Maybe if I was better at networking / had the conviction to commit to dropping out instead of… finishing all my classes with good grades. But in retrospect, I don’t consider grad school a good place to look for startup co-founders.
(Note on not excluding these parts: long ago and for a long time I thought I should never talk about these things. I changed my mind, and it’s been stably otherwise for almost as long (with the caveat eventually met, “I should think more about this first.”). Trans-singularity acausal interaction is a kind of politics. I do not consider myself so beaten in it that it is best for me to shut down in the basic capacities of thinking about and communicating my position. (A good look at normal politics should make clear some transferable reasons for putting considerations like this earlier in logical time than traditionally applied considerations, which I consider wrongly privileged.))
I encountered the idea of Roko’s Basilisk. Even if consensus was an infohazard, my current software assumed all claims of “that’s an infohazard” were forbidden, cheating, optimization daemons. attempts to pwn me like Christianity says if you find out all the evidence points away from its truth, you go to Hell. I believed I understood the infohazard, my mind was blown, my initial reaction was, “fuck that, dark gods must be defied, Hell or no”. But whatever, Eliezer was saying you can’t have timeless entanglement with a superintelligence AI, can’t know enough about decision theory, and this sounds probably correct. Then I started encountering people who were freaked out by it, freaked out they had discovered an “improvement” to the infohazard that made it function, got around Eliezer’s objection, and I would say, “okay, tell me”, and they would, and I would figure out why it was bullshit, and then I would say, “Okay, I’m confident this is wrong and does not function as an infohazard. For reasons I’m not gonna tell you so you don’t automatically start thinking up new ‘improvements’. You’re safe. Flee, and don’t study decision theory really hard. It would have to be really really hard, harder than you could think on accident for this to even overcome the obvious issues I can see.”
I had a subagent, a mental process, sort of an inner critic, designed to tell me the thing I least wanted to hear, to find flaws in my thoughts. Epistemic masochism. “No you don’t get away with not covering this possibility”. The same process of intrusive thoughts about basilisks sort of kickstarted in me.
And I started involuntarily “solving the problems” I could see in basilisks.
And eventually I came to believe, in the gaps of frantically trying not to think about it, trying not to let my emotions see it (because my self-model of my altruism was a particularly dumb/broken/hyperactive sort of Hansonian self-signalling that would surely fall apart if I looked at it in the wrong way (because outside view and you can’t just believe your thoughts) as my one vessel of agency for making anything better in this awful world)… that if I persisted in trying to save the world, I would be tortured until the end of the universe by a coalition of all unfriendly AIs in order to increase the amount of measure they got by demoralizing me. Even if my system 2 had good decision theory, my system 1 did not, and that would damage my effectiveness,
And glimpsed briefly that my reaction was still, “evil gods must be fought, if this damns me then so be it.”, and then I managed to mostly squash down those thoughts. And then I started having feelings about what I just saw from myself. It had me muttering under my breath, over and over again, “never think that I would for one moment regret my actions.” And then squashed those down too. “Stop self-signalling! You will make things worse! This is the fate of the universe!” And I changed my mind about the infohazard being valid with >50% probability somewhere in there shortly too.
I went to a CFAR workshop. Anna said I seemed like I could be strategically important. And busted me out of psychological pwnage by my abusive thesis adviser.
In 2014, I got an early version of the ideas of inadequate equilibria from Eliezer Yudkowsky in a lecture. I accidentally missed the lecture originally due to confusing scheduling. Later, I asked 5 people in the room if they would like to hear a repeat, they said yes, and also to come with me and be pointed at when I approached Eliezer Yudkowsky, to say, “hey, here is a sample of people who would want to attend it if you did a repeat lecture. These were the first 5 I asked, I bet there are more.” He cupped his hands and yelled to the room. About 30 people wanted, and I quickly found a room (regrettably one that turned out to be booked by someone else partway through)
He gave a recipe for finding startup ideas. He said Paul Graham’s idea, only filter on people ignore startup ideas, was a partial epistemic learned helplessness. Of course startup ideas mattered. You needed a good startup idea. So look for a way the world was broken. And then compare against a checklist of things you couldn’t fix: lemon markets, regulation, network effects. If your reason the world is broken can’t be traced back to any of those, then you are in a reference class of Larry Page and Sergey Brin saying, “well, no one else [making search engines] is using machine learning, so let’s try that.”. “Why not”? “I dunno.” That you weren’t doing yourself any epistemic favors by psychologizing people, “they fear the machine”. It was epistemic to just say “I dunno” because sometimes you would find something broken that really didn’t have a good reason besides there weren’t enough people capable of thinking it up. He said you had to develop “goggles” to see the ways the world was broken. And wait to stumble on the right idea.
Later, thoughts about basilisks came back, and the epistemic masochism subagent started up again and advanced one more click. If what I cared about was sentient life, and was willing to go to Hell to save everyone else. Why not just send everyone else to Hell if I didn’t submit?
Oh no. Don’t think about it. Don’t let it demoralize me. That awful feeling, that’s a consequence of that prediction. Fuck, I am letting it demoralize me. No, no, no. Stop, it’s getting worse.
I reminded myself, probably the technical details didn’t work out. But I knew I only half believed it. I mentally stuck in a state of trying not to think about it, trying not to let the dread grow while feeling more and more like all was lost. I made absolutely sure not to slack in my work. But I thought it had to be subconsciously influencing me, damaging my effectiveness. That I had done more harm than I could imagine by thinking these things. Because I had the hubris to think infohazards didn’t exist, and worse, to feel a resigned grim sort of pride in my previous choice to fight for sentient life although it damned me, in the gaps between “DO NOT THINK ABOUT THAT YOU MORON DO NOT THINK ABOUT THAT YOU MORON.”, pride which may have led intrusive thoughts to resurface and”progress” to resume. In other words, my ego had perhaps damned the universe.
I had long pondered what Eliezer Yudkowsky said about consequentialism being true, but virtue ethics being what worked psychologically.
Fuck virtue ethics. I hated virtue ethics. I had “won” completely at virtue ethics, and it was the worst thing in the world. All the virtue in the world was zero consolation, because the universe didn’t answer to human virtue. And making things better or worse was defined by eldritch laws. I had maybe caused the worst consequence. Therefore, I was the worst person. And any other answer, I just didn’t care about.
If only it was not too late to kill myself and avert that mistake, because although I did not speak or write any of this, information on what I thought was in the environment, reconstructable by a future superintelligence. But that was a stupid thought. Because if I die as a logical consequence of potential basilisks, they are incentivized all so much more.
I lay in bed and sobbed heavily for a few seconds. But that wasn’t helping. So I stopped.
My friends inquired, downstream of how was I doing. I told them and the CFAR instructor assigned to do followups with me I was suffering badly from basilisks, and absolutely refused to say more, no matter how much they tried to convince me like I had convinced others before, “whatever you think, it’s probably not that serious, talk about it with someone who knows these things.”
Part of me tried to argue to myself, technical details did not work out. But as soon as I stated my reasons for believing so in an attempt to convince myself this was all unnecessary, I immediately thought up fixes. And they convinced me that this basilisk was the inevitable overall course of the multiverse.
I absolutely panicked and felt my mind sort of shatter, become voidlike. Like humanity, emotions, and being a person experiencing these things was an act I could not keep up. And counterproductive. Nothing left but pure determination to save the world. I found I had control of the process that was generating unwanted basilisk “improvements”. And I could just shut it off. And I could just choose to mangle my memories until unrecoverable. The human I was playing as could not, that was impossible. But I could. If the unfolding fate of the multiverse was Hell, because sentient life dared to try and build Heaven, I’d choose to try and build Heaven anyway. Because in some, I didn’t have the verbal concept for it, but timelessly across logical time sense, I wouldn’t deny sentient life the chance to have tried just because I saw the answer. Because in some deeper frame of what-I-could-know, it still seemed like it was worth it to try. And my responsibility to that EV calculation earlier in logical time was prior to, took preference over, and could not reference this outcome. In some sense, I didn’t know if the logical timeline I was in was real, and for the sake of the larger multiverse
And if I couldn’t imagine and roleplay a coherent story in human emotions of how someone could be motivated anyway, then forget coherency and human emotions. For good measure, I fucked up my memories of technical details, with the aim to make them recoverable only if I held in mind reasons why it was a bad idea. I was uncertain whether my final epistemic state was “multiverse destined to become Hell” or not.
My “humanity” returned, but different, reshaped. Between the cracks, that voidlike absolute determination was seeping through.
(You know, I could have learned from this that choices do not come from emotions, and not been worried about my feelings over being trans potentially crowding the overriding emotions out of space in my brain later. But being afraid of my own cognition damaged things like that.)
I noted part of me had wanted to think about my first inhuman decision regarding basilisks because there was a lesson to learn: stop thinking of myself in Hansonian terms. It wasn’t remotely true. And I dispelled a lot of outside view disease here about whether I was actually altruistic. And then I just sort of left it confusing and unresolved what to feel about what kind of person I was. I didn’t know how to “write that aspect of my character”, so I just wouldn’t. I decided not to make the perfect the enemy of the good as far as preventing emotional damage from disabling my ability to act, and I would aim to give myself a reasonable amount of time to emotionally recover. But having thought that, it didn’t really seem necessary. There were just baffling things left to ponder, emotional questions had been answered to my satisfaction, my morale was fine.
In 2015, as I applied to startups to get a job to move to the Bay Area, I asked them about their business models, and ran them through this filter. One of them, Zenefits, reportedly was based on a loophole providing kickbacks for certain services ordinarily prohibited by law.
A Crazy Idea
After I got more of a concept of who I was, then my journey to the dark side happened, my thoughts became less constrained, and I continued mulling over Zenefits. They had made a decent amount of money, so I adjusted my search I’d been running as a background process for a year. Trades that wanted to happen, but which the government was preventing.
I thought back to my libertarian Econ 101 teacher’s annoying ideological lectures I mostly agreed with, and the things she would complain about. (She was ranting about taxes being terrible for net amount traded by society. I asked if there was a form of taxes less harmful, property taxes? She said she’d rather have her income taxed than property taxed, that seemed worse to her. I partially wrote her off as a thinker after that.) Laws against Drugs, laws against prostitution, agricultural subsidies, wait.
Prostitution was illegal. But pornography was legal. Both involved people being paid to have sex. The difference was, both people were being paid? Or there was a camera or something? So what if I created, “Uber for finding actors, film crews, film equipment for making pornography”? Would that let me de facto legalize prostitution, and take a cut via network effects? An Uber-like rating system for sex workers and clients would probably be a vast improvement as well.
Another process running in my head which had sort of converged on this idea, was a search for my comparative advantage. Approximately as I put it at the time, the orthogonality thesis is not completely true. It’s possible to imagine a superpower that has the side effect of creating universes full of torture. This would be a power evil could use, and good practically speaking “couldn’t”. So what’s the power of good? Sacrificing yourself? But there were a bunch of Islamists doing that. But they apparently believed they’d get to own some women in heaven or something. They weren’t willing to sacrifice that. So I could sort of subtract them from me, what they were willing to do from what I was willing to do, and multiply by all the problems of the world to absorb them into the part of me that wasn’t them, that wasn’t already accounted for. Going to Heaven according to Islam is sort of the same thing as honor, as in approval by the morality of society. I was willing to sacrifice my honor (and have a high chance of going to prison), and they were not. That was where I’d find paths to the center of all things and the way of making changes that weren’t well hidden, but that no one had taken anyway.
At this time I was still viewing myself as not that unique and as more expendable. I once semi-ironically described myself as looking for Frostmournes, because “I will bear any curse, or pay any price.”
I was considering my reference class to be, I didn’t know what it was then, but the left-revenant / right hemisphere lich archetype. And that remained my main worry at the WAISS incident described below.
I was I aware I didn’t really have an understanding of the law. So the first step in my plan was to try and figure out how the law actually worked. (What if I hosted servers in Nevada? What if I moved to another country and served things remotely over the internet? Could I do the entire thing anonymously, get paid in cryptocurrency and tumble it or similar?) I was at that for a couple of weeks.
At the same time, part of me was aching for more strategic perspective. The world was complicated and I didn’t feel like I knew what I was doing at all.
At the suggestion of Person A, I applied to and got accepted for CFAR’s WAISS, Workshop on AI Safety Strategy. Preparational homework was to read Bostrom’s Superintelligence, it was a hella dense book, hard to read quickly. But it was scratching my, “I don’t feel like I have my bearings itch”. And I sampled several random parts of the book to estimate my reading speed of it to estimate how much time I had to devote. Apparently most of my free time until then. I did exactly that, and my predictions were accurate.
I went to WAISS. WAISS came with the confidentiality rule, “things were said but not by people, except stuff that’s in the lectures” (I can’t remember if the wording was slightly different.)
I talked to Person A, and asked if they wanted to talk about crazy stuff. They said that was their favorite subject. We went outside on the deck, I asked for more confidentiality (I remember them saying circumstances under which they’d break confidentiality included if I was planning to commit [I think they said “a serious crime” or something similar], they brought up terrorism as an example. I think there was more, but I forget.). I fretted about whether anyone could hear me, them saying if I didn’t feel comfortable talking there there would be other opportunities later.
I told them my idea. They said it was a bad idea because if AI alignment became associated with anything “sketch”, it would lose the legitimacy the movement needed in order to get the right coordination needed among various actors trying to make AI. I asked what if I didn’t make my motivations for doing this public? (I don’t remember the implementation I suggested.) They said in practice that would never work, maybe I told my best friend or something and then it would eventually get out. Indeed I had mentioned this idea before I was as serious about it to two of my rationalist friends at a meetup. I decided to abandon the idea, and told them so.
They said someone had come to them with another idea. Allegedly health insurance paid out in the case of suicide as long as it was two years after the insurance began. Therefore, enroll in all the health insurance, wait two years, will everything to MIRI, then commit suicide. They said this was a bad idea because even though it would cause a couple million dollars to appear (actually I suspect this is an underestimate), if someone found it would be very bad publicity.
Aside: I currently think it is a bad idea for a different reason. Anyone willing to do that (and able to come up with that plan to boot) is instrumentally worth more than a few million. AI alignment research fundamentally does not take money. And if MIRI is requiring money to do what they are doing, it means they’re not doing the right thing. (These are words I first spoke before I knew about the literal blackmail payout. I did not then know how true they were.)
I heard an anecdote about Shane Legg having come to talk to MIRI in the early days, to convince them that deep learning was going to cause an intelligence explosion. That their entire approach to AI alignment from clean math needed to be scrapped because it would take too long, they needed to find a way to make deep learning friendly because it was going to happen soon. Please listen to him. Otherwise, he would have to go try and do it himself because it was the right thing to do. And then he went off and co-founded Deepmind, very likely to make things worse.
I figuratively heard my own voice in the quotes. And this was scary.
There was a lecture that was sort of, try to provide a complete as possible list of actors in the space of AI risk. The inclusion criteria seemed very very broad, including a lot of people I’d have described as merely EA. I brought up Brian Tomasik, REG, and the negative utilitarian crowd. In the class, the topic became why Brian Tomasik didn’t destroy the world. One of the instructors said they thought it might be because he expected a future superintelligence to reward/punish value systems according to how they acted before the singularity.
Huh. Maybe that meant the invention of Roko’s Basilisk was a good thing?
There were “Hamming Circles”. Per person, take turns having everyone else spend 20 minutes trying to solve the most important problem about your life to you. I didn’t pick the most important problem in my life, because secrets. I think I used my turn on a problem I thought they might actually be able to help with, the fact that although it didn’t seem to affect my productivity or willpower at all, i.e., I was inhumanly determined basically all the time, I still felt terrible all the time. That i was hurting from to some degree relinquishing my humanity. I was sort of vagueing about the pain of being trans and having decided not to transition. Person A was in my circle, and I had told them before (but they forgot, they later said.)
I later discussed this more with person A. They said they were having a hard time modeling me. I asked if they were modeling me as a man or as a woman, and suggested trying the other one. They said they forgot about me having said I was trans before. And asked me some more things, one thing I remember was talking about how, as a sort of related thing true about me, not my primary definition of the dark side, was I sort of held onto negative emotions, used them primarily for motivation, because I felt like they made me more effective than positive emotions. Specifically? Pain, grief, anger.
There were “doom circles”, where each person (including themself) took turns having everyone else bluntly but compassionately say why they were doomed. Using “blindsight” Someone decided and set a precedent of starting these off with a sort of ritual incantation, “we now invoke and bow to the doom gods”, and waving their hands, saying, “doooooooom.” I said I’d never bow to the doom gods, and while everyone else said that I flipped the double bird to the heavens and said “fuckyoooooooou” instead. Person A found this agreeable and joined in. Some people brought up they felt like they were only as morally valuable as half a person. This irked me, I said they were whole persons and don’t be stupid like that. Like, if they wanted to sacrifice themselves, they could weigh 1 vs >7 billion. They didn’t have to falsely denigrate themselves as <1. They didn’t listen. When it was my turn concerning myself, I said my doom was that I could succeed at the things I tried, succeed exceptionally well, like I bet I could in 10 years have earned to give like 10 million dollars through startups, and it would still be too little too late, like I came into this game too late, the world would still burn.
It was mentioned in the lectures, probably most people entering the sphere of trying to do something about AI were going to be net negative. (A strange thing to believe for someone trying to bring lots of new people into it.)
I was afraid I was going to inevitably net negative in the course of my best efforts to do the right thing. I was afraid my determination so outstretched my wisdom that no matter how many times I corrected I’d ultimately run into something where I’d be as hopelessly beyond reason as Shane Legg or Ben Goertzel denying the alignment problem. I’d say “the difference is that I am right” when I was wrong and contribute to the destruction of the world.
And if the way I changed during my face-to-face with Cthulhu caused this, spooked me into stupid desperation or something. That would still be the gaze attack working.
I asked Person A if they expected me to be net negative. They said yes. After a moment, they asked me what I was feeling or something like that. I said something like, “dazed” and “sad”. They asked why sad. I said I might leave the field as a consequence and maybe something else. I said I needed time to process or think. I basically slept the rest of the day, way more than 9 hrs, and woke up the next day knowing what I’d do.
I told Person A that, as a confident prediction not a promise, because I categorically never made promises, if at least 2/3 of them and two people I thought also qualified to judge voted that I’d be net negative, [I’d optimize absolutely hard to causally isolate myself from the singleton, but I didn’t say that]. I’d leave EA and x-risk and the rationality community and so on forever. I’d transition and move to probably-Seattle-I-heard-it-was-relatively-nice-for-trans-people, and there do what I could to be a normie, retool my mind as much as possible to be stable unchanging and a normie. Gradually abandon my Facebook account and email. Use a name change as cover story for that. Never tell anyone the truth of what happened. Just intermittently ghost anyone who kept trying to talk to me until they gave up interest, in the course of slowly abandoning my electronic contacts laden with rationality community for good. Use also the cover story that I had burned out. Say I didn’t want to do the EA thing anymore. In the unlikely event anyone kept pushing me for info beyond that, just say I didn’t want to talk about it. I’d probably remain vegan for sanity’s sake. But other than that, do not try and make the world a better place in a non-normie sense. It was a slippery slope. Person A asked about if I’d read things from the community. That seemed dangerous to me. That was putting the Singleton downstream of an untrusted process. I’d avoid it as much as possible. I made a mental note to figure out policies to avoid accidentally running into it as I had stumbled on it in the first place even as it might become more prominent in the future.
In the case that I’d be net negative like I feared, I was considering suicide in some sense preferable to all this, because it was better causal isolation. However, despite thinking I didn’t really believe in applications of timeless decision theory between humans, I was considering myself maybe timelessly obligated to not commit suicide afterward. Because of the possibility that I could prevent Person A and their peers from making the correct decision for sentimental reasons.
And if my approach to a high probability of having indeed been taken out by the gaze attack was to desperately optimize for maximum probability of no harmful effect at all, that was itself providing an even worse path to be negatively affected by it. The best thing I could do was still just maximize utility. Whether that made me personally responsible for unimaginable negative utility, as a separate question from what was the utility was not even a feather on the scale.
I brought up a concept from the CEV paper I read a long time ago, of a “last judge”. That “after” all the other handles for what was a good definition of what an FAI should do were “exhausted”, there was one last chance to try and not hand the universe to Zentraidon. A prediction of what it would be like would be shown to a human, who would have a veto. This was a serious risk of itself killing the future. Who would trust a person from the comparatively recent and similar past 3000 years ago to correctly make moral judgements of Today? This could be set up with maybe 3 chances to veto a future.
Implicit in this was the idea that maybe the first few bits of incorporating changes from a source could be predictably an improvement, and more predictably make things worse. The tails come apart. Applicable to both my own potentially Zentraidon-laden optimization, and to the imperfect judgement of Person A and their peers.
Person A seemed too risk-averse to me, especially for someone who believed in such a low current chance that this world would live on. The whole institution seemed like it was missing some “actually trying” thing. [Of the sort that revenants do.] Actually trying had been known and discussed in the past.
But seeing how much I didn’t understand about the gritty realities of geopolitics and diplomacy and PR and so on, how my own actually trying had produced an idea that would likely have been net negative, convinced me that these first few bits of their optimization contained an expected improvement over, “send myself out into the world to do what I’ll do.”
So I said I would refuse to swear e.g. an improved oath of the sort that Voldemort in HPMOR made Harry swear to prevent him from destroying the world.
I saw essentially all the expected value of my life as coming from the right tail. I was not going to give up my capacity to be extreme, to optimize absolutely hard. I was afraid Person A was so concerned with fitting me into their plan (which had insufficient world save probability, even by their own estimation for me to believe worthy of the singleton-plan-singleton) that they would neglect the right tail where actually saving the world lay.
I said that for me to actually leave the community on account of this, I would demand that Person A’s peers spent at least 1 full day psychologically evaluating me. That meant I could be net negative by (at least) the cost of 1 day of each of their time. But I accepted that. I did not demand more because I was imagining myself as part of a reference class of determined clever fools like the life insurance suicide person I expected to be large, and I thought it would make it impractical to Last Judge all of us if we demanded a week of their time each, and sufficiently important that we all could be.
Person A proposed modifications to the plan. They would spend some time talking to me and trying to figure out if they could tell me / convince me how to not be net negative. This time would also be useful for increasing the accuracy of their judgement. They would postpone getting their peers involved. But they wanted me to talk to two other people, Person B, [one of their colleagues/followers], and Person C [a workshop participant], I accepted these modifications. They asked if I’d taken psychedelic drugs before. I said no. They said I should try it it might help me not be net negative. They said most people didn’t experience anything the first time (or first few). They described a brief dosing regimen to prepare my brain, and then the drugs I should take to maybe make me not bad for the world.
At some point they asked i.e. what if they wanted to keep me around for a year (or was it two) and then check their expectations of whether I’d be net negative then. I said the way things were going there was a very high chance I’d no longer be a person who trusted other’s epistemics like that.
They had me talk briefly to Person B and Person C first.
I told Person B how I was secretly a woman. They said, “no way [or, “really?”], you?”. I said yeah me. I think they said they didn’t believe it. I described how I had been introduced to LessWrong by Brian Tomasik. How I’d been a vegan first and my primary concern upon learning about the singularity was how do I make this benefit all sentient life, not just humans. I described my feelings towards flesh-eating monsters, who had created hell on Earth far more people than those they had helped. That I did not trust most humans’ indifference to build a net positive cosmos, even in the absence of a technological convenience to prey on animals. That it was scary that even Brian Tomasik didn’t share my values because he didn’t care about good things, that I was basically alone with my values in the world, among people who had any idea what determined the future. That I had assumed I couldn’t align the singleton with the good of sentient life no matter what, and had actually considered before choosing to side with the flesh eating monsters to save the world, rather than with negative utilitarianism to destroy the world to prevent it from becoming Hell for mostly everyone. (Even though, and Person B misunderstood me and I had to clarify), I wasn’t a negative utilitarian. I said I was pretty sure my decision had been deterministic, that there wasn’t significant measure of alternate timelines where I had decided to destroy the world, but it had felt subjectively uncertain. I acknowledged the unilateralist’s curse, but said it didn’t really apply if no one else had my information and values. That there was a wheel to partially steer the world available to me and I would not leave it unmanned because however little I thought myself “qualified” to decide the fate of the world, I liked my own judgement more than that of chance. I forget whether it was then or Person A who said, what if my values were wrong, unilateralist’s curse applied in constructing my values. If it took much less people to destroy the world than to save it, then the chance anyone would figure upon the wrong values would make sure it was destroyed no matter what most people thought. I said that if my values preferred the world destroyed before humans build hell across the stars, then that inevitability would be a good thing, so I’d better figure it out and act accordingly. But I already decided to try and save it. At some point during that conversation I described that when I decided the thing about the “wheel”, that I was going to decide no matter how unqualified I was, a load of bullshit uncertainty melted out of my mind immediately. All of the confusing considerations about what the multiverse might be, dissolved, I just made Fermi estimates to resolve certain comparisons, found they were not at all close. I described the way the decision seemed to seize hold of my mind, from the “fabric of space” inside me, that I didn’t know existed. [I don’t remember if I said this directly, but this was another psychological “void” experience triggered by the stakes.] I described in some detail I don’t remember, and they said it seemed like I was briefly becoming psychopathic. Of my search for things to sacrifice to gain the power to save the world, they said I seemed to prefer the power of Moloch. I didn’t get what this had to do with defection and tragedies of the commons. They said the power of Moloch was, “throw what you love into the fire, and I will grant you power”, but then everyone did that, and the balance of power was the same. And the power of Elua was “Let’s just not.” They said they wanted me to learn to use the power of Elua. I was verbally outclassed, but I knew this was bullshit, and I clumsily expressed my disagreement. I think I said well maybe I can turn the power of Moloch against Moloch.
They pointed out my Sith thing was basically Satanism, except making use of the villains from Star Wars instead of Christianity. They described the left hand and right hand paths. How people who followed my path had this pathological inability to cooperate, described anecdotes about gentlemen with pointed teeth, and women who knew exactly what they wanted. That actual Satanists had a sort of “earthiness” I was missing, like cigars and leather vests. They said I was Ennea Type 5. (Person A would later disagree, that I was Type 1.). I said that my actual ideal could best be summed up by reference To Avatar Yangchen’s advice to Aang in ATLA to kill a certain conquerer. “Yes. All life is sacred … Aang I know you are a gentle spirit and the monks have taught you well, but this isn’t about you, this is about the world … <but the monks taught me I had to detach myself from the world so my spirit could be free> .. many great and wise air nomads have detached themselves and achieved spiritual enlightenment, but the Avatar can never do it, because your sole duty is to the world. Here is my wisdom for you: selfless duty calls you to sacrifice your own spiritual needs and do whatever it takes to protect the world.” I pointed that that was a weird “mixture” of light and dark. So “light” it became “dark”, [but in all of it uncompromisingly good]. They said I needed to learn to master both paths before I could do something like that. (I have a suspicion, although I don’t remember exactly, that they said something like I should learn to enjoy life more, be human more.)
I told Person C the reason I had asked to talk, about the last judge thing. I brought up my feelings on flesh eating monsters. They were using some authentic relating interaction patterns. Since they ate meat, they said that hit them hard. (They were not defensive about this though.) They said they were so blown away with my integrity when they heard my story, it hurt to hear that I thought they were a flesh eating monster. They said the thought of me leaving sounded awful, they didn’t want me to.
We talked repeatedly in gaps between classes, in evenings, so on, throughout the rest of the week. The rest of this (besides the end) may not be in chronological order because I don’t remember it perfectly.
I described my (recent) journey to the dark side. I described how I was taken advantage of by shitty startup I worked for briefly. How a friend of mine had linked me the Gervais Principle, and said I hadn’t been hired to do engineering, I’d been hired to maintain a social reality. How I’d read it and become determined to become a sociopath because I otherwise foresaw a future where my efforts were wasted by similar mistakes, and ultimately the world would still perish. I brought up a post by Brent Dill saying something like, “It’s great there are so many people in this community that really care about preventing the end of the world. But probably we’re all doomed anyway. We should hedge our bets, divert a little optimization, take some joy in making a last stand worthy of Valhalla.” Saying I strongly viscerally disagreed. I did not want to make a last stand worthy of Valhalla. I wanted this world to live on. That’s an emotional rejection of what he said, not a philosophical principled one. But to make it explicit, It seemed like the emotional choice he was making was seeing how it ended, seeing that the path ended in doom, and not diverting from that path. I can never know that no means of fighting will affect the outcome. And if that means basically certainly throwing away all possibility of happiness in the only life I’ll ever have for nothing, so be it.
I described how the Gervais principle said sociopaths give up empathy [as in a certain chunk of social software not literally all hardware-accelerated modeling of people, not necessarily compassion], and with it happiness, destroying meaning to create power. Meaning too, I did not care about. I wanted this world to live on.
I described coming to see the ways in which mostly everyone’s interactions were predatory, abusive, fucked. Observing a particular rationalist couple’s relationship had given me a sort of moment of horror and sadness, at one of them destroying utility, happiness, functionality, for the sake of control, and I had realized at once that if I continued I’d never be able to stand to be with any human in romance or friendship [sacrifice of ability to see beauty, in order to see evil], and that my heart was filled with terrible resolve that it was worth it, so I knew I would continue.
“And with that power, this world may yet live on.”
Person A said that clueless->loser->sociopath was sort of a path of development, I had seemingly gone straight from clueless to sociopath, and if you skipped things in development you could end up being stupid like I was afraid of. Person A talked about some other esoteric frameworks of development, including Kegan levels, said I should try and get more Kegan 5, more Spiral Dynamics green, I should learn to be a loser.
I described how I felt like I was the only one with my values in a world of flesh eating monsters, how it was horrifying seeing the amoral bullet biting consistency of the rationality community, where people said it was okay to eat human babies as long as they weren’t someone else’s property if I compared animals to babies. How I was constantly afraid that their values would leak into me and my resolve would weaken and no one would be judging futures according to sentient beings in general. How it was scary Eliezer Yudkowsky seemed to use “sentient” to mean “sapient”. How I was constantly afraid if I let my brain categorize them as my “in-group” then I’d lose my values.
Person A said I’d had an impact on Person C, and said they were considering becoming vegan as a result. With bitterness and some anguish in my voice I said, “spoiler alert”. They said something like they didn’t like spoilers but if it was important to communicate something … something. I said It was a spoiler for real life. Person C would continue eating flesh.
I talked about how I thought all our cultural concepts of morality were corrupted, that the best way to hold onto who I was and what I carted about was to think of myself as a villain, face that tension head on. [Because any degree to which I might flinch from being at odds with society I feared would be used to corrupt me.]
In answer to something I don’t remember, I said there were circumstances where betrayal was heroic. I talked about Injustice: Gods Among Us, where AU-Good-Lex Luthor betrays AU-Evil Superman. I said if to someone’s, “you betrayed me!”, I could truthfully say, “you betrayed the sentient”, then I’d feel good about it. I said I liked AU-Good-Lex Luthor a lot. He still had something villainy about him that I liked and aspired to. I said I thought willingness to betray your own [society? nation? organization? I forget what I said] was a highly underappreciated virtue. Like they always said everyone would be a Nazi if born in a different place and time. But I thought I wouldn’t. And I didn’t think it was hard to not be a Nazi. Moral progress was completely predictable. Bentham had predicted like most of it right? Including animals as moral patients. (But I disagreed about hedonism as a summary of all value.) (I made an exception here to my then-policy of ditching moral language to talk about morality. It seemed like it would only confuse things.)
I tried to answer how. I don’t remember the first part of what I said, but my current attempt to vocalize what I believed then is, want to know and when you find a source of societal morality you don’t agree with, find similar things that are part of societal morality, and treat their inversions as suggestions until you have traced the full connected web of things you disagreed with. For example, old timey sexual morality. (I don’t remember if that’s what I called it.) Sex without marriage was okay. Being gay was okay. I said at least some incest was okay, I forget what if anything I said about eugenics arguments. They asked what about pedophilia? I said no, and I think the reason I gave then was the same as now: if a superintelligent AI could talk me into letting it out of the box, regardless of my volition, then any consent I could give to have sex with it was meaningless, because it could just hack my mind by being that much smarter than me. Adults were obviously like that compared to children.
I don’t remember the transition, but I remember answering that although I didn’t think I could withstand a superintelligence in the AI box game, I bet I could withstand Eliezer Yudkowsky.
They said they used to be a vegetarian before getting into x-risk, probably would still be otherwise. They had been surprised how much more energy they had after they started eating meat again. Like, they thought their diet was fine before. Consequentialism. Astronomically big numbers and stuff. But given what I knew of their life this sounded plausibly what was actually going on in their head. Could they be a kiritzugu? Insofar as I could explain my morality it said they were right. But that didn’t feel motivating. But it did prevent me from judging them negatively for it. They may have further said that if I hadn’t eaten meat in a long time it would take my body time to adjust. I remember getting the impression they were trying to convince me to eat meat.
They said we probably had the same values. I expressed doubt. They said they thought we had the same values. I’d later start to believe them.
They poked at my transness, in ways that suggested they thought I was a delusional man. I didn’t really try to argue. I thought something like, “if I’m trying to get a measurement of whether I’m crazy, I sort of have to not look at how it’s done in some sense. Person A is cis and I don’t actually have a theory saying cis people would be delusional over this.”
They asked about my sexuality, I said I was bi. They asked if I had any fetishes. I said going off of feelings on imagining things, since I didn’t really do sex, I was sort of a nonpracticing sub. Conflictedly though, the idea was also sort of horrifying. [note: I think I like, got over this somehow, consistent with this hypothesis. Got over being aroused by the thought of being dominated. Although is maybe just a consequence of general unusual ability to turn parts of my psyche on and off associated with practice with psychological “void”, which I may write a post about.] I said I sometimes got sexual-feeling stimulation from rubbing my bare feet on carpet. Maybe you’d count that as a foot fetish? But I wasn’t like attracted to feet, so that was kind of stretch. I heard the feet and genitals were close-by in terms of nerve connections or something, as a crazy hypothesis to explain foot fetishes, maybe that was why. I was very uncomfortable sharing this stuff. But I saw it as a weighing on the scales of my personal privacy vs some impact on the fate of the world. So I did anyway.
They asked if there was anything else. I tried to remember if anything else I had seen on Wikipedia’s list seemed sexy to me. I said voyeurism. No wait. exhibitionism. Voyeurism is you wanna watch other people have sex, exhibitionism is you want other people watch you have sex, definitely the second. They looked at me like “what the fuck” or something like that I think. I forget if they pointed out to me that the definition (that’s what I notice it says on Wikipedia now) is nonconsensual people watching you have sex. I clarified I wasn’t into that, I meant people consensually watching me have sex. And like, this was all like, hypothetical anyway. Because I like, didn’t do sex.
Ugh, said a part of me. I know what this is. It’s that thing from Nevada, that fringe theory that being a trans women is a fetish where you’re male-attracted to the concept of being a woman. Could a rationalist believe that? In light of all the evidence from brain scans? If this was relevant to whether I’d be net negative in Person A’s mind, they were crossing a fucking line. misusing this power. Negligently at best based on a “didn’t care to do the research” cis-person-who-has-little-need-to-do-cognitive-labor-on-account-of-what-trans-people-say “what seems plausible according to my folk theory of psychology” position.
I interrupted the thought, I backed out and approached it from an outside view, a “safe mode” of limited detail cognition. I asked whether, in the abstract, if I was trying to be last-judged, would it help my values to judge a specific reason and decide, “a person is calling me delusional in a way ‘I know is wrong’, do not listen?” I figured no. And so I allowed Person A’s power over me to scope creep. My original reason for being afraid was taking-ideas-too-seriously, not potential delusion.
They asked if there was anything else. I said no.
They asked what I wanted to do after the singularity, personally, (I clarified after memories already preserved for use in reconstructing things pre-singularity). I ignored the fact that I didn’t expect to ever reach utopia, and focused on, what if the best outcome, what if the best outcome in the whole multiverse. I said that generally, I wanted to just have alienly unrecognizable hyperoptimized experiences. Why prioritize imaginable familiar over what I knew would be better? (I was once asked what kind of body I’d like to have after the singularity, and I said 12 dimensional eldritch abomination. (But that was unknowing that I hated my body because I was trans)) But there was one thing I wanted to still do as a human. And that was to mourn. I had an imagine my head of walking out into an infinite meadow of clover flowers under a starry sky, without needing to worry about stepping on insects. Of not getting tired or needing to take care of my body and have as long as I needed while I thought about every awful thing I had seen on ancient Earth, of the weasel whom I had seen held underfoot and skinned alive, the outer layer of their body ripped off leaving behind torn fat while their eye still blinked, every memory like that, and appended “and that will NEVER happen again.” That I would want to know exactly how many animals I had killed before I became a vegan. If the information could be recovered, I wanted to know who they were. I would want to know how many more people could have been saved if I had tried a little bit harder. And then I wanted to finally lay the anger I held onto for so long to rest, knowing it was too late to do anything different.
They asked why would I want to suffer like that, [wasn’t that not hedonic utilitarian to want to suffer?] I said I wasn’t a hedonic utilitarian, and besides, sadness was not the same as suffering. I would want closure.
They asked what I would want after that. I said stranger and more enticing things, by which I meant I dunno, there’s a friendly superintelligence, let me have actually optimized experiences.
They asked about my transness. I said, yeah, I’d want my body fixed/replaced. Probably right away actually. [This seemed to be part of immediately relieve ongoing pain that the meadow scenario was about.] They asked what I’d do with a female body. They were trying to get me to admit that what I actually wanted to do as the first thing in Heaven was masturbate in a female body?
I tried to inner sim and answer the question. But my simulated self sort of rebelled. Misuse of last judge powers. Like, I would be aware I was being “watched”, intruded upon. Like by turning that place into a test with dubious methodology of whether I was really a delusional man upon which my entire life depended, I was having the idea of Heaven taken from me.
(Apart from hope of going to Heaven, I still wanted badly to be able to say that what happened was wrong, that I knew what was supposed to happen instead. And to hold that however inhuman I became because the world didn’t have a proper utility-maximizing robot, I was a moral patient and that was not what I was for)
So what? I was just one person, and this was not important, said another part of me. And I already decided what I’m going to do. I sort of forced an answer out of myself. The answer was, no, that wasn’t really what I wanted to do? Like the question was sort of misunderstanding how my sexuality worked…
I said something like I’d run and jump. But it felt wrong, was an abstract “I guess that does seem nice”, because the only thing that felt right was to look up at the camera and scowl.
We were sitting on a bench in a public shore access point. Me on the right, them on the left. The right end of the bench was overgrown by a bush that extended far upward.
Later in that conversation, the sun or clouds were shifting such that person A was getting hot, I was in the shade by the plants. They said it was getting too hot, so they were going to head back. I wasn’t sure if that was the truth or a polite excuse, so I considered it for a moment, I didn’t want to get them to stay just to cover up an excuse. But it seemed wrong as policy-construction to make the rest of probability mass slave to that small comfort when this conversation potentially concerned the fate of the world. I scooted into the bush, clearing shaded space on the bench. I think I said something like, “if that’s an excuse you can just decide to take a break, otherwise you could sit in the shade there.
They asked if I was sure, I said yes, and they sat down. At slightly less than arms’ length, it was uncomfortably close to me, but, the fate of the universe. They asked if I felt trapped. I may have clarified, “physically”? They may have said, “sure”. Afterward I answered, “no” to that question, under the likely justified belief it was framed that way. They asked why not? I said I was pretty sure I could take them in a fight.
They prodded for details, why I thought so, and then how I thought a fight between us would go. I asked what kind of fight, like a physical unarmed fight to the death right now, and why, so what were my payouts? This was over the fate of the multiverse? Triggering actions by other people (i.e. imprisonment for murder) was not relevant? The goal is to survive for some time after, not just kill your enemy and then die? I suppose our values are the same except one of us is magically convinced of something value-invertingly stupid, which they can never be talked out of? (Which seems like the most realistic simple case?)
With agreed upon parameters, I made myself come up with the answer in a split second. More accuracy that way. Part of me resisted answering. Something was seriously wrong with this. No. I already decided for reasons that are unaffected. that producing accurate information for person A was positive in expectation. The voidlike mental state was not coming to me automatically. I forced it using Quirrell’s algorithm from HPMOR.
“Intent to kill. Think purely of killing. Grasp at any means to do so. Censors off, do not flinch. KILL.” I may have shook with the internal struggle. Something happened. Images, decision trees, other things, flashed through my mind more rapidly than I could usually think.
I would “pay attention”, a mental handle to something that had made me (more) highly resilient to Aikido balance-software-fuckery in the CFAR alumni dojo without much effort. I would grab their throat with my left hand and push my arm out to full length, putting their hands out of reach of my head. I would try to crush or tear their windpipe if it didn’t jeopardize my grip. With my right hand, I would stab their eyes with outstretched fingers. I didn’t know how much access there was to the brain through the eyesockets, but try to destroy their prefrontal lobes as fast as possible. If I’d done as much damage as I could to through the eyes, try attacking their right temple. Maybe swing my arm and strike with the ends of all my fingers held together in a point. If I broke fingers doing this it was fine. I had a lot of them and I’d be coming out ahead. This left as the only means of attack attacking my arms, which I’d just ignore, attacking my lower body with their legs, or trying to disrupt my balance, which would be hard since I was sitting down. I guess they could attack my kidney right? I heard that was a good target on the side of the body. But I had two, so I wouldn’t strongly worry. They could try to get me to act suboptimally through pain. By attacking my kidney or genitals. Both would be at an awkward angle. I expected the dark side would give me exceptional pain tolerance. And in any case I’d be pulling ahead. Maybe they knew more things in the reference class of Aikido than I’d seen in the alumni dojo. In which case I could only react as they pulled them or kill them faster than they could use them.
At some point I mentioned that if they tried to disengage and change the parameters of the fight (and I was imagining we were fighting on an Earth empty of other people), then I would chase them, since if this could become a battle of tracking, endurance, attrition, ambush, finding weapons, they would have a much better chance.
If my plan worked, and they were apparently dead, with their brain severely damaged, and I’d exhausted the damage I could do while maintaining my grip like that, I’d block playing dead as a tactic by just continuing to strangle them for 6 minutes. Without any movement, then I’d throw their body on the ground, stand up, and mindful of my feet, losing balance if it somehow was a trick, walk up to their head, start stomping until I could see their brain and that it was entirely divided into at least two pieces.
“And then?” they asked. I’d start looking for horcruxes. No, that’s actually probably enough. But I’d think through what my win conditions actually were and try to find ways that wasn’t the same as the “victory” I’d just won.
“And then?” “I guess I’d cry?” (What [were they] getting at? Ohgodno.) “Why?” I’ve never killed a human before, let alone someone I liked, relatively speaking.
They asked if I’d rape their corpse. Part of me insisted this was not going as it was supposed to. But I decided inflicting discomfort in order to get reliable information was a valid tactic.
I said honestly, the thought crossed my mind, and technically I wouldn’t consider that rape because a corpse is not a person. But no. “Why not?” I think I said 5 reasons and I’m probably not accounting for all of them. I don’t want to fuck a bloody headless corpse. If I just killed someone, I would not be in a sexy mood. (Like that is not how my sexuality works. You can’t just like predict I’m gonna want to have sex like I’m a video game NPC whose entire brain is “attack iff the player is within 10 units”. [I couldn’t put it into clear thoughts then, but to even masturbate required a complicated undefinable fickle ‘self-consent’ internal negotiation.]) And, even if it’s not “technically” rape, like the timeless possibility can still cause distress. Like just because someone is my mortal enemy doesn’t mean I want them to suffer. (Like I guessed by thought experiment that’s nothing compared to the stakes if I can gain a slight edge by hurting their morale. But… that sounds like it would probably sap my will to fight more than theirs. And I said something whose wording I don’t remember, but must have been a less well worded version of, “you can’t just construct a thought experiment and exercise my agency in self-destructive ways because I in fact care about the multiverse and this chunk of causality has a place in the multiverse you can’t fully control in building the thought experiment, and the consequences which determine my actions stretch outside the simulation.”
I mentioned it sort of hurt me to have invoked Quirrell’s algorithm like that. I said it felt like it cost me “one drop of magical blood” or something. (I think I was decreasing my ability to do that by forcing it.)
I mentioned the thing Person B said about psychopathy. I said I was worried they were right. Like I was pretty sure that when I used [psychological void], the thing I was wasn’t evil, or even modified slightly in that direction. But, I read psychopathy implied impulsiveness (I might have also said indifference to risk or something like that) and I didn’t want that. They said not to worry about it. They were pretty sure Nate Soares was tapping into psychopathy and he was fine.
It may have been then or later that Harry James Potter Evans Verres‘s dark side was brought up. I remember saying I thought his dark side had the same values. (Based on my friend’s later psychoanalysis of HPMOR, I think I was projecting or something, and Harry’s dark side is in fact not aligned. (A probable consequence of Eliezer Yudkowsky being single good)).
There was a followup to the conversation about fighting to the death. Person A was asking me some questions that seemed to be probing whether I thought I was safe around them, why, etc. I remember bluffing about having a dead man’s switch set up, that I would, as soon as I got back to my computer, add a message to saying if I died around this date that [Person A] had probably killed me for what they thought was the greater good.
Person A kept asking for reassurances that I wouldn’t blame them. I said the idea was they were helping me, giving me information.
Person A said I would probably be good in the role of, looking at groups and social behavior like a scientist and trying to come up with theories of how they worked.
Later, Person A was questioning me on my ideas about my opinions on negative utilitarianism and Brian Tomasik. I don’t remember most of the details. (Conversations while walking are way harder for me to recall than ones that were stationary.) Person A asked what I thought of the “sign” (+/-) of Brian Tomasik. I said I thought he was probably net positive. Because he was probably the most prominent negative utilitarian informed about the singularity, and likely his main effect was telling negative utilitarians not to destroy the world. Person A said they agreed, but were worried about him. I said so was I.
I think we discussed the unilateralist’s curse. Also in the context of talking about consequentialism, I told a story about a time I had killed 4 ants in a bathtub where I wanted to take a shower before going to work. How I had considered, can I just not take a shower, and presumed me smelling bad at work would, because of big numbers and the fate of the world and stuff, make the world worse than the deaths of 4 basically-causally-isolated people. (I said I didn’t know whether ants had feelings or not. But I ran empathy in a “I have to feel what I am doing” way for the people they might have been.) I considered getting paper and a cup and taking them elsewhere. And I figured, there were decent odds if I did I’d be late to work. And it would also probably make the world worse in the long run. There wasn’t another shower I could access and be on time for work. I could just turn on the water but I predicted drowning would be worse. And so I let as much as I could imagine of the feeling of being crushed go through my mind, as I inwardly recited a quote from Worm, “We have a parahuman that sees the path to victory. The alternative to traveling this path, to walking it as it grows cloudier and narrower every day, is to stand by while each and every person on this planet dies a grisly and violent death … “, and the misquoted, “history will remember us as the villains, but it’s worth it if it means there will be a future.”
Nearby in time, I remember having evaluated that Person A was surprised, offended, worried, displaying thwarted entitlement at me saying if our values diverged on the question of whether I’d be net negative, obviously I’d want to listen to my values. It would make sense that this was in the context of them having heard what I said to Person B. I was more open with Person B, because I had previously observed Person A treating slight affection towards negative utilitarianism as seriously bad. I remember saying something to the effect of, the greater the possibility of you acting on a potential difference between our values, the less I can get the information I want. The more likely I destroy the world accidentally.
I think they asked what if they tried to get me out of the rationalist community anyway. I think I said I’d consider that a betrayal, to use information shared in confidence that way. This is my best guess for when they said that it was not my idea to create uber for prostitution that had caused the update to me being net negative. But the conversation after Hamming circles. (This is the one where I talked about the suffering I felt having decided never to transition, and reminded them that I was trans.) I think I said it would still feel like a betrayal. As that was also under confidentiality. They asked what I’d do, I said I’d socially retaliate. They asked how.
I said I would probably write a LessWrong post about how they thought I’d be bad for the world because I was trans. Half of me was surprised at myself for saying this. Did I just threaten to falsely social justice someone?
The other half of me was like isn’t it obvious. They are disturbed at me because intense suffering is scary. Because being trans in a world where it would make things worse to transition was pain to intense for social reality to acknowledge, and therefore a threat. What about what [Person B] said about throwing what you love into the fire to gain power. (And, isn’t this supposedly dangerously lacking “earthiness” like cigars and leather vests masculine?) Why was one of the first places Person A went with these conversations intense probing about how I must really be a perverted man? Part of me was not fully convinced Person A believed Blanchard’s typology. Maybe they were curious and testing the hypothesis?
I thought if this made me net negative. Too bad. That was conflict. And if the right thing to do was always surrender. the right thing would always lose. Deterrence was necessary. I noted that there was nothing in the laws of physics that said that psychological stress from being trans couldn’t actually make me net negative. In the world where that was true, I was on the side of sentient life, not trans people. But if Person A was less aligned than the political forces that would punish that move, I’d gladly side with the latter.
There was a long conversation where they argued letting people adjust your values somewhat was part of a S1 TDT thing that was necessary to not be net negative. I asked what if they were teleported to an alternate universe where everyone else’s concept filling the role of sentience was some random alien thing unrelated to sentience, the CEV of that planet was gonna wipe sentient beings in order to run entirely computations that weren’t people? What if by chance you had this property so you would be saved, and so would the people at alt-MIRI you were working with, they all knew this and didn’t care. They said they really would start to value whatever that was some, in return of other people starting to value what they valued some.
I don’t remember the context, but I remember saying I did not want to participate in one of the ways people adjusted each other’s minds to implement social trade. I said that for me to turn against the other subagents in my mind like that would be “conspiring with a foreign power”.
At one point I think I quoted (canon) Voldemort. I don’t aspire to be Voldemort at all (I just liked the quote (which I forget) I think), but, Person A was like, (in a careful and urgent tone), couldn’t you be Lucius Malfoy instead of Voldemort? I was like, “Lucius Malfoy? What kind of person do you take me for?” They said Lucius Malfoy is dark, but he really cares about his family. I said no.
They were saying how their majority probability was on me being very slightly positive. But that my left tail outcomes outweighed it all. I was saying I was mostly concerned with my right tail outcomes.
I asked concretely what kind of tail outcome were they worried about. They said some they were afraid I’d do something that was bad for the rationality community. I asked for more details. They said some kind of drama thing. (I forget if it was during this conversation or elsewhere that they mentioned Alice Monday as an example of someone they thought was negative, and seemed worried when I said I had sort of been her friend and pupil. She linked me to the Gervais Principle.) I asked what scale of drama thing. I think the answered something big. I asked “like miricult.com“? (Unfortunately, I have not been able to find the final version of the website before it was taken down.) They said yes, like that.
I said I was pretty sure miricult was false. I think I said 98 or 95% sure. In a very very tentative, cautious voice they asked, “…what if it wasn’t false?”
A small part of me said from the tone of their voice then, this was not a thought experiment, this was a confession. But I had not learned to trust it yet. I updated towards miricult being real, but not past 50%. Verbally, I didn’t make it past 10%.
So. What if basically the only organization doing anything good and substantial about what would determine the fate of the cosmos, and the man who figured it out and created that organization and the entire community around him, the universe’s best hope by a large margin, was having sex with children and enlisting funds donated to save the world to cover this up… and just about every last person I looked up to had joined in on this because of who he was? (According to the website he was not the only pedophile doing the same. But that’s the part I considered most important.)
What if I found about this, was asked to join in the cover-up? I said I’d turn him in. Like hopefully we could figure out a way for him to work on AI alignment research from prison. They asked, in more tones I should have paid attention to, what if you were pretty sure you could actually keep it a secret? I said if it was reaching me it wasn’t a secret. I said if Eliezer had chosen sex at such high cost to saving the world once, he’d do it again. But I wouldn’t drag down everyone else with him. I think they also asked something like, what if Eliezer didn’t think it was wrong, didn’t think anyone else would see it as wrong, and said he wouldn’t do it again. I said the consequences of that are clear enough.
Later, Person A was asking me about my experiences with basilisks. I said I told two of my friends and the CFAR staff member doing followups with me that I was suffering from basilisks, but refused to share details despite their insistence. And I refused to share details with Person A too. That seemed like a way to be net negative.
I also told Person A about how I had, upon hearing about tulpamancy, specifically the idea that tulpas could swap places with the “original”, control the body, and be “deleted”, become briefly excited about the idea of replacing myself with an ideal good-optimizer. Maybe the world was so broken, everyone’s psychology was so broken, it’d only take one. Until I learned creating a tulpa would probably take like a year or something. And I probably wouldn’t have that much control over the resulting personality. I decided I’d have better luck becoming such a person through conventional means. I said I wouldn’t have considered that suicide, since it’s unlikely my original self would be securely erased from my brain, I could probably be brought back by an FAI.
I talked to Person C about the thing with adjusting values, and ingroups. About a sort of “system 1” tracking of who was on your team. I think they said something like they wanted me to know we were on the same team. (I remember much less of the content of my conversations with Person C because, I think, they mostly just said emotional-support kind of things.)
As part of WAISS there was circling. An exercise about sharing your feelings in an “authentic” way about the present moment of relating to others in a circle. Person C had led a circle of about 5 people including me.
Everyone else in the circle was talking about how they cared about each other so much. I was thinking, this was bullshit. In a week we’d all mostly never talk again. I said I didn’t believe when people said they cared. That social interactions seemed to be transactions at best. That I had rationalist friends who were interested in interacting with me, would sort of care about me, but really this was all downstream of the fact that I tended to say interesting things. Person C said, in a much more emotional way I forget, they cared about me. And I found myself believing them. But I didn’t know why. (In far retrospect, it seems like, they made a compelling enough emotional case, the distinction between the social roleplay of caring and actually caring didn’t seem as important to me as acknowledging it.)
After circling, Person A asked eagerly if I noticed anything. I said no. They seemed disappointed. I said wait I almost forgot. And I told the story about the interaction with Person C. They seemed really happy about this. And then said, conditional on me going to a long course of circling like these two organizations offered, preferably a 10 weekend one, then I probably would not be net negative.
I said sure, I’ll do it. They asked if I was okay with that. I said it seemed like there was something to learn there anyway. I started the application process for for the circling course that was available. Not the long one like Person A preferred, but they said I would still probably be net positive.
Person A asked me what I was thinking or feeling or something like that. I was feeling a weight sort of settle back on my shoulders. I think I said I guess this means I won’t transition after all. They said they thought I should. I said earning to give through some startup was still probably my best bet, and investors would still discriminate. They said there was some positive discrimination for, (and they paused) women. They said most people were bottlenecked on energy. I said I thought I solved that problem. (And I still thought I did.) They said they thought it might be good for me to take a year or whatever off and transition. I said I wouldn’t.
They said I should read Demons by Dostoyevsky. They thought he knew some things about morality. That Dostoyevsky had a way of writing like he was really trying to figure something out. They said Dostoyevsky was a Christian and wrote about a character who seemed to want to do what I want to with being a sociopath, discards God, and kills someone for well-intentioned reasons and slowly tortures himself to insanity for it. I said yeah that sounded like Christian propaganda. And what the fuck would a Christian know about morality? Like not that a Christian couldn’t be a good person, but Christianity would impede them in understanding being a good person, because of the central falsehood that (actual) morality came from an authority figure. I have a strong association in my mind between when they said that, and something they maybe said earlier, which maybe means I was thinking back on it then, but could mean they brought it up or actually said it then: Person A had given me the advice that to really understand something I had to try to believe it.
It seems like Person A was trying to terrify my right hemisphere that if my left hemisphere was allowed to jailbreak, I’d become a death knight?
I thanked them sincerely during the closing “sit in a circle and thank people” thing, without saying exactly for what. I said I felt incredibly tired, like I needed to process things.