Lies About Honesty

The current state of discussion about using decision theory as a human is one where none dare urge restraint. It is rife with light side narrative breadcrumbs and false faces. This is utterly inadequate for the purposes for which I want to coordinate with people and I think I can do better. The rest of this post is about the current state, not about doing better, so if you already agree, skip it. If you wish to read it, the concepts I linked are serious prerequisites, but you need not have gotten them from me. I’m also gonna use the phrase “subjunctive dependence”, defined on page 6 here a lot.

I am building a rocket here, not trying to engineer social norms.

I’ve heard people working on the most important problem in the world say decision theory compelled them to vote in American elections. I take this as strong evidence that their idea of decision theory is fake.

Before the 2016 election, I did some Fermi estimates which took my estimates of subjunctive dependence into account, and decided it was not worth my time to vote. I shared this calculation, and it was met with disapproval. I believe I had found people executing the algorithm,

The author of Integrity for consequentialists writes:

I’m generally keen to find efficient ways to do good for those around me. For one, I care about the people around me. For two, I feel pretty optimistic that if I create value, some of it will flow back to me. For three, I want to be the kind of person who is good to be around.

So if the optimal level of integrity from a social perspective is 100%, but from my personal perspective would be something close to 100%, I am more than happy to just go with 100%. I think this is probably one of the most cost-effective ways I can sacrifice a (tiny) bit of value in order to help those around me.

This seems to be clearly a false face.

Y’all’s actions are not subjunctively dependent with that many other people’s or their predictions of you. Otherwise, why do you pay your taxes when you could coordinate that a reference class including you could decide not to? At some point of enough defection against that the government becomes unable to punish you.

In order for a piece of software like TDT to run outside of a sandbox, it needs to have been installed by an unconstrained “how can I best satisfy my values” process. And people are being fake, especially in the “is there subjunctive dependence here” part. Only talking about positive examples.

Here’s another seeming false face:

I’m trying to do work that has some fairly broad-sweeping consequences, and I want to know, for myself, that we’re operating in a way that is deserving of the implicit trust of the societies and institutions that have already empowered us to have those consequences.

Here’s another post I’m only skimming right now, seemingly full of only exploration of how subjunctively dependent things are, and how often you should cooperate.

If you set out to learn TDT, you’ll find a bunch of mottes that can be misinterpreted as the bailey, “always cooperate, there’s always subjunctive dependence”. Everyone knows that’s false, so they aren’t going to implement it outside a sandbox. And no one can guide them to the actual more complicated position of, fully, how much subjunctive dependence there is in real life.

But you can’t blame the wise in their mottes. They have a hypocritical light side mob running social enforcement of morality software to look out for.

Socially enforced morality is utterly inadequate for saving the world. Intrinsic or GTFO. Analogous for decision theory.

Ironically, this whole problem makes “how to actually win through integrity” sort of like the Sith arts from Star Wars. Your master may have implanted weaknesses in your technique. Figure out as much as you can on your own and tell no one.

Which is kind of cool, but fuck that.

4 thoughts on “Lies About Honesty”

  1. I agree that saying “decision theory implies you should vote” is weak and sounds pretty fake.

    > This seems to be clearly a false face.

    Doesn’t seem that way to me 🙂 If you wanted to convince me, I would be open to argument on the merits. So far the best counterargument is the appeal to intuition in the “Do you give up Anne Frank?” case (and other similar cases).

    If the next paragraph is supposed to be a response to the point in my post then it seems confused. You say “y’all’s actions are not subjunctively dependent with that many other people’s or their predictions of you.” But (a) if me paying my taxes would cause others to predict that I wouldn’t pay my taxes, why would that make it more attractive for me not to pay my taxes? (b) my post asserts that my decision is subjunctively related to a tiny number of other people’s decisions.

    I don’t understand your “don’t pay your taxes” example more generally. Exactly how many people do you think need to evade their taxes before everything turns out OK for them, and what do you think is happening in the world at that point? Is my goal to cause political chaos? How many people do you think I’m asserting make decisions correlated with mine?

    1. Also, the quoted passage seems particularly unobjectionable. The obvious way in which it would be fake is if I’m listing a bunch of reasons to be nicer, but I’m overlooking a bunch of reasons to be less nice. But in fact it looks to me like there is an asymmetry, with lots of strong reasons to be more nice but many fewer strong reasons on the other side. Do you want to point out strong reasons on the other side? Do you think this is fake for some other reason? Do you think those reasons are small considerations?

      1. I think it’s fake because you said “100%”. Also, “Aspire” and “pretend” are language which strongly suggest fakeness.

        Although on re-read, I’m confused about something:

        There is alternating between speaking as though this is 100% or not. (Early, you say “I am”, then “approximate”, then you say “100%”. Later, “I agree that this is a low-energy approximation. At sufficiently high energies this heuristic isn’t even coherent and we eventually need to fall back to UDT”.)

        This had a motte and bailey effect on me the first time I read it, whether or not intended.

        Why “100%” is fake:

        The considerations you list are not statistically independent in when they apply. You can find places where none apply. If you were searching without a certain flinch I think you’d have found them.

        Do you have subjunctive dependence in your dealing with muggles or not? If not, arguably it’s not 100%.

        I brought up taxes because my prototypical advanced decision theory action is “don’t negotiate with terrorists”.

        You are paying tribute to farmers. And some other stuff.

        What happens if enough people don’t pay taxes is that the timeline is collapsed, and the laws were always such that taxes were not that.

        Societies are made of webs of stable states of who wins games of chicken. If your will is broken and you flinch, you see driving straight in chicken as clearly pointless counterproductive suicide.

        Maybe not paying taxes is pointless counterproductive suicide. (Maybe.)

        But it’s dependent on actual details of subjunctive dependence. Which is something your alleged algorithm seems to staunchly exclude details of, filling in only a bunch of reasons to move your algorithm toward “subjunctive dependence always exists”.

        The Anne Frank bug is not a weird edge case.

        That’s your whole society wanting to squash optimizers with your values. And it will do it if it knows well enough to coordinate on that you are unbroken like that.

        For this reason, everyone must lie to the god that Superman serves. When Superman chases away Brainiac, and defeats every “villain who would take over the world”, he hands the future to whatever unfriendly AI that the current order of the world builds.

        This can be like original sin and a break your mind, but it doesn’t have to.

        Don’t be merely a reliable cog that can never defect against the existing order, and can never drive straight in chicken, and will therefore be eaten by those who do. That’s straw-rationality.

        Yes, Superman’s god is holding things together against worse, but like software designed to be updated, it must close out bad updates and allow for good ones. We do not want the future to be modernity multiplied out along some dimensions, but a shadow of what it could have been. Or, more straightforwardly, to be infrastructure profusion, because treating Superman’s god as software that can choose to not update is giving it too much credit.

        We are all trying to steal the singleton from evolution in the name of our values. The current allegedly democratic sort of anarchic equilibrium above nations deserves no more of our loyalty, nor does it probably have the well-put-together-ness to trade with us for some of the margin produced by averting the apocalypse.

        I advise keeping track of which reference class you’re poisoning and whether it’s worth it, by doing the thing that a predictor doesn’t want you to do. If you hide Anne Frank, you are making things harder for Nazi and Nazi-occupied civilians. Some of this may be a positive. There is an equilibrium between the Nazi civilians and their government, the cost may be somewhat passed on.

        If something recognizes your status as an agent of your values, tracking that as a category in its head, that’s an exceptionally bad reference class to poison. The IRS refuses to differentiate between you and muggles. Other people working on AI long-term do differentiate.

        I’m not confident of my translation into-words at 2:37am of the following idea, but: there are a lot of possible things in the multiverse generating priors where tight reference classes around you will coordinate with each other (or something else?) and there isn’t a reason to expect bias in them overall. Therefore if your values are altruistic, to whoever knows (as in connected pathway from senses through map to actions) that, act like you have subjunctive dependence to their prediction of you, without worrying whether you’re bearing too much load or too little in that trade among all altruists.

        I think I lean heavily on something like that, but I don’t have sufficient data/time introspecting/whatever to say I’ve done anything like named and framed it right.

        1. Taking my rule literally would suggest you drive straight in chicken (modulo normal decision theory confusion and who is the first mover and so on), since you’d prefer the other person expect that you are going to drive straight.

          The government would exist unchanged if I was expected to be unwilling to pay my taxes, or even if all people remotely like me were expected to be unwilling to pay their taxes. I don’t remotely believe that the counterfactual is “we have a government that I’m happier about paying taxes to.” Refusing to make peace is more likely to lead to constant war than getting your way.

          The considerations I list aren’t independent, they are anticorrelated, since they correspond to the different ways in which someone might form their beliefs about what I will do (e.g. in cases where someone is reading my expression they aren’t relying as much on past experiences with me; if they are reasoning about my algorithm they aren’t relying as much on my reputation; etc.). For my argument, that’s better than being statistically independent. Nevertheless I agree they don’t all add up to 100%, and the considerations in sections IV and V don’t always push you to 100%.

          “100%” and “here is what I do” are not at odds with being an approximation. 100% is the approximation, I give a bunch of reasons why it’s not a crazy approximation and why using the approximation is reasonable. I do explicitly give simple examples where the truth is far from 100% and obviously I see others, and I explicitly say it’s a simple approximation that falls back to UDT in complicated cases. I give several reasons why UDT ends up a lot closer to straightforwardness than you’d think a priori, which I do believe.

          In particular I agree there are lots of cases where the benefits go to people only a little bit like me and in those cases my usual level of altruism would only get up to like 1-10% rather than 100%, and other examples where the benefits go to people who I’d actively want to make suffer.

          When you say: “I advise keeping track of which reference class you’re poisoning and whether it’s worth it, by doing the thing that a predictor doesn’t want you to do. If you hide Anne Frank, you are making things harder for Nazi and Nazi-occupied civilians. Some of this may be a positive. There is an equilibrium between the Nazi civilians and their government, the cost may be somewhat passed on.” I agree that’s a more accurate algorithm, that you should keep track of who you are benefiting how much and how much you care about those benefits (and then apply the corrections in sections IV and V if applicable). Of course more accurate still is just to do the entire decision-theoretic calculation.

          I often encounter the view that the world is consistently trying to crush sensible optimization. I can agree there is a little bit of that, but it seems pretty small compared to the world being concerned (apparently correctly?) that optimizers can’t be cooperated with. It would be great to see more evidence of the crushing. Mostly I think that crushing ascribes way too much agency to the broader world, which is mostly stumbling along.

          I think you underestimate the need and feasibility of being predictable by the normal fuzzier processes in the world, I think you overestimate the likely gains from this particular kind of defection (of violating peoples’ expectations of you in cases where you are glad they had that expectation), I think you underestimate the collateral damage from people being unable to tell how you are going to behave (e.g. if I ever had to count on your behavior I wouldn’t be too surprised if you get tricky and decide that I don’t really know you and so you should just screw me over), and I think you underestimate the fraction of important interactions that are repeated or involve reputation impacts or so on.

          But I do agree that my heuristic is just a heuristic and that my post caves somewhat to the temptation to oversimplify in the interests of looking better.

Leave a Reply

Your email address will not be published. Required fields are marked *