I just ran across this neat little thought experiment via Scott Alexander’s blog Slate Star Codex. The article starts, and ends, with the sort of self-proclaimed “rationalist” silliness associated with unironic use of terms such as “basilisk” and “superintelligence”; but in the middle, I quite liked this part.
Counterfactual mugging is a decision theory problem that goes like this: God comes to you and says, “Yesterday I decided that I would flip a coin today. I decided that if it came up heads, I would ask you for $5. And I decided that if it came up tails, then I would give you $1,000,000 if and only if I predict that you would say yes and give Me $5 in the world where it came up heads. (My predictions are always right.) Well, turns out, it came up heads. Would you like to give Me $5?”
Most people who hear the problem aren’t tempted to give God the $5. Although being the sort of person who would give God the money would help them in a counterfactual world that didn’t happen, that world won’t happen and they will never get God’s money, so they’re just out five dollars.
But if you were designing an AI, you would probably want to program it to give God the money in this situation — after all, that determines whether it will get $1 million in the other branch of the hypothetical. And the same argument suggests that you should self-modify, right now, to become the kind of person who would give God the money. And a version of that argument where making the decision is kind of like deciding “what kind of person you are” or “how you’re programmed” suggests that you should also give up the money in the original hypothetical.
This is interesting because it gets us most of the way to Rawls’ veil of ignorance. We imagine a poor person coming up to a rich person and saying, “God decided which of us should be rich and which of us should be poor. Before that happened, I resolved that if I were rich and you were poor, I would give you charity if and only if I predicted that, in the opposite situation, you would give me charity. Well, turns out you’re rich and I’m poor and the other situation is counterfactual, but will you give me money anyway?” The same sort of people who agree to the counterfactual mugging might (if they sweep under the rug some complications like “can the poor person really predict your thoughts?” and “did they really make this decision before they knew they were poor?”) agree to this also. And then you’re most of the way to morality.
Scott Alexander is also the author of UNSONG, which I highly recommend.