# Nash equilibria in Ballmer’s binary-search interview game

Yesterday John Graham-Cumming posted about “Steve Ballmer’s incorrect binary search interview question,” and in the Hacker News discussion, among the predictable “interview culture is flawed!!” complaints, there was some discussion of whether binary search was even the right way to solve Ballmer’s puzzle. Here’s a stab at an answer.

Ballmer’s puzzle boils down to:

I’m thinking of a number between 1 and 100. You make a series of guesses. After each guess I’ll tell you whether my number is higher or lower. If you hit it on the first guess I’ll give you $1000. On the second guess,$990. On the third guess, $980. And so on; until, if you hit it on your 100th guess, I’ll give you only$10.

What is the expected value, to you, of playing this game with me?

Here I’ve fiddled with the numbers to keep all the payouts positive and to help separate your possible guesses (in teletype font) from payouts (preceded by a $dollar sign). In Ballmer’s actual interview, he was asking basically whether the expected value of this game is at least$950.

HN rightly points out that the intangible value of getting to play such a game against Steve Ballmer is very high regardless of how many dollars you expect to win or lose at it. But let’s focus on the math.

Binary search seems like a decent way to go about tackling this game. If Ballmer is choosing his secret number uniformly at random, then the expected value of the game is $952. But, as Ballmer points out in the linked video, if he knows you’re going to do a plain old binary search, then he certainly won’t ever choose 50 as his secret number. In fact, he has no reason to choose 25 or 75 either. Or 12, 13, 37, 38, 62, 63, 87, or 88. If Ballmer avoids those eleven numbers, and chooses uniformly at random from among the rest, that alone is enough to lower the expected value of the game from$952 to about $949.55. But of course if you know that Ballmer is going to do that, then you’ll start your binary search at 49 instead of 50. And so on. It turns out that this question is really about computing the game’s Nash equilibrium. ## The three-number version Let’s tackle a stripped-down version of the problem first. Interviewer Alice will choose a number between 1 and 3. Interviewee Bob will make a series of guesses; after each guess, Alice will say “Higher” or “Lower” or “Hit.” The payout is$30 if Bob hits on the first guess; $20 on the second; or$10 on the third.

There are three pure strategies for Alice, and five pure strategies for Bob. The payoff matrix is:

                        p   q   r
1   2   3
------------------------------------
a "Guess 1, then 3"    30  10  20
b "Guess 1, then 2"    30  20  10
c "Guess 2, then win"  20  30  20
d "Guess 3, then 2"    10  20  30
e "Guess 3, then 1"    20  10  30


Alice is looking for a mixed strategy $$(p,q,r)$$ (with all three values between 0 and 1, and $$r = 1-p-q$$) that minimizes the value of Bob’s best response. Vice versa, Bob is looking for a mixed strategy $$(a,b,c,d,e)$$ (with $$e=1-b-c-d$$) that maximizes his expected value given Alice’s best strategy.

Without loss of generality, we can assume by symmetry that $$a=e, b=d, p=r$$; thus $$q=1-2p$$ and $$c=1-2a-2b$$. Bob’s expected payoff, for each of his three distinct strategies, is:

\begin{align} E_a &= 30p+10q+20r = 50p+10(1-2p) = 10+30p \\ E_b &= 30p+20q+10r = 40p+20(1-2p) = 20 \\ E_c &= 20p+30q+20r = 40p+30(1-2p) = 30-20p \end{align}

We can plug in some values for $$p$$ (remember, that’s Alice’s mixed strategy’s likelihood of picking 1, which is the same as its likelihood of picking 3), to see how Bob’s pure strategies would fare in response.

• If Alice’s $$p=0.5$$, then Bob’s $$(E_a,E_b,E_c) = (25, 20, 20)$$. That is, if Alice commits to a mixed strategy that chooses 1 or 3 with equal probability and never chooses 2, then Bob’s best strategy is to guess 1, then 3 (or vice versa) with equal probability. That’ll pay $25 per game. • If Alice’s $$p=0$$, then Bob’s $$(E_a,E_b,E_c) = (10, 20, 30)$$. That is, if Alice always picks good old 2, then Bob’s best strategy by far is binary search, because it’ll always hit on the first guess. • If Alice’s $$p=1/3$$, then Bob’s $$(E_a,E_b,E_c) = (20, 20, 23.33)$$. That is, if Alice picks each number equally often, then Bob’s best strategy is to use binary search. This is significant: Somewhere between $$p=0.5$$ and $$p=1/3$$, Bob’s best strategy must change from “guess the endpoints first” to “binary search.” There will be some value of $$p$$ for which it doesn’t matter whether Bob does binary search or not; and that will be our Nash equilibrium. Iteratively search for that $$p$$, and you’ll find: • If Alice’s $$p=0.4$$, then Bob’s $$(E_a,E_b,E_c) = (22, 20, 22)$$. That is, if Alice picks 2 with probability 0.2 and 1 or 3 with probability 0.4 each, then it doesn’t matter what Bob does. Well, except for strategy b (linear search) — that strategy is dominated by the other two options. What does this mean for our interview? If Alice chooses her optimal strategy $$p=0.4$$, then it doesn’t matter what Bob does, so he might as well always choose binary search, right? Surely “always binary search” is the smart strategy? Well, no; because as we’ve seen, if Alice knows that Bob is playing a pure binary-search strategy, she’ll switch to $$p=0.5$$ and reduce Bob’s expected payout from$22 per game to only 20 per game (because in that case Bob’s first guess will always be 2 and Alice’s secret number will never be 2). So Bob is looking for a mixed strategy $$(a,b,c)$$ that maximizes the smaller of Alice’s expected values: \begin{align} E_p &= 30a+30b+20c+10d+20e = 50a+40b+20(1-2a-2b) = 20+10a \\ E_q &= 10a+20b+30c+20d+10e = 20a+40b+30(1-2a-2b) = 30-40a-20b \end{align} • If Bob’s $$(a,b)=(0,0)$$, then Alice’s $$(E_p,E_q)=(20,30)$$. That is, if Bob always does binary search, then Alice’s best strategy is to avoid 2. • If Bob’s $$(a,b)=(.5,0)$$, then Alice’s $$(E_p,E_q)=(25,10)$$. That is, if Bob always guesses the endpoints first, then Alice’s best strategy by far is to pick 2, because it’ll always pay out a mere10.

• If Bob’s $$(a,b)=(0,.5)$$, then Alice’s $$(E_p,E_q)=(20,20)$$. That is, if Bob always guesses in ascending or descending order, then it doesn’t matter what Alice picks.

Somewhere in the two-dimensional space between $$(a,b)=(0,0)$$ and $$(a,b)=(.5,0)$$, we expect that Alice’s best strategy will change from “avoid 2” to “pick 2”; and that should be our Nash equilibrium. That equilibrium is at:

• If Bob’s $$(a,b)=(.2,0)$$, then Alice’s $$(E_p,E_q)=(22,22)$$. That is, if Bob guesses 2 with probability 0.6, and splits his other guesses evenly between 1 and 3, then it doesn’t matter what Alice does.

## Update, 2024-09-06

Konstantin Gukov, in “Steve Ballmer was wrong” (2024-09-05), shows that we can use scipy.optimize.linprog to do the icky tedious algebra above. All we have to provide is the payoff matrix — with a column for each of Alice’s possible choices and a row for each of Bob’s possible guessing strategies — and it’ll spit out an equilibrium position for that particular set of strategies. But we can increase the value of the game to Bob by increasing the range of Bob’s imagination. Konstantin started out with a range of strategies that let Bob hit in 5.93 guesses on average; then I improved that to 5.8535 guesses; then again to 5.8333 guesses, for a profit of 16.67 cents per game. That best-so-far strategy is a mixture of 83 different pure strategies.

Meanwhile, Bo Waggoner claims a proof (although I don’t understand the math involved) that the true equilibrium is somewhere between 5.80371 guesses and 5.81087 guesses.

Meanwhile, Eric Farmer also computes some expected values — some exactly, some approximately.

Posted 2024-09-04