The best engineering interview question I’ve ever gotten, Part 2
Before you read this post — which contains spoilers — you should read “The best engineering interview question I’ve ever gotten, Part 1” (2022-01-06).
The challenge: Modifying memcached
Via its incr
and decr
commands, memcached provides a built-in way to
atomically add \(k\) to a number. But it doesn’t provide other arithmetic
operations; in particular, there is no “atomic multiply by \(k\)” operation.
Your programming challenge: Add a mult
command to memcached.
Analysis
This is a great engineering interview problem because it pretty cleanly partitions the candidate pool into three different types:
Type 0 is just completely stumped by the challenge of interacting with a real codebase. I don’t think many people in this category would get all the way to this point in the interview process anyway. But if you discover that the candidate is in this group, well, don’t hire them.
By the way, MemSQL-in-2013 was doing deeply arcane and high-performance C++11, so the fact that this challenge incidentally requires fluency in C was a plus, not a minus, for their purposes. If your codebase is all Python and Go, you probably wouldn’t use memcached for your interview challenge.
Type 1 looks at the problem and says, “Ah! I know just how to do this!
Multiplication is just repeated addition, and we already have a
ready-made addition subroutine in the form of incr
. So I’ll just
build on top of that. Ah, but instead of adding a constant to x
’s value,
we need to add x
’s value to itself… and the whole thing needs to be atomic.
Let’s look at how the locking works…” They spend all three hours getting
deeper and deeper down various rabbit holes, and never produce anything that works.
Candidates in this group don’t get hired.
Type 2 looks at the problem and says, “Ah! I know just how to do this!
Multiplication is just like addition, except wherever addition does +
,
I should do *
.” So they copy-and-paste, change all the +
s to
*
s, and they’re done in 90 minutes. Candidates in this group stand a
really good chance of being hired.
The best candidates will notice they’ve got lots of time left and polish their submission, for example by making sure the formatting is consistent, adding unit tests, or revisiting their “design decisions” to make sure they can justify them if asked.
Walkthrough
To make sure my time estimates were in the right ballpark, yesterday afternoon I did the whole thing myself.
This section is likely to bore all but the most masochistic readers, so feel free to skip down to the Conclusion.
I started by grepping for "incr"
(with the quotes), since we want
to imitate the existing incr
command, and it must be parsed somewhere.
That led me to a part of the process_command
function that looks like this:
} else if ((ntokens == 4 || ntokens == 5) && (strcmp(tokens[COMMAND_TOKEN].value, "incr") == 0)) {
process_arithmetic_command(c, tokens, ntokens, 1);
} else if ((ntokens == 4 || ntokens == 5) && (strcmp(tokens[COMMAND_TOKEN].value, "decr") == 0)) {
process_arithmetic_command(c, tokens, ntokens, 0);
The argument with value 0
or 1
corresponds to the parameter bool incr
.
I changed that to int opcode
and changed these callers to
} else if ((ntokens == 4 || ntokens == 5) && (strcmp(tokens[COMMAND_TOKEN].value, "incr") == 0)) {
process_arithmetic_command(c, tokens, ntokens, 1);
} else if ((ntokens == 4 || ntokens == 5) && (strcmp(tokens[COMMAND_TOKEN].value, "decr") == 0)) {
process_arithmetic_command(c, tokens, ntokens, 0);
} else if ((ntokens == 4 || ntokens == 5) && (strcmp(tokens[COMMAND_TOKEN].value, "mult") == 0)) {
process_arithmetic_command(c, tokens, ntokens, 2);
(These magic numbers were a pretty bad design decision, but the quick decision keeps me moving forward. Ten minutes later, I’ll realize a better solution and revisit this code.)
I skim over the body of process_arithmetic_command
looking for references to
incrementing and decrementing. The error message
"CLIENT_ERROR cannot increment or decrement non-numeric value"
seems a little suboptimal, so I change that code to
if (opcode == 2) {
out_string(c, "CLIENT_ERROR cannot multiply non-numeric value");
} else {
out_string(c, "CLIENT_ERROR cannot increment or decrement non-numeric value");
}
And similarly just below:
+if (opcode == 2) {
+ c->thread->stats.mult_misses++;
+} else if (opcode == 1) {
-if (incr) {
c->thread->stats.incr_misses++;
} else {
c->thread->stats.decr_misses++;
}
Mental note that I’ll have to add a mult_misses
field to whatever c->thread->stats
is; but for now, press onward. If I forget, the compiler error will remind me.
-switch(add_delta(c, key, nkey, incr, delta, temp, NULL)) {
+switch(add_delta(c, key, nkey, opcode, delta, temp, NULL)) {
Grep downward for add_delta
.
enum delta_result_type do_add_delta(conn *c, const char *key, const size_t nkey,
- const bool incr, const int64_t delta,
+ const int opcode, const int64_t delta,
char *buf, uint64_t *cas,
const uint32_t hv) {
This signature violates my guidelines for const-correct code
in that it passes a lot of things “by const value,” but let’s not take the bait.
Replace bool
with int
and keep going.
Finally, we’ve found the place we were looking for — the place where we need to change +
to *
!
This codepath becomes:
+if (opcode == 2) {
+ value *= delta;
+ MEMCACHED_COMMAND_MULT(c->sfd, ITEM_key(it), it->nkey, value);
+} else if (opcode == 1) {
-if (incr) {
value += delta;
MEMCACHED_COMMAND_INCR(c->sfd, ITEM_key(it), it->nkey, value);
} else {
if(delta > value) {
value = 0;
} else {
value -= delta;
}
MEMCACHED_COMMAND_DECR(c->sfd, ITEM_key(it), it->nkey, value);
}
Mental note to implement MEMCACHED_COMMAND_MULT
, and press onward.
A little further down, note that slab_stats
needs a mult_hits
field.
We’ve reached the end of do_add_delta
. Wait, this is do_add_delta
… so
what’s add_delta
? Ah, it’s called from two places. And the first place sets
bool incr
to c->cmd == PROTOCOL_BINARY_CMD_INCREMENT
. Grepping for
PROTOCOL_BINARY_CMD_INCREMENT
reveals that there’s an enumeration of all
the commands in protocol_binary.h
! I should use that. Add
PROTOCOL_BINARY_CMD_MULTIPLY
to that enumeration, and refactor all of the
work I’ve done so far to use PROTOCOL_BINARY_CMD_{DECREMENT,INCREMENT,MULTIPLY}
instead of the magic numbers 0,1,2
. int opcode
can stay as an int
, since
grepping for the enumeration type’s name (protocol_binary_command
) reveals
that literally nothing in the codebase uses that type by name.
Implementing MEMCACHED_COMMAND_MULT
in trace.h
tells me that I also need a macro
named MEMCACHED_COMMAND_MULT_ENABLED
. Where’s that used? It’s not. Okay. Add it anyway.
(Chesterton’s Fence: If I don’t
know why these _ENABLED
macros exist, then I certainly shouldn’t try to do anything
novel with my new one. I’ll follow the herd.)
Finishing up the remaining compiler errors, I add a mult_hits
field to struct slab_stats
,
right next to incr_hits
and decr_hits
.
git grep incr_hits
shows lots of places it’s used; when I’m done,
git grep mult_hits
shows the same number of uses. The line
out->incr_hits += stats->slab_stats[sid].incr_hits;
is sneaky because I need to modify my copy of it in two places.
I also add a mult_misses
field to struct thread_stats
, and change
if (c->cmd == PROTOCOL_BINARY_CMD_INCREMENT) {
c->thread->stats.incr_misses++;
} else {
c->thread->stats.decr_misses++;
}
into
switch (c->cmd) {
case PROTOCOL_BINARY_CMD_INCREMENT: c->thread->stats.incr_misses++; break;
case PROTOCOL_BINARY_CMD_DECREMENT: c->thread->stats.decr_misses++; break;
case PROTOCOL_BINARY_CMD_MULTIPLY: c->thread->stats.mult_misses++; break;
}
We don’t technically need to change add_delta
itself from taking a
const int incr
to taking a const int opcode
, but I think it’s a good idea,
so I do it.
I reach the “code complete” milestone in 25 minutes. Let’s try it out!
set age 0 3600 2
37
STORED
mult age 10
27
Aw, crap.
I return to the place where the multiplication is supposed to be happening…
if (opcode == 2) {
value *= delta;
Ha! That should be using my new PROTOCOL_BINARY_CMD_MULTIPLY
. I fix that.
In fact, I grep for opcode ==
and fix a few more places I’d missed.
I reach the “code really complete” milestone in 32 minutes. This time,
the code really does seem to work. I run a few manual tests:
set age 0 3600 2
37
STORED
mult age 10
370
mult age 2
740
mult age -1
CLIENT_ERROR invalid numeric delta argument
set fullname 0 3600 10
John Smith
STORED
mult fullname 1
CLIENT_ERROR cannot multiply non-numeric value
mult
ERROR
mult bogus 1
NOT_FOUND
I check its behavior on integer wraparound, and I check the syntax
mult age 10 noreply
to make sure that’s also supported. Since I
implemented everything by copy-and-paste, there’s basically no way
these things won’t work just as well as they work for incr
and
decr
.
Hmm… with all this manual testing, I should probably write some actual
tests. Are there tests in the repo? Yes, under t/
. make test
builds
and runs them. So, I copy t/incrdecr.t
into t/mult.t
and modify it.
I reach the “code and tests complete” milestone in 50 minutes.
I imagine a candidate who didn’t mess with the tests would still pass the interview; priorities in an interview are different from priorities when making a pull request. Therefore this is a great place for even the most introverted candidate to raise their head and interact a little bit: “I think I’ve got something that works; do you want to take a look?”
I see there’s more tests in binary.t
. I guess I should take a look at them too,
even though I don’t feel like it. Yeah, yikes, there’s another copy of the command
enumeration in there; I should add CMD_MULTIPLY
to it, at least.
I should also add tests for the new stats, in stats.t
. (Actually, because
one of these tests simply counts the number of stats returned, and I’ve added
two more stats, that test would fail if I didn’t modify it.)
Around the 60-minute mark I hit the “code and tests complete” milestone for the second time.
But as I puzzle my way through t/udp.t
, I find a lot of tests devoted
to the “binary protocol” (as opposed to the plain-text protocol we talked about
in the problem statement). Should I modify the binary protocol as well as the
plain-text one? Actually, I already have, thanks to this mindless diff in the
function dispatch_bin_command
.
case PROTOCOL_BINARY_CMD_INCREMENT:
case PROTOCOL_BINARY_CMD_DECREMENT:
+ case PROTOCOL_BINARY_CMD_MULTIPLY:
if (keylen > 0 && extlen == 20 && bodylen == (keylen + extlen)) {
bin_read_key(c, bin_reading_incr_header, 20);
} else {
protocol_error = 1;
}
break;
But higher up, I see “quiet” versions of the same opcodes:
case PROTOCOL_BINARY_CMD_INCREMENTQ:
c->cmd = PROTOCOL_BINARY_CMD_INCREMENT;
break;
case PROTOCOL_BINARY_CMD_DECREMENTQ:
c->cmd = PROTOCOL_BINARY_CMD_DECREMENT;
break;
I’m not sure what those are for, but in order to do my copy-paste trick in
t/udp.t
, I’ll have to add one of these for mult
. (Chesterton’s Fence again:
I don’t know why these “quiet” opcodes are important, but if incr
and decr
have them, then mult
should have one too.) So I add PROTOCOL_BINARY_CMD_MULTIPLYQ
and propagate that change through the codebase.
At this point I’m just repeatedly running make test
and banging my head
against the idiosyncrasies of the test code (which is all written in Perl
and full of five- and six-parameter functions). I belatedly realize that
some of the test files are failing simply because they start with a cryptic
indication that says “I plan
to run exactly 95 test cases,” but I’ve added extra tests, and this makes
the plan fail.
Around 90 minutes, I call it a day. Some of the binary-protocol Perl tests
are still failing; but I’m confident that that’s a problem with my tests,
not with the server code itself. Here’s the secret downside to “just copy
and paste and change the names”: The Perl tests for incr
are basically
of the form
initialize num to zero
check that "incr num 1" returns 1
check that "incr num 1" returns 2
check that "incr num 1" returns 3
(obfuscated through a number of layers of indirection), so when I do the obvious thing for multiplication, it comes out like
initialize num to zero
check that "mult num 1" returns 0
check that "mult num 1" returns 0
check that "mult num 1" returns 0
That’s a pretty bad test. Now, if making good tests was the point of this programming challenge, I’d spend the extra hour on it. Or if I were doing this to get a job, instead of for a blog post, I’d spend the extra hour on it. But for this blog post? I call it a day.
Conclusion
I like this programming challenge because it’s a microcosm of what most real-world programming is like. When you’re maintaining a large codebase, there are always going to be codepaths you don’t fully understand, idioms that feel unnecessary, and masses of code that can feel hard to get a foothold in.
This challenge is particularly well calibrated for an interview because
there is only one correct answer: “change bool incr
to int opcode
”
(or anything isomorphic to that). The codebase and problem statement
together very clearly imply that there are currently two arithmetic
opcodes, and your job is to extend that to three arithmetic opcodes.
Imagine how much more open-ended the problem would be if
memcached didn’t have a decr
command!
Suppose process_arithmetic_command
had been named
process_increment_command
, and add_delta
didn’t take bool incr
as a parameter, and so on. Then the candidate would have to make
a bigger creative decision: add that parameter (in which position?),
or clone an entire codepath (starting at what level?). Cloning even
one of these functions is probably suboptimal, but I might spend
twenty minutes before realizing that.
The problem as presented is well crafted to steer qualified candidates right onto the happy path, while weeding out a whole category of unqualified candidates. Basically, this question is to software engineers as FizzBuzz is to programmers. And it’s fun, too!
So there you have it: the best engineering interview question I’ve ever gotten.