c_str
-correctness
c_str
-correctnessI’ve already blogged about one difference
between std::string
and C++17’s
std::string_view
:
string
is owning and string_view
is non-owning (being what I call a “parameter-only type”).
Today I’d like to talk about a second difference: string
is null-terminated and string_view
is not.
const char buffer[] = "abcdefghij";
auto sv = std::string_view(buffer+2, buffer+7);
assert(sv == "cdefg");
By design, string_view
is not a null-terminated type. There’s no null byte after that 'g'
.
Now, C library functions like strcpy
, fopen
, atoi
, and strtol
all expect null-terminated C strings;
therefore string_view
doesn’t play well with these C library functions.
And some C++ library features are built on top of the C ones.
Since you can’t use string_view
with atoi
, you also can’t use it with std::stoi
.
Since you can’t use string_view
with fopen
, you also can’t construct a std::fstream
with one.
When a function transitively depends on null-termination, you actually don’t want
to refactor its interface from const string&
to string_view
, because eventually
you’re just going to have to copy the string back into a null-terminated buffer in order to
make use of it.
// ACTUALLY FINE
int parseInt(const std::string& digits) {
return std::stoi(digits);
}
// STRICTLY WORSE
int parseInt(std::string_view digits) {
auto ntdigits = std::string(digits);
return std::stoi(ntdigits);
}
.data()
versus .c_str()
std::string
has two member functions that do basically the same thing: s.data()
and s.c_str()
.
The difference between s.data()
and s.c_str()
is that s.c_str()
connotes null-termination.
FILE *fp = fopen(s.c_str(), "w");
// fopen requires a null-terminated string
fwrite(s.data(), 1, s.size(), fp);
// fwrite takes a buffer and a length
std::string_view
, which is not null-terminated, deliberately provides .data()
but does not provide .c_str()
.
Even if you don’t use C++17 string_view
in your codebase yet, you can prepare for C++17-ification
by following good c_str/data
hygiene. Here’s an example of bad hygiene:
void one(const std::string& fname) {
FILE *fp = fopen(fname.data(), "w"); // BAD!
}
The above function uses .data()
in a context that requires null-termination. It’s dangerous
because a careless maintainer might “C++17-ify” the code by changing const std::string&
to
std::string_view
—
void one(std::string_view fname) {
FILE *fp = fopen(fname.data(), "w"); // BUGGY!
}
— and boom, now you have a bug! If the original, pre-C++17 author had used fname.c_str()
instead
of fname.data()
, then not only would the code have been more self-documenting, but the buggy
C++17-ification would have failed to compile, because string_view
does not provide .c_str()
.
Here’s another example of bad hygiene:
void two(int fd, const std::string& packet) {
write(fd, packet.c_str(), packet.length()); // BAD!
}
This hygiene is bad because the gratuitous use of .c_str()
for a buffer that does not need to be
null-terminated needlessly thwarts the maintainer’s attempt at C++17-ification:
void two(int fd, std::string_view packet) {
write(fd, packet.c_str(), packet.length()); // ERROR: no .c_str()
}
The original author should have used .data()
instead, to indicate that null-termination was
not required for correctness. (Plus, consistently using size
instead of length
makes it
easier to drop in std::vector<char>
or C++20’s std::span<const char>
.)
void two(int fd, const std::string& packet) {
write(fd, packet.data(), packet.size()); // GOOD!
}
void two(int fd, std::string_view packet) {
write(fd, packet.data(), packet.size()); // EQUALLY GOOD!
}
So there you go. Use .c_str()
when you mean to rely on a string
’s null-termination, and use
.data() + .size()
when you don’t. Following this rule helps to bring your code’s hidden assumptions
to light, and makes it easier to introduce std::string_view
later.