std::string_view
is a parameter-only type
std::string_view
is a parameter-only typeToday I’d like to talk about std::string_view
.
string_view
arrived in C++17, amid quite a bit of confusion about what exactly it’s for and how to
use it safely.
The basic idea of string_view
is that it lets you hit a sweet middle ground between C++03-style concreteness:
class Widget {
std::string name_;
public:
void setName(const char *new_name);
void setName(const std::string& new_name);
};
and full-on generic programming, which in a pre-Concepts, C++17 world essentially means choosing between inappropriately underconstrained templates:
class Widget {
std::string name_;
public:
template<class T>
void setName(T&& new_name);
// defined elsewhere in this header
};
and properly constrained but comically verbose templates:
class Widget {
std::string name_;
public:
template<class T, class = decltype(std::declval<std::string&>() = std::declval<T&&>())>>
void setName(T&& new_name);
// defined elsewhere in this header
};
With std::string_view
, we can write simply this:
class Widget {
std::string name_;
public:
void setName(std::string_view new_name);
};
I believe that string_view
succeeds admirably in the goal of “drop-in replacement for const string&
parameters.” The problem I have observed, at CppCon 2017 and at Albuquerque 2017, is that some people
(you know who you are) persist in trying to use string_view
as a drop-in replacement for
const string&
in all circumstances! And then they get bitten:
const std::string& s1 = "hello world"; // OK, lifetime-extended
const std::string& s2 = std::string("hello world"); // OK, lifetime-extended
std::string_view sv1 = "hello world"; // OK, points to static array
std::string_view sv2 = std::string("hello world"); // BUG! Dangling pointer!
Or again:
auto identity(std::string_view sv) { return sv; }
int main() {
std::string s = "hello";
auto sv1 = identity(s); // OK
auto sv2 = identity(s + " world"); // BUG! Dangling pointer!
}
This example is more subtle, because the sin (using string_view
as the return type of
the identity
function, and then later using string_view
as the type of local variables
sv1
and sv2
) is hidden behind the auto
keyword. auto
may hide your sins, but
it does not expiate them! Using string_view
as anything other than a parameter type
is always wrong.
This suggests to me that we have a relatively new entrant into the field of “well-understood kinds of types” in C++. The two relatively old kinds of types are object types and value types. The new kid on the block is the parameter-only type.
-
Object types are generally immobile (deleted copy constructor, for example) and lack an assignment operator; they are identified by memory address; they rely on mutation; they may be classically polymorphic.
-
Value types are identified by “value.” They have strong ownership semantics. You can work with them in a functional, mutation-free style; they lend themselves well to being moved around in memory, stored in
tuple
s, returned by value from functions, and so on. -
Parameter-only types are essentially “borrowed” references to existing objects. They lack ownership; they are short-lived; they generally can do without an assignment operator. They generally appear only in function parameter lists; because they lack ownership semantics, they generally cannot be stored in data structures or returned safely from functions.
UPDATE, 2023-07-19: The original of this post used the term “borrow type,” borrowed from Rust; but soon I decided that “parameter-only type” was a much better name, both because it’s more explanatory and because it avoids confusion with the Rust idea, which is after all fairly different. I added a note to that effect in May 2019, but it took me until July 2023 to actually re-edit this post.
C++20 Ranges introduced the term “borrowed range” for a view which can safely be “deboned” into iterators. Views are generally parameter-only types, even when they don’t satisfy the C++20 definition of a “borrowed range.”
string_view
is the first “mainstream” parameter-only type. But C++ has other
types that arguably match the criteria above:
-
std::reference_wrapper<T>
. We see this type used as a function parameter to the constructor ofstd::thread
and to functions likestd::make_pair
andstd::invoke
. We do not see the STL using it as anything other than a parameter type. -
std::tuple<Ts&...>
, the result type ofstd::forward_as_tuple(Ts...)
. This is an interesting one to me personally, because if I were designing the STL, I would have made this a completely separate class template, rather than recyclingstd::tuple
. Notice thatstd::tuple<Ts...>
is a value type according to our classification above; butstd::tuple<Ts&...>
is a parameter-only view type. To me, this change in semantics signals a mistake on par withvector<bool>
. -
T&&
, either in the sense of “rvalue reference” (whenT
is some known object type) or in the sense of “forwarding reference” (whenT
is deduced). We see this type used as a function parameter all over the place. We rarely see it in any other context, unless we are doing metaprogramming specifically on reference-type value categories. (I mean, the functionstd::move
itself does return an rvalue reference. But in normal everyday code, returning an rvalue reference is a sure sign that you’re doing something wrong.) -
And now,
std::string_view
.
Notice that because all of the above types were designed by Not Me, they all break some of the general rules for parameter-only types:
-
reference_wrapper<T>
is assignable:rw1 = rw2
. Assignment has shallow semantics, even though comparisonrw1 == rw2
has deep semantics. -
tuple<Ts&...>
is assignable:t1 = t2
. Assignment has deep semantics, as does comparison. -
T&&
is actually pretty great. It automatically has reference semantics for everything, because, well, it’s a reference. -
string_view
is assignable:sv1 = sv2
. Assignment has shallow semantics (of course — the viewed strings are immutable), even though comparisonsv1 == sv2
has deep semantics.
If I were designing reference_wrapper
and string_view
, I would have given them no assignment
operator at all; this would prevent any confusion about the meaning of assignment, and additionally
removed a perennial complaint: that these types provide both copy/assignment and comparison, but
in an inconsistent manner that makes them not Regular types.
I originally called these types borrow types (despite the confusion that would generate), because borrowing is the fundamental property that allows us to reason about their semantics. You should use a parameter-only type if and only if it is short-lived enough that it can safely (that is, unsafely) borrow ownership of another object and then go away again before the other object notices. In practice, this means it’s got to be either a function parameter or a ranged-for-loop control variable; and you can’t do anything with the value of the parameter that requires it to outlive the function or loop.
struct StringSet {
vector<string> elements_;
template<class F>
void for_each_element(const F& f) const {
for (string_view elt : elements_) {
f(elt);
}
}
};
In the above code sample, callback f
is “borrowed” only for the lifetime of for_each_element
;
therefore it is correct to take it as the parameter-only type const F&
instead of as the
(potentially more expensive) value type F
.
Likewise, elt
is “borrowed” only for the lifetime of the for-loop; therefore it is
correct to take it as string_view
instead of as the (potentially
more expensive) value type string
. Notice that we could equally well take it as
the parameter-only type auto&&
(and perhaps we should),
but I wanted to demonstrate a valid use of string_view
that proves it might be useful
outside a function parameter list.
I would hope that static analysis tools such as the C++ Core Guidelines would explicitly
call out this pattern of “borrowing,” but it appears that they have not caught up
just yet. The Core Guidelines discussion around string_view
in particular seems to
have gotten bogged down in trying to avoid diagnosing sinful code,
which I’d say is the exact opposite of the Core Guidelines’ original mandate.
Returning parameter-only types
One must be very careful when returning a parameter-only type from a getter method:
class Widget {
string name_;
public:
string_view getName() const { return name_; }
};
Widget w;
std::cout << w.getName() << std::endl;
It’s safe to pass the parameter-only object returned from w.getName()
as a parameter to
operator<<
; but it would be absolutely incorrect to capture it
into a local variable which is not a function parameter:
auto sv1 = w.getName(); // SIN! Parameter-only type used as non-parameter!
w.setName("hello world"); // "Borrowed" object's lifetime ends
std::cout << sv1 << std::endl; // BUG! Dangling pointer!
Therefore I think we should frown on returning parameter-only object types such as string_view
from functions (without a very good reason). They’re just too easy to misuse. Note that
native reference types like const string&
don’t have this particular problem, because
for them, auto
will make a copy. (But see
“On function_ref
and string_view
” (2019-05-10)
for an even subtler take on “making copies.”)
Simple rules for parameter-only types
The rule of thumb is simple and statically checkable:
- Parameter-only types must appear only as function parameters and for-loop control variables.
We can make an exception for return types:
-
A function may have a parameter-only type as its return type, but if so, the function must be explicitly annotated as returning a potentially dangling reference. That is, the programmer must explicitly acknowledge responsibility for the annotated function’s correctness.
-
Regardless, if
f
is such a function, the result off
must not be stored into any named variable except a function parameter or for-loop control variable. For example,auto x = f()
must still be diagnosed as a violation.
Follow these rules and you shouldn’t get into any trouble with std::string_view
.
It’s perfectly usable and will even simplify your code,
if you follow the rules for parameter-only types in C++.