Precedence of a proposed |>
operator
|>
operatorColby Pike and Barry Revzin’s P2011R0 “A pipeline-rewrite operator” (January 2020) proposes for C++ a “pizza operator” similar to the one proposed for JavaScript. This came out of a blog post by Colby Pike: “Eliminating the Static Overhead of Ranges” (October 2019).
The idea of the “pizza operator” |>
is that it provides syntactic sugar, sort of related to
UFCS.
Suppose you want to write a chain of method calls like
// Ideal-for-this-purpose class-based API
auto x = a.convolve(gauss5x5).clamp(0, 1).sharpen();
auto y = b.each_as<fs::path>().transform(get_basename)
.to<std::vector<std::string>>();
except that your API doesn’t do methods, it does free functions; so you are currently forced to write these two lines as more like
// Unfortunate free-function-based API
auto x = sharpen(clamp(convolve(a, gauss5x5), 0, 1));
auto y = to<std::vector<std::string>>(transform(each_as<fs::path>(b), get_basename));
In a hypothetical future C++ with a pizza operator, you’d simply write
// Still using the free-function-based API!
auto x = a |> convolve(gauss5x5) |> clamp(0, 1) |> sharpen();
auto y = b |> each_as<fs::path>() |> transform(get_basename)
|> to<std::vector<std::string>>();
The proposed |>
isn’t an operator like operator|
; it doesn’t follow the traditional model of
“evaluate LHS, evaluate RHS, combine the resulting values.” Instead, it does “evaluate LHS, parse
RHS, insert evaluated LHS into RHS’s parse tree, now evaluate RHS.” x |> y(z)
is syntactic sugar
for y(x, z)
.
The Ranges library overloads operator|
, and provides lots of machinery including multiple overloaded
call operators in all of its range adaptors, basically to simulate what’s happening above. If C++ were
to provide |>
out of the box, a huge swath of Ranges machinery could go away (except for backward
compatibility).
No placeholders in C++
In the JavaScript proposal, you’re allowed to write things like
let x = a |> (# - 1);
let y = a |> (1 - #);
where the former means the same as (a - 1)
and the latter means the same as (1 - a)
.
The scripting language Hack does the same thing,
using $$
as the placeholder.
P2011 doesn’t propose any kind of placeholder. It says, when you pipeline x
into f(1,2,3)
, x
always gets
inserted as the first argument, because that’s the most common idiom in C++. If you want it to get inserted
as the second argument, you’ll have to do something like
x |> (std::bind(f,_2,_1,_3,_4))(1,2,3)
Why those extra parentheses? Read on.
P2011R0’s precedence for |>
leads to weirdness
The current proposal, P2011R0, proposes that |>
should be a new postfix operator on the same level
as .
and ->
. This makes sense if you’re just looking at the example above: In the member-function
API we write x.y(z)
, in the free-function API we write x|>y(z)
, so |>
replaces .
one-for-one,
so it should have the same precedence.
However, P2011R0’s rule leads to surprising behavior. For example, the .
operator binds tighter
than the unary -
operator, so if we bind |>
as tightly as .
, then -x|>f()
means -f(x)
:
assert(-1 |> std::abs() == -1); // wat?
Or again,
++i |> v.at(); // means the same as ++v.at(i)
The essential issue here is that P2011R0’s chosen precedence didn’t match the visual symbolism of |>
.
The |>
operator looks just like the |
operator; it’s used with pipelines in the same way; and
|
has one of the lowest precedences in C++. In fact we’ve been subtly indicating “Low precedence, low precedence!”
via our use of whitespace. You’ve seen C++ code that indicates precedence by whitespace, even if
you didn’t consciously realize it.
if (x+1 == y) // + binds tighter than ==
y->z + 1 // -> binds tighter than +
So if |>
is to have a super high precedence, we should really be writing
- 1|>std::abs()
++ i|>v.at()
But anecdotal evidence (from P2011’s own sample code!) is that people don’t want to write x|>y
; the
association of |>
with |
pipelines is just too strong. We should go with the flow and lower the
precedence of |>
until it matches our visual intuition.
P2011R1 is coming, and it seems likely that this time it will propose a suitably low precedence.
My proposal
For the record, here’s how I would do it. (This is not a description of P2011R0 anymore!)
Step one, the precedence of |>
should be exactly the same as the precedence of |
, and of course
it should be left-associative.
Step two, consider an operator-precedence parser — which admittedly is not how most compilers work in practice, but I’ll blithely assume that they can figure out how to translate this algorithm into their own terms.
If you are currently parsing the right-hand side of x |>
— that is, if there’s a |>
on the top
of the operator stack — and you have just finished parsing
a function-call expression f(y)
, then do not do overload resolution on f(y)
yet. First, look at the next
operator in the expression. If it is something with tighter (higher) precedence than |>
, then
do overload resolution on f(y)
, and proceed. But if
it’s something with lower precedence (or another |>
or the end of the expression),
then completely resolve the x |> f(y)
to your left before proceeding
(which means doing overload resolution on f(x,y)
, not on f(y)
).
Finally, if you are currently parsing the right-hand side of x |>
, and you get to an operator
with lower precedence than |>
(or another |>
or the end of the expression), that’s an error!
You should have been able to resolve that first |>
operator by now. If it’s still on the stack,
it means you’ve got some bogus expression like x |> y
where there’s no function call at all.
That’s a syntax error.
Now, there are some gray areas.
-
If
A
is a class type, should we treatx |> A(y)
the same asx |> f(y)
? (I think yes. And this also applies in the CTAD case whereA
is a template-id.) -
What about
x |> A{y}
? (I think no.) -
What about parenthesized call-expressions like
x |> (f(y))
? (My algorithm says no, which I think is fine.) -
What about similar-looking postfix-expressions like
x |> typeid()
orx |> static_cast<int>()
? (I think no, and nobody should ever write these seriously.) -
What about similar-looking unary-expressions like
x |> sizeof()
? (I think no, and nobody should ever write this seriously.)
The table
Here is a table of expressions extracted from the paper P2011R0 and from a long reflector thread
on the subject. For each expression, I provide two interpretations: the P2011R0 interpretation
(where |>
has the precedence of .
and must be followed by a primary-expression)
and the interpretation implied by the proposal above (where |>
has the precedence of |
).
In many realistic cases, the two proposals give the same interpretation, so some of these rows have the
same thing in both columns. At the same time, bear in mind that this table is heavily weighted toward
the corner cases. Personally I would hope that any compiler would diagnose x + y |> f()
and suggest
parentheses — either (x + y) |> f()
or x + (y |> f())
— the same way
GCC (but sadly only GCC) diagnoses x + y | f()
today.
Expression | P2011R0 | This post |
---|---|---|
r |> filter(f) |> transform(g) |
transform(filter(r,f),g) |
transform(filter(r,f),g) |
x |> f() |> g<0>(0) |
g<0>(f(x), 0) |
g<0>(f(x), 0) |
x |> A::staticmethod() |
A::staticmethod(x) |
A::staticmethod(x) |
x |> A{}.staticmethod() |
Syntax error | A{}.staticmethod(x) |
x |> A().staticmethod() |
Syntax error | A().staticmethod(x) |
x + y |> f() |
x + f(y) |
f(x + y) |
x |> c.f() |
Syntax error | c.f(x) |
x |> (f()).g() |
Syntax error | f().g(x) |
x |> f().g() |
f(x).g() |
f().g(x) |
x |> always(y)(z) |
always(x,y)(z) |
always(y)(x,z) |
x |> always(y)() |> split() |
split(always(x,y)()) |
split(always(y)(x)) |
x |> get()++ |
get(x)++ |
Syntax error |
x |> ++get() |
Syntax error | Syntax error |
++x |> get() |
++get(x) |
get(++x) |
x |> (y |> z()) |
Syntax error | Syntax error |
x |> f().g<0>(0) |
f(x).g<0>(0) |
f().g<0>(x,0) |
-3 |> std::abs() |
-std::abs(3) |
std::abs(-3) |
co_await x |> via(e) |
co_await via(x, e) |
via(co_await x, e) |
co_yield x |> via(e) |
co_yield via(x, e) |
co_yield via(x, e) |
throw x |> via(e) |
throw via(x, e) |
throw via(x, e) |
return x |> via(e) |
return via(x, e) |
return via(x, e) |
s |> rev() |> find_if(a).base() |
find_if(rev(s), a).base() |
find_if(a).base(rev(s)) |
x |> (f()) |
Syntax error | Syntax error |
ctr |> size() == max() |
size(ctr) == max() |
Syntax error |
(ctr |> size()) == max() |
size(ctr) == max() |
size(ctr) == max() |
x |> f() + g() |
f(x) + g() |
Syntax error |
x |> f() + 3 |
f(x) + 3 |
Syntax error |
(x |> f()) + 3 |
f(x) + 3 |
f(x) + 3 |
x |> get()[i] |
get(x)[i] |
Syntax error |
x |> v[i]() |
Syntax error | v[i](x) |
x |> v[i]()() |
Syntax error | v[i]()(x) |
(x |> v[i]())() |
Syntax error | v[i](x)() |
x |> (v[i])()() |
v[i](x)() |
v[i]()(x) |
"hi"sv |> ra::count('o') == 0 |
ra::count("hi"sv, 'o') == 0 |
Syntax error |
("hi"sv |> ra::count('o')) == 0 |
ra::count("hi"sv, 'o') == 0 |
ra::count("hi"sv, 'o') == 0 |
v |> filter(2) |> size() == 1 |
size(filter(v, 2)) == 0 |
Syntax error |
(v |> filter(2) |> size()) == 1 |
size(filter(v, 2)) == 0 |
size(filter(v, 2)) == 0 |
x |> y.operator+() |
Syntax error | y.operator+(x) |
x |> +y |
Syntax error | Syntax error |
a |> b() - c |> d() |
b(a) - d(c) |
Syntax error |
a |> b() | c() |> d() |
b(a) | d(c()) |
d(b(a) | c()) |
r |> filter(f) | transform(g) |
filter(r,f) | transform(g) |
filter(r,f) | transform(g) |
r | filter(f) |> transform(g) |
r | transform(filter(f), g) |
transform(r | filter(f), g) |
x + y |> f() + g() |> a.h() |
Syntax error | Syntax error |
x |> getA().member() |
getA(x).member() |
getA().member(x) |
x |> getA().*getMemptr() |
getA(x).*getMemptr() |
Syntax error |
c ? left : right |> split('/') |
c ? left : split(right, '/') |
c ? left : split(right, '/') |
(c ? left : right) |> split('/') |
split(c ? left : right) |
split(c ? left : right) |
c ? left |> split('/') : right |
c ? split('/', left) : right |
c ? split('/', left) : right |
x |> f() |> std::make_pair(y) |
std::make_pair(f(x), y) |
std::make_pair(f(x), y) |
x |> f() |> std::pair(y) |
Syntax error | std::pair(x, y) in the gray area |
x |> f() |> std::pair<X,Y>(y) |
Syntax error | std::pair<X,Y>(x, y) in the gray area |
x |> new T() |
Syntax error | Syntax error |
x |> [](int x, int y){ return x+y; }(1) |
[](int x, int y){ return x+y; }(x,1) |
[](int x, int y){ return x+y; }(x,1) |
x |> f() |> std::plus{}(1) |
Syntax error | std::plus{}(f(x),1) |
Since that table is a lot to read, I’ll pull out the four most important differences here:
Expression | P2011R0 | This post |
---|---|---|
co_await f() |> via(e) |
co_await via(f,e) |
via(co_await f(), e) |
v |> filter(2) |> size() == 1 |
size(filter(v, 2)) == 0 |
Syntax error |
r | filter(f) |> transform(g) |
r | transform(filter(f), g) |
transform(r | filter(f), g) |
-3 |> std::abs() |
-std::abs(3) |
std::abs(-3) |
The reason co_await
has this difference, but not any of the other “word-like” unary operations (co_yield
,
co_return
, return
, throw
) is a “bug” in the grammar of C++20. Notice that co_yield(1) + 2
means co_yield 3
,
but co_await(1) + 2
means (co_await 1) + 2
. Lewis Baker tells me he’s opening an issue for something along
these lines, although his main focus is on how !co_yield(x)
shouldn’t be a syntax error.
Finally, Tony Van Eerd provides food for thought: What if some future version of C++ were to provide
a placeholder, such as #
? If it did, then we’d have
Expression | P2011R0 (hypothetically) | This post (hypothetically) |
---|---|---|
x |> f(#) |> # + 3 |> g(#) |
f(x) + g(3) |
g(f(x) + 3) |
r = ints |> # + 3 |> # * 2 |> vec(#) |
ints + 3 * vec(2) |
vec((ints + 3) * 2) |