Return Value Optimization (RVO)
The problem and first thoughts
We were given the following C++ code and we were asked to find the output of the program.
#include <iostream>
class Friday {
private:
int x;
public:
explicit Friday(int x) : x(x) {}
void set_x(int x) { this->x = x; }
Friday(const Friday &other) { std::cout << "copy" << std::endl; }
};
Friday make_friday(int x) {
Friday f(x);
f.set_x(2 * x);
return f;
}
Friday which_friday(bool choice) {
Friday f1(22);
Friday f2(130);
if (choice) {
return f1;
}
return f2;
}
int main() {
Friday main_f1 = make_friday(0);
Friday main_f2 = which_friday(true);
return 0;
}The class Friday has a copy constructor which does nothing except having a side-effect of printing the string "copy". A good thing to observe here will be that there is no copy assignment but this still works because this is an initialization of a class and hence defining a copy constructor is enough.
At a quick glance, we can see the possible locations where a copy constructor might be needed:
-
return fcopying to temporary inmake_fridayfunction - copying from temporary to
main_f1 - same two scenarios in
which_fridayfunction as well
Hence, there should be 4 copies. This is exactly what happens in C++11 if you turn off RVO using the flag -fno-elide-constructors. But in this case, since we did not turn it off, the answer is not 4.
For C++17 onwards, there is no copying from temporary to stack variable of main (details later in this article), hence we would expect a total of 2 copies which is also the case if you use the flag -fno-elide-constructors.
Critical Optimization – RVO
Return Value Optimization is an optimization which has in time become a compulsory optimization of C++ language, something which almost all compilers abide by. This is also one of the few optimizations that takes place even if we switch off all (most) optimizations using -O0 flag in compilers like g++ and clang.
RVO is an optimization that constructs the object returned from a function not in the callee’s stack but in the caller’s stack. For those of you who are unfamiliar about caller/callee terminology, you can understand caller as the scope which calls the function and callee as the scope of the function called. With each function call, a new stack is created which stores any local object created.
The copy constructors are used when we need to copy an object from callee’s stack to that of caller’s but because of RVO, the space is allocated in caller’s stack, the pointer to that is passed to the function and the callee merely constructs the object not in its but caller’s stack. A low level view of this optimization is given later in the article.
For the problem above, RVO works for the function make_friday but not for which_friday because the return value depends on the runtime which doesn’t allow the optimization to happen at compile time.
Hence, the correct answer is 1 for both C++17 and C++11 (note that with RVO not deactivated, the optimization concerining generation of a temporary also automatically happens)
Low Level analysis
For the sake of simplification, we will simplify our problem and then inspect its assembly (in x86). I will be using a reverse engineering tool called Rizin to analyse binaries. Simplified code :
#include <iostream>
class Friday {
// ... Same as before ...
};
Friday make_friday(int x) {
Friday f(x);
return f;
}
int main() {
Friday main_f1 = make_friday(0);
return 0;
}Expected output in this case :
- RVO not deactivated : 0 “copy” in output
- RVO deactivated in C++17 : 1 “copy” in output
- RVO deactivated in C++11 : 2 “copy” in output
RVO not deactivated in C++11/17
┌ int main(int argc, char **argv, char **envp);
│ ; var int64_t var_ch @ stack - 0xc
│ 0x00400724 push rbp
│ 0x00400725 mov rbp, rsp
│ 0x00400728 sub rsp, 0x10
│ 0x0040072c lea rax, qword [var_ch] // caller's stack addr
│ 0x00400730 mov esi, 0x00
│ 0x00400735 mov rdi, rax // passed as arg1
│ 0x00400738 call sym.make_friday_int
│ 0x0040073d mov eax, 0x00
│ 0x00400742 leave
└ 0x00400743 ret
┌ sym.make_friday_int(int64_t arg1, int64_t arg2);
│ ; arg int64_t arg1 @ rdi
│ ; arg int64_t arg2 @ rsi
│ ; var int64_t var_14h @ stack - 0x14
│ ; var int64_t var_10h @ stack - 0x10
│ 0x004006fd push rbp
│ 0x004006fe mov rbp, rsp
│ 0x00400701 sub rsp, 0x10
│ 0x00400705 mov qword [var_10h], rdi // addr stored in callee's stack
│ 0x00400709 mov dword [var_14h], esi
│ 0x0040070c mov edx, dword [var_14h]
│ 0x0040070f mov rax, qword [var_10h] // addr accessed
│ 0x00400713 mov esi, edx
│ 0x00400715 mov rdi, rax // and passed as arg1 to ctor
│ 0x00400718 call method.Friday.Friday_int
│ 0x0040071d nop
│ 0x0040071e mov rax, qword [var_10h]
│ 0x00400722 leave
└ 0x00400723 retHere, the constructor builds the object in the stack of caller, preventing call to copy constructor.
RVO deactivated in C++17
┌ int main(int argc, char **argv, char **envp);
│ ; var int64_t var_ch @ stack - 0xc
│ 0x004008d6 push rbp
│ 0x004008d7 mov rbp, rsp
│ 0x004008da sub rsp, 0x10
│ 0x004008de lea rax, qword [var_ch] // caller's stack addr
│ 0x004008e2 mov esi, 0x00
│ 0x004008e7 mov rdi, rax // passed as arg1
│ 0x004008ea call sym.make_friday_int
│ 0x004008ef mov eax, 0x00
│ 0x004008f4 leave
└ 0x004008f5 ret
┌ sym.make_friday_int(int64_t arg1, int64_t arg2);
│ ; arg int64_t arg1 @ rdi
│ ; arg int64_t arg2 @ rsi
│ ; var int64_t var_24h @ stack - 0x24
│ ; var int64_t var_20h @ stack - 0x20
│ ; var int64_t var_ch @ stack - 0xc
│ 0x0040089d push rbp
│ 0x0040089e mov rbp, rsp
│ 0x004008a1 sub rsp, 0x20
│ 0x004008a5 mov qword [var_20h], rdi // caller's stack addr stored in callee's stack
│ 0x004008a9 mov dword [var_24h], esi
│ 0x004008ac mov edx, dword [var_24h]
│ 0x004008af lea rax, qword [var_ch] // callee's stack addr
│ 0x004008b3 mov esi, edx
│ 0x004008b5 mov rdi, rax // passed as arg1 to ctor
│ 0x004008b8 call method.Friday.Friday_int
│ 0x004008bd lea rdx, qword [var_ch]
│ 0x004008c1 mov rax, qword [var_20h]
│ 0x004008c5 mov rsi, rdx
│ 0x004008c8 mov rdi, rax // copying to caller's stack instead of temp
│ 0x004008cb call method.Friday.Friday_Friday_const
│ 0x004008d0 mov rax, qword [var_20h]
│ 0x004008d4 leave
└ 0x004008d5 retIn this case, the caller’s stack address is still passed as an argument but it is instead used to prevent generating temporary. Hence, in sym.make_friday_int, there is a call to copy ctor method.Friday.Friday_Friday_const
RVO deactivated in C++11
┌ int main(int argc, char **argv, char **envp);
│ ; var int64_t var_10h @ stack - 0x10
│ ; var int64_t var_ch @ stack - 0xc
│ 0x004008d6 push rbp
│ 0x004008d7 mov rbp, rsp
│ 0x004008da sub rsp, 0x10
│ 0x004008de lea rax, qword [var_ch]
│ 0x004008e2 mov esi, 0x00
│ 0x004008e7 mov rdi, rax
│ 0x004008ea call sym.make_friday_int
│ 0x004008ef lea rdx, qword [var_ch] // var_ch contains the temp, which is also in caller's stack
│ 0x004008f3 lea rax, qword [var_10h] // var_10h is the preferred location of object
│ 0x004008f7 mov rsi, rdx
│ 0x004008fa mov rdi, rax
│ 0x004008fd call method.Friday.Friday_Friday_const
│ 0x00400902 mov eax, 0x00
│ 0x00400907 leave
└ 0x00400908 ret
┌ sym.make_friday_int(int64_t arg1, int64_t arg2); // Same as C++17
│ ; arg int64_t arg1 @ rdi
│ ; arg int64_t arg2 @ rsi
│ ; var int64_t var_24h @ stack - 0x24
│ ; var int64_t var_20h @ stack - 0x20
│ ; var int64_t var_ch @ stack - 0xc
│ 0x0040089d push rbp
│ 0x0040089e mov rbp, rsp
│ 0x004008a1 sub rsp, 0x20
│ 0x004008a5 mov qword [var_20h], rdi
│ 0x004008a9 mov dword [var_24h], esi
│ 0x004008ac mov edx, dword [var_24h]
│ 0x004008af lea rax, qword [var_ch]
│ 0x004008b3 mov esi, edx
│ 0x004008b5 mov rdi, rax
│ 0x004008b8 call method.Friday.Friday_int
│ 0x004008bd lea rdx, qword [var_ch]
│ 0x004008c1 mov rax, qword [var_20h]
│ 0x004008c5 mov rsi, rdx
│ 0x004008c8 mov rdi, rax
│ 0x004008cb call method.Friday.Friday_Friday_const
│ 0x004008d0 mov rax, qword [var_20h]
│ 0x004008d4 leave
└ 0x004008d5 retUnlike C++17, main in C++11 has two stack objects, one of which is used as temporary. In C++17, this itself is used for further uses but in C++11, this temporary is copied to another stack object which is used further. Hence there will be two calls to copy constructor here, one in sym.make_friday_int which copies object from callee’s stack to temporary (in caller’s stack) and one in main which copies the temporary to another variable (both in caller’s stack).
C++ versions and RVO
Look at the code snippet below. Here, we have explicitly removed the copy constructor, preventing any copy operations.
#include <iostream>
class C {
private:
int x;
public:
explicit C(int x) : x(x) {}
void set_x(int x) { this->x = x; }
C(const C &other) = delete;
};
C make(int x) { return C(x); }
int main() {
C obj = make(0);
}This code compiles in C++17 because we do not need copy constructor anywhere. However, this code does not compile in C++11 despite not requiring copy constructor anywhere. The reason is mostly because C++17 mandates RVO while C++11 does not.
Author – tushar3q34
