Discussion:
[PATCH x86_64] Optimize access to globals in "-fpie -pie" builds with copy relocations
Sriraman Tallam
2014-05-15 18:34:12 UTC
Permalink
Optimize access to globals with -fpie, x86_64 only:

Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module
using the GOT. This is two instructions, one to get the address of the global
from the GOT and the other to get the value. If it turns out that the global
gets defined in the executable at link-time, it still needs to go through the
GOT as it is too late then to generate a direct access.

Examples:

foo.cc
------
int a_glob;
int main () {
return a_glob; // defined in this file
}

With -O2 -fpie -pie, the generated code directly accesses the global via
PC-relative insn:

5e0 <main>:
mov 0x165a(%rip),%eax # 1c40 <a_glob>

foo.cc
------

extern int a_glob;
int main () {
return a_glob; // defined in this file
}

With -O2 -fpie -pie, the generated code accesses global via GOT using two
memory loads:

6f0 <main>:
mov 0x1609(%rip),%rax # 1d00 <_DYNAMIC+0x230>
mov (%rax),%eax

This is true even if in the latter case the global was defined in the
executable through a different file.

Some experiments on google benchmarks shows that the extra memory loads affects
performance by 1% to 5%.


Solution - Copy Relocations:

When the linker supports copy relocations, GCC can always assume that the
global will be defined in the executable. For globals that are truly extern
(come from shared objects), the linker will create copy relocations and have
them defined in the executable. Result is that no global access needs to go
through the GOT and hence improves performance.

This patch to the gold linker :
https://sourceware.org/ml/binutils/2014-05/msg00092.html
submitted recently allows gold to generate copy relocations for -pie mode when
necessary.

I have added option -mld-pie-copyrelocs which when combined with -fpie would do
this. Note that the BFD linker does not support pie copyrelocs yet and this
option cannot be used there.

Please review.


ChangeLog:

* config/i386/i36.opt (mld-pie-copyrelocs): New option.
* config/i386/i386.c (legitimate_pic_address_disp_p): Check if this
address is still legitimate in the presence of copy relocations
and -fpie.
* testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test.
* testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test.



Patch attached.
Thanks
Sri
Sriraman Tallam
2014-05-19 18:11:18 UTC
Permalink
Ping.
Post by Sriraman Tallam
Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module
using the GOT. This is two instructions, one to get the address of the global
from the GOT and the other to get the value. If it turns out that the global
gets defined in the executable at link-time, it still needs to go through the
GOT as it is too late then to generate a direct access.
foo.cc
------
int a_glob;
int main () {
return a_glob; // defined in this file
}
With -O2 -fpie -pie, the generated code directly accesses the global via
mov 0x165a(%rip),%eax # 1c40 <a_glob>
foo.cc
------
extern int a_glob;
int main () {
return a_glob; // defined in this file
}
With -O2 -fpie -pie, the generated code accesses global via GOT using two
mov 0x1609(%rip),%rax # 1d00 <_DYNAMIC+0x230>
mov (%rax),%eax
This is true even if in the latter case the global was defined in the
executable through a different file.
Some experiments on google benchmarks shows that the extra memory loads affects
performance by 1% to 5%.
When the linker supports copy relocations, GCC can always assume that the
global will be defined in the executable. For globals that are truly extern
(come from shared objects), the linker will create copy relocations and have
them defined in the executable. Result is that no global access needs to go
through the GOT and hence improves performance.
https://sourceware.org/ml/binutils/2014-05/msg00092.html
submitted recently allows gold to generate copy relocations for -pie mode when
necessary.
I have added option -mld-pie-copyrelocs which when combined with -fpie would do
this. Note that the BFD linker does not support pie copyrelocs yet and this
option cannot be used there.
Please review.
* config/i386/i36.opt (mld-pie-copyrelocs): New option.
* config/i386/i386.c (legitimate_pic_address_disp_p): Check if this
address is still legitimate in the presence of copy relocations
and -fpie.
* testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test.
* testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test.
Patch attached.
Thanks
Sri
Sriraman Tallam
2014-06-09 22:55:09 UTC
Permalink
Ping.
Ping.
Post by Sriraman Tallam
Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module
using the GOT. This is two instructions, one to get the address of the global
from the GOT and the other to get the value. If it turns out that the global
gets defined in the executable at link-time, it still needs to go through the
GOT as it is too late then to generate a direct access.
foo.cc
------
int a_glob;
int main () {
return a_glob; // defined in this file
}
With -O2 -fpie -pie, the generated code directly accesses the global via
mov 0x165a(%rip),%eax # 1c40 <a_glob>
foo.cc
------
extern int a_glob;
int main () {
return a_glob; // defined in this file
}
With -O2 -fpie -pie, the generated code accesses global via GOT using two
mov 0x1609(%rip),%rax # 1d00 <_DYNAMIC+0x230>
mov (%rax),%eax
This is true even if in the latter case the global was defined in the
executable through a different file.
Some experiments on google benchmarks shows that the extra memory loads affects
performance by 1% to 5%.
When the linker supports copy relocations, GCC can always assume that the
global will be defined in the executable. For globals that are truly extern
(come from shared objects), the linker will create copy relocations and have
them defined in the executable. Result is that no global access needs to go
through the GOT and hence improves performance.
https://sourceware.org/ml/binutils/2014-05/msg00092.html
submitted recently allows gold to generate copy relocations for -pie mode when
necessary.
I have added option -mld-pie-copyrelocs which when combined with -fpie would do
this. Note that the BFD linker does not support pie copyrelocs yet and this
option cannot be used there.
Please review.
* config/i386/i36.opt (mld-pie-copyrelocs): New option.
* config/i386/i386.c (legitimate_pic_address_disp_p): Check if this
address is still legitimate in the presence of copy relocations
and -fpie.
* testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test.
* testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test.
Patch attached.
Thanks
Sri
Sriraman Tallam
2014-06-26 17:54:49 UTC
Permalink
Hi Uros,

Could you please review this patch?

Thanks
Sri
Patch Updated.
Sri
Ping.
Ping.
Post by Sriraman Tallam
Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module
using the GOT. This is two instructions, one to get the address of the global
from the GOT and the other to get the value. If it turns out that the global
gets defined in the executable at link-time, it still needs to go through the
GOT as it is too late then to generate a direct access.
foo.cc
------
int a_glob;
int main () {
return a_glob; // defined in this file
}
With -O2 -fpie -pie, the generated code directly accesses the global via
mov 0x165a(%rip),%eax # 1c40 <a_glob>
foo.cc
------
extern int a_glob;
int main () {
return a_glob; // defined in this file
}
With -O2 -fpie -pie, the generated code accesses global via GOT using two
mov 0x1609(%rip),%rax # 1d00 <_DYNAMIC+0x230>
mov (%rax),%eax
This is true even if in the latter case the global was defined in the
executable through a different file.
Some experiments on google benchmarks shows that the extra memory loads affects
performance by 1% to 5%.
When the linker supports copy relocations, GCC can always assume that the
global will be defined in the executable. For globals that are truly extern
(come from shared objects), the linker will create copy relocations and have
them defined in the executable. Result is that no global access needs to go
through the GOT and hence improves performance.
https://sourceware.org/ml/binutils/2014-05/msg00092.html
submitted recently allows gold to generate copy relocations for -pie mode when
necessary.
I have added option -mld-pie-copyrelocs which when combined with -fpie would do
this. Note that the BFD linker does not support pie copyrelocs yet and this
option cannot be used there.
Please review.
* config/i386/i36.opt (mld-pie-copyrelocs): New option.
* config/i386/i386.c (legitimate_pic_address_disp_p): Check if this
address is still legitimate in the presence of copy relocations
and -fpie.
* testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test.
* testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test.
Patch attached.
Thanks
Sri
Sriraman Tallam
2014-07-11 17:42:44 UTC
Permalink
Ping.
Post by Sriraman Tallam
Hi Uros,
Could you please review this patch?
Thanks
Sri
Patch Updated.
Sri
Ping.
Ping.
Post by Sriraman Tallam
Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module
using the GOT. This is two instructions, one to get the address of the global
from the GOT and the other to get the value. If it turns out that the global
gets defined in the executable at link-time, it still needs to go through the
GOT as it is too late then to generate a direct access.
foo.cc
------
int a_glob;
int main () {
return a_glob; // defined in this file
}
With -O2 -fpie -pie, the generated code directly accesses the global via
mov 0x165a(%rip),%eax # 1c40 <a_glob>
foo.cc
------
extern int a_glob;
int main () {
return a_glob; // defined in this file
}
With -O2 -fpie -pie, the generated code accesses global via GOT using two
mov 0x1609(%rip),%rax # 1d00 <_DYNAMIC+0x230>
mov (%rax),%eax
This is true even if in the latter case the global was defined in the
executable through a different file.
Some experiments on google benchmarks shows that the extra memory loads affects
performance by 1% to 5%.
When the linker supports copy relocations, GCC can always assume that the
global will be defined in the executable. For globals that are truly extern
(come from shared objects), the linker will create copy relocations and have
them defined in the executable. Result is that no global access needs to go
through the GOT and hence improves performance.
https://sourceware.org/ml/binutils/2014-05/msg00092.html
submitted recently allows gold to generate copy relocations for -pie mode when
necessary.
I have added option -mld-pie-copyrelocs which when combined with -fpie would do
this. Note that the BFD linker does not support pie copyrelocs yet and this
option cannot be used there.
Please review.
* config/i386/i36.opt (mld-pie-copyrelocs): New option.
* config/i386/i386.c (legitimate_pic_address_disp_p): Check if this
address is still legitimate in the presence of copy relocations
and -fpie.
* testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test.
* testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test.
Patch attached.
Thanks
Sri
Sriraman Tallam
2014-09-02 18:15:26 UTC
Permalink
Ping.
Ping.
Post by Sriraman Tallam
Hi Uros,
Could you please review this patch?
Thanks
Sri
Patch Updated.
Sri
Ping.
Ping.
Post by Sriraman Tallam
Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module
using the GOT. This is two instructions, one to get the address of the global
from the GOT and the other to get the value. If it turns out that the global
gets defined in the executable at link-time, it still needs to go through the
GOT as it is too late then to generate a direct access.
foo.cc
------
int a_glob;
int main () {
return a_glob; // defined in this file
}
With -O2 -fpie -pie, the generated code directly accesses the global via
mov 0x165a(%rip),%eax # 1c40 <a_glob>
foo.cc
------
extern int a_glob;
int main () {
return a_glob; // defined in this file
}
With -O2 -fpie -pie, the generated code accesses global via GOT using two
mov 0x1609(%rip),%rax # 1d00 <_DYNAMIC+0x230>
mov (%rax),%eax
This is true even if in the latter case the global was defined in the
executable through a different file.
Some experiments on google benchmarks shows that the extra memory loads affects
performance by 1% to 5%.
When the linker supports copy relocations, GCC can always assume that the
global will be defined in the executable. For globals that are truly extern
(come from shared objects), the linker will create copy relocations and have
them defined in the executable. Result is that no global access needs to go
through the GOT and hence improves performance.
https://sourceware.org/ml/binutils/2014-05/msg00092.html
submitted recently allows gold to generate copy relocations for -pie mode when
necessary.
I have added option -mld-pie-copyrelocs which when combined with -fpie would do
this. Note that the BFD linker does not support pie copyrelocs yet and this
option cannot be used there.
Please review.
* config/i386/i36.opt (mld-pie-copyrelocs): New option.
* config/i386/i386.c (legitimate_pic_address_disp_p): Check if this
address is still legitimate in the presence of copy relocations
and -fpie.
* testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test.
* testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test.
Patch attached.
Thanks
Sri
Richard Henderson
2014-09-02 20:40:50 UTC
Permalink
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c (revision 211826)
+++ config/i386/i386.c (working copy)
@@ -12691,7 +12691,9 @@ legitimate_pic_address_disp_p (rtx disp)
return true;
}
else if (!SYMBOL_REF_FAR_ADDR_P (op0)
- && SYMBOL_REF_LOCAL_P (op0)
+ && (SYMBOL_REF_LOCAL_P (op0)
+ || (TARGET_64BIT && ix86_copyrelocs && flag_pie
+ && !SYMBOL_REF_FUNCTION_P (op0)))
&& ix86_cmodel != CM_LARGE_PIC)
return true;
break;
This is the wrong place to patch.

You ought to be adjusting SYMBOL_REF_LOCAL_P, by providing a modified
TARGET_BINDS_LOCAL_P.

Note in particular that I believe that you are doing the wrong thing with weak
and COMMON symbols, in that you probably ought not force a copy reloc there.

Note the complexity of default_binds_local_p_1, and the fact that all you
really want to modify is

/* If PIC, then assume that any global name can be overridden by
symbols resolved from other modules. */
else if (shlib)
local_p = false;

near the bottom of that function.


r~
Bernhard Reutner-Fischer
2014-09-03 07:25:23 UTC
Permalink
Post by Richard Henderson
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c (revision 211826)
+++ config/i386/i386.c (working copy)
@@ -12691,7 +12691,9 @@ legitimate_pic_address_disp_p (rtx disp)
return true;
}
else if (!SYMBOL_REF_FAR_ADDR_P (op0)
- && SYMBOL_REF_LOCAL_P (op0)
+ && (SYMBOL_REF_LOCAL_P (op0)
+ || (TARGET_64BIT && ix86_copyrelocs && flag_pie
+ && !SYMBOL_REF_FUNCTION_P (op0)))
&& ix86_cmodel != CM_LARGE_PIC)
return true;
break;
This is the wrong place to patch.
You ought to be adjusting SYMBOL_REF_LOCAL_P, by providing a modified
TARGET_BINDS_LOCAL_P.
Note in particular that I believe that you are doing the wrong thing with weak
and COMMON symbols, in that you probably ought not force a copy reloc there.
Note the complexity of default_binds_local_p_1, and the fact that all you
really want to modify is
/* If PIC, then assume that any global name can be overridden by
symbols resolved from other modules. */
else if (shlib)
local_p = false;
near the bottom of that function.
Reminds me of PR32219 https://gcc.gnu.org/ml/gcc-patches/2010-03/msg00665.html
but admittedly that is not PIE imposed but still fails on current trunk..
Sriraman Tallam
2014-09-08 22:19:34 UTC
Permalink
Post by Richard Henderson
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c (revision 211826)
+++ config/i386/i386.c (working copy)
@@ -12691,7 +12691,9 @@ legitimate_pic_address_disp_p (rtx disp)
return true;
}
else if (!SYMBOL_REF_FAR_ADDR_P (op0)
- && SYMBOL_REF_LOCAL_P (op0)
+ && (SYMBOL_REF_LOCAL_P (op0)
+ || (TARGET_64BIT && ix86_copyrelocs && flag_pie
+ && !SYMBOL_REF_FUNCTION_P (op0)))
&& ix86_cmodel != CM_LARGE_PIC)
return true;
break;
This is the wrong place to patch.
You ought to be adjusting SYMBOL_REF_LOCAL_P, by providing a modified
TARGET_BINDS_LOCAL_P.
I have done this in the new attached patch, I added a new function
i386_binds_local_p which will check for this and call
default_binds_local_p otherwise.
Post by Richard Henderson
Note in particular that I believe that you are doing the wrong thing with weak
and COMMON symbols, in that you probably ought not force a copy reloc there.
I added an extra check to not do this for WEAK symbols. I also added a
check for DECL_EXTERNAL so I believe this will also not be called for
COMMON symbols.
Post by Richard Henderson
Note the complexity of default_binds_local_p_1, and the fact that all you
really want to modify is
/* If PIC, then assume that any global name can be overridden by
symbols resolved from other modules. */
else if (shlib)
local_p = false;
near the bottom of that function.
I did not understand what you mean here? Were you suggesting an
alternative way of doing this?

Thanks for reviewing
Sri
Post by Richard Henderson
r~
Sriraman Tallam
2014-09-19 21:11:11 UTC
Permalink
Hi Richard,

I also ran the gcc testsuite with
RUNTESTFLAGS="--tool_opts=-mcopyrelocs" to check for issues. The only
test that failed was g++.dg/tsan/default_options.C. It uses -fpie
-pie and BFD ld to link. Since BFD ld does not support copy
relocations with -pie, it does not link. I linked with gold to make
the test pass.

Could you please take another look at this patch?

Thanks
Sri
Post by Sriraman Tallam
Post by Richard Henderson
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c (revision 211826)
+++ config/i386/i386.c (working copy)
@@ -12691,7 +12691,9 @@ legitimate_pic_address_disp_p (rtx disp)
return true;
}
else if (!SYMBOL_REF_FAR_ADDR_P (op0)
- && SYMBOL_REF_LOCAL_P (op0)
+ && (SYMBOL_REF_LOCAL_P (op0)
+ || (TARGET_64BIT && ix86_copyrelocs && flag_pie
+ && !SYMBOL_REF_FUNCTION_P (op0)))
&& ix86_cmodel != CM_LARGE_PIC)
return true;
break;
This is the wrong place to patch.
You ought to be adjusting SYMBOL_REF_LOCAL_P, by providing a modified
TARGET_BINDS_LOCAL_P.
I have done this in the new attached patch, I added a new function
i386_binds_local_p which will check for this and call
default_binds_local_p otherwise.
Post by Richard Henderson
Note in particular that I believe that you are doing the wrong thing with weak
and COMMON symbols, in that you probably ought not force a copy reloc there.
I added an extra check to not do this for WEAK symbols. I also added a
check for DECL_EXTERNAL so I believe this will also not be called for
COMMON symbols.
Post by Richard Henderson
Note the complexity of default_binds_local_p_1, and the fact that all you
really want to modify is
/* If PIC, then assume that any global name can be overridden by
symbols resolved from other modules. */
else if (shlib)
local_p = false;
near the bottom of that function.
I did not understand what you mean here? Were you suggesting an
alternative way of doing this?
Thanks for reviewing
Sri
Post by Richard Henderson
r~
Sriraman Tallam
2014-09-29 17:57:06 UTC
Permalink
Ping.
Post by Sriraman Tallam
Hi Richard,
I also ran the gcc testsuite with
RUNTESTFLAGS="--tool_opts=-mcopyrelocs" to check for issues. The only
test that failed was g++.dg/tsan/default_options.C. It uses -fpie
-pie and BFD ld to link. Since BFD ld does not support copy
relocations with -pie, it does not link. I linked with gold to make
the test pass.
Could you please take another look at this patch?
Thanks
Sri
Post by Sriraman Tallam
Post by Richard Henderson
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c (revision 211826)
+++ config/i386/i386.c (working copy)
@@ -12691,7 +12691,9 @@ legitimate_pic_address_disp_p (rtx disp)
return true;
}
else if (!SYMBOL_REF_FAR_ADDR_P (op0)
- && SYMBOL_REF_LOCAL_P (op0)
+ && (SYMBOL_REF_LOCAL_P (op0)
+ || (TARGET_64BIT && ix86_copyrelocs && flag_pie
+ && !SYMBOL_REF_FUNCTION_P (op0)))
&& ix86_cmodel != CM_LARGE_PIC)
return true;
break;
This is the wrong place to patch.
You ought to be adjusting SYMBOL_REF_LOCAL_P, by providing a modified
TARGET_BINDS_LOCAL_P.
I have done this in the new attached patch, I added a new function
i386_binds_local_p which will check for this and call
default_binds_local_p otherwise.
Post by Richard Henderson
Note in particular that I believe that you are doing the wrong thing with weak
and COMMON symbols, in that you probably ought not force a copy reloc there.
I added an extra check to not do this for WEAK symbols. I also added a
check for DECL_EXTERNAL so I believe this will also not be called for
COMMON symbols.
Post by Richard Henderson
Note the complexity of default_binds_local_p_1, and the fact that all you
really want to modify is
/* If PIC, then assume that any global name can be overridden by
symbols resolved from other modules. */
else if (shlib)
local_p = false;
near the bottom of that function.
I did not understand what you mean here? Were you suggesting an
alternative way of doing this?
Thanks for reviewing
Sri
Post by Richard Henderson
r~
Sriraman Tallam
2014-10-06 20:43:42 UTC
Permalink
Ping.
Ping.
Post by Sriraman Tallam
Hi Richard,
I also ran the gcc testsuite with
RUNTESTFLAGS="--tool_opts=-mcopyrelocs" to check for issues. The only
test that failed was g++.dg/tsan/default_options.C. It uses -fpie
-pie and BFD ld to link. Since BFD ld does not support copy
relocations with -pie, it does not link. I linked with gold to make
the test pass.
Could you please take another look at this patch?
Thanks
Sri
Post by Sriraman Tallam
Post by Richard Henderson
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c (revision 211826)
+++ config/i386/i386.c (working copy)
@@ -12691,7 +12691,9 @@ legitimate_pic_address_disp_p (rtx disp)
return true;
}
else if (!SYMBOL_REF_FAR_ADDR_P (op0)
- && SYMBOL_REF_LOCAL_P (op0)
+ && (SYMBOL_REF_LOCAL_P (op0)
+ || (TARGET_64BIT && ix86_copyrelocs && flag_pie
+ && !SYMBOL_REF_FUNCTION_P (op0)))
&& ix86_cmodel != CM_LARGE_PIC)
return true;
break;
This is the wrong place to patch.
You ought to be adjusting SYMBOL_REF_LOCAL_P, by providing a modified
TARGET_BINDS_LOCAL_P.
I have done this in the new attached patch, I added a new function
i386_binds_local_p which will check for this and call
default_binds_local_p otherwise.
Post by Richard Henderson
Note in particular that I believe that you are doing the wrong thing with weak
and COMMON symbols, in that you probably ought not force a copy reloc there.
I added an extra check to not do this for WEAK symbols. I also added a
check for DECL_EXTERNAL so I believe this will also not be called for
COMMON symbols.
Post by Richard Henderson
Note the complexity of default_binds_local_p_1, and the fact that all you
really want to modify is
/* If PIC, then assume that any global name can be overridden by
symbols resolved from other modules. */
else if (shlib)
local_p = false;
near the bottom of that function.
I did not understand what you mean here? Were you suggesting an
alternative way of doing this?
Thanks for reviewing
Sri
Post by Richard Henderson
r~
Loading...