Discussion:
[PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c
Alan Lawrence
2014-06-30 19:05:34 UTC
Permalink
combine.c includes a check which prevents

(ashiftrt (xor A C2) C1)

from being commuted to

(xor (ashiftrt A C1) (ashiftrt C2 C1))

for constants C1, C2 if C2 has its sign bit set.

Specifically, this prevents (ashiftrt (not A) C1) from being commuted to

(not (ashiftrt A C1))

because the former is rewritten to (ashiftrt (xor A -1) C1) above, with a
comment /* Make this fit the case below. */ - which it no longer does.

If result_mode == shift_mode, I can see no reason to prevent this normalisation
(indeed, I'm not sure I can see any reason to prevent it even if result_mode !=
shift_mode - but I've not managed to produce such a case in my own testing, as
there are always intervening subreg's or sign_extend's, or to build a toolchain
on which to reproduce the original bug, so I'm being cautious). Hence this patch
allows commutation if the two modes are equal.

As an illustrative example, on AArch64, without this patch, compiling this
snippet of the current arm_neon.h:

typedef long long int int64x1_t __attribute__ ((__vector_size__((8))));
int64x1 vcgez_s64(int64x1_t a)
{
return (int64x1_t) {a >=0 ? -1 : 0};
}

produces a sequence involving a logical not (mvn) followed by asr (plus some
inter-bank moves), as the combiner takes (ashiftrt (not x) 63), "simplifies",
and fails to match the resulting RTL

(set (reg:DI 78 [ D.2593 ])
(ashiftrt:DI (xor:DI (reg:DI 0 x0 [ aD.2565 ])
(const_int -1 [0xffffffffffffffff]))
(const_int 63 [0x3f])))

However, with this patch, the combiner simplifies to

(set (reg:DI 84 [ D.2604 ])
(neg:DI (ge:DI (reg:DI 32 v0 [ aD.2569 ])
(const_int 0 [0]))))

which matches a pattern to output the desired cmge instruction.

(For the curious: the check against commutation was added in January 2004,
r76965, https://gcc.gnu.org/ml/gcc-patches/2004-01/msg03406.html - however, I've
not been able to build an ADA toolchain of that era, or an Alpha/VMS toolchain,
on which to reproduce the bug; if anyone has such and is willing and able to
investigate, by all means!)

Testing:

aarch64-none-elf: check-gcc (cross-tested)
x86-none-linux-gnu: bootstrapped; check-gcc, check-ada, check-g++
arm-none-linux-gnueabi: bootstrapped
arm-none-eabi: check-gcc (cross-tested)


gcc/ChangeLog:

* combine.c (simplify_shift_const_1): Allow commuting ASHIFTRT and XOR
if same mode.

---------------------------------
diff --git a/gcc/combine.c b/gcc/combine.c
index 4e7ef55..29a9fc8 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -10219,6 +10219,7 @@ simplify_shift_const_1 (enum rtx_code code, enum machine
/* We can't do this if we have (ashiftrt (xor)) and the
constant has its sign bit set in shift_mode. */
&& !(code == ASHIFTRT && GET_CODE (varop) == XOR
+ && result_mode != shift_mode
&& 0 > trunc_int_for_mode (INTVAL (XEXP (varop, 1)),
shift_mode))
&& (new_rtx = simplify_const_binary_operation
@@ -10239,6 +10240,7 @@ simplify_shift_const_1 (enum rtx_code code, enum machine
for some (ashiftrt (xor)). */
if (CONST_INT_P (XEXP (varop, 1))
&& !(code == ASHIFTRT && GET_CODE (varop) == XOR
+ && result_mode != shift_mode
&& 0 > trunc_int_for_mode (INTVAL (XEXP (varop, 1)),
shift_mode)))
Jeff Law
2014-06-30 21:11:13 UTC
Permalink
Post by Alan Lawrence
combine.c includes a check which prevents
(ashiftrt (xor A C2) C1)
from being commuted to
(xor (ashiftrt A C1) (ashiftrt C2 C1))
for constants C1, C2 if C2 has its sign bit set.
Specifically, this prevents (ashiftrt (not A) C1) from being commuted to
(not (ashiftrt A C1))
because the former is rewritten to (ashiftrt (xor A -1) C1) above, with
a comment /* Make this fit the case below. */ - which it no longer does.
If result_mode == shift_mode, I can see no reason to prevent this
normalisation (indeed, I'm not sure I can see any reason to prevent it
even if result_mode != shift_mode - but I've not managed to produce such
a case in my own testing, as there are always intervening subreg's or
sign_extend's, or to build a toolchain on which to reproduce the
original bug, so I'm being cautious). Hence this patch allows
commutation if the two modes are equal.
As an illustrative example, on AArch64, without this patch, compiling
typedef long long int int64x1_t __attribute__ ((__vector_size__((8))));
int64x1 vcgez_s64(int64x1_t a)
{
return (int64x1_t) {a >=0 ? -1 : 0};
}
produces a sequence involving a logical not (mvn) followed by asr (plus
some inter-bank moves), as the combiner takes (ashiftrt (not x) 63),
"simplifies", and fails to match the resulting RTL
(set (reg:DI 78 [ D.2593 ])
(ashiftrt:DI (xor:DI (reg:DI 0 x0 [ aD.2565 ])
(const_int -1 [0xffffffffffffffff]))
(const_int 63 [0x3f])))
However, with this patch, the combiner simplifies to
(set (reg:DI 84 [ D.2604 ])
(neg:DI (ge:DI (reg:DI 32 v0 [ aD.2569 ])
(const_int 0 [0]))))
which matches a pattern to output the desired cmge instruction.
(For the curious: the check against commutation was added in January
2004, r76965, https://gcc.gnu.org/ml/gcc-patches/2004-01/msg03406.html -
however, I've not been able to build an ADA toolchain of that era, or an
Alpha/VMS toolchain, on which to reproduce the bug; if anyone has such
and is willing and able to investigate, by all means!)
aarch64-none-elf: check-gcc (cross-tested)
x86-none-linux-gnu: bootstrapped; check-gcc, check-ada, check-g++
arm-none-linux-gnueabi: bootstrapped
arm-none-eabi: check-gcc (cross-tested)
* combine.c (simplify_shift_const_1): Allow commuting ASHIFTRT and XOR
if same mode.
You'll want to use your sample code from above to create a testcase.
You can either dump the RTL and search it, or scan the assembly code.

Look in gcc/testsuite/gcc.target/arm for examples.

Given the original problem which prompted Kenner to make this change was
Ada related on the Alpha, you might ping rth and/or uros to see if they
could do a "quick" regression of those platforms with Ada enabled.

I'm naturally hesitant to approve since this was something pretty Kenner
explicitly tried to avoid AFAICT. Kenner wasn't ever known for trying
to paper over problems -- if he checked in a change like this, much more
likely than not it was well thought out and analyzed. He also probably
knew combine.c better than anyone at that time (possibly even still true
today).


Jeff
Alan Lawrence
2014-07-16 15:24:22 UTC
Permalink
Thanks for the suggestions! I think I've got a reasonably platform-independent
testcase that scans the rtl dump, just trying it on a few more platforms now...

As to running on Alpha: bootstrap succeeds, and the regression testsuite doesn't
raise any issues (https://gcc.gnu.org/ml/gcc-testresults/2014-07/msg01236.html)
- and that's with a more aggressive patch that completely rolls back the
original r76965:

Index: combine.c
===================================================================
--- combine.c (revision 212523)
+++ combine.c (working copy)
@@ -10218,9 +10218,6 @@
if (CONST_INT_P (XEXP (varop, 1))
/* We can't do this if we have (ashiftrt (xor)) and the
constant has its sign bit set in shift_mode. */
- && !(code == ASHIFTRT && GET_CODE (varop) == XOR
- && 0 > trunc_int_for_mode (INTVAL (XEXP (varop, 1)),
- shift_mode))
&& (new_rtx = simplify_const_binary_operation
(code, result_mode,
gen_int_mode (INTVAL (XEXP (varop, 1)), result_mode),
@@ -10237,10 +10234,7 @@
logical expression, make a new logical expression, and apply
the inverse distributive law. This also can't be done
for some (ashiftrt (xor)). */
- if (CONST_INT_P (XEXP (varop, 1))
- && !(code == ASHIFTRT && GET_CODE (varop) == XOR
- && 0 > trunc_int_for_mode (INTVAL (XEXP (varop, 1)),
- shift_mode)))
+ if (CONST_INT_P (XEXP (varop, 1)))
{
rtx lhs = simplify_shift_const (NULL_RTX, code, shift_mode,
XEXP (varop, 0), count);

I'm testing this version more widely but initial indications are good.

However, I've not succeeded in checking Ada on Alpha, as GCC's Ada frontend
requires an Ada compiler to bootstrap. So I have to ask: does anyone actually
use Ada on Alpha? (And if so, would they please be able to test the above patch?)

Moreover, I don't really see we have much reason to believe the check against
commuting is required even for Ada/Alpha. GCC's internals have changed
substantially in the interim, with the Ada frontend no longer generating RTL
directly, as we now have intervening GENERIC/GIMPLE tree stages. Unless there is
a logical/bitwise explanation for why the commuting of ashiftrc and xor is
unsafe, is the best explanation that the Ada frontend was generating RTL that
may have looked OK at the time but we would now consider dubious, malformed, bad?

(E.g., these days I don't see how to produce an ashiftrt of one mode containing
an XOR of another without an intervening sign_extend, zero_extend or subreg.)

--Alan
Post by Jeff Law
Post by Alan Lawrence
combine.c includes a check which prevents
(ashiftrt (xor A C2) C1)
from being commuted to
(xor (ashiftrt A C1) (ashiftrt C2 C1))
for constants C1, C2 if C2 has its sign bit set.
Specifically, this prevents (ashiftrt (not A) C1) from being commuted to
(not (ashiftrt A C1))
because the former is rewritten to (ashiftrt (xor A -1) C1) above, with
a comment /* Make this fit the case below. */ - which it no longer does.
If result_mode == shift_mode, I can see no reason to prevent this
normalisation (indeed, I'm not sure I can see any reason to prevent it
even if result_mode != shift_mode - but I've not managed to produce such
a case in my own testing, as there are always intervening subreg's or
sign_extend's, or to build a toolchain on which to reproduce the
original bug, so I'm being cautious). Hence this patch allows
commutation if the two modes are equal.
As an illustrative example, on AArch64, without this patch, compiling
typedef long long int int64x1_t __attribute__ ((__vector_size__((8))));
int64x1 vcgez_s64(int64x1_t a)
{
return (int64x1_t) {a >=0 ? -1 : 0};
}
produces a sequence involving a logical not (mvn) followed by asr (plus
some inter-bank moves), as the combiner takes (ashiftrt (not x) 63),
"simplifies", and fails to match the resulting RTL
(set (reg:DI 78 [ D.2593 ])
(ashiftrt:DI (xor:DI (reg:DI 0 x0 [ aD.2565 ])
(const_int -1 [0xffffffffffffffff]))
(const_int 63 [0x3f])))
However, with this patch, the combiner simplifies to
(set (reg:DI 84 [ D.2604 ])
(neg:DI (ge:DI (reg:DI 32 v0 [ aD.2569 ])
(const_int 0 [0]))))
which matches a pattern to output the desired cmge instruction.
(For the curious: the check against commutation was added in January
2004, r76965, https://gcc.gnu.org/ml/gcc-patches/2004-01/msg03406.html -
however, I've not been able to build an ADA toolchain of that era, or an
Alpha/VMS toolchain, on which to reproduce the bug; if anyone has such
and is willing and able to investigate, by all means!)
aarch64-none-elf: check-gcc (cross-tested)
x86-none-linux-gnu: bootstrapped; check-gcc, check-ada, check-g++
arm-none-linux-gnueabi: bootstrapped
arm-none-eabi: check-gcc (cross-tested)
* combine.c (simplify_shift_const_1): Allow commuting ASHIFTRT and XOR
if same mode.
You'll want to use your sample code from above to create a testcase.
You can either dump the RTL and search it, or scan the assembly code.
Look in gcc/testsuite/gcc.target/arm for examples.
Given the original problem which prompted Kenner to make this change was
Ada related on the Alpha, you might ping rth and/or uros to see if they
could do a "quick" regression of those platforms with Ada enabled.
I'm naturally hesitant to approve since this was something pretty Kenner
explicitly tried to avoid AFAICT. Kenner wasn't ever known for trying
to paper over problems -- if he checked in a change like this, much more
likely than not it was well thought out and analyzed. He also probably
knew combine.c better than anyone at that time (possibly even still true
today).
Jeff
Alan Lawrence
2014-07-17 16:56:03 UTC
Permalink
Ok, the attached tests are passing on x86_64-none-linux-gnu, aarch64-none-elf,
arm-none-eabi, and a bunch of smaller platforms for which I've only built a
stage 1 compiler (i.e. as far as necessary to assemble). That's with either
change to simplify_shift_const_1.

As to the addition of "result_mode != shift_mode", or removing the whole check
against XOR - I've now tested the latter:

bootstrapped on x86_64-none-linux-gnu, check-gcc and check-ada;
bootstrapped on arm-none-linux-gnueabihf;
bootstrapped on aarch64-none-linux-gnu;
cross-tested check-gcc on aarch64-none-elf;
cross-tested on arm-none-eabi;
(and Uros has bootstrapped on alpha and done a suite of tests, as per
https://gcc.gnu.org/ml/gcc-testresults/2014-07/msg01236.html).

From a perspective of paranoia, I'd lean towards adding "result_mode !=
shift_mode", but for neatness removing the whole check against XOR is nicer. So
I'd defer to the maintainers as to whether one might be preferable to the
other...(but my unproven suspicion is that the two are equivalent, and no case
where result_mode != shift_mode is possible!)

--Alan
Post by Alan Lawrence
Thanks for the suggestions! I think I've got a reasonably platform-independent
testcase that scans the rtl dump, just trying it on a few more platforms now...
As to running on Alpha: bootstrap succeeds, and the regression testsuite doesn't
raise any issues (https://gcc.gnu.org/ml/gcc-testresults/2014-07/msg01236.html)
- and that's with a more aggressive patch that completely rolls back the
Index: combine.c
===================================================================
--- combine.c (revision 212523)
+++ combine.c (working copy)
@@ -10218,9 +10218,6 @@
if (CONST_INT_P (XEXP (varop, 1))
/* We can't do this if we have (ashiftrt (xor)) and the
constant has its sign bit set in shift_mode. */
- && !(code == ASHIFTRT && GET_CODE (varop) == XOR
- && 0 > trunc_int_for_mode (INTVAL (XEXP (varop, 1)),
- shift_mode))
&& (new_rtx = simplify_const_binary_operation
(code, result_mode,
gen_int_mode (INTVAL (XEXP (varop, 1)), result_mode),
@@ -10237,10 +10234,7 @@
logical expression, make a new logical expression, and apply
the inverse distributive law. This also can't be done
for some (ashiftrt (xor)). */
- if (CONST_INT_P (XEXP (varop, 1))
- && !(code == ASHIFTRT && GET_CODE (varop) == XOR
- && 0 > trunc_int_for_mode (INTVAL (XEXP (varop, 1)),
- shift_mode)))
+ if (CONST_INT_P (XEXP (varop, 1)))
{
rtx lhs = simplify_shift_const (NULL_RTX, code, shift_mode,
XEXP (varop, 0), count);
I'm testing this version more widely but initial indications are good.
However, I've not succeeded in checking Ada on Alpha, as GCC's Ada frontend
requires an Ada compiler to bootstrap. So I have to ask: does anyone actually
use Ada on Alpha? (And if so, would they please be able to test the above patch?)
Moreover, I don't really see we have much reason to believe the check against
commuting is required even for Ada/Alpha. GCC's internals have changed
substantially in the interim, with the Ada frontend no longer generating RTL
directly, as we now have intervening GENERIC/GIMPLE tree stages. Unless there is
a logical/bitwise explanation for why the commuting of ashiftrc and xor is
unsafe, is the best explanation that the Ada frontend was generating RTL that
may have looked OK at the time but we would now consider dubious, malformed, bad?
(E.g., these days I don't see how to produce an ashiftrt of one mode containing
an XOR of another without an intervening sign_extend, zero_extend or subreg.)
--Alan
Post by Jeff Law
Post by Alan Lawrence
combine.c includes a check which prevents
(ashiftrt (xor A C2) C1)
from being commuted to
(xor (ashiftrt A C1) (ashiftrt C2 C1))
for constants C1, C2 if C2 has its sign bit set.
Specifically, this prevents (ashiftrt (not A) C1) from being commuted to
(not (ashiftrt A C1))
because the former is rewritten to (ashiftrt (xor A -1) C1) above, with
a comment /* Make this fit the case below. */ - which it no longer does.
If result_mode == shift_mode, I can see no reason to prevent this
normalisation (indeed, I'm not sure I can see any reason to prevent it
even if result_mode != shift_mode - but I've not managed to produce such
a case in my own testing, as there are always intervening subreg's or
sign_extend's, or to build a toolchain on which to reproduce the
original bug, so I'm being cautious). Hence this patch allows
commutation if the two modes are equal.
As an illustrative example, on AArch64, without this patch, compiling
typedef long long int int64x1_t __attribute__ ((__vector_size__((8))));
int64x1 vcgez_s64(int64x1_t a)
{
return (int64x1_t) {a >=0 ? -1 : 0};
}
produces a sequence involving a logical not (mvn) followed by asr (plus
some inter-bank moves), as the combiner takes (ashiftrt (not x) 63),
"simplifies", and fails to match the resulting RTL
(set (reg:DI 78 [ D.2593 ])
(ashiftrt:DI (xor:DI (reg:DI 0 x0 [ aD.2565 ])
(const_int -1 [0xffffffffffffffff]))
(const_int 63 [0x3f])))
However, with this patch, the combiner simplifies to
(set (reg:DI 84 [ D.2604 ])
(neg:DI (ge:DI (reg:DI 32 v0 [ aD.2569 ])
(const_int 0 [0]))))
which matches a pattern to output the desired cmge instruction.
(For the curious: the check against commutation was added in January
2004, r76965, https://gcc.gnu.org/ml/gcc-patches/2004-01/msg03406.html -
however, I've not been able to build an ADA toolchain of that era, or an
Alpha/VMS toolchain, on which to reproduce the bug; if anyone has such
and is willing and able to investigate, by all means!)
aarch64-none-elf: check-gcc (cross-tested)
x86-none-linux-gnu: bootstrapped; check-gcc, check-ada, check-g++
arm-none-linux-gnueabi: bootstrapped
arm-none-eabi: check-gcc (cross-tested)
* combine.c (simplify_shift_const_1): Allow commuting ASHIFTRT and XOR
if same mode.
You'll want to use your sample code from above to create a testcase.
You can either dump the RTL and search it, or scan the assembly code.
Look in gcc/testsuite/gcc.target/arm for examples.
Given the original problem which prompted Kenner to make this change was
Ada related on the Alpha, you might ping rth and/or uros to see if they
could do a "quick" regression of those platforms with Ada enabled.
I'm naturally hesitant to approve since this was something pretty Kenner
explicitly tried to avoid AFAICT. Kenner wasn't ever known for trying
to paper over problems -- if he checked in a change like this, much more
likely than not it was well thought out and analyzed. He also probably
knew combine.c better than anyone at that time (possibly even still true
today).
Jeff
Jeff Law
2014-09-05 18:06:41 UTC
Permalink
Post by Alan Lawrence
Ok, the attached tests are passing on x86_64-none-linux-gnu,
aarch64-none-elf, arm-none-eabi, and a bunch of smaller platforms for
which I've only built a stage 1 compiler (i.e. as far as necessary to
assemble). That's with either change to simplify_shift_const_1.
As to the addition of "result_mode != shift_mode", or removing the whole
bootstrapped on x86_64-none-linux-gnu, check-gcc and check-ada;
bootstrapped on arm-none-linux-gnueabihf;
bootstrapped on aarch64-none-linux-gnu;
cross-tested check-gcc on aarch64-none-elf;
cross-tested on arm-none-eabi;
(and Uros has bootstrapped on alpha and done a suite of tests, as per
https://gcc.gnu.org/ml/gcc-testresults/2014-07/msg01236.html).
From a perspective of paranoia, I'd lean towards adding "result_mode !=
shift_mode", but for neatness removing the whole check against XOR is
nicer. So I'd defer to the maintainers as to whether one might be
preferable to the other...(but my unproven suspicion is that the two are
equivalent, and no case where result_mode != shift_mode is possible!)
So first, whether or not someone cares about Alpha-VMS isn't the issue
at hand. It's whether or not the new code is correct or not.
Similarly the fact that the code generation paths are radically
different now when compared to 2004 and we can't currently trigger the
cases where the modes are different isn't the issue, again, it's whether
or not your code is correct or not.

I think the key is to look at try_widen_shift_mode and realize that it
can return a mode larger than the original mode of the operations.
However, it only does so when it presented with a case where it knows
the sign bit being shifted in from the left will be the same as the sign
bit in the original mode.

In the case of an XOR when the sign bit set in shift_mode, that's not
going to be the case. We would violate the assumption made when we
decided to widen the shift to shift_mode.

So your relaxation is safe when shift_mode == result_mode, but unsafe
otherwise -- even though we don't currently have a testcase for the
shift_mode != result_mode case, we don't want to break that.

So your updated patch is correct. However, I would ask that you make
one additional change. Namely the comment before the two fragments of
code you changed needs updating. Something like

"... and the constant has its sign bit set in shift_mode and shift_mode
is different than result_mode".

The 2nd comment should have similar clarifying comments.

You should also include your testcase for a cursory review.

So with the inclusion of the testcase and updated comments, we should be
good to go. However, please post the final patch for that cursory
review of the testcase.

jeff
Alan Lawrence
2014-09-18 09:40:44 UTC
Permalink
Thanks for the reply - and the in-depth investigation. I agree that the
correctness of the compiler is critical rather than particular platforms such as
Ada / Alpha.

Moreover, I think we both agree that if result_mode==shift_mode, the
transformation is correct. "Just" putting that check in, achieves what I'm
trying for here, so I'd be happy to go with the attached patch and call it a
day. However, I'm a little concerned about the other cases - i.e. where
shift_mode is wider than result_mode.

If I understand correctly (and I'm not sure about that, let's see how far I
get), this means we'll perform the shift in (say) DImode, when we're only really
concerned about the lower (say) 32-bits (for an originally-SImode shift).
try_widen_shift_mode will in this case check that the result of the operation
*inside* the shift (in our case an XOR) has 33 sign bit copies (via
num_sign_bit_copies), i.e. that its *top* 32-bits are all equal to the original
SImode sign bit. <count> of these bits may be fed into the top of the desired
SImode result by the DImode shift. Right so far?

AFAICT, num_sign_bit_copies for an XOR, conservatively returns the minimum of
the num_sign_bit_copies of its two operands. I'm not sure whether this is
behaviour we should rely on in its callers, or for the sake of abstraction we
should treat num_sign_bit_copies as a black box (which does what it says on the,
erm, tin).

If the former, doesn't having num_sign_bit_copies >= the difference in size
between result_mode and shift_mode, of both operands to the XOR, guarantee
safety of the commutation (whether the constant is positive or negative)? We
could perform the shift (in the larger mode) on both of the XOR operands safely,
then XOR together their lower parts.

If, however, we want to play safe and ensure that we deal safely with any XOR
whose top (mode size difference + 1) bits were the same, then I think the
restriction that the XOR constant is positive is neither necessary nor
sufficient; rather (mirroring try_widen_shift_mode) I think we need that
num_sign_bit_copies of the constant in shift_mode, is more than the size
difference between result_mode and shift_mode.

Hmmm. I might try that patch at some point, I think it is the right check to
make. (Meta-comment: this would be *so*much* easier if we could write unit tests
more easily!) In the meantime I'd be happy to settle for the attached...

(tests are as they were posted
https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01233.html .)

--Alan
Post by Jeff Law
Post by Alan Lawrence
Ok, the attached tests are passing on x86_64-none-linux-gnu,
aarch64-none-elf, arm-none-eabi, and a bunch of smaller platforms for
which I've only built a stage 1 compiler (i.e. as far as necessary to
assemble). That's with either change to simplify_shift_const_1.
As to the addition of "result_mode != shift_mode", or removing the whole
bootstrapped on x86_64-none-linux-gnu, check-gcc and check-ada;
bootstrapped on arm-none-linux-gnueabihf;
bootstrapped on aarch64-none-linux-gnu;
cross-tested check-gcc on aarch64-none-elf;
cross-tested on arm-none-eabi;
(and Uros has bootstrapped on alpha and done a suite of tests, as per
https://gcc.gnu.org/ml/gcc-testresults/2014-07/msg01236.html).
From a perspective of paranoia, I'd lean towards adding "result_mode !=
shift_mode", but for neatness removing the whole check against XOR is
nicer. So I'd defer to the maintainers as to whether one might be
preferable to the other...(but my unproven suspicion is that the two are
equivalent, and no case where result_mode != shift_mode is possible!)
So first, whether or not someone cares about Alpha-VMS isn't the issue
at hand. It's whether or not the new code is correct or not.
Similarly the fact that the code generation paths are radically
different now when compared to 2004 and we can't currently trigger the
cases where the modes are different isn't the issue, again, it's whether
or not your code is correct or not.
I think the key is to look at try_widen_shift_mode and realize that it
can return a mode larger than the original mode of the operations.
However, it only does so when it presented with a case where it knows
the sign bit being shifted in from the left will be the same as the sign
bit in the original mode.
In the case of an XOR when the sign bit set in shift_mode, that's not
going to be the case. We would violate the assumption made when we
decided to widen the shift to shift_mode.
So your relaxation is safe when shift_mode == result_mode, but unsafe
otherwise -- even though we don't currently have a testcase for the
shift_mode != result_mode case, we don't want to break that.
So your updated patch is correct. However, I would ask that you make
one additional change. Namely the comment before the two fragments of
code you changed needs updating. Something like
"... and the constant has its sign bit set in shift_mode and shift_mode
is different than result_mode".
The 2nd comment should have similar clarifying comments.
You should also include your testcase for a cursory review.
So with the inclusion of the testcase and updated comments, we should be
good to go. However, please post the final patch for that cursory
review of the testcase.
jeff
Andreas Schwab
2014-10-05 08:06:22 UTC
Permalink
diff --git a/gcc/testsuite/gcc.dg/combine_ashiftrt_1.c b/gcc/testsuite/gcc.dg/combine_ashiftrt_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..90e64fd10dc358f10ad03a90041605bc3ccb7011
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/combine_ashiftrt_1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile {target sparc64*-*-* aarch64*-*-* x86_64-*-* powerpc64*-*-*} } */
You should check for lp64 instead of matching 64 in target names, to
reject -m32.
diff --git a/gcc/testsuite/gcc.dg/combine_ashiftrt_2.c b/gcc/testsuite/gcc.dg/combine_ashiftrt_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..fd6827caed230ea5dd2d6ec4431b11bf826531ea
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/combine_ashiftrt_2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile {target arm*-*-* i?86-*-* powerpc-*-* sparc-*-*} } */
Likewise, using ilp32 to reject -m64.

Andreas.
--
Andreas Schwab, ***@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Jeff Law
2014-09-19 05:38:09 UTC
Permalink
Post by Alan Lawrence
Moreover, I think we both agree that if result_mode==shift_mode, the
transformation is correct. "Just" putting that check in, achieves
what I'm trying for here, so I'd be happy to go with the attached
patch and call it a day. However, I'm a little concerned about the
other cases - i.e. where shift_mode is wider than result_mode.
Let's go ahead and get the attached patch installed. I'm pretty sure
it's correct and I know you want to see something move forward here. We
can iterate further if we want.
Post by Alan Lawrence
If I understand correctly (and I'm not sure about that, let's see how
far I get), this means we'll perform the shift in (say) DImode, when
we're only really concerned about the lower (say) 32-bits (for an
originally-SImode shift).
That's the class of cases I'm concerned about.


try_widen_shift_mode will in this case
Post by Alan Lawrence
check that the result of the operation *inside* the shift (in our
case an XOR) has 33 sign bit copies (via num_sign_bit_copies), i.e.
that its *top* 32-bits are all equal to the original SImode sign bit.
<count> of these bits may be fed into the top of the desired SImode
result by the DImode shift. Right so far?
Correct.
Post by Alan Lawrence
AFAICT, num_sign_bit_copies for an XOR, conservatively returns the
minimum of the num_sign_bit_copies of its two operands. I'm not sure
whether this is behaviour we should rely on in its callers, or for
the sake of abstraction we should treat num_sign_bit_copies as a
black box (which does what it says on the, erm, tin).
Treat it as a black box. It returns the number of known sign bit
copies. There may be more, but never less.
Post by Alan Lawrence
If the former, doesn't having num_sign_bit_copies >= the difference
in size between result_mode and shift_mode, of both operands to the
XOR, guarantee safety of the commutation (whether the constant is
positive or negative)? We could perform the shift (in the larger
mode) on both of the XOR operands safely, then XOR together their
lower parts.
I had convinced myself that when we flip the sign bit via the XOR and
commute the XOR out that we invalidate the assumptions made when
widening. But I'm not so sure anymore. Damn I hate changes made
without suitable tests :(

I almost convinced myself the problem is in the adjustment of C2 in the
widened case, but that's not a problem either. At least not on paper.
Post by Alan Lawrence
If, however, we want to play safe and ensure that we deal safely with
any XOR whose top (mode size difference + 1) bits were the same,
then I think the restriction that the XOR constant is positive is
neither necessary nor sufficient; rather (mirroring
try_widen_shift_mode) I think we need that num_sign_bit_copies of the
constant in shift_mode, is more than the size difference between
result_mode and shift_mode.
But isn't that the same? Isn't the only case where it isn't the same
when the constant has bits set that are outside the mode of the other
operand?

Hmm, what about (xor:QI A -1)? In that case -1 will be represented with
bits outside the precision of QImode.
Post by Alan Lawrence
Hmmm. I might try that patch at some point, I think it is the right
check to make. (Meta-comment: this would be *so*much* easier if we
could write unit tests more easily!) In the meantime I'd be happy to
settle for the attached...
No argument on the unit testing comment. It's a major failing in the
design of GCC that we can't easily build a unit testing framework.

Jeff
Alan Lawrence
2014-09-22 11:16:05 UTC
Permalink
Ok thanks Jeff. In that case I think I should draw this to the attention of the
AArch64 maintainers to check the testsuite updates are OK before I commit...?

Methinks it may be possible to get further, or at least increase our confidence,
if I "mock" out try_widen_shift_mode, and/or try injecting some dubious RTL from
a builtin, although this'll only give a momentary snapshot of behaviour. I may
or may not have time to look into this though ;)...

Cheers, Alan
Post by Jeff Law
Post by Alan Lawrence
Moreover, I think we both agree that if result_mode==shift_mode, the
transformation is correct. "Just" putting that check in, achieves
what I'm trying for here, so I'd be happy to go with the attached
patch and call it a day. However, I'm a little concerned about the
other cases - i.e. where shift_mode is wider than result_mode.
Let's go ahead and get the attached patch installed. I'm pretty sure
it's correct and I know you want to see something move forward here. We
can iterate further if we want.
Post by Alan Lawrence
If I understand correctly (and I'm not sure about that, let's see how
far I get), this means we'll perform the shift in (say) DImode, when
we're only really concerned about the lower (say) 32-bits (for an
originally-SImode shift).
That's the class of cases I'm concerned about.
try_widen_shift_mode will in this case
Post by Alan Lawrence
check that the result of the operation *inside* the shift (in our
case an XOR) has 33 sign bit copies (via num_sign_bit_copies), i.e.
that its *top* 32-bits are all equal to the original SImode sign bit.
<count> of these bits may be fed into the top of the desired SImode
result by the DImode shift. Right so far?
Correct.
Post by Alan Lawrence
AFAICT, num_sign_bit_copies for an XOR, conservatively returns the
minimum of the num_sign_bit_copies of its two operands. I'm not sure
whether this is behaviour we should rely on in its callers, or for
the sake of abstraction we should treat num_sign_bit_copies as a
black box (which does what it says on the, erm, tin).
Treat it as a black box. It returns the number of known sign bit
copies. There may be more, but never less.
Post by Alan Lawrence
If the former, doesn't having num_sign_bit_copies >= the difference
in size between result_mode and shift_mode, of both operands to the
XOR, guarantee safety of the commutation (whether the constant is
positive or negative)? We could perform the shift (in the larger
mode) on both of the XOR operands safely, then XOR together their
lower parts.
I had convinced myself that when we flip the sign bit via the XOR and
commute the XOR out that we invalidate the assumptions made when
widening. But I'm not so sure anymore. Damn I hate changes made
without suitable tests :(
I almost convinced myself the problem is in the adjustment of C2 in the
widened case, but that's not a problem either. At least not on paper.
Post by Alan Lawrence
If, however, we want to play safe and ensure that we deal safely with
any XOR whose top (mode size difference + 1) bits were the same,
then I think the restriction that the XOR constant is positive is
neither necessary nor sufficient; rather (mirroring
try_widen_shift_mode) I think we need that num_sign_bit_copies of the
constant in shift_mode, is more than the size difference between
result_mode and shift_mode.
But isn't that the same? Isn't the only case where it isn't the same
when the constant has bits set that are outside the mode of the other
operand?
Hmm, what about (xor:QI A -1)? In that case -1 will be represented with
bits outside the precision of QImode.
Post by Alan Lawrence
Hmmm. I might try that patch at some point, I think it is the right
check to make. (Meta-comment: this would be *so*much* easier if we
could write unit tests more easily!) In the meantime I'd be happy to
settle for the attached...
No argument on the unit testing comment. It's a major failing in the
design of GCC that we can't easily build a unit testing framework.
Jeff
Jeff Law
2014-09-22 17:02:21 UTC
Permalink
Post by Alan Lawrence
Ok thanks Jeff. In that case I think I should draw this to the attention
of the AArch64 maintainers to check the testsuite updates are OK before
I commit...?
Can't hurt.
Post by Alan Lawrence
Methinks it may be possible to get further, or at least increase our
confidence, if I "mock" out try_widen_shift_mode, and/or try injecting
some dubious RTL from a builtin, although this'll only give a momentary
snapshot of behaviour. I may or may not have time to look into this
though ;)...
Yea, it's something I'd pondered as well, though I tend to inject the
RTL I want directly in the debugger. The downside is doing so doesn't
ensure various tables are updated properly, and I vaguely recall a
per-pseudo table for the combiner's nonzero_bits, signbit_copies and
friends.


jeff
Marcus Shawcroft
2014-09-23 11:32:51 UTC
Permalink
Post by Alan Lawrence
Ok thanks Jeff. In that case I think I should draw this to the attention of
the AArch64 maintainers to check the testsuite updates are OK before I
commit...?
OK with me.

Cheers
/Marcus
Loading...