[PATCH] PR63404, gcc 5 miscompiles linux block layer

+inline rtx single_set_no_clobber_use (const rtx_insn *insn)
+{
+ if (!INSN_P (insn))
+ return NULL_RTX;
+
+ if (GET_CODE (PATTERN (insn)) == SET)
+ return PATTERN (insn);
+
+ /* Defer to the more expensive case, and return NULL_RTX if there is
+ USE or CLOBBER. */
+ return single_set_2 (insn, PATTERN (insn), true);
}

What more expensive case?

If you're disallowing USE and CLOBBER, then single_set is just GET_CODE == SET.

I think this function is somewhat useless, and should not be added.

An adjustment to move_insn_for_shrink_wrap may be reasonable though. I haven't
tried to understand the miscompilation yet. I can imagine that this would
disable quite a bit of shrink wrapping for x86 though. Can we do better in
understanding when the clobbered register is live at the location to which we'd
like to move then insns?

r~

Jiong Wang

2014-09-29 19:24:52 UTC

Post by Richard Henderson

Richard,

thanks for review.

Post by Richard Henderson
What more expensive case?

single_set_no_clobber_use is just a clone of single_set, I copied the comments with
only minor modifications.

I think the "more expensive case" here means the case where there are PARALLEL that
we need to check the inner rtx.

Post by Richard Henderson
If you're disallowing USE and CLOBBER, then single_set is just GET_CODE == SET.
I think this function is somewhat useless, and should not be added.
An adjustment to move_insn_for_shrink_wrap may be reasonable though. I haven't
tried to understand the miscompilation yet. I can imagine that this would
disable quite a bit of shrink wrapping for x86 though.

I don't think so. from the x86-64 bootstrap, there is no regression on the number
of functions shrink-wrapped. actually speaking, previously only single mov dest, src
handled, so the disallowing USE/CLOBBER will not disallow shrink-wrap opportunity which
was allowed previously.

and I am afraid if we don't reuse single_set_2, then there will be another loop
to check all those inner rtx which single_set_2 already does.

so, IMHO, just modify single_set_2 will be more efficient.

Post by Richard Henderson
Can we do better in
understanding when the clobbered register is live at the location to which we'd
like to move then insns?

currently, the generic code in move_insn_for_shrink_wrap only handle dest/src be single
register, so if there is clobber or use, then we might need to check multiply regs, then there might
be a few modifications. and I think that's better be done after all single dest/src issues fixed.

--
Jiong

Post by Richard Henderson
r~

David Malcolm

2014-09-29 20:34:57 UTC

Post by Richard Henderson

Richard,
thanks for review.

Post by Richard Henderson
What more expensive case?

single_set_no_clobber_use is just a clone of single_set, I copied the comments with
only minor modifications.

I introduced that comment to single_set, in r215089, when making
single_set into an inline function (so that it could check that it
received an rtx_insn *, rather than an rtx).

Post by Jiong Wang
I think the "more expensive case" here means the case where there are PARALLEL that
we need to check the inner rtx.

My comment may have been misleading, sorry. IIRC, what I was thinking
that the old implementation had split single_set into a macro and a
function. This was by Honza (CCed), 14 years ago to the day back in
r36664 (on 2000-09-29):
https://gcc.gnu.org/ml/gcc-patches/2000-09/msg00893.html

/* Single set is implemented as macro for performance reasons. */
#define single_set(I) (INSN_P (I) \
? (GET_CODE (PATTERN (I)) == SET \
? PATTERN (I) : single_set_1 (I)) \
: NULL_RTX)

I think by "the more expensive case" I meant having to make a function
call to handle the less-common cases (which indeed covers the PARALLEL
case), rather than having logic inline; preserving that inlined vs
not-inlined split was one of my aims for r215089.

Perhaps it should be rewritten to "Defer to a function call to handle
the less common cases", or somesuch?

[...snip rest of post...]

Dave

Jeff Law

2014-09-30 04:21:25 UTC

Post by Jiong Wang
I don't think so. from the x86-64 bootstrap, there is no regression
on the number of functions shrink-wrapped. actually speaking,
previously only single mov dest, src handled, so the disallowing
USE/CLOBBER will not disallow shrink-wrap opportunity which was
allowed previously.

This is the key, of course. shrink-wrapping is very restrictive in its
ability to sink insns. The only forms it'll currently shrink are simple
moves. Arithmetic, logicals, etc are left alone. Thus disallowing
USE/CLOBBER does not impact the x86 port in any significant way.

I do agree with Richard that it would be useful to see the insns that
are incorrectly sunk and the surrounding context.
Jeff

Jiong Wang

2014-09-30 14:37:33 UTC

yes, and we could get +1.22% (2567 compared with 2536) functions shrink-wrapped
after we sinking more insn except simple "mov dest, src" on x86-64 bootstrap. and
I remember the similar percentage on glibc build.

while on aarch64, the overall functions shrink-wrapped increased +25% on some
programs. maybe we could gain the same on other RISC backend.

Post by Jeff Law
I do agree with Richard that it would be useful to see the insns that
are incorrectly sunk and the surrounding context.

insn 14 and 182 are sunk incorrectly. below is the details.

(insn 14 173 174 2 (parallel [

(set (reg:QI 37 r8 [orig:86 D.32480 ] [86])

(lshiftrt:QI (reg:QI 37 r8 [orig:86 D.32480 ] [86])

(const_int 2 [0x2])))

(clobber (reg:CC 17 flags))

]) /home/andi/lsrc/linux/block/blk-flush2.c:50 547 {*lshrqi3_1}

(expr_list:REG_EQUAL (lshiftrt:QI (mem:QI (plus:DI (reg/v/f:DI 43 r14 [orig:85 q ] [85])

(const_int 1612 [0x64c])) [20 *q_7+1612 S1 A32])

(const_int 2 [0x2]))

(nil)))

(insn 174 14 182 2 (set (reg:QI 44 r15 [orig:86 D.32480 ] [86])

(reg:QI 37 r8 [orig:86 D.32480 ] [86])) /home/andi/lsrc/linux/block/blk-flush2.c:50 93 {*movqi_internal}

(nil))

(insn 182 174 16 2 (parallel [

(set (reg:SI 44 r15 [orig:86 D.32480 ] [86])

(and:SI (reg:SI 44 r15 [orig:86 D.32480 ] [86])

(const_int 1 [0x1])))

(clobber (reg:CC 17 flags))

]) /home/andi/lsrc/linux/block/blk-flush2.c:50 376 {*andsi_1}

(nil))

Post by Jeff Law
Jeff

Jeff Law

2014-09-30 16:30:21 UTC

yes, and we could get +1.22% (2567 compared with 2536) functions shrink-wrapped
after we sinking more insn except simple "mov dest, src" on x86-64 bootstrap. and
I remember the similar percentage on glibc build.
while on aarch64, the overall functions shrink-wrapped increased +25% on some
programs. maybe we could gain the same on other RISC backend.

Post by Jeff Law
I do agree with Richard that it would be useful to see the insns that
are incorrectly sunk and the surrounding context.

So I must be missing something. I thought the shrink-wrapping code
wouldn't sink arithmetic/logical insns like we see with insn 14 and insn
182. I thought it was limited to reg-reg copies and constant
initializations.

Jeff

Jiong Wang

2014-09-30 18:36:54 UTC

yes, and we could get +1.22% (2567 compared with 2536) functions shrink-wrapped
after we sinking more insn except simple "mov dest, src" on x86-64 bootstrap. and
I remember the similar percentage on glibc build.
while on aarch64, the overall functions shrink-wrapped increased +25% on some
programs. maybe we could gain the same on other RISC backend.

Post by Jeff Law
I do agree with Richard that it would be useful to see the insns that
are incorrectly sunk and the surrounding context.

So I must be missing something. I thought the shrink-wrapping code wouldn't
sink arithmetic/logical insns like we see with insn 14 and insn 182. I
thought it was limited to reg-reg copies and constant initializations.

yes, it was limited to reg-reg copies, and my previous sink improvement aimed to
sink any rtx

A: be single_set
B: the src operand be any combination of no more than one register
and no non-constant objects.

while some operator like shift may have side effect. IMHO, all side
effects are reflected on RTX,
together with this fail_on_clobber_use modification, the rtx returned
by single_set_no_clobber_use is
safe to sink if it meets the above limit B and pass later register
use/def check in move_insn_for_shrink_wrap ?

Regards,
Jiong

Jeff

Jiong Wang

2014-10-08 15:31:04 UTC

Post by Jeff Law
I do agree with Richard that it would be useful to see the insns that
are incorrectly sunk and the surrounding context.

So I must be missing something. I thought the shrink-wrapping code wouldn't
sink arithmetic/logical insns like we see with insn 14 and insn 182. I
thought it was limited to reg-reg copies and constant initializations.

yes, it was limited to reg-reg copies, and my previous sink improvement aimed to
sink any rtx
A: be single_set
B: the src operand be any combination of no more than one register
and no non-constant objects.
while some operator like shift may have side effect. IMHO, all side
effects are reflected on RTX,
together with this fail_on_clobber_use modification, the rtx returned
by single_set_no_clobber_use is
safe to sink if it meets the above limit B and pass later register
use/def check in move_insn_for_shrink_wrap ?

Ping ~

And as there is NONDEBUG_INSN_P check before move_insn_for_shrink_wrap invoked,
we could avoid creating new wrapper function by invoke single_set_2 directly.

comments?

bootstrap ok on x86-64, and no regression on check-gcc/g++.
will do aarch64 bootstrapping/regression going on.

2014-10-08 Jiong Wang <***@arm.com>

* rtl.h (single_set_2): New parameter "fail_on_clobber_use".
(single_set): Likewise.
* config/ia64/ia64.c (ia64_single_set): Likewise.
* rtlanal.c (single_set_2): Return NULL_RTX if fail_on_clobber_use be true.
* shrink-wrap.c (move_insn_for_shrink_wrap): Use single_set_2.

Regards,
Jiong

Post by Jiong Wang
Regards,
Jiong

Jeff

Richard Earnshaw

2014-09-30 14:15:27 UTC

Post by Richard Henderson

What more expensive case?
If you're disallowing USE and CLOBBER, then single_set is just GET_CODE == SET.
I think this function is somewhat useless, and should not be added.
An adjustment to move_insn_for_shrink_wrap may be reasonable though. I haven't
tried to understand the miscompilation yet. I can imagine that this would
disable quite a bit of shrink wrapping for x86 though. Can we do better in
understanding when the clobbered register is live at the location to which we'd
like to move then insns?
r~

I think part of the problem is in the naming of single_set(). From the
name it's not entirely obvious to users that this includes insns that
clobber registers or which write other registers that are unused after
that point. I've previously had to fix a bug where this assumption was
made (eg https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300)

Most uses of single_set prior to register allocation are probably safe;
but later uses are fraught with potential problems of this nature and
may well be bugs waiting to happen.

R.

Jeff Law

2014-09-30 16:45:42 UTC

Post by Richard Earnshaw
I think part of the problem is in the naming of single_set(). From the
name it's not entirely obvious to users that this includes insns that
clobber registers or which write other registers that are unused after
that point. I've previously had to fix a bug where this assumption was
made (eg https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300)
Most uses of single_set prior to register allocation are probably safe;
but later uses are fraught with potential problems of this nature and
may well be bugs waiting to happen.

Very possibly. There's a bit of a natural tension here in that often we
don't much care about the additional CLOBBERS, but when we get it wrong,
obviously it's bad.

I haven't done any research, but I suspect the change it ignore clobbers
in single_set came in as part of exposing the CC register and avoiding
regressions all over the place as a result.

I wonder what would happen if we ignored prior to register allocation,
then rejected insns with those CLOBBERs once register allocation started.

Jeff

Richard Earnshaw

2014-09-30 16:51:32 UTC

It's not just clobbers; it ignores patterns like

(parallel
[(set (a) (...)
(set (b) (...)])
[(reg_note (reg_unused(b))]

Which is probably fine before register allocation but definitely
something you have to think about afterwards.

Post by Jeff Law
I wonder what would happen if we ignored prior to register allocation,
then rejected insns with those CLOBBERs once register allocation started.

Might work; but it might miss some useful cases as well...

R.

Steven Bosscher

2014-09-30 20:17:30 UTC