Discussion:
RFA: improvement to if-conversion
(too old to reply)
Joern RENNECKE
2005-01-12 21:18:42 UTC
Permalink
I've finally found some time to finish the iinitial coding for
if-conversion / cross-jump merge.
I think full register liveness information would be harder to keep
up-to-date as if conversion progresses.
I think that global_live_at_start/end are kept up-to-date. Leastwise,
I'd be interested to know how badly they're off. We re-run life info
at the end, but that's to get death notes inserted properly.
It turns out that it's actually much worse. We don't have any life info
in the first
if-conversion pass at all. A lot of loop optimizations only make sense
after
if-conversion has been properly performed, so I don't think only relying
on the
second if-conversion pass is much good. I suppose that means that I'll
have to
run flow before if-conversion then.
Joern RENNECKE
2005-02-18 19:16:39 UTC
Permalink
Post by Joern RENNECKE
It turns out that it's actually much worse. We don't have any
life info in the first
if-conversion pass at all. A lot of loop optimizations only make
sense after
if-conversion has been properly performed, so I don't think only
relying on the
second if-conversion pass is much good. I suppose that means that
I'll have to
run flow before if-conversion then.
I've implemented this now.
Richard Henderson
2005-02-18 20:18:45 UTC
Permalink
Please create an enhancement PR, and make it depend on 17652,
the 4.1 pending patches meta-bug.

I havn't examined it in detail yet, but it looks generally plausible.


r~
Hans-Peter Nilsson
2005-02-21 19:16:27 UTC
Permalink
Post by Joern RENNECKE
I've implemented this now.
But not built cc1 on PA, I presume? Big typo in
pa_commutative_p. Kind of looks like a Perl function. :-)

brgds, H-P
Joern RENNECKE
2005-07-07 18:18:08 UTC
Permalink
A number of changes in mainline made it necessary to update the patch.
I am regression bootstrapping / testing in the sh-elf-4_1-branch
because of PR22258.
Joern RENNECKE
2005-07-12 12:24:06 UTC
Permalink
Post by Joern RENNECKE
A number of changes in mainline made it necessary to update the patch.
I am regression bootstrapping / testing in the sh-elf-4_1-branch
because of PR22258.
Regression testing in the updated sh-elf-4_1-branch uncovered two problems.
I have attached an updated patch which sucessfully bootstrapped
i686-pc-linux-gnu
and regression tested this platform and i686-pc-linux-gnu X sh-elf in the
sh-elf-4_1-branch (not in mainline because PR22258 is still not resolved).

If you have already begun to review the previous patch, you can find
(relatively small) incremental patches in bugzilla
( http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20070, comments 13 and 14).
Steven Bosscher
2005-07-12 13:47:21 UTC
Permalink
        PR rtl-optimization/20070
        * basic-block.h (STRUCT_EQUIV_START, STRUCT_EQUIV_RERUN): Define.
        (STRUCT_EQUIV_FINAL, STRUCT_EQUIV_MAX_LOCAL): Likewise.
        (struct struct_equiv_checkpoint, struct equiv_info): Likewise.
        (struct_equiv_block_eq): Declare.
        * cfgcleanup.c (reload.h, expr.h): #include.
        (IMPOSSIBLE_MOVE_FACTOR): Define.
        (flow_find_cross_jump): Remove.
        (assign_reg_reg_set, struct_equiv, struct_equiv_set): New functions.
        (struct_equiv_dst_mem, struct_equiv_make_checkpoint): Likewise.
        (struct_equiv_improve_checkpoint): Likewise.
        (struct_equiv_restore_checkpoint, struct_equiv_death): Likewise.
        (struct_equiv_block_eq): Likewise.
        (find_dying_inputs, resolve_input_conflict): Likewise.
        (try_crossjump_to_edge): Use struct_equiv_block_eq instead of
        flow_find_cross_jump.
        * ifcvt.c (noce_try_complex_cmove): New function.
        (noce_process_if_block): Call it.
        For flag_expensive_optimizations, update cond_exec_changed_p on
        success.
        (rest_of_handle_if_conversion): For flag_expensive_optimizations,
        provide if_convert with register lifeness info.
Hi Joern,

If I understand your patch correctly, it makes cross-jumping and one
special if-conversion case consider semantically equivalent blocks,
instead of requiring the blocks to be completely equal. Sounds like
a useful idea.

I have a few comments on / questions about the patch.

First of all, this entire code to match the blocks does IMVHO not
belong in basic-block.h and cfgcleanup.c, but in a separate file.
The code could be useful in a bunch of places outside cfgcleanupc,
as you show yourself by using it in ifcvt.c.


You have this huge comment before "struct equiv_info". Maybe you
can move the comments about the individual fields into the struct
definition, like we do elsewhere. E.g.

+struct equiv_info
+{
+ /* Two block being compared. */
+ basic_block x_block, y_block;
(..>)
+}

instead of the comment before the struct definition.


Why not write this in a human-readable form with ifs and elses? ;-)
+ return ((rvalue < 0
+ ? ((dst_code != STRICT_LOW_PART
+ && dst_code != ZERO_EXTEND
+ && dst_code != SIGN_EXTEND)
+ ? struct_equiv_dst_mem (SET_DEST (x), SET_DEST (y), info)
+ : struct_equiv (&SET_DEST (x), SET_DEST (y), 1, info))
+ : struct_equiv (&SET_DEST (x), SET_DEST (y), 0, info))
+ && struct_equiv (&SET_SRC (x), SET_SRC (y), 1, info));


Why pass a pointer to an rtx, instead of just the rth itself, as the
first argument to struct_equiv() ?
+struct_equiv (rtx *xp, rtx y, int rvalue, struct equiv_info *info)

I suppose this is for validate_change. But that means that struct_equiv
can change the rtx passed to it, right? The comment before the function
doesn't say that (or explain why this is necessary).


You haven't said anything about how this impacts compile time. Given
comments like,
+/* This function must be called up to three times for a successful cross-jump
+ match:
it makes me wonder if this entire idea isn't prohibitively expensive.


Ehm, yikes!
- if_convert (0);
+ if (flag_expensive_optimizations)
+ {
+ basic_block bb;
+
+ life_analysis (dump_file, PROP_REG_INFO);
+ if_convert (1);
+ count_or_remove_death_notes (NULL, 1);
+ FOR_EACH_BB (bb)
+ {
+ bb->il.rtl->global_live_at_start = 0;
+ bb->il.rtl->global_live_at_end = 0;
+ }
+ ENTRY_BLOCK_PTR->il.rtl->global_live_at_start = 0;
+ EXIT_BLOCK_PTR->il.rtl->global_live_at_start = 0;
+ }
+ else
+ if_convert (0);
Is it really crucial to do the complex cmove thing before combine?

Gr.
Steven
Joern RENNECKE
2005-07-12 15:51:33 UTC
Permalink
Post by Steven Bosscher
Hi Joern,
If I understand your patch correctly, it makes cross-jumping and one
special if-conversion case consider semantically equivalent blocks,
instead of requiring the blocks to be completely equal. Sounds like
Yes, that's right.
Post by Steven Bosscher
a useful idea.
I have a few comments on / questions about the patch.
First of all, this entire code to match the blocks does IMVHO not
belong in basic-block.h and cfgcleanup.c, but in a separate file.
The code could be useful in a bunch of places outside cfgcleanupc,
as you show yourself by using it in ifcvt.c.
This patch grew out of a smaller patch to ifcvt.c, which I merged with
cross-jumping on request of rth. See:
http://gcc.gnu.org/ml/gcc-patches/2004-01/msg03354.html
So it is no surprise that it is useful in ifcvt.c . The optimization falls
within the scope stated at the top of cfgcleanup.c - it implements
functionality
for optimizers of the control flow. Nonetheless, I am willing to split
this out
into its own file(s), if a global write/middle-end maintainer states that he
(or she, if a female one is appointed in the meantime) agrees that this
is the
way to go.

alternative proposals:
- keep the code in cfgcleanup.c & basic-block.h
- move the functions to struct-equiv.c, keep the declarations in
basic-block.h
- move the code into struct-equiv.c and struct-equiv.h
- any better ideas?
Post by Steven Bosscher
You have this huge comment before "struct equiv_info". Maybe you
can move the comments about the individual fields into the struct
definition, like we do elsewhere. E.g.
+struct equiv_info
+{
+ /* Two block being compared. */
+ basic_block x_block, y_block;
(..>)
+}
The problem is that meaningful comments for these fields are longer
than a line, so the entire definition would become harder to read than
the comment block is now. If you just want a shorthand reminder of
the purpose of a field, I think the type and the name of the field should
be sufficient. In the example above, the purpose of the entire struct is
comparing two blocks, and both the type and name of the fields say
that these are the blocks.
Post by Steven Bosscher
instead of the comment before the struct definition.
Why not write this in a human-readable form with ifs and elses? ;-)
+ return ((rvalue < 0
+ ? ((dst_code != STRICT_LOW_PART
+ && dst_code != ZERO_EXTEND
+ && dst_code != SIGN_EXTEND)
+ ? struct_equiv_dst_mem (SET_DEST (x), SET_DEST (y), info)
+ : struct_equiv (&SET_DEST (x), SET_DEST (y), 1, info))
+ : struct_equiv (&SET_DEST (x), SET_DEST (y), 0, info))
+ && struct_equiv (&SET_SRC (x), SET_SRC (y), 1, info));
Yes, this statement grew gradually without me noticing when it became
hard to parse for humans. How do you like this variant:

case SET:
{
/* Process destination. */
if (rvalue < 0)
{
/* Ignore the destinations role as a destination. Still,
we have to consider input registers embedded in the addresses
of a MEM. Some special forms also make the entire
destination
a source. */
enum rtx_code dst_code = GET_CODE (SET_DEST (x));

if ((dst_code != STRICT_LOW_PART
&& dst_code != ZERO_EXTEND
&& dst_code != SIGN_EXTEND)
? !struct_equiv_dst_mem (SET_DEST (x), SET_DEST (y), info)
: !struct_equiv (&SET_DEST (x), SET_DEST (y), 1, info))
return false;
}
else if (!struct_equiv (&SET_DEST (x), SET_DEST (y), 0, info))
return false;
/* Process source. */
return struct_equiv (&SET_SRC (x), SET_SRC (y), 1, info);
}
Post by Steven Bosscher
Why pass a pointer to an rtx, instead of just the rth itself, as the
first argument to struct_equiv() ?
+struct_equiv (rtx *xp, rtx y, int rvalue, struct equiv_info *info)
I suppose this is for validate_change. But that means that struct_equiv
can change the rtx passed to it, right? The comment before the function
doesn't say that (or explain why this is necessary).
Yes. How about this comment?

/* Check if *xp is equivalent to Y. Until an an unreconcilable
difference is
found, use in-group changes with validate_change on *xp to make register
assignments agree. It is the (not necessarily direct) callers
responsibility to verify / confirm / cancel these changes, as
appropriate.
RVALUE indicates if the processed piece of rtl is used as a
destination, in
which case we can't have different registers being an input. Returns
nonzero if the two blocks have been identified as equivalent, zero
otherwise.
RVALUE == 0: destination
RVALUE == 1: source
RVALUE == -1: source, ignore SET_DEST of SET / clobber. */
Post by Steven Bosscher
You haven't said anything about how this impacts compile time. Given
comments like,
+/* This function must be called up to three times for a successful cross-jump
it makes me wonder if this entire idea isn't prohibitively expensive.
A final pass is only done if the optimization indeed does go ahead.
A second non-final pass is done only if the first indicates that there
is something
worthwile to optimize, but we had to backtrack at some point, but
couldn't restore
all the state, or found an input conflict. I.e. this is done in corner
cases where we
will likely go ahead with the optimization.
The most common case when comparing two random blocks should be that the
last instructions (the first ones compared) look completely different,
thus the
first pass finds no match whatsoever, and the optimization fails without
doing
further checks.
Post by Steven Bosscher
Is it really crucial to do the complex cmove thing before combine?
Yes, it makes a significant difference for at least one EEMBC benchmark -
which was the motivation to write the patch in the first place.
Without doing the early pass, this doesn't work.
When you have a three-way minimum / maximum, you need this
if-conversion to untangle the if-then-else constructs. IIRC, the main
pojnt was that loop optimizations work a lot better when you have only a
single basic block inside, and after loop optimizations, only one level
of if-conversion can match anymore because of the transformations done.
Joern RENNECKE
2005-07-13 12:39:11 UTC
Permalink
This is the patch with the comment and ?: -> if-then-else
changes discussed yesterday. Bootstrapped & regtested as before.
Joern RENNECKE
2005-07-15 14:04:20 UTC
Permalink
Post by Steven Bosscher
Why not write this in a human-readable form with ifs and elses? ;-)
+ return ((rvalue < 0
+ ? ((dst_code != STRICT_LOW_PART
+ && dst_code != ZERO_EXTEND
+ && dst_code != SIGN_EXTEND)
+ ? struct_equiv_dst_mem (SET_DEST (x), SET_DEST (y), info)
+ : struct_equiv (&SET_DEST (x), SET_DEST (y), 1, info))
+ : struct_equiv (&SET_DEST (x), SET_DEST (y), 0, info))
+ && struct_equiv (&SET_SRC (x), SET_SRC (y), 1, info));
Yes, this statement grew gradually without me noticing when it became
{
/* Process destination. */
if (rvalue < 0)
{
/* Ignore the destinations role as a destination. Still,
we have to consider input registers embedded in the addresses
of a MEM. Some special forms also make the entire
destination
a source. */
enum rtx_code dst_code = GET_CODE (SET_DEST (x));
if ((dst_code != STRICT_LOW_PART
&& dst_code != ZERO_EXTEND
&& dst_code != SIGN_EXTEND)
? !struct_equiv_dst_mem (SET_DEST (x), SET_DEST (y), info)
: !struct_equiv (&SET_DEST (x), SET_DEST (y), 1, info))
return false;
}
else if (!struct_equiv (&SET_DEST (x), SET_DEST (y), 0, info))
return false;
/* Process source. */
return struct_equiv (&SET_SRC (x), SET_SRC (y), 1, info);
}
Why pass a pointer to an rtx, instead of just the rth itself, as the
first argument to struct_equiv() ?
+struct_equiv (rtx *xp, rtx y, int rvalue, struct equiv_info *info)
I suppose this is for validate_change. But that means that struct_equiv
can change the rtx passed to it, right? The comment before the function
doesn't say that (or explain why this is necessary).
Yes. How about this comment?
/* Check if *xp is equivalent to Y. Until an an unreconcilable
difference is
found, use in-group changes with validate_change on *xp to make register
assignments agree. It is the (not necessarily direct) callers
responsibility to verify / confirm / cancel these changes, as
appropriate.
RVALUE indicates if the processed piece of rtl is used as a
destination, in
which case we can't have different registers being an input. Returns
nonzero if the two blocks have been identified as equivalent, zero
otherwise.
RVALUE == 0: destination
RVALUE == 1: source
RVALUE == -1: source, ignore SET_DEST of SET / clobber. */
I've successfully re-tested the patch (i686-pc-linux-gnu bootstrap &
regtest,
i686-pc-linux-gnu X sh-elf build & regtest in sh-elf-4_1-branch) with
these changes,
and with a change that went missing before:
http://gcc.gnu.org/bugzilla/attachment.cgi?id=9280&action=view

I have appended the updated patch.
Joern RENNECKE
2005-07-26 18:23:39 UTC
Permalink
Joern Rennecke wrote in
------------------------------------------------------------------------
PR rtl-optimization/20070
* basic-block.h (STRUCT_EQUIV_START, STRUCT_EQUIV_RERUN): Define.
(STRUCT_EQUIV_FINAL, STRUCT_EQUIV_MAX_LOCAL): Likewise.
(struct struct_equiv_checkpoint, struct equiv_info): Likewise.
(struct_equiv_block_eq): Declare.
* cfgcleanup.c (reload.h, expr.h): #include.
(IMPOSSIBLE_MOVE_FACTOR): Define.
(flow_find_cross_jump): Remove.
(assign_reg_reg_set, struct_equiv, struct_equiv_set): New functions.
(struct_equiv_dst_mem, struct_equiv_make_checkpoint): Likewise.
(struct_equiv_improve_checkpoint): Likewise.
(struct_equiv_restore_checkpoint, struct_equiv_death): Likewise.
(struct_equiv_block_eq): Likewise.
(find_dying_inputs, resolve_input_conflict): Likewise.
(try_crossjump_to_edge): Use struct_equiv_block_eq instead of
flow_find_cross_jump.
* ifcvt.c (noce_try_complex_cmove): New function.
(noce_process_if_block): Call it.
For flag_expensive_optimizations, update cond_exec_changed_p on
success.
(rest_of_handle_if_conversion): For flag_expensive_optimizations,
provide if_convert with register lifeness info.
* recog.c (verify_changes): Make it static.
* recog.h: Remove the corresponding prototype.
I have successfuly bootstrapped & regression tested this patch in
mainline from
2005-07-22 13:00 UTC on i686-pc-linux-gnu.
Bernd Schmidt
2005-11-29 13:24:10 UTC
Permalink
When I mentioned on the mailing list that I was going to help review the
CFO branch since we (Analog Devices) are interested in code size
optimizations, people on IRC pointed me at this patch as a potentially
better starting point, so I'll try to review it first. Here's my first
pass of comments. More to come, probably, once I've played with the
patch some more and understand it better.
* ifcvt.c (noce_try_complex_cmove): New function.
(noce_process_if_block): Call it.
For flag_expensive_optimizations, update cond_exec_changed_p on
success.
(rest_of_handle_if_conversion): For flag_expensive_optimizations,
provide if_convert with register lifeness info.
Can we postpone the ifcvt changes? They look quite straightforward, but
it seems like they are independent of the rest. Once we've done the
heavy lifting to get the first part in, we can revisit this.

One general comment: You're matching entire pieces of RTL. Wouldn't it
be possible/simpler to match insn_codes, extract the operands, and then
just look at those? I have in mind something that looks a bit like
parts of regrename.

I was wondering about the use of the find_reg_equal_equiv_note - lots of
code seems to use that function, but do we really want to mess with
REG_EQUIV notes?

Like Stephen B., I'd like to see this moved to a separate file; then you
can also lose the struct_equiv_ prefix from most function names and make
them more meaningful.
+/* In cfgcleanup.c */
Needs updating when moving the code.
+/* A struct equiv_info is used to pass information to struct_equiv and
+ to gather state while two basic blocks are checked for structural
+ equivalence.
Like Stephen, I'd like to see the rest of this comment distributed so
that each structure member has its own comment in front of it. See also
below - I'd like to see a struct_equiv_checkpoint used inside this
monster struct to keep track of current state. Further improvements
might be to split off those members which track state, and others which
are just inputs set by the callers of the struct_equiv pass ("struct
struct_equiv_params"?). Such a split could either result in two
structures, or just two clearly separated areas inside the same
structure. There should be some kind of logical grouping instead of
just a collection of members.
+ X_BLOCK and Y_BLOCK are the two block being compared.
"blocks"
+ NEED_FULL_BLOCK is set when the entire basic blocks have to match, as
+ in if-conversion. It is cleared for cross-jumping.
Might be better as an arg to struct_equiv_block_eq or part of a params
struct, since it's only needed in the toplevel function. Anything to
get this struct here down in size...
+ bool local_rvalue[STRUCT_EQUIV_MAX_LOCAL];
+ rtx x_local[STRUCT_EQUIV_MAX_LOCAL], y_local[STRUCT_EQUIV_MAX_LOCAL];
Might want to turn this into an array of structs. Then we could also
make STRUCT_EQUIV_MAX_LOCAL a runtime param more easily.
+static int
+assign_reg_reg_set (regset set, rtx reg, int value)
+{
+ unsigned regno = REGNO (reg);
+ int nregs, i, old;
+
+ if (regno >= FIRST_PSEUDO_REGISTER)
+ {
+ if (reload_completed)
+ regno = reg_renumber[regno];
+ else
+ {
+ if ((value != 0) == REGNO_REG_SET_P (set, regno))
+ return 0;
+ if (value)
+ SET_REGNO_REG_SET (set, regno);
+ else
+ CLEAR_REGNO_REG_SET (set, regno);
+ return value ? 1 : -1;
+ }
+ }
+ old = 0;
+ for (i = nregs = hard_regno_nregs[regno][GET_MODE (reg)]; --i >=
0; regno++)
+ {
+ old += REGNO_REG_SET_P (set, regno);
+ if (value)
+ SET_REGNO_REG_SET (set, regno);
+ else
+ CLEAR_REGNO_REG_SET (set, regno);
+ }
+ return value ? nregs - old : -old;
+}
Looks slightly obfuscated to me with the special case for pseudos.
Please combine the two loops into one by setting a NREGS variable to 1
for pseudos.
+ RVALUE == 0: destination
+ RVALUE == 1: source
+ RVALUE == -1: source, ignore SET_DEST of SET / clobber. */
Better an enum.
+ it already registerd. */
"registered"
+ change *= IMPOSSIBLE_MOVE_FACTOR;
Just an assignment would be enough? Wouldn't like this to overflow, and
I don't think we need different grades of impossible.
+/* Do only the struct_equiv SET_DEST processing for SETs and
CLOBBERs. */
+static bool
+struct_equiv_set (rtx x, rtx y, struct equiv_info *info)
Not too happy with some of these function names. "set_dest_equiv_p"?
Should mention return value in comment. The rest of the description
could also be enlarged to describe the necessary details of the
algorithm to understand why we need this function.
+/* Record state about current inputs / local registers / lifeness
+ in CHECKPOINT[N] of INFO. */
+static void
+struct_equiv_make_checkpoint (int n, struct equiv_info *info)
+{
+ struct struct_equiv_checkpoint *p = &info->checkpoint[n];
+
+ p->ninsns = info->ninsns;
+ p->version = info->version;
+ p->local_count = info->local_count;
+ p->input_count = info->input_count;
+ p->x_start = info->x_start;
+ p->y_start = info->y_start;
+ p->input_valid = info->input_valid;
+}
Seems like you could use a checkpoint structure inside struct equiv_info
and do a struct copy here. That would give struct equiv_info a bit more
structure.
+/* Restore state about current inputs / local registers / lifeness
+ from CHECKPOINT[N] of INFO. */
+static void
+struct_equiv_restore_checkpoint (int n, struct equiv_info *info)
+{
+ struct struct_equiv_checkpoint *p = &info->checkpoint[n];
+
+ info->ninsns = p->ninsns;
+ info->x_start = p->x_start;
+ info->y_start = p->y_start;
+ info->input_count = p->input_count;
+ info->input_valid = p->input_valid;
+ while (info->local_count > p->local_count)
+ {
+ info->local_count--;
+ info->version--;
+ if (REGNO_REG_SET_P (info->x_local_live,
+ REGNO (info->x_local[info->local_count])))
+ {
+ assign_reg_reg_set (info->x_local_live,
+ info->x_local[info->local_count], 0);
+ assign_reg_reg_set (info->y_local_live,
+ info->y_local[info->local_count], 0);
+ info->version--;
+ }
+ }
I'm not sure I understand what's going on here. This goes back and gets
reg numbers out of x_local and y_local, indexed by local_count. For
this to work, these arrays must always (except for in this function)
grow and never shrink - correct?
But there's code after the comment
/* This might be a register local to each block. See if we
have
it already registerd. */
which seems to shrink the arrays and overwrite entries. How does this
work together?
+ if (info->version != p->version)
+ info->need_rerun = true;
+}
+
+/* Check if the set of REG_DEAD notes attached to I1 and I2 allows us to
+ go ahead with merging I1 and I2, which otherwise look fine.
+ Inputs / local registers for the inputs of I1 and I2 have already
been
+ set up. */
+static bool
+struct_equiv_death (rtx i1 ATTRIBUTE_UNUSED, rtx i2 ATTRIBUTE_UNUSED,
+ struct equiv_info *info ATTRIBUTE_UNUSED)
"death_notes_match_p"?
+{
+#ifdef STACK_REGS
+ /* If cross_jump_death_matters is not 0, the insn's mode
+ indicates whether or not the insn contains any stack-like regs. */
+
+ if (INSN_P (i1)
+ && (info->mode & CLEANUP_POST_REGSTACK) &&
stack_regs_mentioned (i1))

Is this bit always set after regstack, or do we need an extra test such
as reload_completed?
+/* Return number of matched insns. */
Matched how? I'd like a human-readable overview of the algorithm somewhere.
+/* This function must be called up to three times for a successful cross-jump
+ first to find out which instructions do match. While trying to match
+ another instruction that doesn't match, we destroy information in
info
+ about the actual inputs. So if there have been any before the last
+ match attempt, we need to callthis function again to recompute the
"call this"
+ actual inputs up to the actual start of the matching sequence.
+ When we are then satisfied that the cross-jump is worth while, we
"worthwhile"
+ if (info->x_start != x_stop) for (;;)
Formatting.
+ /* The following code helps take care of G++ cleanups. */
Comment not helpful. How does it help? What's special about G++ cleanups?


Bernd
Joern RENNECKE
2005-11-29 21:54:23 UTC
Permalink
Post by Bernd Schmidt
Can we postpone the ifcvt changes? They look quite straightforward,
but it seems like they are independent of the rest. Once we've done
the heavy lifting to get the first part in, we can revisit this.
That's a bit backwards with respect to the motivation of this patch, as
the starting point was the if-conversion enhancement, but I am willing
to split it up like this if you are prepared to review both parts.
Post by Bernd Schmidt
One general comment: You're matching entire pieces of RTL. Wouldn't
it be possible/simpler to match insn_codes, extract the operands, and
then just look at those? I have in mind something that looks a bit
like parts of regrename.
It wouldn't be simpler in code, since we have to match arbitrary rtl for
operands and notes. Even if we spent lots of time
to make genrecog emit a an optimized parser like bison or byacc do -
which I don't see that happening anytime soon - I
would still expect it to be slower than matching two rtl structures to
compare them. At the moment, I think using recog
will be much slower. Have you run a debugging session recently
figuring out why a pattern was not matched? The code
looks quite horrible.
We could also fail to merge memory attributes where the match_operand is
of for the memory address inside.
And finally, we could loose some optimizations when separate insn
patterns for registers and/or constants are used, and the ease of
implementing recognizing cases where constants will be used in
different-looking rtl than a variant using a register because of rtl
canonicalizations applied.
Post by Bernd Schmidt
I was wondering about the use of the find_reg_equal_equiv_note - lots
of code seems to use that function, but do we really want to mess with
REG_EQUIV notes?
We have to avoid leaving REG_EQUIV or REG_EQUAL notes around that are
factually wrong - they can lead to incorrect code in later optimization
steps.
It is also not a good idea to willy-nilly remove notes when we are not
going to do any optimization at all, or if the notes are still correct -
we pay with missed rematerialization in reload for missing notes.
Post by Bernd Schmidt
Like Stephen B., I'd like to see this moved to a separate file; then
you can also lose the struct_equiv_ prefix from most function names
and make them more meaningful.
struct-equiv.c ?
I suppose we can keep the header stuff in basic-block.h, it would be
confusing an error-prone if the STRUCT_EQUIV_START etc. numbers are in a
different file, but the same number sequence as the CLEANUP_* numbers.
Post by Bernd Schmidt
+/* A struct equiv_info is used to pass information to
struct_equiv and
+ to gather state while two basic blocks are checked for structural
+ equivalence.
Like Stephen, I'd like to see the rest of this comment distributed so
that each structure member has its own comment in front of it. See
also below - I'd like to see a struct_equiv_checkpoint used inside
this monster struct to keep track of current state. Further
improvements might be to split off those members which track state,
and others which are just inputs set by the callers of the
struct_equiv pass ("struct struct_equiv_params"?). Such a split could
either result in two structures, or just two clearly separated areas
inside the same structure. There should be some kind of logical
grouping instead of just a collection of members.
I don't think we should have multiple separate structures; that would
unnecessarily increase the argument passing.
Post by Bernd Schmidt
+ NEED_FULL_BLOCK is set when the entire basic blocks have to
match, as
+ in if-conversion. It is cleared for cross-jumping.
Might be better as an arg to struct_equiv_block_eq or part of a params
struct, since it's only needed in the toplevel function. Anything to
get this struct here down in size...
I think this can actually folded as a STRUCT_EQUIV_NEED_FULL_BLOCK into
mode.
I have attached a draft of the new struct declaration.
Post by Bernd Schmidt
+ bool local_rvalue[STRUCT_EQUIV_MAX_LOCAL];
+ rtx x_local[STRUCT_EQUIV_MAX_LOCAL],
y_local[STRUCT_EQUIV_MAX_LOCAL];
Might want to turn this into an array of structs. Then we could also
make STRUCT_EQUIV_MAX_LOCAL a runtime param more easily.
While x_local and y_local generally appear toghether, local_rvalue does
not. Having an array of a struct of two rtx and one bool also causes
inefficient use of space and awkward indexing. Is there any reason to
make MAX_LOCAL a runtime param? The algorithm is not designed to scale
well with a really large MAX_LOCAL.
Post by Bernd Schmidt
+ change *= IMPOSSIBLE_MOVE_FACTOR;
Just an assignment would be enough? Wouldn't like this to overflow,
and I don't think we need different grades of impossible.
Actually, we do. As explained in the comment above
IMPOSSIBLE_MOVE_FACTOR, we don't know during scanning if the register
pair under consideration is actually an input for the block we'll end up
with, or just a pair of local registers that are set and die in the
block. E.g. you could have a register in a class that needs a secondary
reload to move between registers in the same class, like the branch
target registers on the SH5; there are also at least three such classes
on ia64.
So, input_cost might be temporarily greater or equal to
IMPOSSIBLE_MOVE_FACTOR, but decrease later to a more mundane value.
Note that change is a signed value.
An overflow is not possible with 32 bit integers on the host for any
current target; you'd have to have more than 107374 hard registers for
that to happen, or patch STRUCT_EQUIV_MAX_LOCAL to a number larger than
that and run before reload.
Would you like me to add a sentence to the comment above
IMPOSSIBLE_MOVE_FACTOR, like below?

This number, multiplied with the larger of STRUCT_EQUIV_MAX_LOCAL and
FIRST_PSEUDO_REGISTER, must fit in the input_cost field of struct
equiv_info.
Post by Bernd Schmidt
+/* Do only the struct_equiv SET_DEST processing for SETs and
CLOBBERs. */
+static bool
+struct_equiv_set (rtx x, rtx y, struct equiv_info *info)
Not too happy with some of these function names. "set_dest_equiv_p"?
Yes, that makes sense if it is in a separate file. Although it makes it
less obvious that it's a variation on the same theme as struct_equiv.
Maybe it is better to rename that into rtx_equiv_p then.
What other function names?
Post by Bernd Schmidt
Should mention return value in comment.
"Return true fur success." ?
Post by Bernd Schmidt
The rest of the description could also be enlarged to describe the
necessary details of the algorithm to understand why we need this
function.
Obviously we could hand-inline this function, so I suppose you mean why
we need to process set/clobber destinations separately from input
values. This is really something struct_equiv_block_eq does, so should
I put it into the pre-function comment there?
This function scans backwards through the insns. In doing so, it has
to process first all the
SET_DESTs, then the SET_SRCes, then the REG_NOTES, in order to keep
the register
lifeness information consistent.
Post by Bernd Schmidt
+ while (info->local_count > p->local_count)
+ {
+ info->local_count--;
+ info->version--;
+ if (REGNO_REG_SET_P (info->x_local_live,
+ REGNO (info->x_local[info->local_count])))
+ {
+ assign_reg_reg_set (info->x_local_live,
+ info->x_local[info->local_count], 0);
+ assign_reg_reg_set (info->y_local_live,
+ info->y_local[info->local_count], 0);
+ info->version--;
+ }
+ }
I'm not sure I understand what's going on here. This goes back and
gets reg numbers out of x_local and y_local, indexed by local_count.
For this to work, these arrays must always (except for in this
function) grow and never shrink - correct?
But there's code after the comment
/* This might be a register local to each block. See if
we have
it already registerd. */
which seems to shrink the arrays and overwrite entries. How does this
work together?
After the old entry is deleted, a new one is added, and the version
number increased. Thus, for this register enlargement, there will be a
version increment, but no net change of info->cur.local_count. Hence,
the while loop will fail to restore info.cur.version to p->version, and
info->need_rerun will be set. I.e. the register liveness information is
known
to be flawed, and another pass will be required if the partial match so
far otherwise looks promising.
Post by Bernd Schmidt
+struct_equiv_death (rtx i1 ATTRIBUTE_UNUSED, rtx i2
ATTRIBUTE_UNUSED,
+ struct equiv_info *info ATTRIBUTE_UNUSED)
"death_notes_match_p"?
Ok.
Post by Bernd Schmidt
+{
+#ifdef STACK_REGS
+ /* If cross_jump_death_matters is not 0, the insn's mode
+ indicates whether or not the insn contains any stack-like
regs. */
+
+ if (INSN_P (i1)
+ && (info->mode & CLEANUP_POST_REGSTACK) &&
stack_regs_mentioned (i1))
Is this bit always set after regstack, or do we need an extra test
such as reload_completed?
AFAICS, there is no other pass that uses cleanup_cfg after regstack.
FWIW, this code is
stems basically from the old cross-jumping code.
Post by Bernd Schmidt
+/* Return number of matched insns. */
Matched how? I'd like a human-readable overview of the algorithm somewhere.
If it's gonna have its own file, we might as well put a blurb at the top.
/* Try to match two basic blocks - or their ends - for structural
equivalence.
We scan the blocks from their ends backwards. When we see mismatching
registers, we try to figure out if these could be registers local to
these (partial)
blocks, or input values.
The main entry point to this file is struct_equiv_block_eq. This
function uses a
struct equiv_info to accept some of its inputs, to keep track of its
internal state, to pass down
to its helper functions, and to communicate some of the results back
to the caller.
struct_equiv_block_eq has to be called first with the
STRUCT_EQUIV_START bit set in the
mode parameter. This will calculate the number of matched insns and
the number and types of
inputs. If the need_rerun field is set, the results are only
tentative, and the caller has to call
again with STRUCT_EQUIV_RERUN till need_rerun is false in order to
get a reliable match.
To install the changes necessary for the match, the function has to
be called again with
STRUCT_EQUIV_FINAL.
*/
Post by Bernd Schmidt
+ if (info->x_start != x_stop) for (;;)
Formatting.
IIRC there were 80-char-per-line problems with adding two indentation
levels there. Would you
consider:

bool loop;
...
for (loop = info->x_start != x_stop; loop; loop = true)
Post by Bernd Schmidt
+ /* The following code helps take care of G++ cleanups. */
Comment not helpful. How does it help? What's special about G++ cleanups?
I'm afraid that I have no idea. That stuff comes from the old
cross-jumping code too,
added by Jan Hubicka on the 11th July 2001 to flow.c , moved on the 10th
September 2001
to cfgcleanup.c, later broken out into its own function, and reformatted
a number of times.
Bernd Schmidt
2005-11-30 15:15:48 UTC
Permalink
Post by Joern RENNECKE
Post by Bernd Schmidt
Can we postpone the ifcvt changes? They look quite straightforward,
but it seems like they are independent of the rest. Once we've done
the heavy lifting to get the first part in, we can revisit this.
That's a bit backwards with respect to the motivation of this patch, as
the starting point was the if-conversion enhancement, but I am willing
to split it up like this if you are prepared to review both parts.
I intend to; but regardless of that, splitting things up into smaller,
logically independent parts will always help get things into the tree
more quickly.

Couple of new comments:
Why are we using true_regnum at all here, post reload? All regs should
be hardregs at that point, and subregs of them should have been
eliminated. Same goes for reg_overlap_mentioned_for_reload_p, and...

-: 1230: /* Make sure we enter this pair in its
renumbered form. */
259008: 1231: if (x_regno != REGNO (x))
#####: 1232: x = gen_rtx_REG (GET_MODE (x), x_regno);
259008: 1233: if (y_regno != REGNO (y))
#####: 1234: y = gen_rtx_REG (GET_MODE (y), y_regno);

... this bit appears to be dead code too. Speaking of which,

-: 1269: case SET:
-: 1270: {
-: 1271: /* Process destination. */
4647110: 1272: if (rvalue < 0)
-: 1273: {
-: 1274: /* Ignore the destinations role as a
destination. Still,
-: 1275: we have to consider input registers
embedded in the addresses
-: 1276: of a MEM. Some special forms also make
the entire destination
-: 1277: a source. */
4647110: 1278: enum rtx_code dst_code = GET_CODE (SET_DEST
(x));
-: 1279:
4647110: 1280: if ((dst_code != STRICT_LOW_PART
-: 1281: && dst_code != ZERO_EXTEND
-: 1282: && dst_code != SIGN_EXTEND)
-: 1283: ? !struct_equiv_dst_mem (SET_DEST (x),
SET_DEST (y), info)
-: 1284: : !struct_equiv (&SET_DEST (x),
SET_DEST (y), 1, info))
16497: 1285: return false;
-: 1286: }
#####: 1287: else if (!struct_equiv (&SET_DEST (x), SET_DEST
(y), 0, info))
#####: 1288: return false;
-: 1289: /* Process source. */
4630613: 1290: return struct_equiv (&SET_SRC (x), SET_SRC (y),
1, info);
-: 1291: }
-: 1292: case CLOBBER:
1904: 1293: return rvalue < 0 || struct_equiv (&XEXP (x, 0),
XEXP (y, 0), 0, info);

It would seem that when called for the toplevel pattern, rvalue is
always -1, so we shouldn't get a SET or CLOBBER with a different rvalue.
Have the code gcc_assert that and lose the dead code.
Post by Joern RENNECKE
Post by Bernd Schmidt
One general comment: You're matching entire pieces of RTL. Wouldn't
it be possible/simpler to match insn_codes, extract the operands, and
then just look at those? I have in mind something that looks a bit
like parts of regrename.
It wouldn't be simpler in code, since we have to match arbitrary rtl for
operands and notes.
True, but I was thinking we can tell quickly which operands are inputs
and which ones are outputs and handle them accordingly.

I won't insist, I just thought it would be kind of nice to have
similar-looking algorithms.
Post by Joern RENNECKE
Even if we spent lots of time
to make genrecog emit a an optimized parser like bison or byacc do -
which I don't see that happening anytime soon - I
would still expect it to be slower than matching two rtl structures to
compare them.
recog_memoized doesn't take long if we have an insn_code already (we
seem to, at this point in the compilation). insn_extract should also be
rather quick.
Post by Joern RENNECKE
At the moment, I think using recog
will be much slower. Have you run a debugging session recently
figuring out why a pattern was not matched? The code
looks quite horrible.
Has anything changed recently in that department?
Post by Joern RENNECKE
We could also fail to merge memory attributes where the match_operand is
of for the memory address inside.
Do any ports generate insns with different MEM attributes outside
operands? That sounds rather broken.
Post by Joern RENNECKE
And finally, we could loose some optimizations when separate insn
patterns for registers and/or constants are used, and the ease of
implementing recognizing cases where constants will be used in
different-looking rtl than a variant using a register because of rtl
canonicalizations applied.
Post by Bernd Schmidt
I was wondering about the use of the find_reg_equal_equiv_note - lots
of code seems to use that function, but do we really want to mess with
REG_EQUIV notes?
We have to avoid leaving REG_EQUIV or REG_EQUAL notes around that are
factually wrong - they can lead to incorrect code in later optimization
steps.
It is also not a good idea to willy-nilly remove notes when we are not
going to do any optimization at all, or if the notes are still correct -
we pay with missed rematerialization in reload for missing notes.
Post by Bernd Schmidt
Like Stephen B., I'd like to see this moved to a separate file; then
you can also lose the struct_equiv_ prefix from most function names
and make them more meaningful.
struct-equiv.c ?
Seems plausible. Maybe functions like insns_match_p can move too, then
a name like "match-insns.c" might be better. The exact name is
uncritical though.
Post by Joern RENNECKE
I suppose we can keep the header stuff in basic-block.h, it would be
confusing an error-prone if the STRUCT_EQUIV_START etc. numbers are in a
different file, but the same number sequence as the CLEANUP_* numbers.
Sure.
Post by Joern RENNECKE
I don't think we should have multiple separate structures; that would
unnecessarily increase the argument passing.
Ok. The new struct you appended at the end looks fine.
Post by Joern RENNECKE
Post by Bernd Schmidt
+ bool local_rvalue[STRUCT_EQUIV_MAX_LOCAL];
+ rtx x_local[STRUCT_EQUIV_MAX_LOCAL],
y_local[STRUCT_EQUIV_MAX_LOCAL];
Might want to turn this into an array of structs. Then we could also
make STRUCT_EQUIV_MAX_LOCAL a runtime param more easily.
While x_local and y_local generally appear toghether, local_rvalue does
not. Having an array of a struct of two rtx and one bool also causes
inefficient use of space and awkward indexing. Is there any reason to
make MAX_LOCAL a runtime param? The algorithm is not designed to scale
well with a really large MAX_LOCAL.
ISTR there seems to be a general rule/guideline to make this kind of
thing runtime tunable with a --param. If we can do it easily we
probably should, otherwise let's not bother.
Post by Joern RENNECKE
Post by Bernd Schmidt
Just an assignment would be enough? Wouldn't like this to overflow,
and I don't think we need different grades of impossible.
Actually, we do.
Understood, thanks.
Post by Joern RENNECKE
This number, multiplied with the larger of STRUCT_EQUIV_MAX_LOCAL and
FIRST_PSEUDO_REGISTER, must fit in the input_cost field of struct
equiv_info.
Might be helpful to add this.
Post by Joern RENNECKE
Post by Bernd Schmidt
+/* Do only the struct_equiv SET_DEST processing for SETs and
CLOBBERs. */
+static bool
+struct_equiv_set (rtx x, rtx y, struct equiv_info *info)
Not too happy with some of these function names. "set_dest_equiv_p"?
Yes, that makes sense if it is in a separate file. Although it makes it
less obvious that it's a variation on the same theme as struct_equiv.
Maybe it is better to rename that into rtx_equiv_p then.
What other function names?
Maybe struct_equiv_dst_mem -> set_dest_inputs_equiv_p?
Post by Joern RENNECKE
Post by Bernd Schmidt
Should mention return value in comment.
"Return true fur success." ?
Plausible :)
Post by Joern RENNECKE
Post by Bernd Schmidt
The rest of the description could also be enlarged to describe the
necessary details of the algorithm to understand why we need this
function.
Obviously we could hand-inline this function,
We could, but that's beside the point. The point is that a simple
comment along the lines of "This is the first step in processing each
insn; we handle set destinations first because we're scanning backwards,
and need to terminate the lifetimes of anything that's set in this insn"
would help a new reader understand the algorithm more quickly.
Post by Joern RENNECKE
so I suppose you mean why
we need to process set/clobber destinations separately from input
values. This is really something struct_equiv_block_eq does, so should
I put it into the pre-function comment there?
If you add a top-level overview at the start of the file, that's
probably a good place for this kind of description.
Post by Joern RENNECKE
After the old entry is deleted, a new one is added, and the version
number increased. Thus, for this register enlargement, there will be a
version increment, but no net change of info->cur.local_count. Hence,
the while loop will fail to restore info.cur.version to p->version, and
info->need_rerun will be set. I.e. the register liveness information is
known
to be flawed, and another pass will be required if the partial match so
far otherwise looks promising.
Still trying to wrap my head around this - the REG handling in function
seems difficult; maybe more comments describing the whys of what we are
doing might be helpful.
Couldn't we just enter the same register into the [xy]_local arrays
multiple times, along with a bit that records the change (live <->
dead)? We'd save the scan backwards, and we could always restore to a
previous checkpoint, which in turn means we no longer need the rerun
machinery? MAX_LOCAL probably needs to grow, but if we're not going to
loop over these arrays anymore, that's no problem.
Post by Joern RENNECKE
Post by Bernd Schmidt
+{
+#ifdef STACK_REGS
+ /* If cross_jump_death_matters is not 0, the insn's mode
+ indicates whether or not the insn contains any stack-like
regs. */
+
+ if (INSN_P (i1)
+ && (info->mode & CLEANUP_POST_REGSTACK) &&
stack_regs_mentioned (i1))
Is this bit always set after regstack, or do we need an extra test
such as reload_completed?
AFAICS, there is no other pass that uses cleanup_cfg after regstack.
FWIW, this code is
stems basically from the old cross-jumping code.
But we're (going to) use it in one new place, ifcvt. Don't know if this
is going to be a problem.
Post by Joern RENNECKE
If it's gonna have its own file, we might as well put a blurb at the top.
Yes.
Post by Joern RENNECKE
/* Try to match two basic blocks - or their ends - for structural
equivalence.
We scan the blocks from their ends backwards. When we see mismatching
registers, we try to figure out if these could be registers local to
these (partial)
blocks, or input values.
The main entry point to this file is struct_equiv_block_eq. This
function uses a
struct equiv_info to accept some of its inputs, to keep track of its
internal state, to pass down
to its helper functions, and to communicate some of the results back
to the caller.
struct_equiv_block_eq has to be called first with the
STRUCT_EQUIV_START bit set in the
mode parameter. This will calculate the number of matched insns and
the number and types of
inputs. If the need_rerun field is set, the results are only
tentative, and the caller has to call
again with STRUCT_EQUIV_RERUN till need_rerun is false in order to get
a reliable match.
To install the changes necessary for the match, the function has to be
called again with
STRUCT_EQUIV_FINAL.
*/
Sounds good. There should also be a description of the processing done
on each insn (i.e., SET_DESTs first, then another pass for inputs).
Post by Joern RENNECKE
Post by Bernd Schmidt
+ if (info->x_start != x_stop) for (;;)
Formatting.
IIRC there were 80-char-per-line problems with adding two indentation
levels there.
Can't see anything that would be problematic.
Post by Joern RENNECKE
Post by Bernd Schmidt
+ /* The following code helps take care of G++ cleanups. */
Comment not helpful. How does it help? What's special about G++ cleanups?
I'm afraid that I have no idea. That stuff comes from the old
cross-jumping code too,
added by Jan Hubicka on the 11th July 2001 to flow.c , moved on the 10th
September 2001
to cfgcleanup.c, later broken out into its own function, and reformatted
a number of times.
Ah, I see, it occurs somewhere else too. Common parts should move into
a separate function - unless the corresponding bits in insns_match_p
aren't actually needed anymore (gcov shows no coverage, but that may not
mean anything)?

Oh dear, this comment is even older than Jan's change - it was present
in the old jump.c cross jumper.
Post by Joern RENNECKE
------------------------------------------------------------------------
+ /* The following are ORed in on top of the CLEANUp* flags in calls to
"CLEANUP_*"

[New structure definitions snipped]

Much better, thank you.


Bernd
Joern RENNECKE
2005-11-30 21:18:44 UTC
Permalink
Post by Bernd Schmidt
Why are we using true_regnum at all here, post reload? All regs
should be hardregs at that point, and subregs of them should have been
eliminated. Same goes for reg_overlap_mentioned_for_reload_p, and...
Because the cross-jumping code used reg_renumbered_equal_p, which also
takes reg_renumber into account and looks into subregs.
I remember that we had REG_NOTES after reload that still referred to
pseudos; I don't know if that can still happen now.
I suppose for 4.2 I can replace the renumbering code with gcc_assert
(!reload_completed || regno < FIRST_PSEUDO_REGISTER) .
However, we have to be able to handle pseudos, and subregs of pseudos
and hardregs, for this code to be any use for if-convert.

We might also do a bit of cross-jumping before reload in the future,
e.g. when we find entire blocks that are equivalent.
...
Post by Bernd Schmidt
It would seem that when called for the toplevel pattern, rvalue is
always -1, so we shouldn't get a SET or CLOBBER with a different
rvalue. Have the code gcc_assert that and lose the dead code.
We can still be called for a PRE_MODIFY / POST_MODIFY.
Post by Bernd Schmidt
Post by Joern RENNECKE
At the moment, I think using recog
will be much slower. Have you run a debugging session recently
figuring out why a pattern was not matched? The code
looks quite horrible.
Has anything changed recently in that department?
Not that I know of. I said 'recently' because of the human tendency to
blur bad memories.
Post by Bernd Schmidt
Post by Joern RENNECKE
We could also fail to merge memory attributes where the match_operand
is of for the memory address inside.
Do any ports generate insns with different MEM attributes outside
operands? That sounds rather broken.
The mem attributes can be generated by machine-independent code. Often
optimizer code just generates rtl and tries
if it matches; thus, a mem with attributes can easily be matched against
a MEM that is a fixed part of the insn pattern.
These attributes might be irrelevant for the machine description, but
meaningful for the alias analysis; hence the need to merge them.
Post by Bernd Schmidt
Post by Joern RENNECKE
struct-equiv.c ?
Seems plausible. Maybe functions like insns_match_p can move too,
then a name like "match-insns.c" might be better. The exact name is
uncritical though.
Yes, that's one good thing about subversion. However, the response
times and the average command line lengths are still a lot longer than
for cvs :-(
Post by Bernd Schmidt
Post by Joern RENNECKE
While x_local and y_local generally appear toghether, local_rvalue
does not. Having an array of a struct of two rtx and one bool also
causes inefficient use of space and awkward indexing. Is there any
reason to make MAX_LOCAL a runtime param? The algorithm is not
designed to scale well with a really large MAX_LOCAL.
ISTR there seems to be a general rule/guideline to make this kind of
thing runtime tunable with a --param. If we can do it easily we
probably should, otherwise let's not bother.
As far as I can tell, 16 registers is plenty for matching structurally
equivalent code, i.e. having more would hardly increase code quality,
and the overhead to search through 16 items is in the ballpark or less
then what we'd spend with any elaborate data structure. --param is best
when there is a hard trade-off between compilation time and run time, or
different target processors or code bases have different optimum points
for a tunable value. Since neither appears to be the case here, I think
it would be unnecessary complexity to make this configurable.
Post by Bernd Schmidt
Maybe struct_equiv_dst_mem -> set_dest_inputs_equiv_p?
That would be misleading. STRICT_LOW_PART / ZERO_EXTEND / SIGN_EXTEND
is handled by the caller.
set_dest_addr_equiv_p ?
Post by Bernd Schmidt
Still trying to wrap my head around this - the REG handling in
function seems difficult; maybe more comments describing the whys of
what we are doing might be helpful.
Couldn't we just enter the same register into the [xy]_local arrays
multiple times, along with a bit that records the change (live <->
dead)? We'd save the scan backwards, and we could always restore to a
previous checkpoint, which in turn means we no longer need the rerun
machinery? MAX_LOCAL probably needs to grow, but if we're not going
to loop over these arrays anymore, that's no problem.
I don't think the rerun machinery is expensive. Let's face it, most of
the time, this optimization won't succeed, and we'll be able to tell
that straight ahead because we have matched too little instructions (or
too little instructions for the number of input we have). Thus,
actually having to re-run the scan will be much rarer than running the
first pass.
Also, we can't add willy-nilly new mappings; if there is an existing,
live one for one or both of the registers, we can only acept the new one
if it is identical to or a consistent extension of the old mapping.
One possible way to avoid a search is when we find that the registers
are not in x_local_live / y_local_live, set a flag for that, and if this
flag is set and the registers are both either pseudo registers, or hard
registers with hard_regno_nregs == 1, we know they have not been
encountered before.
I have considered handling multiple lifetimes with varying pairings, or
hard registers with hard_regno_nregs mismatch, but I suspect that's a
lot of extra trouble for little or no gain.
Post by Bernd Schmidt
But we're (going to) use it in one new place, ifcvt. Don't know if
this is going to be a problem.
All the current ifcvt passes happen before reg-stack conversion. If we
want another one after reg-stack, we can
set the CLEANUP_POST_REGSTACK bit appropriately.
Post by Bernd Schmidt
Post by Joern RENNECKE
Post by Bernd Schmidt
+ /* The following code helps take care of G++ cleanups. */
Comment not helpful. How does it help? What's special about G++ cleanups?
I'm afraid that I have no idea. That stuff comes from the old
cross-jumping code too,
added by Jan Hubicka on the 11th July 2001 to flow.c , moved on the
10th September 2001
to cfgcleanup.c, later broken out into its own function, and
reformatted a number of times.
Ah, I see, it occurs somewhere else too. Common parts should move
into a separate function - unless the corresponding bits in
insns_match_p aren't actually needed anymore (gcov shows no coverage,
but that may not mean anything)?
There are two remainig calls in outgoing_edges_match. One for
tablejumps, and one for the case that we match none of three specific
cases - which are:
- one outgoing edge for bb1
- two outgoing edges for bb1, and bb1 ends in a conditional jump without
other sets. (EDGE_COUNT (bb1->succs) == 2 && any_condjump_p (BB_END
(bb1)) && onlyjump_p (BB_END (bb1)))
- tablejump

Thus, it is not surprisong that you don't see any coverage for an
average program. I don't think there is a good way to extract the
common bits of insns_match_p and struct_equiv -
there is too much control flow interwoven with them. insns_match_p
could be replaced with or using rtx_equiv_p (aka struct_equiv) and/or
friends, but then it would need a struct equiv_info.
I suppose ideally we'd have try_crossjump_to_edge pass a pointer to its
struct equiv_info into outgoing_edges_match, and if there were any local
registers and/or inputs, take a copy of the struct_equiv for re-runs,
and tell struct_equiv_block_eq not to do a full initialization. This
would allow outgoing_edges_match to perform better for targets where
there is a choise what register to use for the condition in a branch
instruction. I could do some sh64-elf-sim testing, but I'm not sure
that that would give sufficient coverage.
Bernd Schmidt
2005-11-30 23:18:57 UTC
Permalink
Post by Joern RENNECKE
Post by Bernd Schmidt
Still trying to wrap my head around this - the REG handling in
function seems difficult; maybe more comments describing the whys of
what we are doing might be helpful.
Couldn't we just enter the same register into the [xy]_local arrays
multiple times, along with a bit that records the change (live <->
dead)? We'd save the scan backwards, and we could always restore to a
previous checkpoint, which in turn means we no longer need the rerun
machinery? MAX_LOCAL probably needs to grow, but if we're not going
to loop over these arrays anymore, that's no problem.
I don't think the rerun machinery is expensive.
Neither do I, but I think by making checkpoints always restorable, the
code could be simplified. I'll see if I can come up with a way of doing
that.

I'm still unable to understand all the details of the REG handling in
function struct_equiv, and would like simplifications if at all
possible. The code you write is always very clever, but for me at least
that makes it very hard to understand.

Let's take an example of something that's unclear to me: we're matching
two blocks where R0 and R5 are alive at the end of both; the last insn
in block X is

(insn:HI 47 44 48 5 (set (mem/s/f:SI (reg/f:SI 2 r2 [orig:162 ivtmp.1356
] [162]) [3 <variable>.subreg_loc+0 S4 A32])
(reg:SI 0 r0 [170])) 171 {movsi_i} (nil)
(expr_list:REG_EQUAL (const_int 0 [0x0])
(nil)))

and the last insn in block Y is

(insn:HI 30 28 76 3 (set (mem/s/f:SI (reg/f:SI 2 r2 [orig:162 ivtmp.1356
] [162]) [3 <variable>.subreg_loc+0 S4 A32])
(reg/v/f:SI 5 r5 [orig:166 y ] [166])) 171 {movsi_i} (nil)
(nil))

We eventually end up in the REG case of struct_equiv, with x == (reg:SI
0 r0 [170]), and y == (reg/v/f:SI 5 r5 [orig:166 y ] [166]). Both regs
are in common_live, so we end up in

else if (x_common_live)
{
if (! rvalue || info->input_cost < 0 || no_new_pseudos)
return false;
/* If info->live_update is not set, we are processing notes.
We then allow a match with x_input / y_input found in a
previous pass. */
if (info->live_update && !info->input_valid)
{
info->input_valid = true;
info->x_input = x;
info->y_input = y;
info->input_count += optimize_size ? 2 : 1;
if (info->input_reg
&& GET_MODE (info->input_reg) != GET_MODE
(info->x_input))
info->input_reg = NULL_RTX;
if (!info->input_reg)
info->input_reg = gen_reg_rtx (GET_MODE (info->x_input));
validate_change (info->x_start, xp, info->input_reg, 1);
}
else if ((info->live_update ? ! info->input_valid : !
info->x_input)
|| ! rtx_equal_p (x, info->x_input)
|| ! rtx_equal_p (y, info->y_input))
return false;
validate_change (info->x_start, xp, info->input_reg, 1);
}

Nevermind the fact that this has a duplicate call of validate_change.
What is this code trying to do? I would have naively thought that if we
have R5 and R0 live at the end, then previous uses of either register
must match exactly.

Can you post an RTL example of what you're trying to accomplish with the
code related to input_reg? What can be done with it that can't be done
by examining the local_regs still live at the start of a block?

This bit of code:
if (reload_completed
&& (code == REG || (code == SUBREG && REG_P (SUBREG_REG (y))))
&& rtx_renumbered_equal_p (x, y))
{
int regno = true_regnum (x);
int nregs = hard_regno_nregs[regno][GET_MODE (x)];
int i;

for (i = nregs; --i>= 0; regno++)
if (REGNO_REG_SET_P (info->x_local_live, regno)
|| REGNO_REG_SET_P (info->y_local_live, regno))
return false;

if (!rvalue && info->input_valid
&& (reg_overlap_mentioned_for_reload_p (x, info->x_input)
|| reg_overlap_mentioned_for_reload_p (x, info->y_input)))
return false;

/* Update liveness information. */
if (info->live_update
&& assign_reg_reg_set (info->common_live, x, rvalue))
info->version++;

return true;
}
is partially duplicated just a few lines down, without the
reload_completed guard. This is very confusing, please merge the two.
Also, it doesn't seem to test for rvalue when marking the reg as live?


Bernd
Joern RENNECKE
2005-11-30 23:53:44 UTC
Permalink
Post by Bernd Schmidt
Nevermind the fact that this has a duplicate call of validate_change.
What is this code trying to do? I would have naively thought that if
we have R5 and R0 live at the end, then previous uses of either
register must match exactly.
Can you post an RTL example of what you're trying to accomplish with
the code related to input_reg? What can be done with it that can't be
done by examining the local_regs still live at the start of a block?
input_reg is for a register that is *not* local to the block, but
remains unaltered. In principle, we could also match constants here,
but I haven't had sufficient motivation (as in: appears to be relevant
to performance according to the benchmarks I've looked at) so far. We
should only encounter it as an rvalue.
When we are processing REG_NOTES, e..g REG_EQUAL, it is quite possible
that we see a note later in the insn stream than the actual last use in
insn patterns, i.e. scanning backwards
we'll see these notes first. In order to recognize them, we have to use
information from a previous pass.
input_reg is expected to be only set once for each block matching, but
it is thinkable that strange backtracking with equivalences will cause
us to discard an input reg and use a different one.
This might not actually be possible in the current code, but it seems
too fragile to rely on that.
Bernd Schmidt
2005-12-01 00:14:54 UTC
Permalink
Post by Joern RENNECKE
Post by Bernd Schmidt
Nevermind the fact that this has a duplicate call of validate_change.
What is this code trying to do? I would have naively thought that if
we have R5 and R0 live at the end, then previous uses of either
register must match exactly.
Can you post an RTL example of what you're trying to accomplish with
the code related to input_reg? What can be done with it that can't be
done by examining the local_regs still live at the start of a block?
input_reg is for a register that is *not* local to the block, but
remains unaltered. In principle, we could also match constants here,
but I haven't had sufficient motivation (as in: appears to be relevant
to performance according to the benchmarks I've looked at) so far. We
should only encounter it as an rvalue.
When we are processing REG_NOTES, e..g REG_EQUAL, it is quite possible
that we see a note later in the insn stream than the actual last use in
insn patterns, i.e. scanning backwards
we'll see these notes first. In order to recognize them, we have to use
information from a previous pass.
I'll ask again, please post an example or two of the way this is
intended to work.


Bernd
Joern RENNECKE
2005-12-01 12:36:43 UTC
Permalink
Post by Bernd Schmidt
I'll ask again, please post an example or two of the way this is
intended to work.
(insn 101 0 102 (set (reg:SI 201) (mem:SI (plus:SI (reg:SI 14) (cont_int
32)))) -1 (nil) (nil))
(insn 102 101 103 (set (reg:SI 202) (mem:SI (plus:SI (reg:SI 14)
(cont_int 40)))) -1 (nil) (nil))
...
(jump_insn 199 198 202 (set (pc)
(if_then_else (ne (reg:SI r4)
(const_int 0 [0x0]))
(label_ref 300)
(pc))) -1 (nil)
(nil))
(insn 202 199 204 (set (reg:SI 203) (ashift:SI (reg:SI 201) (const_int
5))) -1 (nil) (nil))
(insn 204 202 205 (set (reg:SI 204) (ashift:SI (reg:SI 201) (const_int
3))) -1 (nil) (nil))
(insn 205 204 206 (set (reg:SI 203) (plus:SI (reg:SI 203) (reg:SI 204)))
-1 (nil) (expr_list:REG_EQUAL (mult:SI (reg:SI 201) (const_int 40)) (nil)))
(insn 206 205 207 (set (reg:SI 205) (plus:SI (reg:SI 205) (reg:SI 203))
-1 (nil) (nil))
(jump_insn 207 206 208 (set (pc) (label_ref 400)) -1 (nil) (nil))
(barrier 208 207 300)
(code_label 300 208 302 "" [1 uses])
(insn 302 300 304 (set (reg:SI 303) (ashift:SI (reg:SI 202) (const_int
6))) -1 (nil) (nil))
(insn 304 302 305 (set (reg:SI 304) (ashift:SI (reg:SI 202) (const_int
4))) -1 (nil) (nil))
(insn 305 304 306 (set (reg:SI 303) (plus:SI (reg:SI 303) (reg:SI 304)))
-1 (nil) (expr_list:REG_EQUAL (mult:SI (reg:SI 202) (const_int 80)) (nil)))
(insn 306 305 307 (set (reg:SI 205) (plus:SI (reg:SI 205) (reg:SI 303))
-1 (nil) (nil))
(code_label 400 399 401 "" [1 uses])

can be if-converted into:

(insn 101 0 102 (set (reg:SI 201) (mem:SI (plus:SI (reg:SI 14) (cont_int
32)))) -1 (nil) (nil))
(insn 102 101 103 (set (reg:SI 202) (mem:SI (plus:SI (reg:SI 14)
(cont_int 40)))) -1 (nil) (nil))
...

(insn 501 302 199 (set (reg:SI 206)
(if_then_else:SI (eq (reg:SI 4)
(const_int 0)) (reg:SI 201) (reg:SI 202)))
-1 (nil) (nil))
(insn 302 501 304 (set (reg:SI 303) (ashift:SI (reg:SI 206) (const_int
6))) -1 (nil) (nil))
(insn 304 302 305 (set (reg:SI 304) (ashift:SI (reg:SI 206) (const_int
4))) -1 (nil) (nil))
(insn 305 304 306 (set (reg:SI 303) (plus:SI (reg:SI 303) (reg:SI 304)))
-1 (nil) (nil) (nil)))
(insn 306 305 401 (set (reg:SI 205) (plus:SI (reg:SI 205) (reg:SI 303))
-1 (nil) (nil))



In principle, cross-jumping can also use this code, but we'd have to
enable cross-jumping in some pass before reload, and/or add register
scavenging code to allocate input_reg after reload.
Bernd Schmidt
2005-12-01 13:26:32 UTC
Permalink
Post by Bernd Schmidt
I'll ask again, please post an example or two of the way this is
intended to work.
[snip example]

This helps, thank you.


Bernd
Joern RENNECKE
2005-12-02 21:26:41 UTC
Permalink
Post by Bernd Schmidt
is partially duplicated just a few lines down, without the
reload_completed guard. This is very confusing, please merge the two.
Done. I have attached the new patches I am currently regtesting.
Post by Bernd Schmidt
Also, it doesn't seem to test for rvalue when marking the reg as live?
rvalue is passed as a parameter to assign_reg_reg_set.
Bernd Schmidt
2005-12-02 23:49:26 UTC
Permalink
This has patches against a previous version of struct-equiv.c - can't apply.


Bernd
Bernd Schmidt
2005-12-03 14:36:50 UTC
Permalink
Post by Bernd Schmidt
This has patches against a previous version of struct-equiv.c - can't apply.
Ok, "svn cp cfgcleanup.c struct-equiv.c" fixed that.


Bernd
Bernd Schmidt
2005-12-05 15:57:07 UTC
Permalink
The last patch looks like a good improvement, and I think we're close to
something that can go in.

I've attached a small patch: some formatting fixes, removed the new enum
since nothing was using the symbolic names, and added a bit of
commentary about aspects I found hard to understand. Please verify
these for accuracy and incorporate them if you're happy with them.
Post by Joern RENNECKE
Post by Bernd Schmidt
It would seem that when called for the toplevel pattern, rvalue is
always -1, so we shouldn't get a SET or CLOBBER with a different
rvalue. Have the code gcc_assert that and lose the dead code.
We can still be called for a PRE_MODIFY / POST_MODIFY.
So? These do not contain SETs or CLOBBERs. The comment near the MEM
handling seems confused, and I'm not convinced we're doing the right
thing for these RTL expressions.

A patch with no functional changes to split out condjump_equiv_p and
move other stuff into a new file should be committed separately first.
That is preapproved, please install after testing.


Bernd
Joern RENNECKE
2005-12-05 19:10:19 UTC
Permalink
+ We scan the blocks from their ends backwards, and expect that insns are
+ identical, except for certain cases involving registers. A mismatch
We don't actually expect the insns to be identical; most of the time, we
won't find a
match. Maybe:

We scan the blocks from their ends backwards, hoping to find a match, I.e.insns
are identical, except for certain cases involving registers. A mismatch
...
+ compensation code to move RY to RX. The function resolve_input_conflict
+ ensures that this can be done.
resolve_input_conflict gets only used when we find an input overlap at
the end of the
first pass. Maybe change the last sentece to:
If there are overlapping inputs, the function resolve_input_conflict
ensures that this can be done.



My i686-pc-linux-gnu bootstrap failed with an ICE during stage2 library
builiding - i.e. stage 1 was miscompiled. I've decided to go back to a
1-stage regtest to see if I could find some simple regressions first,
and I found one in execute/20050316-2.c -Os. We can't actually update
death notes after reg-stack. To fix this, I made struct_equiv_init
pass the CLEANUP_POST_REGSTACK flag to update_life_info_in_dirty_blocks,
and changed flow.c to leave death notes of stack registers alone if this
flag is set, as with the attached patch.

I don't know if we can get away without either propagating the life
information for stack registers exactly, or at least doing something
to perserve the global_live_at_{start,end} information we have about
them.

I'm currently running regression tests to see if there is a quick answer
to that question with a testcase...
Joern RENNECKE
2005-12-05 23:41:01 UTC
Permalink
Post by Bernd Schmidt
Post by Joern RENNECKE
We can still be called for a PRE_MODIFY / POST_MODIFY.
So? These do not contain SETs or CLOBBERs. The comment near the MEM
handling seems confused, and I'm not convinced we're doing the right
thing for these RTL expressions.
Oops, I they way I remembered them they contained a SET. They actually
contain a SET_SRC and a SET_DEST, except that these macros won't allow
these codes.
Could we change them to accept these codes? Then I can add PRE_MODIFY
and POST_MODIFY as additional case labels besides the SET: label.
Post by Bernd Schmidt
A patch with no functional changes to split out condjump_equiv_p and
move other stuff into a new file should be committed separately first.
That is preapproved, please install after testing.
I have attached what I have currently under test.

The life information handling for the full patch needs some more work;
I've found that I need a separate PROP_POST_REGSTACK flag, and also have
to do something
about the life information of the stack regs - I clear these life bits
when I find a post-regstack mismatch in struct_equiv_init. Moreover, I
disallow matches between
diffferent stack registers after regstack. Bootstrap has proceeded into
stage 3 so far, but the one-stage regtest found some compilation failures.
Bernd Schmidt
2005-12-06 02:00:40 UTC
Permalink
Post by Joern RENNECKE
Post by Bernd Schmidt
Post by Joern RENNECKE
We can still be called for a PRE_MODIFY / POST_MODIFY.
So? These do not contain SETs or CLOBBERs. The comment near the MEM
handling seems confused, and I'm not convinced we're doing the right
thing for these RTL expressions.
Oops, I they way I remembered them they contained a SET. They actually
contain a SET_SRC and a SET_DEST, except that these macros won't allow
these codes.
Could we change them to accept these codes? Then I can add PRE_MODIFY
and POST_MODIFY as additional case labels besides the SET: label.
Not sure this is such a good idea since the "SET_DEST" is actually an
in-out operand. I'd rather add them near with code similar to the
P{RE,OST}_{IN,DE}C cases.


Bernd
Joern RENNECKE
2005-12-06 09:09:35 UTC
Permalink
Post by Bernd Schmidt
Not sure this is such a good idea since the "SET_DEST" is actually an
in-out operand. I'd rather add them near with code similar to the
P{RE,OST}_{IN,DE}C cases.
No, the in part is represented in the SET_SRC - it's required to be a
PLUS with the destination as the first operand.
Bernd Schmidt
2005-12-06 10:22:14 UTC
Permalink
Post by Joern RENNECKE
Post by Bernd Schmidt
Not sure this is such a good idea since the "SET_DEST" is actually an
in-out operand. I'd rather add them near with code similar to the
P{RE,OST}_{IN,DE}C cases.
No, the in part is represented in the SET_SRC - it's required to be a
PLUS with the destination as the first operand.
Still, for a POST_MODIFY, the actual value of what you call the
"SET_DEST" is used as an input. I think allowing these two macros on
PRE/POST_MODIFY would be confusing.


Bernd
Joern RENNECKE
2005-12-06 10:47:24 UTC
Permalink
B
Post by Bernd Schmidt
Still, for a POST_MODIFY, the actual value of what you call the
"SET_DEST" is used as an input. I think allowing these two macros on
PRE/POST_MODIFY would be confusing.
No, the second value is the input, while the first is the output, just
like for SET.
Bernd Schmidt
2005-12-06 11:19:08 UTC
Permalink
Post by Joern RENNECKE
Post by Bernd Schmidt
Still, for a POST_MODIFY, the actual value of what you call the
"SET_DEST" is used as an input. I think allowing these two macros on
PRE/POST_MODIFY would be confusing.
No, the second value is the input, while the first is the output, just
like for SET.
??? The REG in the first operand contains the address. Hence, it's
conceptually an input. It's written to, so it's also an output. The
new value of the REG (the second operand) is also an input. You could
argue that currently the second operand has to contain the first reg, so
it'll be treated as an input anyway, but this is thinking about current
usage and specific algorithms, not the data structure itself, and is
conceptually the wrong way to look at a POST_MODIFY.

I'll grant you a certain similarity to a SET, but I do not think that
this warrants the potential for confusion of using SET_DEST on a
POST_MODIFY. It's trying to be too clever.


Bernd
Joern RENNECKE
2005-12-06 11:49:58 UTC
Permalink
Post by Bernd Schmidt
??? The REG in the first operand contains the address. Hence, it's
conceptually an input. It's written to, so it's also an output. The
new value of the REG (the second operand) is also an input. You could
argue that currently the second operand has to contain the first reg,
so it'll be treated as an input anyway, but this is thinking about
current usage and specific algorithms, not the data structure itself,
and is conceptually the wrong way to look at a POST_MODIFY.
I'll grant you a certain similarity to a SET, but I do not think that
this warrants the potential for confusion of using SET_DEST on a
POST_MODIFY. It's trying to be too clever.
I see your point. So I have to first process the first operand of the
POST_MODIFY as an lvalue, then both as an rvalue.
However, that would be incorrect for PRE_MODIFY - the first operand is a
pure lvalue. OTOH, a PRE_MODIFY can be handled like a SET, except for
the SET_DEST / SET_SRC code check.
Can we relax these checks to allow a PRE_MODIFY?

( I am assuming here that we don't need to enforce strict lvalue/rvalue
processing ordering in cases where the PRE_MODIFY destination is also a
source in another piece of rtl in the same insn, since then the
evaluation order is indeed undefined.)
Joern RENNECKE
2005-12-06 16:13:21 UTC
Permalink
I've just thought of another thing that won't work right with the
current infrastructure:
when we call validate_change, the change is installed instantly. Thus,
when we try to match
the same register again, we will fail. That is easy to fix for
PRE_MODIFY - just save the
register before doing the first potential change - but much harder when
you have a PARALLEL
where any SET_DEST could be a STRICT_LOW_PART, ZERO_EXTEND or SIGN_EXTEND.

I see four possible approaches here:

- split up validate_change into one function that does the registering
of the change request -which is
then used in rtx_equiv_p - and another one that does the tentative
installation of the change
(preferrably for a range of changes).
- Have a separate function like above that only registers the basic
change request, but let validate_changes
figure out if a tentative change needs to be installed first, and
cancel_changes if a change actually
needs to be backed out.

These first to options have the disadvantage of extra overhead for the
exisitng code that uses validate_change,
although the first one could be made to be cost-free with code
duplication (via inlining or in the source).

- Have a function that undoes changes akin to cancel_changes, but does
not throw away
the basic information nor decrement num_changes. This function is
called after the SET_DEST
processing. Then have another function that re-instates the changes,
which is called after the
SET_SRC processing.
- Change the STRICT_LOW_PART, ZERO_EXTEND or SIGN_EXTEND processing:
- gcc_assert that they refer to a MEM or a REG (Is that safe, assuming
an lvalue?).
- If a MEM, process the address only once (it is an input).
- if a REG, do the rvalue == 0 processing like we do now, but set the
register as live in the end
in local_live (if the regs were different) or common_live (if they
were the same. Ignore when
processing rvalue == -1.
Bernd Schmidt
2005-12-07 13:11:42 UTC
Permalink
Post by Joern RENNECKE
I've just thought of another thing that won't work right with the
when we call validate_change, the change is installed instantly. Thus,
when we try to match
the same register again, we will fail. That is easy to fix for
PRE_MODIFY - just save the
register before doing the first potential change - but much harder when
you have a PARALLEL
where any SET_DEST could be a STRICT_LOW_PART, ZERO_EXTEND or SIGN_EXTEND.
Maybe we could extend RVALUE to also describe the case that we have an
in-out operand, and the REG handling could be changed to handle this
case as well? Then we'd have to scan each part of the insn only once.


Bernd
Joern RENNECKE
2005-12-07 17:50:07 UTC
Permalink
Post by Bernd Schmidt
Maybe we could extend RVALUE to also describe the case that we have an
in-out operand, and the REG handling could be changed to handle this
case as well? Then we'd have to scan each part of the insn only once.
Handling this right in POST_MODIFY and ZERO_EXTEND / SIGN_EXTEND /
ZERO_EXTRACT seems simpler.
I have attached my current patch, which I am regtesting on
i686-pc-linux-gnu native and X sh-elf. X sh64-elf testing is currently
not possible, we need to fix a build error in mainline first.

I have to admit that I haven't thought what to do about ASMs yet. There
should be no actual in-out operand problem, since we duplicate the rtl
for in-out operands, but I suppose the rtl
of output operands will be processed as inputs. I hope that'll be a
straightforward extra case.
Joern RENNECKE
2005-12-07 19:15:43 UTC
Permalink
Post by Joern RENNECKE
I have to admit that I haven't thought what to do about ASMs yet.
There should be no actual in-out operand problem, since we duplicate
the rtl for in-out operands, but I suppose the rtl
of output operands will be processed as inputs. I hope that'll be a
straightforward extra case.
On second thought, processing ASMs should work just fine, since actually
use SETs to show which registers are set, and wrap them in a PARALLEL if
there is more than one.
And if we break a constraint, verify_changes will tell us.
Joern RENNECKE
2005-12-08 15:50:10 UTC
Permalink
PR rtl-optimization/20070 / part1
* flow.c (update_life_info): If PROP_POST_REGSTACK is set, call
count_or_remove_death_notes with kill == -1.
(mark_set_1): Don't add REG_DEAD / REG_UNUSED notes for stack
registers if PROP_POST_REGSTACK is set.
(mark_used_reg): Likewise.
(count_or_remove_death_notes): If kill is -1, don't remove REG_DEAD /
REG_UNUSED notes for stack regs.
* cfgcleanup.c (condjump_equiv_p): Change parameters and processing
to match rtx_equiv_p machinery. Change caller.
(outgoing_edges_match): Likewise.
(try_crossjump_to_edge): Use struct_equiv_block_eq
instead of flow_find_cross_jump.
* basic-block.h (PROP_POST_REGSTACK, STRUCT_EQUIV_START): Define.
(STRUCT_EQUIV_RERUN, STRUCT_EQUIV_FINAL): Likewise.
(STRUCT_EQUIV_NEED_FULL_BLOCK, STRUCT_EQUIV_MATCH_JUMPS): Likewise.
(STRUCT_EQUIV_MAX_LOCAL): Likewise.
(struct struct_equiv_checkpoint, struct equiv_info): Likewise.
(insns_match_p): Update prototype.
(flow_find_cross_jump): Remove prototype.
(struct_equiv_block_eq, struct_equiv_init): Declare.
(rtx_equiv_p, condjump_equiv_p): Likewise.
* struct-equiv.c: Include reload.h.
(IMPOSSIBLE_MOVE_FACTOR): Define.
(assign_reg_reg_set, struct_equiv_make_checkpoint): New functions.
(struct_equiv_improve_checkpoint): Likewise.
(struct_equiv_restore_checkpoint, rtx_equiv_p): Likewise.
(set_dest_equiv_p, set_dest_addr_equiv_p, struct_equiv_init): Likewise.
(struct_equiv_merge, find_dying_input): Likewise.
resolve_input_conflict): Likewise.
(death_notes_match_p): Change parameters and processing
to match rtx_equiv_p machinery. Change caller.
(insns_match_p): Likewise.
(struct_equiv_block_eq).
* recog.c (verify_changes): Make it static.
* recog.h: Remove the corresponding prototype.
Regression tests passed on i686-pc-linux-gnu native, X sh-elf and X
sh64-elf.
Bernd Schmidt
2005-12-09 11:44:10 UTC
Permalink
+ /* Make the source of the pc sets unreadable so that we can insns_match_p
+ won't process it.
Grammar.
if (b1->dest == b2->dest)
- prob2 = b2->probability;
+ prob2 = b2->probability;
Next time, please separate out formatting changes.
/* Return true iff outgoing edges of BB1 and BB2 match, together with
the branch instruction. This means that if we commonize the control
flow before end of the basic block, the semantic remains unchanged.
+ If we need to compare jumps, we set STRUCT_EQUIV_MATCH_JUMPS in *MODE,
+ and pass it to struct_equiv_init.
That doesn't say what else MODE does, and you've replaced BB1 and BB2
with INFO. Please adjust the comment.
+
+ if (secondary_reload_class
+ (0,
+ REGNO_REG_CLASS (y_regno), x_mode, x_inner) != NO_REGS
+#ifdef SECONDARY_MEMORY_NEEDED
+ || SECONDARY_MEMORY_NEEDED (REGNO_REG_CLASS (y_regno),
+ REGNO_REG_CLASS (x_regno),
+ x_mode)
+#endif
+ )
+ y_change *= IMPOSSIBLE_MOVE_FACTOR;
+ }
+ info->cur.input_count += y_change;
+ info->cur.version++;
This code appears twice. Please split out into a function.
+ {
+ int j;
+
+ if (!set_dest_equiv_p (x, y, info))
+ return false;
+ for (j = 0; j < XVECLEN (x, 0); ++j)
+ if (! rtx_equiv_p (&XVECEXP (x, 0, j), XVECEXP (y, 0, j), -1, info))
+ return false;
+ return true;
+ }
This doesn't do anything different from the normal recursive bit at the
end of the function, does it? If it isn't needed, delete this part.
+ /* After reg-stack. Remove bogus life info about stack regs. */
+ for (rn = FIRST_STACK_REG; rn < LAST_STACK_REG; rn++)
+ {
+ CLEAR_REGNO_REG_SET (info->x_block->il.rtl->global_live_at_end, rn);
+ CLEAR_REGNO_REG_SET (info->y_block->il.rtl->global_live_at_end, rn);
+ }
Are they really all dead at the end of each block? It's been a long
time since I looked at reg-stack, but this looks odd to me.

Apart from these comments, I think this patch is going to be ok.


Bernd
Joern RENNECKE
2005-12-12 18:58:28 UTC
Permalink
Post by Bernd Schmidt
This code appears twice. Please split out into a function.
While doing that, I also found that the second site was missing
processing of info->common_live.
Post by Bernd Schmidt
This doesn't do anything different from the normal recursive bit at
the end of the function, does it? If it isn't needed, delete this part.
It does something different - it first processes set destinations, and
then sources. That is the right thing to do if rvalue == 1.
However, AFAICS there is no caller left which can cause rvalue == 1 when
we are processing a PARALLEL. Instead, it will
always be -1, which is correctly handled by the default processing.
Therefore, I have replaced this code with a gcc_assert and a
break; .
Post by Bernd Schmidt
Are they really all dead at the end of each block? It's been a long
time since I looked at reg-stack, but this looks odd to me.
No, they are not all dead. But we don't have any up-to-date information
on these registers, because reg-stack changes the actual register
numbers used, but it doesn't update the global_live_at{start,end}. And
update_live_info lacks the smarts to do a proper update. (The
bogosity of the live info is taken care of by not changing register
usage by stack regs after reg-stack.) Since we are changing some
blocks by cross-jumping, we end up with live information on the stack
regs that is not only bogus, but also inconsistent between different
blocks with identical successors. I have amended the comment to make it
clear that this is not about generating correct information, but merely
consistent information.

I have attached the updated patch that takes your comments into
account. regtesting is underway.
Bernd Schmidt
2005-12-13 12:17:37 UTC
Permalink
Post by Joern RENNECKE
Post by Bernd Schmidt
This doesn't do anything different from the normal recursive bit at
the end of the function, does it? If it isn't needed, delete this part.
It does something different - it first processes set destinations, and
then sources. That is the right thing to do if rvalue == 1.
However, AFAICS there is no caller left which can cause rvalue == 1 when
we are processing a PARALLEL. Instead, it will
always be -1, which is correctly handled by the default processing.
Therefore, I have replaced this code with a gcc_assert and a
break; .
That's what I meant. Thanks.
Post by Joern RENNECKE
Post by Bernd Schmidt
Are they really all dead at the end of each block? It's been a long
time since I looked at reg-stack, but this looks odd to me.
No, they are not all dead. But we don't have any up-to-date information
on these registers, because reg-stack changes the actual register
numbers used, but it doesn't update the global_live_at{start,end}. And
update_live_info lacks the smarts to do a proper update. (The
bogosity of the live info is taken care of by not changing register
usage by stack regs after reg-stack.) Since we are changing some
blocks by cross-jumping, we end up with live information on the stack
regs that is not only bogus, but also inconsistent between different
blocks with identical successors.
Uh, ok. Sounds like there's a project to rewrite regstack in there
somewhere, but I guess it's out of scope for this :-)
Post by Joern RENNECKE
I have attached the updated patch that takes your comments into
account. regtesting is underway.
This is ok, please install after testing. Thanks!


Bernd
Alan Modra
2005-12-14 01:26:24 UTC
Permalink
Both powerpc-linux and powerpc64-linux die compiling libgfortran. Looks
to be your recent patches. The powerpc64 failure is

matmul_c4.c: In function 'matmul_c4':
matmul_c4.c:274: internal compiler error: in rtx_equiv_p, at struct-equiv.c:644

(gdb) up
#2 0x00000000103bb790 in set_dest_equiv_p (x=0x8000679010, y=0x80006791d0,
info=0x1ffffffeb80) at /src/gcc-current/gcc/struct-equiv.c:761
(gdb) p debug_rtx (x)
(parallel [
(set (parallel:SC [
(expr_list:REG_DEP_ANTI (reg:SF 33 1)
(const_int 0 [0x0]))
(expr_list:REG_DEP_ANTI (reg:SF 34 2)
(const_int 4 [0x4]))
])
(call (mem:SI (symbol_ref:DI ("__mulsc3") [flags 0x41] <function_decl 0x8000299200 __mulsc3>) [0 S4 A8])
(const_int 64 [0x40])))
(use (const_int 0 [0x0]))
(clobber (reg:SI 65 lr))
])

That one looks relatively easy to fix.

The powerpc-linux one is
/home/alan/build/ppc/gcc-curr/./gcc/xgcc
-B/home/alan/build/ppc/gcc-curr/./gcc/ -B/usr/local/powerpc-linux/bin/
-B/usr/local/powerpc-linux/lib/ -isystem /usr/local/powerpc-linux/include
-isystem /usr/local/powerpc-linux/sys-include -DHAVE_CONFIG_H
-I. -I/src/gcc-current/libgfortran -I. -iquote/src/gcc-current/libgfortran/io
-I/src/gcc-current/libgfortran/../gcc
-I/src/gcc-current/libgfortran/../gcc/config -I../.././gcc -D_GNU_SOURCE
-std=gnu99 -Wall -Wstrict-prototypes -Wmissing-prototypes
-Wold-style-definition -Wextra -Wwrite-strings -O2 -g -O2 -c
/src/gcc-current/libgfortran/generated/product_c8.c -fPIC -DPIC -o
.libs/product_c8.o
product_c8.c: In function 'product_c8':
product_c8.c:173: internal compiler error: in note_local_live, at struct-equiv.c:315

(gdb) up
#1 0x10385628 in note_local_live (info=0xffffdc88, x=0x40247fa8,
y=<value optimized out>, rvalue=<value optimized out>)
at /src/gcc-current/gcc/struct-equiv.c:315
(gdb)
#2 0x10386068 in rtx_equiv_p (xp=<value optimized out>, y=0x40247c90,
rvalue=1, info=0xffffdc88) at /src/gcc-current/gcc/struct-equiv.c:538
(gdb) p debug_rtx (x)
(reg:DF 5 5 [orig:272+8 ] [272])
$2 = void
(gdb) p debug_rtx (y)
(reg:DF 45 13 [266])
$3 = void
(gdb) up
#3 0x10385a24 in rtx_equiv_p (xp=0x402369ac, y=0x4024a3d0, rvalue=-1,
info=0xffffdc88) at /src/gcc-current/gcc/struct-equiv.c:557
(gdb)
#4 0x10386b54 in insns_match_p (i1=0x40236990, i2=0x40243cf0,
info=0xffffdc88) at /src/gcc-current/gcc/struct-equiv.c:926
(gdb) p debug_rtx (i1)
(insn:HI 341 340 342 32 /src/gcc-current/libgfortran/generated/product_c8.c:141 (set (mem/s:DF (plus:SI (reg/v/f:SI 28 28 [orig:163 dest ] [163])
(const_int 8 [0x8])) [17 S8 A64])
(reg:DF 5 5 [orig:272+8 ] [272])) 328 {*movdf_hardfloat32} (nil)
(expr_list:REG_DEAD (reg:DF 5 5 [orig:272+8 ] [272])
(nil)))
$4 = void
(gdb) p debug_rtx (i2)
(insn:HI 301 299 603 29 /src/gcc-current/libgfortran/generated/product_c8.c:133 (set (mem:DF (plus:SI (reg/v/f:SI 28 28 [orig:163 dest ] [163])
(const_int 8 [0x8])) [17 S8 A64])
(reg:DF 45 13 [266])) 328 {*movdf_hardfloat32} (insn_list:REG_DEP_TRUE 300 (nil))
(expr_list:REG_DEAD (reg:DF 45 13 [266])
(nil)))
$5 = void

This may not be enough info for you to figure out what is going on,
because I see rtx_equiv_p modifies its args. .i attached.
--
Alan Modra
IBM OzLabs - Linux Technology Centre
Joern RENNECKE
2005-12-14 14:20:30 UTC
Permalink
Post by Alan Modra
Both powerpc-linux and powerpc64-linux die compiling libgfortran. Looks
to be your recent patches. The powerpc64 failure is
matmul_c4.c:274: internal compiler error: in rtx_equiv_p, at struct-equiv.c:644
(gdb) up
#2 0x00000000103bb790 in set_dest_equiv_p (x=0x8000679010, y=0x80006791d0,
info=0x1ffffffeb80) at /src/gcc-current/gcc/struct-equiv.c:761
(gdb) p debug_rtx (x)
(parallel [
(set (parallel:SC [
(expr_list:REG_DEP_ANTI (reg:SF 33 1)
(const_int 0 [0x0]))
(expr_list:REG_DEP_ANTI (reg:SF 34 2)
(const_int 4 [0x4]))
])
(call (mem:SI (symbol_ref:DI ("__mulsc3") [flags 0x41] <function_decl 0x8000299200 __mulsc3>) [0 S4 A8])
(const_int 64 [0x40])))
(use (const_int 0 [0x0]))
(clobber (reg:SI 65 lr))
])
That one looks relatively easy to fix.
I have checked in the attached patch under the obvious rule.
Andreas Schwab
2005-12-14 14:23:11 UTC
Permalink
Post by Alan Modra
Both powerpc-linux and powerpc64-linux die compiling libgfortran. Looks
to be your recent patches. The powerpc64 failure is
matmul_c4.c:274: internal compiler error: in rtx_equiv_p, at struct-equiv.c:644
Same on ia64.

Andreas.
--
Andreas Schwab, SuSE Labs, ***@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Joern RENNECKE
2005-12-14 14:34:06 UTC
Permalink
Post by Alan Modra
(gdb) p debug_rtx (x)
(reg:DF 5 5 [orig:272+8 ] [272])
$2 = void
(gdb) p debug_rtx (y)
(reg:DF 45 13 [266])
$3 = void
Does HARD_REGNO_NREGS differ for these two registers?
My patch for PR 20070 part 2 has some code that makes the assert in
note_local_live to handle these properly.
Alan Modra
2005-12-14 15:14:20 UTC
Permalink
Post by Joern RENNECKE
Post by Alan Modra
(gdb) p debug_rtx (x)
(reg:DF 5 5 [orig:272+8 ] [272])
$2 = void
(gdb) p debug_rtx (y)
(reg:DF 45 13 [266])
$3 = void
Does HARD_REGNO_NREGS differ for these two registers?
Yes. On powerpc, reg 5 is r5, a 32-bit gp register, reg 45 is fr13, a
64-bit fp reg.
--
Alan Modra
IBM OzLabs - Linux Technology Centre
Joern RENNECKE
2005-12-14 16:40:59 UTC
Permalink
Post by Alan Modra
Yes. On powerpc, reg 5 is r5, a 32-bit gp register, reg 45 is fr13, a
64-bit fp reg.
I have checked in the attached hunk as an obvious fix.

Still, we should find out why insns 296 is deleted or moved by your
compiler, but not by mine.
Alan Modra
2005-12-14 23:37:47 UTC
Permalink
* struct-equiv.c (note_local_live): Handle hard regs with different
hard_regno_nregs.
Fixes the testcase failure, thanks! Trying another bootstrap.
--
Alan Modra
IBM OzLabs - Linux Technology Centre
Joern RENNECKE
2005-12-14 15:03:56 UTC
Permalink
Post by Alan Modra
product_c8.c:173: internal compiler error: in note_local_live, at struct-equiv.c:315
(gdb) up
#1 0x10385628 in note_local_live (info=0xffffdc88, x=0x40247fa8,
y=<value optimized out>, rvalue=<value optimized out>)
at /src/gcc-current/gcc/struct-equiv.c:315
(gdb)
Hmm, I can't reproduce this.

bash-2.05b$ ./cc1 product_c8.i -std=gnu99 -Wall -Wstrict-prototypes
-Wmissing-prototypes -Wold-style-definition -Wextra -Wwrite-strings -O2
-g -O2 -fPIC -DPIC -quiet;echo $?
0

What version have you checked out?
Alan Modra
2005-12-14 15:18:38 UTC
Permalink
Post by Joern RENNECKE
Hmm, I can't reproduce this.
bash-2.05b$ ./cc1 product_c8.i -std=gnu99 -Wall -Wstrict-prototypes
-Wmissing-prototypes -Wold-style-definition -Wextra -Wwrite-strings -O2
-g -O2 -fPIC -DPIC -quiet;echo $?
0
What version have you checked out?
108480, with a few local patches. I didn't have your PR 20070 part 2
patch.
--
Alan Modra
IBM OzLabs - Linux Technology Centre
Joern RENNECKE
2005-12-14 15:47:13 UTC
Permalink
Post by Alan Modra
(gdb) p debug_rtx (i2)
(insn:HI 301 299 603 29 /src/gcc-current/libgfortran/generated/product_c8.c:133 (set (mem:DF (plus:SI (reg/v/f:SI 28 28 [orig:163 dest ] [163])
(const_int 8 [0x8])) [17 S8 A64])
(reg:DF 45 13 [266])) 328 {*movdf_hardfloat32} (insn_list:REG_DEP_TRUE 300 (nil))
(expr_list:REG_DEAD (reg:DF 45 13 [266])
(nil)))
$5 = void
The match is not attempted with my cross compiler because insn 296 is in
the way:

(gdb) call debug_rtx_list(yi,-5)
(insn:HI 299 300 301 29
/src/gcc-current/libgfortran/generated/product_c8.c:133 (set (mem:DF
(reg/v/f:SI 28 28 [orig:163 dest ] [163]) [17 S8 A64])
(reg:DF 32 0 [265])) 257 {*movdf_hardfloat32}
(insn_list:REG_DEP_TRUE 298 (nil))
(expr_list:REG_DEAD (reg:DF 32 0 [265])
(nil)))

(insn:HI 301 299 296 29
/src/gcc-current/libgfortran/generated/product_c8.c:133 (set (mem:DF
(plus:SI (reg/v/f:SI 28 28 [orig:163 dest ] [163])
(const_int 8 [0x8])) [17 S8 A64])
(reg:DF 45 13 [266])) 257 {*movdf_hardfloat32}
(insn_list:REG_DEP_TRUE 300 (nil))
(expr_list:REG_DEAD (reg:DF 45 13 [266])
(nil)))

(insn 296 301 603 29
/src/gcc-current/libgfortran/generated/product_c8.c:133 (use
(symbol_ref/f:SI ("*.LC2") [flags 0x2] <complex_cst 0xb5900f30>)) -1 (nil)
(nil))

(jump_insn:HI 603 296 604 29 (set (pc)
(label_ref 342)) 413 {jump} (insn_list:REG_DEP_ANTI 296
(insn_list:REG_DEP_TRUE 298 (insn_list:REG_DEP_TRUE 300
(insn_list:REG_DEP_ANTI 299 (insn_list:REG_DEP_ANTI 301 (nil))))))
(nil))

(barrier:HI 604 603 304)

Since your insn numbers are otherwise the same, I suspect that insn 296
was also emitted in your compiler, but later deleted or moved.
Apparently, the code
that does not work correctly either or your or my system. Could you
find out when and where insn 296 is moved / deleted on your system?
Alan Modra
2005-12-14 23:36:28 UTC
Permalink
Post by Joern RENNECKE
The match is not attempted with my cross compiler because insn 296 is in
[snip]

Why is is "in the way"? BTW, the "use" is just the way the rs6000
backend ties together a toc load with whatever is stored in the toc. So
when we load an address from the toc, the item addressed won't be
deleted as unused.
Post by Joern RENNECKE
Since your insn numbers are otherwise the same, I suspect that insn 296
was also emitted in your compiler, but later deleted or moved.
Apparently, the code
that does not work correctly either or your or my system. Could you
find out when and where insn 296 is moved / deleted on your system?
Insn 296 is moved in sched1. I'd have thought that insn 296 and insn
297 ought to stay together, because the "use" belongs with the load of
reg 264. In which case whatever is deleting insn 297 in oldloop
probably ought to also remove insn 296.

296 is before 299 the first time I hit note_local_live.

Breakpoint 6, note_local_live (info=0xffffdc40, x=0x4024b7b0, y=0x4024b498,
rvalue=1) at /src/gcc-current/gcc/struct-equiv.c:321
(gdb) up
#1 0x10606260 in rtx_equiv_p (xp=0x4024ce4c, y=0x4024b498, rvalue=1,
info=0xffffdc40) at /src/gcc-current/gcc/struct-equiv.c:541
(gdb)
#2 0x106062e0 in rtx_equiv_p (xp=0x40237dfc, y=0x4024c820, rvalue=-1,
info=0xffffdc40) at /src/gcc-current/gcc/struct-equiv.c:554
(gdb)
#3 0x106075f0 in insns_match_p (i1=0x40237de0, i2=0x40246f60,
info=0xffffdc40) at /src/gcc-current/gcc/struct-equiv.c:932
(gdb)
#4 0x1060811c in struct_equiv_block_eq (mode=531, info=0xffffdc40)
at /src/gcc-current/gcc/struct-equiv.c:1168
(gdb) p debug_rtx_list (yi, -5)
(insn 296 300 299 29 /src/gcc-current/libgfortran/generated/product_c8.c:133 (use (symbol_ref/f:SI ("*.LC2") [flags 0x2] <complex_cst 0x4024b390>)) -1 (nil)
(nil))

(insn:HI 299 296 301 29 /src/gcc-current/libgfortran/generated/product_c8.c:133 (set (mem:DF (reg/v/f:SI 28 28 [orig:163 dest ] [163]) [17 S8 A64])
(reg:DF 32 0 [265])) 328 {*movdf_hardfloat32} (insn_list:REG_DEP_TRUE 298 (nil))
(expr_list:REG_DEAD (reg:DF 32 0 [265])
(nil)))

(insn:HI 301 299 603 29 /src/gcc-current/libgfortran/generated/product_c8.c:133 (set (mem:DF (plus:SI (reg/v/f:SI 28 28 [orig:163 dest ] [163])
(const_int 8 [0x8])) [17 S8 A64])
(reg:DF 45 13 [266])) 328 {*movdf_hardfloat32} (insn_list:REG_DEP_TRUE 300 (nil))
(expr_list:REG_DEAD (reg:DF 45 13 [266])
(nil)))

(jump_insn:HI 603 301 604 29 (set (pc)
(label_ref 342)) 545 {jump} (insn_list:REG_DEP_ANTI 296 (insn_list:REG_DEP_TRUE 298 (insn_list:REG_DEP_TRUE 300 (insn_list:REG_DEP_ANTI 299 (insn_list:REG_DEP_ANTI 301 (nil))))))
(nil))

(barrier:HI 604 603 304)

called from

(gdb) p *pass
$7 = {name = 0x10a40354 "csa",
gate = 0x1085bb9c <gate_handle_stack_adjustments>,
execute = 0x1085bbd4 <rest_of_handle_stack_adjustments>, sub = 0x0,
next = 0x10a5bd88, static_pass_number = 145, tv_id = 0,
properties_required = 264, properties_provided = 264,
properties_destroyed = 0, todo_flags_start = 0, todo_flags_finish = 3,
letter = 0 '\0'}


======== .00.expand
(insn 296 295 297 29 (use (symbol_ref/f:SI ("*.LC2") [flags 0x2] <complex_cst 0x4025a708>)) -1 (nil)
(nil))

(insn 297 296 298 29 (set (reg/f:SI 264)
(mem/u/c:SI (plus:SI (reg:SI 30 30)
(const:SI (minus:SI (symbol_ref/u:SI ("*.LC3") [flags 0x2])
(symbol_ref:SI ("*.LCTOC1"))))) [16 S4 A8])) -1 (nil)
(expr_list:REG_EQUAL (symbol_ref/f:SI ("*.LC2") [flags 0x2] <complex_cst 0x4025a708>)
(nil)))

(insn 298 297 299 29 (set (reg:DF 265)
(mem/i:DF (reg/f:SI 264) [1 S8 A64])) -1 (nil)
(nil))

(insn 299 298 300 29 (set (mem:DF (reg/v/f:SI 163 [ dest ]) [17 S8 A64])
(reg:DF 265)) -1 (nil)
(nil))

======== .11.oldloop has deleted 297 because reg 264 is already set
(insn 296 294 298 31 /src/gcc-current/libgfortran/generated/product_c8.c:133 (use (symbol_ref/f:SI ("*.LC2") [flags 0x2] <complex_cst 0x4025a708>)) -1 (nil)
(nil))

(insn 298 296 299 31 /src/gcc-current/libgfortran/generated/product_c8.c:133 (set (reg:DF 265)
(mem/i:DF (reg/f:SI 264) [1 S8 A64])) -1 (nil)
(nil))

(insn 299 298 300 31 /src/gcc-current/libgfortran/generated/product_c8.c:133 (set (mem:DF (reg/v/f:SI 163 [ dest ]) [17 S8 A64])
(reg:DF 265)) -1 (nil)
(nil))

======== .33.life2 has
(insn:HI 296 294 298 29 /src/gcc-current/libgfortran/generated/product_c8.c:133 (use (symbol_ref/f:SI ("*.LC2") [flags 0x2] <complex_cst 0x4025a708>)) -1 (nil)
(nil))

(insn:HI 298 296 299 29 /src/gcc-current/libgfortran/generated/product_c8.c:133 (set (reg:DF 265)
(mem/i:DF (reg/f:SI 264) [1 S8 A64])) 328 {*movdf_hardfloat32} (nil)
(expr_list:REG_EQUAL (mem/i:DF (symbol_ref/f:SI ("*.LC2") [flags 0x2] <complex_cst 0x4025a708>) [1 S8 A64])
(nil)))

(insn:HI 299 298 300 29 /src/gcc-current/libgfortran/generated/product_c8.c:133 (set (mem:DF (reg/v/f:SI 163 [ dest ]) [17 S8 A64])
(reg:DF 265)) 328 {*movdf_hardfloat32} (insn_list:REG_DEP_TRUE 298 (nil))
(expr_list:REG_DEAD (reg:DF 265)
(nil)))

======== .35.sched1 moves things around
(insn:HI 298 294 300 29 /src/gcc-current/libgfortran/generated/product_c8.c:133 (set (reg:DF 265)
(mem/i:DF (reg/f:SI 264) [1 S8 A64])) 328 {*movdf_hardfloat32} (nil)
(expr_list:REG_EQUAL (mem/i:DF (symbol_ref/f:SI ("*.LC2") [flags 0x2] <complex_cst 0x4025a708>) [1 S8 A64])
(nil)))

(insn:HI 300 298 296 29 /src/gcc-current/libgfortran/generated/product_c8.c:133 (set (reg:DF 266)
(mem/i:DF (plus:SI (reg/f:SI 264)
(const_int 8 [0x8])) [1 S8 A64])) 328 {*movdf_hardfloat32} (nil)
(expr_list:REG_EQUAL (mem/i:DF (const:SI (plus:SI (symbol_ref/f:SI ("*.LC2") [flags 0x2] <complex_cst 0x4025a708>)
(const_int 8 [0x8]))) [1 S8 A64])
(nil)))

(insn:HI 296 300 299 29 /src/gcc-current/libgfortran/generated/product_c8.c:133 (use (symbol_ref/f:SI ("*.LC2") [flags 0x2] <complex_cst 0x4025a708>)) -1 (nil)
(nil))

Hmm, I see sched disassociates the "use" from the insn that references
LC2 via the toc anyway, even where the insn that loads from the toc is
not deleted like insn 297.
--
Alan Modra
IBM OzLabs - Linux Technology Centre
Joern RENNECKE
2005-12-19 12:33:07 UTC
Permalink
Post by Alan Modra
Post by Joern RENNECKE
The match is not attempted with my cross compiler because insn 296 is in
[snip]
Why is is "in the way"?
The other block we compare against has no equivalent USE insn, so the
comparison
stops early with a mismatch.
Post by Alan Modra
BTW, the "use" is just the way the rs6000
backend ties together a toc load with whatever is stored in the toc. So
when we load an address from the toc, the item addressed won't be
deleted as unused.
AFAIK there are no guarantees that such a use will stay where you put
it. This sounds like you
lower too early.

possible alternative strategies:
- allow all constants till after reload, and fix them up in
machine_dependent_reorg (that's what the SH does - but there it is
compilcated by restrictions on the distance between reference and constant).

or

- instead of using a label_ref for the address of the constant, use an
unspec containing the constant. You might fix this up in
machine_dependent_reorg, or even build this modules contribution to the
toc lazily during final as instructions are emitted that use addresses
of entries.
Ian Lance Taylor
2005-12-14 01:15:13 UTC
Permalink
+/* Check if *XP is equivalent to Y. Until an an unreconcilable difference is
+ found, use in-group changes with validate_change on *XP to make register
+ assignments agree. It is the (not necessarily direct) callers
+ responsibility to verify / confirm / cancel these changes, as appropriate.
+ RVALUE indicates if the processed piece of rtl is used as a destination, in
+ which case we can't have different registers being an input. Returns
+ nonzero if the two blocks have been identified as equivalent, zero otherwise.
+ RVALUE == 0: destination
+ RVALUE == 1: source
+ RVALUE == -1: source, ignore SET_DEST of SET / clobber. */
+bool
+rtx_equiv_p (rtx *xp, rtx y, int rvalue, struct equiv_info *info)
I think that it's very unwise to have a function named rtx_equiv_p
which makes changes to its operands. The "_p" suffix implies that the
function is a predicate. Predicates should never change their
operands. The name rtx_equiv_p strongly suggests a function which
tests whether two rtx variables are equivalent. That is not what this
function does.

I also find the comment to be somewhat unclear; what does it mean "to
make register assignments agree?"

Ian
Joern RENNECKE
2005-12-14 14:57:10 UTC
Permalink
Post by Ian Lance Taylor
I think that it's very unwise to have a function named rtx_equiv_p
which makes changes to its operands. The "_p" suffix implies that the
function is a predicate. Predicates should never change their
operands. The name rtx_equiv_p strongly suggests a function which
tests whether two rtx variables are equivalent. That is not what this
function does.
Bernd asked for *_p names for these functions in:
http://gcc.gnu.org/ml/gcc-patches/2005-11/msg02031.html .

Can you please discuss this with him and then tell me what consensus you
reached (if any).
Post by Ian Lance Taylor
I also find the comment to be somewhat unclear; what does it mean "to
make register assignments agree?"
For registers that are dead at the end, it means to make them identical.
For registers that are live at the end and not written to within the
block, it means to replace the original register in *xp with a
placeholder register INFO->input_reg.
Ian Lance Taylor
2005-12-14 20:17:17 UTC
Permalink
Post by Joern RENNECKE
Post by Ian Lance Taylor
I think that it's very unwise to have a function named rtx_equiv_p
which makes changes to its operands. The "_p" suffix implies that the
function is a predicate. Predicates should never change their
operands. The name rtx_equiv_p strongly suggests a function which
tests whether two rtx variables are equivalent. That is not what this
function does.
http://gcc.gnu.org/ml/gcc-patches/2005-11/msg02031.html .
Can you please discuss this with him and then tell me what consensus
you reached (if any).
Basically I think rtx_equiv_p is a really bad name for a function
which is quite different from rtx_equal_p.

And, more generally, I think predicates shouldn't modify their
arguments.

Bernd, any comments?
Post by Joern RENNECKE
Post by Ian Lance Taylor
I also find the comment to be somewhat unclear; what does it mean "to
make register assignments agree?"
For registers that are dead at the end, it means to make them identical.
For registers that are live at the end and not written to within the
block, it means to replace the original register in *xp with a
placeholder register INFO->input_reg.
Can we put something like that in the comment? Or is the algorithm
described in some other comment somewhere?

Ian
Bernd Schmidt
2005-12-14 15:45:19 UTC
Permalink
Post by Joern RENNECKE
Post by Ian Lance Taylor
I think that it's very unwise to have a function named rtx_equiv_p
which makes changes to its operands. The "_p" suffix implies that the
function is a predicate. Predicates should never change their
operands. The name rtx_equiv_p strongly suggests a function which
tests whether two rtx variables are equivalent. That is not what this
function does.
http://gcc.gnu.org/ml/gcc-patches/2005-11/msg02031.html .
Can you please discuss this with him and then tell me what consensus you
reached (if any).
I'm open for better suggestions. I see your point that a predicate
shouldn't modify its inputs, but for the most part, this function _does_
verify that the two rtxes are equivalent.


Bernd
Mike Stump
2005-12-16 18:55:31 UTC
Permalink
Post by Bernd Schmidt
Post by Joern RENNECKE
http://gcc.gnu.org/ml/gcc-patches/2005-11/msg02031.html .
Can you please discuss this with him and then tell me what
consensus you reached (if any).
I'm open for better suggestions. I see your point that a predicate
shouldn't modify its inputs, but for the most part, this function
_does_ verify that the two rtxes are equivalent.
bla_p names are reserved for a small class of functions that never
hurt anything. If you can tell it did something, then, it can't be
called bla_p. An example of a reasonable use of _p would be if the
object contains some bits that are modified that serve as a cache or
a speed up to make things go faster. And example of something that
isn't ok would be if you merge bits from two delcs into one of them
and return whether or not they were the same.

A cursory look at the code appears to be a modification of at least
one of the operands. unify/merge/smash come to mind for names, not _p.
Ian Lance Taylor
2005-12-20 06:26:37 UTC
Permalink
Post by Bernd Schmidt
Post by Joern RENNECKE
Post by Ian Lance Taylor
I think that it's very unwise to have a function named rtx_equiv_p
which makes changes to its operands. The "_p" suffix implies that the
function is a predicate. Predicates should never change their
operands. The name rtx_equiv_p strongly suggests a function which
tests whether two rtx variables are equivalent. That is not what this
function does.
http://gcc.gnu.org/ml/gcc-patches/2005-11/msg02031.html .
Can you please discuss this with him and then tell me what consensus
you reached (if any).
I'm open for better suggestions. I see your point that a predicate
shouldn't modify its inputs, but for the most part, this function
_does_ verify that the two rtxes are equivalent.
Yes, but it does more than that too.

I would suggest renaming insns_match_p to struct_equiv_match_insns,
and renaming rtx_equiv_p to struct_equiv_match_rtx.

Ian
Ulrich Weigand
2005-12-14 19:33:26 UTC
Permalink
PR rtl-optimization/20070 / part1
This seems to cause a new bad-code regression on s390(x)
FAIL: gcc.c-torture/execute/stdarg-2.c execution, -Os

What happens is that before if-conversion (.ce3), I have

block A:
...
(set (reg 2 [71]) (symbol_ref X))
(set (mem (reg 2 [71])) (reg 1 [74]))
(set (pc) (label_ref L))

block B:
...
(set (reg 1 [84]) (symbol_ref X))
(set (mem (reg 1 [84])) (reg 2 [87]))
(set (pc) (label_ref L))

and after .ce3 I see

block A:
...
(set (reg 1 [84]) (reg 2 [71])) (***)
(set (reg 2 [87]) (reg 1 [74]))
(set (pc) (label_ref N))

block B:
...
new block (label_ref N):
(set (reg 1 [84]) (symbol_ref X))
(set (mem (reg 1 [84])) (reg 2 [87]))


It would appear the insn marked (***) is the incorrect one ...

Any suggestions how to fix or further debug this?

Bye,
Ulrich
--
Dr. Ulrich Weigand
Linux on zSeries Development
***@de.ibm.com
Joern RENNECKE
2005-12-19 12:13:33 UTC
Permalink
Post by Ulrich Weigand
PR rtl-optimization/20070 / part1
What happens is that before if-conversion (.ce3), I have
..
Post by Ulrich Weigand
and after .ce3 I see
..
Post by Ulrich Weigand
Any suggestions how to fix or further debug this?
part 1 doesn't even touch if-conversion. Either if-conversion got
confused by some invalid
input (e.g., is the flowgraph corrupt?), or it has a latent bug that is
triggered.
Ulrich Weigand
2005-12-19 17:03:51 UTC
Permalink
Post by Joern RENNECKE
part 1 doesn't even touch if-conversion. Either if-conversion got
confused by some invalid
input (e.g., is the flowgraph corrupt?), or it has a latent bug that is
triggered.
Sorry, I was reading the wrong file. The corrupt code gets introduced
already in .41.csa, during the cross-jumping optimization:

Cross jumping from bb 14 to bb 21; 2 common insns, 2 local registers

Bye,
Ulrich
--
Dr. Ulrich Weigand
Linux on zSeries Development
***@de.ibm.com
Ulrich Weigand
2005-12-19 20:46:50 UTC
Permalink
Post by Ulrich Weigand
...
(set (reg 2 [71]) (symbol_ref X))
(set (mem (reg 2 [71])) (reg 1 [74]))
(set (pc) (label_ref L))
...
(set (reg 1 [84]) (symbol_ref X))
(set (mem (reg 1 [84])) (reg 2 [87]))
(set (pc) (label_ref L))
and after .ce3 I see
...
(set (reg 1 [84]) (reg 2 [71])) (***)
(set (reg 2 [87]) (reg 1 [74]))
(set (pc) (label_ref N))
...
(set (reg 1 [84]) (symbol_ref X))
(set (mem (reg 1 [84])) (reg 2 [87]))
It would appear the insn marked (***) is the incorrect one ...
Some more data: the insn (***) is generated here

#0 make_insn_raw (pattern=0x77e720c0) at ../../gcc-head/gcc/emit-rtl.c:3314
#1 0x005a9912 in emit_insn (x=0x77e7b780) at ../../gcc-head/gcc/emit-rtl.c:4461
#2 0x0063ee1e in gen_movsi (operand0=0x77e720c0, operand1=0x77e67d50) at s390.md:1151
#3 0x005d2bec in emit_move_insn_1 (x=0x77e720c0, y=0x77e67d50) at ../../gcc-head/gcc/expr.c:3021
#4 0x006772ba in gen_move_insn (x=0x77e720c0, y=0x77e67d50) at ../../gcc-head/gcc/optabs.c:4233
#5 0x0053c566 in try_crossjump_to_edge (mode=19, e1=0x77e7b780, e2=0x77e614e0) at ../../gcc-head/gcc/cfgcleanup.c:1301
#6 0x0053d5b0 in try_crossjump_bb (mode=19, bb=0x77e5f2d8) at ../../gcc-head/gcc/cfgcleanup.c:1568
#7 0x0053e2c8 in cleanup_cfg (mode=19) at ../../gcc-head/gcc/cfgcleanup.c:1757
#8 0x00809fe2 in rest_of_handle_stack_adjustments () at ../../gcc-head/gcc/regmove.c:2512
#9 0x0071a128 in execute_one_pass (pass=0x92f878) at ../../gcc-head/gcc/passes.c:845
#10 0x0071a2ac in execute_pass_list (pass=0x92f878) at ../../gcc-head/gcc/passes.c:877
#11 0x0071a2bc in execute_pass_list (pass=0x92ea44) at ../../gcc-head/gcc/passes.c:878
#12 0x0071a2bc in execute_pass_list (pass=0x92ea78) at ../../gcc-head/gcc/passes.c:878
#13 0x0048de1a in tree_rest_of_compilation (fndecl=0x77e55100) at ../../gcc-head/gcc/tree-optimize.c:419
#14 0x00415646 in c_expand_body (fndecl=0x77e55100) at ../../gcc-head/gcc/c-decl.c:6648
#15 0x00763fb6 in cgraph_expand_function (node=0x77dba3f0) at ../../gcc-head/gcc/cgraphunit.c:1055
#16 0x00764fb8 in cgraph_optimize () at ../../gcc-head/gcc/cgraphunit.c:1121
#17 0x00417ed8 in c_write_global_declarations () at ../../gcc-head/gcc/c-decl.c:7754
#18 0x006e68c0 in toplev_main (argc=9819952, argv=0x586a2c) at ../../gcc-head/gcc/toplev.c:1003
#19 0x0046c5f8 in main (argc=2011674496, argv=0x77e34c00) at ../../gcc-head/gcc/main.c:35

in try_crossjump_to_edge:

1299 for (i = 0; i < info.cur.local_count; i++)
1300 if (info.local_rvalue[i])
1301 emit_insn_before (gen_move_insn (info.x_local[i], info.y_local[i]),
1302 y_active);

where

(gdb) print info.cur.local_count
$2 = 2
(gdb) call debug_rtx (info.x_local[0])
(reg/f:SI 1 %r1 [83])
(gdb) call debug_rtx (info.y_local[0])
(reg/f:SI 2 %r2 [70])
(gdb) call debug_rtx (info.x_local[1])
(reg:SI 2 %r2 [86])
(gdb) call debug_rtx (info.y_local[1])
(reg:SI 1 %r1 [73])

which doesn't seem right. In fact, there's code in resolve_input_conflict
that's supposed to prevent just this problem, but this function is never
even called in my test case, because info->check_input_conflict is never
set. This in turn happens because this piece of code in struct_equiv_block_eq,
which is supposed to set that flag as far as I can tell:

if (!input_conflict
&& info->dying_inputs > 1
&& bitmap_intersect_p (info->x_local_live, info->y_local_live))
{
regset_head clobbered_regs;

INIT_REG_SET (&clobbered_regs);
for (i = 0; i < info->cur.local_count; i++)
{
if (assign_reg_reg_set (&clobbered_regs, info->y_local[i], 0))
{
input_conflict = true;
break;
}
assign_reg_reg_set (&clobbered_regs, info->x_local[i], 1);
}
CLEAR_REG_SET (&clobbered_regs);
}
if (input_conflict && !info->check_input_conflict)
info->need_rerun = true;
info->check_input_conflict = input_conflict;
}

is never entered because the bitmap_intersect_p call fails:

(gdb) call debug_bitmap (info->x_local_live)

first = 0x9afd38 current = 0x9afd38 indx = 0
0x9afd38 next = (nil) prev = (nil) indx = 0
bits = { 2 }
(gdb) call debug_bitmap (info->y_local_live)

first = 0x9afdac current = 0x9afdac indx = 0
0x9afdac next = (nil) prev = (nil) indx = 0
bits = { 1 }

If I comment out the bitmap_intersect_p call, the cross-jump opportunity
is rejected and I get valid code. (However, this doesn't seem ideal either
since we *can* do a cross-jump in this particular case ...)

Any suggestions?

Thanks,
Ulrich
--
Dr. Ulrich Weigand
Linux on zSeries Development
***@de.ibm.com
Joern RENNECKE
2006-01-03 22:16:15 UTC
Permalink
Post by Ulrich Weigand
(gdb) call debug_bitmap (info->x_local_live)
first = 0x9afd38 current = 0x9afd38 indx = 0
0x9afd38 next = (nil) prev = (nil) indx = 0
bits = { 2 }
(gdb) call debug_bitmap (info->y_local_live)
first = 0x9afdac current = 0x9afdac indx = 0
0x9afdac next = (nil) prev = (nil) indx = 0
bits = { 1 }
At what point of the block matching did you look at x_local_live and
y_local_live ?
When the last insn of the match has been processed, bits 1 and 2 should
be set for both
blocks, and hence the bitmaps intersect.
Ulrich Weigand
2006-01-10 00:37:35 UTC
Permalink
Post by Joern RENNECKE
Post by Ulrich Weigand
(gdb) call debug_bitmap (info->x_local_live)
first = 0x9afd38 current = 0x9afd38 indx = 0
0x9afd38 next = (nil) prev = (nil) indx = 0
bits = { 2 }
(gdb) call debug_bitmap (info->y_local_live)
first = 0x9afdac current = 0x9afdac indx = 0
0x9afdac next = (nil) prev = (nil) indx = 0
bits = { 1 }
At what point of the block matching did you look at x_local_live and
y_local_live ?
At the first time the block of code at struct-equiv.c:1236 is entered.

(gdb) call debug_rtx (info->cur.x_start)
(insn 193 138 139 21 (set (reg/f:SI 1 %r1 [83])
(mem/u/c/i:SI (unspec:SI [
(symbol_ref/u:SI ("*.LC0") [flags 0x2])
(reg:SI 13 %r13)
] 212) [2 S4 A32])) 52 {*movsi_esa} (nil)
(expr_list:REG_DEAD (reg:SI 13 %r13)
(expr_list:REG_EQUIV (symbol_ref:SI ("foo_arg") <var_decl 0x77dba1c0 foo_arg>)
(nil))))

(gdb) call debug_rtx (info->cur.y_start)
(insn 194 87 88 14 (set (reg/f:SI 2 %r2 [70])
(mem/u/c/i:SI (unspec:SI [
(symbol_ref/u:SI ("*.LC0") [flags 0x2])
(reg:SI 13 %r13)
] 212) [2 S4 A32])) 52 {*movsi_esa} (nil)
(expr_list:REG_DEAD (reg:SI 13 %r13)
(expr_list:REG_EQUIV (symbol_ref:SI ("foo_arg") <var_decl 0x77dba1c0 foo_arg>)
(nil))))

Those are the top-most insns of the match. As I understand it, the [xy]_local_live
bitmaps now correspond to the live information at the points immediately preceding
those two insns.
Post by Joern RENNECKE
When the last insn of the match has been processed, bits 1 and 2 should
be set for both blocks, and hence the bitmaps intersect.
How come? Before those two insns, register 1 in block X is *not* live,
and same for register 2 in block Y.

Bye,
Ulrich
--
Dr. Ulrich Weigand
Linux on zSeries Development
***@de.ibm.com
Joern RENNECKE
2006-01-10 16:46:30 UTC
Permalink
Post by Ulrich Weigand
Those are the top-most insns of the match. As I understand it, the [xy]_local_live
bitmaps now correspond to the live information at the points immediately preceding
those two insns.
Post by Joern RENNECKE
When the last insn of the match has been processed, bits 1 and 2 should
be set for both blocks, and hence the bitmaps intersect.
How come? Before those two insns, register 1 in block X is *not* live,
and same for register 2 in block Y.
Oops, I got confused when you brougt up resolve_input_conflict. Since the
registers are not live at the start, they are not inputs; they are used
only locally.
Hence there is no input conflict, either.
Post by Ulrich Weigand
1299 for (i = 0; i < info.cur.local_count; i++)
1300 if (info.local_rvalue[i])
1301 emit_insn_before (gen_move_insn (info.x_local[i],
info.y_local[i]),
Post by Ulrich Weigand
1302 y_active);
where
(gdb) print info.cur.local_count
$2 = 2
(gdb) call debug_rtx (info.x_local[0])
(reg/f:SI 1 %r1 [83])
(gdb) call debug_rtx (info.y_local[0])
(reg/f:SI 2 %r2 [70])
(gdb) call debug_rtx (info.x_local[1])
(reg:SI 2 %r2 [86])
(gdb) call debug_rtx (info.y_local[1])
(reg:SI 1 %r1 [73])
You have to look at info.local_rvalue for what registers are actually
inputs.
Ulrich Weigand
2006-01-11 02:56:51 UTC
Permalink
Post by Joern RENNECKE
Oops, I got confused when you brougt up resolve_input_conflict. Since the
registers are not live at the start, they are not inputs; they are used
only locally.
Hence there is no input conflict, either.
[snip]
Post by Joern RENNECKE
You have to look at info.local_rvalue for what registers are actually
inputs.
Ah, this is indeed the bug: info.local_rvalue is true for *both* register
1 and 2, because of an off-by-one bug in find_dying_inputs fixed by the
patch below. This fixes my test case.

I'm still running a test cycle on this patch, once it completes I guess
I can commit this as obvious ...

Bye,
Ulrich


ChangeLog:

* struct-equiv.c (find_dying_inputs): Fix off-by-one bug.

Index: gcc/struct-equiv.c
===================================================================
*** gcc/struct-equiv.c (revision 109020)
--- gcc/struct-equiv.c (working copy)
*************** find_dying_inputs (struct equiv_info *in
*** 1280,1286 ****
int nregs = (regno >= FIRST_PSEUDO_REGISTER
? 1 : hard_regno_nregs[regno][GET_MODE (x)]);

! for (info->local_rvalue[i] = false; nregs >= 0; regno++, --nregs)
if (REGNO_REG_SET_P (info->x_local_live, regno))
{
info->dying_inputs++;
--- 1280,1286 ----
int nregs = (regno >= FIRST_PSEUDO_REGISTER
? 1 : hard_regno_nregs[regno][GET_MODE (x)]);

! for (info->local_rvalue[i] = false; nregs > 0; regno++, --nregs)
if (REGNO_REG_SET_P (info->x_local_live, regno))
{
info->dying_inputs++;
--
Dr. Ulrich Weigand
Linux on zSeries Development
***@de.ibm.com
Joern RENNECKE
2006-01-11 14:24:51 UTC
Permalink
Post by Ulrich Weigand
Ah, this is indeed the bug: info.local_rvalue is true for *both* register
1 and 2, because of an off-by-one bug in find_dying_inputs fixed by the
patch below. This fixes my test case.
Oops, indeed. I suppose that happened because I'm still used to the
idiom --nregs >= 0,
which we phase out for stylistic reasons and changes in prevalent
relative operation
costs.
Ulrich Weigand
2006-01-12 17:23:27 UTC
Permalink
Post by Joern RENNECKE
Oops, indeed. I suppose that happened because I'm still used to the
idiom --nregs >= 0, which we phase out for stylistic reasons and
changes in prevalent relative operation costs.
I've committed the fix now.

Bye,
Ulrich
--
Dr. Ulrich Weigand
Linux on zSeries Development
***@de.ibm.com
Joern RENNECKE
2005-12-07 13:37:54 UTC
Permalink
Post by Bernd Schmidt
A patch with no functional changes to split out condjump_equiv_p and
move other stuff into a new file should be committed separately first.
That is preapproved, please install after testing.
I have checked in the attached patch (with ChangeLog date of today).
Tested on i686-pc-linux-gnu native, X sh-elf and X sh64-elf.
Paul Brook
2005-12-14 23:03:12 UTC
Permalink
        PR rtl-optimization/20070 / part1
* basic-block.h (STRUCT_EQUIV_START, STRUCT_EQUIV_RERUN): Define.
(STRUCT_EQUIV_FINAL, STRUCT_EQUIV_MAX_LOCAL): Likewise.
(STRUCT_EQUIV_NEED_FULL_BLOCK): Likewise.
(struct struct_equiv_checkpoint, struct equiv_info): Likewise.
(insns_match_p, struct_equiv_block_eq, struct_equiv_init): Declare.
(rtx_equiv_p, condjump_equiv_p): Likewise.
* cfgcleanup.c (merge_memattrs, insns_match_p): Move from here into..
* struct-equiv.c: New file.
(IMPOSSIBLE_MOVE_FACTOR): Define.
(assign_reg_reg_set, rtx_equiv_p, struct_equiv_set): New functions.
(struct_equiv_dst_mem, struct_equiv_make_checkpoint): Likewise.
(struct_equiv_improve_checkpoint): Likewise.
+#ifdef HAVE_cc0
+ if (reg_mentioned_p (cc0_rtx, info->x_start) && !sets_cc0_p
(info->x_start))
+ return;
+#endif
This breaks cc0 targets:
../../gcc/gcc/struct-equiv.c: In function 'struct_equiv_improve_checkpoint':
../../gcc/gcc/struct-equiv.c:252: error: 'struct equiv_info' has no member
named 'x_start'

Paul
Steven Bosscher
2005-12-01 17:41:48 UTC
Permalink
Post by Bernd Schmidt
Post by Joern RENNECKE
Post by Bernd Schmidt
+ /* The following code helps take care of G++ cleanups. */
Comment not helpful. How does it help? What's special about G++ cleanups?
I'm afraid that I have no idea. That stuff comes from the old
cross-jumping code too,
added by Jan Hubicka on the 11th July 2001 to flow.c , moved on the 10th
September 2001
to cfgcleanup.c, later broken out into its own function, and reformatted
a number of times.
Ah, I see, it occurs somewhere else too. Common parts should move into
a separate function - unless the corresponding bits in insns_match_p
aren't actually needed anymore (gcov shows no coverage, but that may not
mean anything)?
Oh dear, this comment is even older than Jan's change - it was present
in the old jump.c cross jumper.
In fact it was already there in the initial check-in of jump.c in
the former old-gcc repository, back in January of 1992. See here:
http://gcc.gnu.org/viewcvs/trunk/gcc/jump.c?rev=236&view=markup

But that code never triggers for me. I bootstrapped and tested
all languages except Ada with this patch on x86_64-linux:

---------------------------------------
Index: cfgcleanup.c
===================================================================
--- cfgcleanup.c (revision 107822)
+++ cfgcleanup.c (working copy)
@@ -1073,7 +1073,7 @@ insns_match_p (int mode ATTRIBUTE_UNUSED
if (! rtx_renumbered_equal_p (p1, p2))
cancel_changes (0);
else if (apply_change_group ())
- return true;
+ gcc_unreachable ();
}
}
}
---------------------------------------

I didn't get any new failures. Maybe we should just remove this
code.

Gr.
Steven
Jan Hubicka
2005-12-01 18:06:30 UTC
Permalink
Post by Steven Bosscher
Post by Bernd Schmidt
Post by Joern RENNECKE
Post by Bernd Schmidt
+ /* The following code helps take care of G++ cleanups. */
Comment not helpful. How does it help? What's special about G++ cleanups?
I'm afraid that I have no idea. That stuff comes from the old
cross-jumping code too,
added by Jan Hubicka on the 11th July 2001 to flow.c , moved on the 10th
September 2001
to cfgcleanup.c, later broken out into its own function, and reformatted
a number of times.
Ah, I see, it occurs somewhere else too. Common parts should move into
a separate function - unless the corresponding bits in insns_match_p
aren't actually needed anymore (gcov shows no coverage, but that may not
mean anything)?
Oh dear, this comment is even older than Jan's change - it was present
in the old jump.c cross jumper.
In fact it was already there in the initial check-in of jump.c in
http://gcc.gnu.org/viewcvs/trunk/gcc/jump.c?rev=236&view=markup
But that code never triggers for me. I bootstrapped and tested
---------------------------------------
Index: cfgcleanup.c
===================================================================
--- cfgcleanup.c (revision 107822)
+++ cfgcleanup.c (working copy)
@@ -1073,7 +1073,7 @@ insns_match_p (int mode ATTRIBUTE_UNUSED
if (! rtx_renumbered_equal_p (p1, p2))
cancel_changes (0);
else if (apply_change_group ())
- return true;
+ gcc_unreachable ();
}
}
}
---------------------------------------
I didn't get any new failures. Maybe we should just remove this
code.
I would suggest so. The code was originally intended (I think, it is
really arcane) to help pre-reload crossjumping we never ended up doing
crossjumping on equivalent code sequences with different temporaries.

I experimented with enabling crossjuping before reload in the past and
it didn't work well enought to pay back (ie usually didn't get across
the different pseudos problem anyway) and there are better algorithms to
do that.

Honza
Post by Steven Bosscher
Gr.
Steven
Joern RENNECKE
2005-12-01 18:57:48 UTC
Permalink
Post by Jan Hubicka
I experimented with enabling crossjuping before reload in the past and
it didn't work well enought to pay back (ie usually didn't get across
the different pseudos problem anyway)
Well, with my patch, the different pseudos problem can be expected to be
solved,
as long as there are not too many registers involved (in which case this
wouldn't
work well as a local optimization anyway).
Post by Jan Hubicka
and there are better algorithms to
do that.
Which ones do you have in mind?
Steven Bosscher
2005-12-01 19:31:07 UTC
Permalink
Post by Joern RENNECKE
Post by Jan Hubicka
and there are better algorithms to
do that.
Which ones do you have in mind?
I was already thinking about using the tree-ssa value numbering
framework for this.

Gr.
Steven
Jan Hubicka
2005-12-01 19:40:53 UTC
Permalink
Post by Joern RENNECKE
Post by Jan Hubicka
I experimented with enabling crossjuping before reload in the past and
it didn't work well enought to pay back (ie usually didn't get across
the different pseudos problem anyway)
Well, with my patch, the different pseudos problem can be expected to be
solved,
as long as there are not too many registers involved (in which case this
wouldn't
work well as a local optimization anyway).
Could you try how it behaves pre-reload then? (if you can send me
patch, I can give it a SPEC run too) I saw pretty common case where code
duplication prevented ifconvert, so it might be interesting to see how
things are affected by this.
Post by Joern RENNECKE
Post by Jan Hubicka
and there are better algorithms to
do that.
Which ones do you have in mind?
I tend to believe that most of the savings should come from code
unification pretty much like one we have in gcse.c. The gcse
implementation don't work very well since it can't handle chained
unifications, but at least on tree-ssa this should be resonably easy.
I am not sure how much duplicates comes out from RTL lowering that are
important to be caught on RTL level, I must say.

Honza
Steven Bosscher
2005-12-01 20:02:57 UTC
Permalink
Post by Jan Hubicka
Post by Joern RENNECKE
Post by Jan Hubicka
I experimented with enabling crossjuping before reload in the past and
it didn't work well enought to pay back (ie usually didn't get across
the different pseudos problem anyway)
Well, with my patch, the different pseudos problem can be expected to be
solved,
as long as there are not too many registers involved (in which case this
wouldn't
work well as a local optimization anyway).
Could you try how it behaves pre-reload then? (if you can send me
patch, I can give it a SPEC run too) I saw pretty common case where code
duplication prevented ifconvert, so it might be interesting to see how
things are affected by this.
Our if-conversion pass should be made smart enough to do head- and
tail merging when comparing blocks. We have at least one open bug
report which shows how we miss a seemingly trivial if-conversion
opportunity because we fail to merge away a few insns. And AFAIU,
Joern's patch that we're discussing in this thread is actually also
trying to fix such a case.

Richard Guenther was working on a new pre-reload if-conversion pass,
perhaps we can work on it together and make it smart enough to do
this kind of thing...?
Post by Jan Hubicka
Post by Joern RENNECKE
Post by Jan Hubicka
and there are better algorithms to
do that.
Which ones do you have in mind?
I tend to believe that most of the savings should come from code
unification pretty much like one we have in gcse.c. The gcse
implementation don't work very well since it can't handle chained
unifications, but at least on tree-ssa this should be resonably easy.
We've discussed implementing hoisting and sinking in the tree-ssa-pre
framework before. It should be quite easy to do this. In fact, Dan
Berlin and I discussed some ideas as far back as June 2004. Something
like the following:

The hoist phase works as follows:
1. Find the hoisting candidates. Those are the ones in the ANTIC_IN
sets of each block using the normal global code hoisting data flow
equations (iirc ANTIC_IN in all successors of a block B, but not
in AVAIL_OUT of B, or something like that).
2. Determine the local cost of a value in each block (e.g. some cost
function that could taking into account things like estimated size,
register pressure, frequencies of the basic blocks that the value
iscomputed in, etc.), and select which values to hoist.
3. Insert expressions to compute the values at the appropriate points,
i.e. such that the value is available in the blocks you want to
hoist from, and updating the AVAIL_OUT blocks accordingly. Normal
FRE/PRE takes care of eliminating the expressions in those blocks.
Post by Jan Hubicka
I am not sure how much duplicates comes out from RTL lowering that are
important to be caught on RTL level, I must say.
Probably quite a bit, but those should be the easy cases.

Gr.
Steven
Joern RENNECKE
2005-12-01 20:28:01 UTC
Permalink
Post by Steven Bosscher
1. Find the hoisting candidates. Those are the ones in the ANTIC_IN
sets of each block using the normal global code hoisting data flow
equations (iirc ANTIC_IN in all successors of a block B, but not
in AVAIL_OUT of B, or something like that).
The code that originally prompted me to work on if-conversion was a
multi-way
min or max operation. If you don't have any way to express that several
blocks
compute a value that is an if_then_else, you loose.
Joern RENNECKE
2005-12-01 23:46:51 UTC
Permalink
Post by Jan Hubicka
Could you try how it behaves pre-reload then? (if you can send me
patch, I can give it a SPEC run too) I saw pretty common case where code
duplication prevented ifconvert, so it might be interesting to see how
things are affected by this.
I added some sanity checks and they trigger on i686-pc-linux-gnu. It
appears that
we don't have consistent register liveness information to start with.
I've started
to add some more debug code (not in the ChangeLog), but I can't finish
this today.

I have attached what I have so far.
Joern RENNECKE
2005-12-01 18:53:08 UTC
Permalink
Post by Steven Bosscher
But that code never triggers for me.
Since we are not running any cross-jumping pass before reload at the
moment, it can't trigger.
Joern RENNECKE
2005-12-02 19:33:42 UTC
Permalink
Post by Bernd Schmidt
Why are we using true_regnum at all here, post reload? All regs
should be hardregs at that point, and subregs of them should have been
eliminated.
Neither is the case even now.

i686-pc-linux-gnu bootstrap: REG_EQUAL notes containing pseudos:

(gdb) call debug_rtx(info->cur.x_start)
(insn:HI 161 160 162 23 ../../srcw/gcc/alias.c:1576 (set (reg:QI 0 ax [91])
(lt:QI (reg:CCGC 17 flags)
(const_int 0 [0x0]))) 346 {*setcc_1} (insn_list:REG_DEP_TRUE
160 (nil))
(expr_list:REG_DEAD (reg:CCGC 17 flags)
(expr_list:REG_EQUAL (lt:QI (reg:SI 2 cx [orig:64 iftmp.226 ] [64])
(reg:SI 88))
(nil))))
(gdb) call debug_rtx(info->cur.y_start)
(insn:HI 124 123 125 17 ../../srcw/gcc/alias.c:1572 (set (reg:QI 0 ax [83])
(lt:QI (reg:CCGC 17 flags)
(const_int 0 [0x0]))) 346 {*setcc_1} (insn_list:REG_DEP_TRUE
123 (nil))
(expr_list:REG_DEAD (reg:CCGC 17 flags)
(expr_list:REG_EQUAL (lt:QI (reg:SI 2 cx [orig:67 iftmp.224 ] [67])
(reg:SI 80))
(nil))))

sh64-elf build: SUBREGs of hard registers in REG_EQUAL notes.

(gdb) call debug_rtx(info->cur.x_start)
(insn:HI 391 773 1346 37 dp-bit.c:329 (set (reg:SI 2 r2)
(and:SI (reg:SI 2 r2)
(reg:SI 8 r8 [orig:324+4 ] [324]))) 83 {*andsi3_compact}
(insn_list:REG_DEP_TRUE 388 (insn_list:REG_DEP_TRUE 389
(insn_list:REG_DEP_TRUE 390 (nil))))
(expr_list:REG_DEAD (reg:SI 8 r8 [orig:324+4 ] [324])
(expr_list:REG_EQUAL (and:SI (subreg:SI (reg:DI 3 r3 [orig:321
exp ] [321]) 4)
(const_int 2047 [0x7ff]))
(expr_list:REG_NO_CONFLICT (reg:DI 3 r3 [orig:321 exp ] [321])
(expr_list:REG_NO_CONFLICT (reg:DI 7 r7 [324])
(nil))))))
(gdb) call debug_rtx(info->cur.y_start)
(insn:HI 258 656 1634 25 dp-bit.c:286 (set (reg:SI 2 r2)
(and:SI (reg:SI 2 r2)
(reg:SI 8 r8 [orig:254+4 ] [254]))) 83 {*andsi3_compact}
(insn_list:REG_DEP_TRUE 255 (insn_list:REG_DEP_TRUE 256
(insn_list:REG_DEP_TRUE 257 (nil))))
(expr_list:REG_DEAD (reg:SI 8 r8 [orig:254+4 ] [254])
(expr_list:REG_EQUAL (and:SI (subreg:SI (reg:DI 3 r3 [orig:251
prephitmp.61 ] [251]) 4)
(const_int 2047 [0x7ff]))
(expr_list:REG_NO_CONFLICT (reg:DI 3 r3 [orig:251
prephitmp.61 ] [251])
(expr_list:REG_NO_CONFLICT (reg:DI 7 r7 [254])
(nil))))))

This was building little-endian -m5-compact _pack_df.o; the REG_EQUAL
notes were still factually correct.
Joern RENNECKE
2005-12-02 19:45:32 UTC
Permalink
Post by Joern RENNECKE
Post by Bernd Schmidt
Why are we using true_regnum at all here, post reload? All regs
should be hardregs at that point, and subregs of them should have
been eliminated.
Neither is the case even now.
P.S.: This is the patch I've used. I both cases, a gcc_assert in
rtx_equiv_p is triggered.
Joern RENNECKE
2005-11-30 22:25:03 UTC
Permalink
Post by Bernd Schmidt
+ RVALUE == 0: destination
+ RVALUE == 1: source
+ RVALUE == -1: source, ignore SET_DEST of SET / clobber. */
Better an enum.
Hmm, in order to get rid of insns_match_p, I have to declare rtx_equiv_p
in basic-block.h,
and hence also this enum. I fear that can cause a lot of confuction in
gdb when 1, 0 and -1
is always displayed as the enum values.

enum equiv_rvalue
{ dst = 0, /* destination */
src = 1, /* source */
ignore_dst = -1 /* source, ignore SET_DEST of SET / clobber. */
};

extern int struct_equiv_block_eq (int mode, struct equiv_info *info);
extern bool rtx_equiv_p (rtx *xp, rtx y, equiv_rvalue rvalue,
Continue reading on narkive:
Loading...