Post by Bernd SchmidtCan we postpone the ifcvt changes? They look quite straightforward,
but it seems like they are independent of the rest. Once we've done
the heavy lifting to get the first part in, we can revisit this.
That's a bit backwards with respect to the motivation of this patch, as
the starting point was the if-conversion enhancement, but I am willing
to split it up like this if you are prepared to review both parts.
Post by Bernd SchmidtOne general comment: You're matching entire pieces of RTL. Wouldn't
it be possible/simpler to match insn_codes, extract the operands, and
then just look at those? I have in mind something that looks a bit
like parts of regrename.
It wouldn't be simpler in code, since we have to match arbitrary rtl for
operands and notes. Even if we spent lots of time
to make genrecog emit a an optimized parser like bison or byacc do -
which I don't see that happening anytime soon - I
would still expect it to be slower than matching two rtl structures to
compare them. At the moment, I think using recog
will be much slower. Have you run a debugging session recently
figuring out why a pattern was not matched? The code
looks quite horrible.
We could also fail to merge memory attributes where the match_operand is
of for the memory address inside.
And finally, we could loose some optimizations when separate insn
patterns for registers and/or constants are used, and the ease of
implementing recognizing cases where constants will be used in
different-looking rtl than a variant using a register because of rtl
canonicalizations applied.
Post by Bernd SchmidtI was wondering about the use of the find_reg_equal_equiv_note - lots
of code seems to use that function, but do we really want to mess with
REG_EQUIV notes?
We have to avoid leaving REG_EQUIV or REG_EQUAL notes around that are
factually wrong - they can lead to incorrect code in later optimization
steps.
It is also not a good idea to willy-nilly remove notes when we are not
going to do any optimization at all, or if the notes are still correct -
we pay with missed rematerialization in reload for missing notes.
Post by Bernd SchmidtLike Stephen B., I'd like to see this moved to a separate file; then
you can also lose the struct_equiv_ prefix from most function names
and make them more meaningful.
struct-equiv.c ?
I suppose we can keep the header stuff in basic-block.h, it would be
confusing an error-prone if the STRUCT_EQUIV_START etc. numbers are in a
different file, but the same number sequence as the CLEANUP_* numbers.
Post by Bernd Schmidt+/* A struct equiv_info is used to pass information to
struct_equiv and
+ to gather state while two basic blocks are checked for structural
+ equivalence.
Like Stephen, I'd like to see the rest of this comment distributed so
that each structure member has its own comment in front of it. See
also below - I'd like to see a struct_equiv_checkpoint used inside
this monster struct to keep track of current state. Further
improvements might be to split off those members which track state,
and others which are just inputs set by the callers of the
struct_equiv pass ("struct struct_equiv_params"?). Such a split could
either result in two structures, or just two clearly separated areas
inside the same structure. There should be some kind of logical
grouping instead of just a collection of members.
I don't think we should have multiple separate structures; that would
unnecessarily increase the argument passing.
Post by Bernd Schmidt+ NEED_FULL_BLOCK is set when the entire basic blocks have to
match, as
+ in if-conversion. It is cleared for cross-jumping.
Might be better as an arg to struct_equiv_block_eq or part of a params
struct, since it's only needed in the toplevel function. Anything to
get this struct here down in size...
I think this can actually folded as a STRUCT_EQUIV_NEED_FULL_BLOCK into
mode.
I have attached a draft of the new struct declaration.
Post by Bernd Schmidt+ bool local_rvalue[STRUCT_EQUIV_MAX_LOCAL];
+ rtx x_local[STRUCT_EQUIV_MAX_LOCAL],
y_local[STRUCT_EQUIV_MAX_LOCAL];
Might want to turn this into an array of structs. Then we could also
make STRUCT_EQUIV_MAX_LOCAL a runtime param more easily.
While x_local and y_local generally appear toghether, local_rvalue does
not. Having an array of a struct of two rtx and one bool also causes
inefficient use of space and awkward indexing. Is there any reason to
make MAX_LOCAL a runtime param? The algorithm is not designed to scale
well with a really large MAX_LOCAL.
Post by Bernd Schmidt+ change *= IMPOSSIBLE_MOVE_FACTOR;
Just an assignment would be enough? Wouldn't like this to overflow,
and I don't think we need different grades of impossible.
Actually, we do. As explained in the comment above
IMPOSSIBLE_MOVE_FACTOR, we don't know during scanning if the register
pair under consideration is actually an input for the block we'll end up
with, or just a pair of local registers that are set and die in the
block. E.g. you could have a register in a class that needs a secondary
reload to move between registers in the same class, like the branch
target registers on the SH5; there are also at least three such classes
on ia64.
So, input_cost might be temporarily greater or equal to
IMPOSSIBLE_MOVE_FACTOR, but decrease later to a more mundane value.
Note that change is a signed value.
An overflow is not possible with 32 bit integers on the host for any
current target; you'd have to have more than 107374 hard registers for
that to happen, or patch STRUCT_EQUIV_MAX_LOCAL to a number larger than
that and run before reload.
Would you like me to add a sentence to the comment above
IMPOSSIBLE_MOVE_FACTOR, like below?
This number, multiplied with the larger of STRUCT_EQUIV_MAX_LOCAL and
FIRST_PSEUDO_REGISTER, must fit in the input_cost field of struct
equiv_info.
Post by Bernd Schmidt+/* Do only the struct_equiv SET_DEST processing for SETs and
CLOBBERs. */
+static bool
+struct_equiv_set (rtx x, rtx y, struct equiv_info *info)
Not too happy with some of these function names. "set_dest_equiv_p"?
Yes, that makes sense if it is in a separate file. Although it makes it
less obvious that it's a variation on the same theme as struct_equiv.
Maybe it is better to rename that into rtx_equiv_p then.
What other function names?
Post by Bernd SchmidtShould mention return value in comment.
"Return true fur success." ?
Post by Bernd SchmidtThe rest of the description could also be enlarged to describe the
necessary details of the algorithm to understand why we need this
function.
Obviously we could hand-inline this function, so I suppose you mean why
we need to process set/clobber destinations separately from input
values. This is really something struct_equiv_block_eq does, so should
I put it into the pre-function comment there?
This function scans backwards through the insns. In doing so, it has
to process first all the
SET_DESTs, then the SET_SRCes, then the REG_NOTES, in order to keep
the register
lifeness information consistent.
Post by Bernd Schmidt+ while (info->local_count > p->local_count)
+ {
+ info->local_count--;
+ info->version--;
+ if (REGNO_REG_SET_P (info->x_local_live,
+ REGNO (info->x_local[info->local_count])))
+ {
+ assign_reg_reg_set (info->x_local_live,
+ info->x_local[info->local_count], 0);
+ assign_reg_reg_set (info->y_local_live,
+ info->y_local[info->local_count], 0);
+ info->version--;
+ }
+ }
I'm not sure I understand what's going on here. This goes back and
gets reg numbers out of x_local and y_local, indexed by local_count.
For this to work, these arrays must always (except for in this
function) grow and never shrink - correct?
But there's code after the comment
/* This might be a register local to each block. See if
we have
it already registerd. */
which seems to shrink the arrays and overwrite entries. How does this
work together?
After the old entry is deleted, a new one is added, and the version
number increased. Thus, for this register enlargement, there will be a
version increment, but no net change of info->cur.local_count. Hence,
the while loop will fail to restore info.cur.version to p->version, and
info->need_rerun will be set. I.e. the register liveness information is
known
to be flawed, and another pass will be required if the partial match so
far otherwise looks promising.
Post by Bernd Schmidt+struct_equiv_death (rtx i1 ATTRIBUTE_UNUSED, rtx i2
ATTRIBUTE_UNUSED,
+ struct equiv_info *info ATTRIBUTE_UNUSED)
"death_notes_match_p"?
Ok.
Post by Bernd Schmidt+{
+#ifdef STACK_REGS
+ /* If cross_jump_death_matters is not 0, the insn's mode
+ indicates whether or not the insn contains any stack-like
regs. */
+
+ if (INSN_P (i1)
+ && (info->mode & CLEANUP_POST_REGSTACK) &&
stack_regs_mentioned (i1))
Is this bit always set after regstack, or do we need an extra test
such as reload_completed?
AFAICS, there is no other pass that uses cleanup_cfg after regstack.
FWIW, this code is
stems basically from the old cross-jumping code.
Post by Bernd Schmidt+/* Return number of matched insns. */
Matched how? I'd like a human-readable overview of the algorithm somewhere.
If it's gonna have its own file, we might as well put a blurb at the top.
/* Try to match two basic blocks - or their ends - for structural
equivalence.
We scan the blocks from their ends backwards. When we see mismatching
registers, we try to figure out if these could be registers local to
these (partial)
blocks, or input values.
The main entry point to this file is struct_equiv_block_eq. This
function uses a
struct equiv_info to accept some of its inputs, to keep track of its
internal state, to pass down
to its helper functions, and to communicate some of the results back
to the caller.
struct_equiv_block_eq has to be called first with the
STRUCT_EQUIV_START bit set in the
mode parameter. This will calculate the number of matched insns and
the number and types of
inputs. If the need_rerun field is set, the results are only
tentative, and the caller has to call
again with STRUCT_EQUIV_RERUN till need_rerun is false in order to
get a reliable match.
To install the changes necessary for the match, the function has to
be called again with
STRUCT_EQUIV_FINAL.
*/
Post by Bernd Schmidt+ if (info->x_start != x_stop) for (;;)
Formatting.
IIRC there were 80-char-per-line problems with adding two indentation
levels there. Would you
consider:
bool loop;
...
for (loop = info->x_start != x_stop; loop; loop = true)
Post by Bernd Schmidt+ /* The following code helps take care of G++ cleanups. */
Comment not helpful. How does it help? What's special about G++ cleanups?
I'm afraid that I have no idea. That stuff comes from the old
cross-jumping code too,
added by Jan Hubicka on the 11th July 2001 to flow.c , moved on the 10th
September 2001
to cfgcleanup.c, later broken out into its own function, and reformatted
a number of times.