Discussion:
RFC: LRA for x86/x86-64 [7/9] -- continuation
(too old to reply)
Vladimir Makarov
2012-09-27 22:59:15 UTC
Permalink
It is the 2nd part of the previous patch.
Richard Sandiford
2012-10-03 11:11:59 UTC
Permalink
Hi Vlad,

Some comments on lra-spills.c and lra-coalesce.c.
+ The pass creates necessary stack slots and assign spilled pseudos
s/assign/assigns/
+ (or insn memory constraints) might be not satisfied any more.
s/might be not/might not be/
+ For some targets, the pass can spill some pseudos into hard
+ registers of different class (usually into vector registers)
+ instead of spilling them into memory if it is possible and
+ profitable. Spilling GENERAL_REGS pseudo into SSE registers for
+ modern Intel x86/x86-64 processors is an example of such
+ optimization. And this is actually recommended by Intel
+ optimization guide.
Maybe mention core i7 specifically? "Modern" is a bit dangerous
in code that'll live a long time.
+/* The structure describes a stack slot which can be used for several
+ spilled pseudos. */
+struct slot
+{
Looks like this describes "a register or stack slot" given the hard_regno case.
+/* Array containing info about the stack slots. The array element is
+ indexed by the stack slot number in the range [0..slost_num). */
Typo: slots_num
+ /* Each pseudo has an inherent size which comes from its own mode,
+ and a total size which provides room for paradoxical subregs
+ which refer to the pseudo reg in wider modes.
+
+ We can use a slot already allocated if it provides both enough
+ inherent space and enough total space. Otherwise, we allocate a
+ new slot, making sure that it has no less inherent space, and no
+ less total space, then the previous slot. */
The second part of the comment seems a bit misplaced, since the following
code doesn't reuse stack slots. This is done elsewhere instead.
Maybe the first part would be better above the inherent_size assignment.
+ /* If we have any adjustment to make, or if the stack slot is the
+ wrong mode, make a new stack slot. */
+ x = adjust_address_nv (x, GET_MODE (regno_reg_rtx[i]), adjust);
We don't make a new slot here.
+/* Sort pseudos according their slot numbers putting ones with smaller
+ numbers first, or last when the frame pointer is not needed. So
+ pseudos with the first slot will be finally addressed with smaller
+ address displacement. */
+static int
+pseudo_reg_slot_compare (const void *v1p, const void *v2p)
+{
+ const int regno1 = *(const int *) v1p;
+ const int regno2 = *(const int *) v2p;
+ int diff, slot_num1, slot_num2;
+ int total_size1, total_size2;
+
+ slot_num1 = pseudo_slots[regno1].slot_num;
+ slot_num2 = pseudo_slots[regno2].slot_num;
+ if ((diff = slot_num1 - slot_num2) != 0)
+ return (frame_pointer_needed
+ || !FRAME_GROWS_DOWNWARD == STACK_GROWS_DOWNWARD ? diff : -diff);
The comment doesn't quite describe the condition. Maybe:

/* Sort pseudos according to their slots, putting the slots in the order
that they should be allocated. Slots with lower numbers have the highest
priority and should get the smallest displacement from the stack or
frame pointer (whichever is being used).

The first allocated slot is always closest to the frame pointer,
so prefer lower slot numbers when frame_pointer_needed. If the stack
and frame grow in the same direction, then the first allocated slot is
always closest to the initial stack pointer and furthest away from the
final stack pointer, so allocate higher numbers first when using the
stack pointer in that case. The reverse is true if the stack and
frame grow in opposite directions. */
+ total_size1 = MAX (PSEUDO_REGNO_BYTES (regno1),
+ GET_MODE_SIZE (lra_reg_info[regno1].biggest_mode));
+ total_size2 = MAX (PSEUDO_REGNO_BYTES (regno2),
+ GET_MODE_SIZE (lra_reg_info[regno2].biggest_mode));
+ if ((diff = total_size2 - total_size1) != 0)
+ return diff;
I think this could do with a bit more commentary. When is biggest_mode
ever smaller than PSEUDO_REGNO_BYTES? Is that for pseudos that are only
ever referenced as lowpart subregs? If so, why does PSEUDO_REGNO_BYTES
matter for those registers here but not when calculating biggest_mode?
+/* Assign spill hard registers to N pseudos in PSEUDO_REGNOS. Put the
+ pseudos which did not get a spill hard register at the beginning of
+ array PSEUDO_REGNOS. Return the number of such pseudos. */
It'd be worth saying that PSEUDO_REGNOS is sorted in order of highest
frequency first.
+ bitmap set_jump_crosses = regstat_get_setjmp_crosses ();
I notice you use "set_jump" here and "setjump" in parts of 7a.patch.
Probably better to use setjmp across the board.
+ /* Hard registers which can not be used for any purpose at given
+ program point because they are unallocatable or already allocated
+ for other pseudos. */
+ HARD_REG_SET *reserved_hard_regs;
+
+ if (! lra_reg_spill_p)
+ return n;
+ /* Set up reserved hard regs for every program point. */
+ reserved_hard_regs = (HARD_REG_SET *) xmalloc (sizeof (HARD_REG_SET)
+ * lra_live_max_point);
+ for (p = 0; p < lra_live_max_point; p++)
+ COPY_HARD_REG_SET (reserved_hard_regs[p], lra_no_alloc_regs);
+ for (i = FIRST_PSEUDO_REGISTER; i < regs_num; i++)
+ if (lra_reg_info[i].nrefs != 0
+ && (hard_regno = lra_get_regno_hard_regno (i)) >= 0)
+ for (r = lra_reg_info[i].live_ranges; r != NULL; r = r->next)
+ for (p = r->start; p <= r->finish; p++)
+ lra_add_hard_reg_set (hard_regno, lra_reg_info[i].biggest_mode,
+ &reserved_hard_regs[p]);
Since compilation time seems to be all the rage, I wonder if it would be
+ for (r = lra_reg_info[regno].live_ranges; r != NULL; r = r->next)
+ for (p = r->start; p <= r->finish; p++)
+ IOR_HARD_REG_SET (conflict_hard_regs, reserved_hard_regs[p]);
+ /* Update reserved_hard_regs. */
+ for (r = lra_reg_info[regno].live_ranges; r != NULL; r = r->next)
+ for (p = r->start; p <= r->finish; p++)
+ lra_add_hard_reg_set (hard_regno, lra_reg_info[regno].biggest_mode,
+ &reserved_hard_regs[p]);
would again be a merge.

Just an idea, not a merge requirement. If you've already tried this and
found it to be worse, that might be worth a comment.
+ first = pseudo_slots[regno].first = &pseudo_slots[slots[slot_num].regno];
+ pseudo_slots[regno].next = pseudo_slots[slots[slot_num].regno].next;
+ first->next = &pseudo_slots[regno];
Very minor nit, but I think this would be easier to read if the middle
line also used "first->next".
+/* Assign spill hard registers to N pseudos in PSEUDO_REGNOS. Put the
+ pseudos which did not get a spill hard register at the beginning of
+ array PSEUDO_REGNOS. Return the number of such pseudos. */
Here too I think it's worth mentioning that PSEUDO_REGNOS is sorted
with highest frequency first.
+/* Recursively process LOC in INSN and change spilled pseudos to the
+ corresponding memory or spilled hard reg. Ignore spilled pseudos
+ created from the scratches. */
+static bool
+remove_pseudos (rtx *loc, rtx insn)
The return value is now ignored -- we know in advance which insns need
changing -- so this could be simplified.
+/* Change spilled pseudos into memory or spill hard regs. The
+ function put changed insns on the constraint stack (these insns
+ will be considered on the next constraint pass). The changed insns
+ are all insns in which pseudos were changed. */
s/The function put/Put/
+/* Set up REMOVED_PSEUDOS_BITMAP and USED_PSEUDOS_BITMAP, and update
+ LR_BITMAP (a BB live info bitmap). */
+static void
+update_live_info (bitmap lr_bitmap)
+{
+ unsigned int j;
+ bitmap_iterator bi;
+
+ bitmap_clear (&removed_pseudos_bitmap);
+ bitmap_clear (&used_pseudos_bitmap);
+ EXECUTE_IF_AND_IN_BITMAP (&coalesced_pseudos_bitmap, lr_bitmap,
+ FIRST_PSEUDO_REGISTER, j, bi)
+ {
+ bitmap_set_bit (&removed_pseudos_bitmap, j);
+ bitmap_set_bit (&used_pseudos_bitmap, first_coalesced_pseudo[j]);
+ }
+ if (! bitmap_empty_p (&removed_pseudos_bitmap))
+ {
+ bitmap_and_compl_into (lr_bitmap, &removed_pseudos_bitmap);
+ bitmap_ior_into (lr_bitmap, &used_pseudos_bitmap);
+ }
+}
Might be wrong, but it looks like nothing really uses removed_pseudos_bitmap
outside this function. I think this could simply be:

/* Set up REMOVED_PSEUDOS_BITMAP and USED_PSEUDOS_BITMAP, and update
LR_BITMAP (a BB live info bitmap). */
static void
update_live_info (bitmap lr_bitmap)
{
unsigned int j;
bitmap_iterator bi;

bitmap_clear (&used_pseudos_bitmap);
EXECUTE_IF_AND_IN_BITMAP (&coalesced_pseudos_bitmap, lr_bitmap,
FIRST_PSEUDO_REGISTER, j, bi)
bitmap_set_bit (&used_pseudos_bitmap, first_coalesced_pseudo[j]);
if (! bitmap_empty_p (&used_pseudos_bitmap))
{
bitmap_and_compl_into (lr_bitmap, &coalesced_pseudos_bitmap);
bitmap_ior_into (lr_bitmap, &used_pseudos_bitmap);
}
}
+ && mem_move_p (sregno, dregno)
+ /* Don't coalesce inheritance pseudos because spilled
+ inheritance pseudos will be removed in subsequent 'undo
+ inheritance' pass. */
+ && lra_reg_info[sregno].restore_regno < 0
+ && lra_reg_info[dregno].restore_regno < 0
+ /* We undo splits for spilled pseudos whose live ranges
+ were split. So don't coalesce them, it is not
+ necessary and the undo transformations would be
+ wrong. */
+ && ! bitmap_bit_p (&split_origin_bitmap, sregno)
+ && ! bitmap_bit_p (&split_origin_bitmap, dregno)
+ && ! side_effects_p (set)
+ /* Don't coalesces bound pseudos. Bound pseudos has own
+ rules for finding live ranges. It is hard to maintain
+ this info with coalescing and it is not worth to do
+ it. */
+ && ! bitmap_bit_p (&lra_bound_pseudos, sregno)
+ && ! bitmap_bit_p (&lra_bound_pseudos, dregno)
+ /* We don't want to coalesce regnos with equivalences,
+ at least without updating this info. */
+ && ira_reg_equiv[sregno].constant == NULL_RTX
+ && ira_reg_equiv[sregno].memory == NULL_RTX
+ && ira_reg_equiv[sregno].invariant == NULL_RTX
+ && ira_reg_equiv[dregno].constant == NULL_RTX
+ && ira_reg_equiv[dregno].memory == NULL_RTX
+ && ira_reg_equiv[dregno].invariant == NULL_RTX
Probably personal preference, but I think this would be easier
to read as:

&& coalescable_reg_p (sregno)
&& coalescable_reg_p (dregno)
&& !side_effects_p (set)

with coalescable_reg_p checking reg_renumber (from mem_move_p)
and the open-coded stuff in the quote above.
+ for (; mv_num != 0;)
+ {
+ for (i = 0; i < mv_num; i++)
+ {
+ mv = sorted_moves[i];
+ set = single_set (mv);
+ lra_assert (set != NULL && REG_P (SET_SRC (set))
+ && REG_P (SET_DEST (set)));
+ sregno = REGNO (SET_SRC (set));
+ dregno = REGNO (SET_DEST (set));
+ if (! lra_intersected_live_ranges_p
+ (lra_reg_info[first_coalesced_pseudo[sregno]].live_ranges,
+ lra_reg_info[first_coalesced_pseudo[dregno]].live_ranges))
+ {
+ coalesced_moves++;
+ if (lra_dump_file != NULL)
+ fprintf
+ (lra_dump_file,
+ " Coalescing move %i:r%d(%d)-r%d(%d) (freq=%d)\n",
+ INSN_UID (mv), sregno, ORIGINAL_REGNO (SET_SRC (set)),
+ dregno, ORIGINAL_REGNO (SET_DEST (set)),
+ BLOCK_FOR_INSN (mv)->frequency);
+ bitmap_ior_into (&involved_insns_bitmap,
+ &lra_reg_info[sregno].insn_bitmap);
+ bitmap_ior_into (&involved_insns_bitmap,
+ &lra_reg_info[dregno].insn_bitmap);
+ merge_pseudos (sregno, dregno);
+ i++;
+ break;
+ }
+ }
+ /* Collect the rest of copies. */
+ for (n = 0; i < mv_num; i++)
+ {
+ mv = sorted_moves[i];
+ set = single_set (mv);
+ lra_assert (set != NULL && REG_P (SET_SRC (set))
+ && REG_P (SET_DEST (set)));
+ sregno = REGNO (SET_SRC (set));
+ dregno = REGNO (SET_DEST (set));
+ if (first_coalesced_pseudo[sregno] != first_coalesced_pseudo[dregno])
+ sorted_moves[n++] = mv;
+ else if (lra_dump_file != NULL)
+ {
+ coalesced_moves++;
+ fprintf
+ (lra_dump_file, " Coalescing move %i:r%d-r%d (freq=%d)\n",
+ INSN_UID (mv), sregno, dregno,
+ BLOCK_FOR_INSN (mv)->frequency);
+ }
+ }
+ mv_num = n;
I'm probably being dense here, sorry, but why the nested loops?
Why can't we have one loop along the lines of:

for (i = 0; i < mv_num; i++)
{
mv = sorted_moves[i];
set = single_set (mv);
lra_assert (set != NULL && REG_P (SET_SRC (set))
&& REG_P (SET_DEST (set)));
sregno = REGNO (SET_SRC (set));
dregno = REGNO (SET_DEST (set));
if (first_coalesced_pseudo[sregno] == first_coalesced_pseudo[dregno])
{
coalesced_moves++;
fprintf
(lra_dump_file, " Coalescing move %i:r%d-r%d (freq=%d)\n",
INSN_UID (mv), sregno, dregno,
BLOCK_FOR_INSN (mv)->frequency);
/* We updated involved_insns_bitmap when doing the mrege */
}
else if (!(lra_intersected_live_ranges_p
(lra_reg_info[first_coalesced_pseudo[sregno]].live_ranges,
lra_reg_info[first_coalesced_pseudo[dregno]].live_ranges)))
{
coalesced_moves++;
if (lra_dump_file != NULL)
fprintf
(lra_dump_file,
" Coalescing move %i:r%d(%d)-r%d(%d) (freq=%d)\n",
INSN_UID (mv), sregno, ORIGINAL_REGNO (SET_SRC (set)),
dregno, ORIGINAL_REGNO (SET_DEST (set)),
BLOCK_FOR_INSN (mv)->frequency);
bitmap_ior_into (&involved_insns_bitmap,
&lra_reg_info[sregno].insn_bitmap);
bitmap_ior_into (&involved_insns_bitmap,
&lra_reg_info[dregno].insn_bitmap);
merge_pseudos (sregno, dregno);
}
}

(completely untested)
+ if ((set = single_set (insn)) != NULL_RTX
+ && REG_P (SET_DEST (set)) && REG_P (SET_SRC (set))
+ && REGNO (SET_SRC (set)) == REGNO (SET_DEST (set))
+ && ! side_effects_p (set))
Maybe use set_noop_p here?

Richard
Vladimir Makarov
2012-10-11 00:41:59 UTC
Permalink
Post by Richard Sandiford
Hi Vlad,
Some comments on lra-spills.c and lra-coalesce.c.
+ The pass creates necessary stack slots and assign spilled pseudos
s/assign/assigns/
Fixed.
Post by Richard Sandiford
+ (or insn memory constraints) might be not satisfied any more.
s/might be not/might not be/
Fixed.
Post by Richard Sandiford
+ For some targets, the pass can spill some pseudos into hard
+ registers of different class (usually into vector registers)
+ instead of spilling them into memory if it is possible and
+ profitable. Spilling GENERAL_REGS pseudo into SSE registers for
+ modern Intel x86/x86-64 processors is an example of such
+ optimization. And this is actually recommended by Intel
+ optimization guide.
Maybe mention core i7 specifically? "Modern" is a bit dangerous
in code that'll live a long time.
Yes, right. Fixed. Another bad thing would be an usage of word new.
Post by Richard Sandiford
+/* The structure describes a stack slot which can be used for several
+ spilled pseudos. */
+struct slot
+{
Looks like this describes "a register or stack slot" given the hard_regno case.
Fixed
Post by Richard Sandiford
+/* Array containing info about the stack slots. The array element is
+ indexed by the stack slot number in the range [0..slost_num). */
Typo: slots_num
Fixed.
Post by Richard Sandiford
+ /* Each pseudo has an inherent size which comes from its own mode,
+ and a total size which provides room for paradoxical subregs
+ which refer to the pseudo reg in wider modes.
+
+ We can use a slot already allocated if it provides both enough
+ inherent space and enough total space. Otherwise, we allocate a
+ new slot, making sure that it has no less inherent space, and no
+ less total space, then the previous slot. */
The second part of the comment seems a bit misplaced, since the following
code doesn't reuse stack slots. This is done elsewhere instead.
Maybe the first part would be better above the inherent_size assignment.
Right. I've changed comment to reflect the current state of the code.
Post by Richard Sandiford
+ /* If we have any adjustment to make, or if the stack slot is the
+ wrong mode, make a new stack slot. */
+ x = adjust_address_nv (x, GET_MODE (regno_reg_rtx[i]), adjust);
We don't make a new slot here.
I removed the comment. The same comment is present in reload1.c and
probably should be also removed.
Post by Richard Sandiford
+/* Sort pseudos according their slot numbers putting ones with smaller
+ numbers first, or last when the frame pointer is not needed. So
+ pseudos with the first slot will be finally addressed with smaller
+ address displacement. */
+static int
+pseudo_reg_slot_compare (const void *v1p, const void *v2p)
+{
+ const int regno1 = *(const int *) v1p;
+ const int regno2 = *(const int *) v2p;
+ int diff, slot_num1, slot_num2;
+ int total_size1, total_size2;
+
+ slot_num1 = pseudo_slots[regno1].slot_num;
+ slot_num2 = pseudo_slots[regno2].slot_num;
+ if ((diff = slot_num1 - slot_num2) != 0)
+ return (frame_pointer_needed
+ || !FRAME_GROWS_DOWNWARD == STACK_GROWS_DOWNWARD ? diff : -diff);
/* Sort pseudos according to their slots, putting the slots in the order
that they should be allocated. Slots with lower numbers have the highest
priority and should get the smallest displacement from the stack or
frame pointer (whichever is being used).
The first allocated slot is always closest to the frame pointer,
so prefer lower slot numbers when frame_pointer_needed. If the stack
and frame grow in the same direction, then the first allocated slot is
always closest to the initial stack pointer and furthest away from the
final stack pointer, so allocate higher numbers first when using the
stack pointer in that case. The reverse is true if the stack and
frame grow in opposite directions. */
I used your comment. Thanks.
Post by Richard Sandiford
+ total_size1 = MAX (PSEUDO_REGNO_BYTES (regno1),
+ GET_MODE_SIZE (lra_reg_info[regno1].biggest_mode));
+ total_size2 = MAX (PSEUDO_REGNO_BYTES (regno2),
+ GET_MODE_SIZE (lra_reg_info[regno2].biggest_mode));
+ if ((diff = total_size2 - total_size1) != 0)
+ return diff;
I think this could do with a bit more commentary. When is biggest_mode
ever smaller than PSEUDO_REGNO_BYTES? Is that for pseudos that are only
ever referenced as lowpart subregs? If so, why does PSEUDO_REGNO_BYTES
matter for those registers here but not when calculating biggest_mode?
The MAX code has no sense to me too (probably it was wrongly adapted
from somewhere). So I removed MAX.
Post by Richard Sandiford
+/* Assign spill hard registers to N pseudos in PSEUDO_REGNOS. Put the
+ pseudos which did not get a spill hard register at the beginning of
+ array PSEUDO_REGNOS. Return the number of such pseudos. */
It'd be worth saying that PSEUDO_REGNOS is sorted in order of highest
frequency first.
Fixed.
Post by Richard Sandiford
+ bitmap set_jump_crosses = regstat_get_setjmp_crosses ();
I notice you use "set_jump" here and "setjump" in parts of 7a.patch.
Probably better to use setjmp across the board.
Fixed. setjump is also used in other parts of GCC.
Post by Richard Sandiford
+ /* Hard registers which can not be used for any purpose at given
+ program point because they are unallocatable or already allocated
+ for other pseudos. */
+ HARD_REG_SET *reserved_hard_regs;
+
+ if (! lra_reg_spill_p)
+ return n;
+ /* Set up reserved hard regs for every program point. */
+ reserved_hard_regs = (HARD_REG_SET *) xmalloc (sizeof (HARD_REG_SET)
+ * lra_live_max_point);
+ for (p = 0; p < lra_live_max_point; p++)
+ COPY_HARD_REG_SET (reserved_hard_regs[p], lra_no_alloc_regs);
+ for (i = FIRST_PSEUDO_REGISTER; i < regs_num; i++)
+ if (lra_reg_info[i].nrefs != 0
+ && (hard_regno = lra_get_regno_hard_regno (i)) >= 0)
+ for (r = lra_reg_info[i].live_ranges; r != NULL; r = r->next)
+ for (p = r->start; p <= r->finish; p++)
+ lra_add_hard_reg_set (hard_regno, lra_reg_info[i].biggest_mode,
+ &reserved_hard_regs[p]);
Since compilation time seems to be all the rage, I wonder if it would be
+ for (r = lra_reg_info[regno].live_ranges; r != NULL; r = r->next)
+ for (p = r->start; p <= r->finish; p++)
+ IOR_HARD_REG_SET (conflict_hard_regs, reserved_hard_regs[p]);
+ /* Update reserved_hard_regs. */
+ for (r = lra_reg_info[regno].live_ranges; r != NULL; r = r->next)
+ for (p = r->start; p <= r->finish; p++)
+ lra_add_hard_reg_set (hard_regno, lra_reg_info[regno].biggest_mode,
+ &reserved_hard_regs[p]);
would again be a merge.
Just an idea, not a merge requirement. If you've already tried this and
found it to be worse, that might be worth a comment.
I checked profiles and coverage for different tests (including huge
ones) and did not see this code is critical. But probably it is worth
to try. It might a bit more complicated for multi-register pseudos.
I've just used the same pattern as in IRA fast allocation. On the other
hand, stack slots are allocated as you propose. It might be good to
unify the code. I'll put it on my (long) todo list.
Post by Richard Sandiford
+ first = pseudo_slots[regno].first = &pseudo_slots[slots[slot_num].regno];
+ pseudo_slots[regno].next = pseudo_slots[slots[slot_num].regno].next;
+ first->next = &pseudo_slots[regno];
Very minor nit, but I think this would be easier to read if the middle
line also used "first->next".
Fixed.
Post by Richard Sandiford
+/* Assign spill hard registers to N pseudos in PSEUDO_REGNOS. Put the
+ pseudos which did not get a spill hard register at the beginning of
+ array PSEUDO_REGNOS. Return the number of such pseudos. */
Here too I think it's worth mentioning that PSEUDO_REGNOS is sorted
with highest frequency first.
Fixed.
Post by Richard Sandiford
+/* Recursively process LOC in INSN and change spilled pseudos to the
+ corresponding memory or spilled hard reg. Ignore spilled pseudos
+ created from the scratches. */
+static bool
+remove_pseudos (rtx *loc, rtx insn)
The return value is now ignored -- we know in advance which insns need
changing -- so this could be simplified.
Fixed.
Post by Richard Sandiford
+/* Change spilled pseudos into memory or spill hard regs. The
+ function put changed insns on the constraint stack (these insns
+ will be considered on the next constraint pass). The changed insns
+ are all insns in which pseudos were changed. */
s/The function put/Put/
Fixed
Post by Richard Sandiford
+/* Set up REMOVED_PSEUDOS_BITMAP and USED_PSEUDOS_BITMAP, and update
+ LR_BITMAP (a BB live info bitmap). */
+static void
+update_live_info (bitmap lr_bitmap)
+{
+ unsigned int j;
+ bitmap_iterator bi;
+
+ bitmap_clear (&removed_pseudos_bitmap);
+ bitmap_clear (&used_pseudos_bitmap);
+ EXECUTE_IF_AND_IN_BITMAP (&coalesced_pseudos_bitmap, lr_bitmap,
+ FIRST_PSEUDO_REGISTER, j, bi)
+ {
+ bitmap_set_bit (&removed_pseudos_bitmap, j);
+ bitmap_set_bit (&used_pseudos_bitmap, first_coalesced_pseudo[j]);
+ }
+ if (! bitmap_empty_p (&removed_pseudos_bitmap))
+ {
+ bitmap_and_compl_into (lr_bitmap, &removed_pseudos_bitmap);
+ bitmap_ior_into (lr_bitmap, &used_pseudos_bitmap);
+ }
+}
Might be wrong, but it looks like nothing really uses removed_pseudos_bitmap
/* Set up REMOVED_PSEUDOS_BITMAP and USED_PSEUDOS_BITMAP, and update
LR_BITMAP (a BB live info bitmap). */
static void
update_live_info (bitmap lr_bitmap)
{
unsigned int j;
bitmap_iterator bi;
bitmap_clear (&used_pseudos_bitmap);
EXECUTE_IF_AND_IN_BITMAP (&coalesced_pseudos_bitmap, lr_bitmap,
FIRST_PSEUDO_REGISTER, j, bi)
bitmap_set_bit (&used_pseudos_bitmap, first_coalesced_pseudo[j]);
if (! bitmap_empty_p (&used_pseudos_bitmap))
{
bitmap_and_compl_into (lr_bitmap, &coalesced_pseudos_bitmap);
bitmap_ior_into (lr_bitmap, &used_pseudos_bitmap);
}
}
Yes. Thanks for finding such nontrivial change. I fixed it.
Post by Richard Sandiford
+ && mem_move_p (sregno, dregno)
+ /* Don't coalesce inheritance pseudos because spilled
+ inheritance pseudos will be removed in subsequent 'undo
+ inheritance' pass. */
+ && lra_reg_info[sregno].restore_regno < 0
+ && lra_reg_info[dregno].restore_regno < 0
+ /* We undo splits for spilled pseudos whose live ranges
+ were split. So don't coalesce them, it is not
+ necessary and the undo transformations would be
+ wrong. */
+ && ! bitmap_bit_p (&split_origin_bitmap, sregno)
+ && ! bitmap_bit_p (&split_origin_bitmap, dregno)
+ && ! side_effects_p (set)
+ /* Don't coalesces bound pseudos. Bound pseudos has own
+ rules for finding live ranges. It is hard to maintain
+ this info with coalescing and it is not worth to do
+ it. */
+ && ! bitmap_bit_p (&lra_bound_pseudos, sregno)
+ && ! bitmap_bit_p (&lra_bound_pseudos, dregno)
+ /* We don't want to coalesce regnos with equivalences,
+ at least without updating this info. */
+ && ira_reg_equiv[sregno].constant == NULL_RTX
+ && ira_reg_equiv[sregno].memory == NULL_RTX
+ && ira_reg_equiv[sregno].invariant == NULL_RTX
+ && ira_reg_equiv[dregno].constant == NULL_RTX
+ && ira_reg_equiv[dregno].memory == NULL_RTX
+ && ira_reg_equiv[dregno].invariant == NULL_RTX
Probably personal preference, but I think this would be easier
&& coalescable_reg_p (sregno)
&& coalescable_reg_p (dregno)
&& !side_effects_p (set)
with coalescable_reg_p checking reg_renumber (from mem_move_p)
and the open-coded stuff in the quote above.
Ok. Fixed.
Post by Richard Sandiford
+ for (; mv_num != 0;)
+ {
+ for (i = 0; i < mv_num; i++)
+ {
+ mv = sorted_moves[i];
+ set = single_set (mv);
+ lra_assert (set != NULL && REG_P (SET_SRC (set))
+ && REG_P (SET_DEST (set)));
+ sregno = REGNO (SET_SRC (set));
+ dregno = REGNO (SET_DEST (set));
+ if (! lra_intersected_live_ranges_p
+ (lra_reg_info[first_coalesced_pseudo[sregno]].live_ranges,
+ lra_reg_info[first_coalesced_pseudo[dregno]].live_ranges))
+ {
+ coalesced_moves++;
+ if (lra_dump_file != NULL)
+ fprintf
+ (lra_dump_file,
+ " Coalescing move %i:r%d(%d)-r%d(%d) (freq=%d)\n",
+ INSN_UID (mv), sregno, ORIGINAL_REGNO (SET_SRC (set)),
+ dregno, ORIGINAL_REGNO (SET_DEST (set)),
+ BLOCK_FOR_INSN (mv)->frequency);
+ bitmap_ior_into (&involved_insns_bitmap,
+ &lra_reg_info[sregno].insn_bitmap);
+ bitmap_ior_into (&involved_insns_bitmap,
+ &lra_reg_info[dregno].insn_bitmap);
+ merge_pseudos (sregno, dregno);
+ i++;
+ break;
+ }
+ }
+ /* Collect the rest of copies. */
+ for (n = 0; i < mv_num; i++)
+ {
+ mv = sorted_moves[i];
+ set = single_set (mv);
+ lra_assert (set != NULL && REG_P (SET_SRC (set))
+ && REG_P (SET_DEST (set)));
+ sregno = REGNO (SET_SRC (set));
+ dregno = REGNO (SET_DEST (set));
+ if (first_coalesced_pseudo[sregno] != first_coalesced_pseudo[dregno])
+ sorted_moves[n++] = mv;
+ else if (lra_dump_file != NULL)
+ {
+ coalesced_moves++;
+ fprintf
+ (lra_dump_file, " Coalescing move %i:r%d-r%d (freq=%d)\n",
+ INSN_UID (mv), sregno, dregno,
+ BLOCK_FOR_INSN (mv)->frequency);
+ }
+ }
+ mv_num = n;
I'm probably being dense here, sorry, but why the nested loops?
for (i = 0; i < mv_num; i++)
{
mv = sorted_moves[i];
set = single_set (mv);
lra_assert (set != NULL && REG_P (SET_SRC (set))
&& REG_P (SET_DEST (set)));
sregno = REGNO (SET_SRC (set));
dregno = REGNO (SET_DEST (set));
if (first_coalesced_pseudo[sregno] == first_coalesced_pseudo[dregno])
{
coalesced_moves++;
fprintf
(lra_dump_file, " Coalescing move %i:r%d-r%d (freq=%d)\n",
INSN_UID (mv), sregno, dregno,
BLOCK_FOR_INSN (mv)->frequency);
/* We updated involved_insns_bitmap when doing the mrege */
}
else if (!(lra_intersected_live_ranges_p
(lra_reg_info[first_coalesced_pseudo[sregno]].live_ranges,
lra_reg_info[first_coalesced_pseudo[dregno]].live_ranges)))
{
coalesced_moves++;
if (lra_dump_file != NULL)
fprintf
(lra_dump_file,
" Coalescing move %i:r%d(%d)-r%d(%d) (freq=%d)\n",
INSN_UID (mv), sregno, ORIGINAL_REGNO (SET_SRC (set)),
dregno, ORIGINAL_REGNO (SET_DEST (set)),
BLOCK_FOR_INSN (mv)->frequency);
bitmap_ior_into (&involved_insns_bitmap,
&lra_reg_info[sregno].insn_bitmap);
bitmap_ior_into (&involved_insns_bitmap,
&lra_reg_info[dregno].insn_bitmap);
merge_pseudos (sregno, dregno);
}
}
(completely untested)
As I remember, it was more complicated coalesced algorithm where sorting
was done on each iteration after one move coalesce.

I changed the code.
Post by Richard Sandiford
+ if ((set = single_set (insn)) != NULL_RTX
+ && REG_P (SET_DEST (set)) && REG_P (SET_SRC (set))
+ && REGNO (SET_SRC (set)) == REGNO (SET_DEST (set))
+ && ! side_effects_p (set))
Maybe use set_noop_p here?
Ok. Why not. The code is rarely executed so more generalize code could
be used. I changed it to set_noop_p.
Richard Sandiford
2012-10-04 15:50:58 UTC
Permalink
Hi Vlad,

This message is for lra-assigns.c. Sorry for the piecemeal reviews,
never sure when I'll get time...
+/* This file contains a pass mostly assigning hard registers to reload
+ pseudos. There is no any RTL code transformation on this pass.
Maybe:

/* This file's main objective is to assign hard registers to reload pseudos.
It also tries to allocate hard registers to other pseudos, but at a lower
priority than the reload pseudos. The pass does not transform the RTL.

if that's accurate.
+ Reload pseudos get what they need (usually) hard registers in
+ anyway possibly by spilling non-reload pseudos and by assignment
+ reload pseudos with smallest number of available hard registers
+ first.
+
+ If reload pseudos can get hard registers only through spilling
+ other pseudos, we choose what pseudos to spill taking into account
+ how given reload pseudo benefits and also how other reload pseudos
+ not assigned yet benefit too (see function spill_for).
Maybe:

We must allocate a hard register to every reload pseudo. We try to
increase the chances of finding a viable allocation by assigning the
pseudos in order of fewest available hard registers first. If we
still fail to find a hard register, we spill other (non-reload)
pseudos in order to make room.

assign_hard_regno_for allocates registers without spilling.
spill_for does the same with spilling. Both functions use
a cost model to determine the most profitable choice of
hard and spill registers.
+ Non-reload pseudos can get hard registers too if it is possible and
+ improves the code. It might be possible because of spilling
+ non-reload pseudos on given pass.
Maybe:

Once we have finished allocating reload pseudos, we also try to
assign registers to other (non-reload) pseudos. This is useful
if hard registers were freed up by the spilling just described.
+ We try to assign hard registers processing pseudos by threads. The
+ thread contains reload and inheritance pseudos connected by copies
+ (move insns). It improves the chance to get the same hard register
+ to pseudos in the thread and, as the result, to remove some move
+ insns.
Maybe:

We try to assign hard registers by collecting pseudos into threads.
These threads contain reload and inheritance pseudos that are connected
by copies (move insns). Doing this improves the chances of pseudos
in the thread getting the same hard register and, as a result,
of allowing some move insns to be deleted.
+ When we assign hard register to a pseudo, we decrease the cost of
+ the hard registers for corresponding pseudos connected by copies.
Maybe:

When we assign a hard register to a pseudo, we decrease the cost of
using the same hard register for pseudos that are connected by copies.
+ If two hard registers are equally good for assigning the pseudo
+ with hard register cost point of view, we prefer a hard register in
+ smaller register bank. By default, there is only one register
+ bank. A target can define register banks by hook
+ register_bank. For example, x86-64 has a few register banks: hard
+ regs with and without REX prefixes are in different banks. It
+ permits to generate smaller code as insns without REX prefix are
+ shorter.
Maybe:

If two hard registers have the same frequency-derived cost,
we prefer hard registers in lower register banks. The mapping
of registers to banks is controlled by the register_bank target hook.
For example, x86-64 has a few register banks: hard registers with and
without REX prefixes are in different banks. This permits us
to generate smaller code as insns without REX prefixes are shorter.

although this might change if the name of the hook changes.
+/* Info about pseudo used during the assignment pass. Thread is a set
+ of connected reload and inheritance pseudos with the same set of
+ available hard reg set. Thread is a pseudo itself for other
+ cases. */
+struct regno_assign_info
Maybe:

/* Information about the thread to which a pseudo belongs. Threads are
a set of connected reload and inheritance pseudos with the same set of
available hard registers. Lone registers belong to their own threads. */
+ && (ira_class_hard_regs_num[regno_allocno_class_array[regno1]]
+ == ira_class_hard_regs_num[regno_allocno_class_array[regno2]]))
i.e. the same _number_ of available hard regs, but not necessarily the
same set.

"thread" might be more mnemonic than "regno_assign" in this file,
but that's bikeshed stuff.
+ for (i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
+ {
+ regno_assign_info[i].first = i;
+ regno_assign_info[i].next = -1;
+ regno_assign_info[i].freq = lra_reg_info[i].freq;
+ }
Minor speedup, but it's probably worth caching max_reg_num () rather than
calling it in each loop iteration. Several other loops with the same thing.
+/* Process a pseudo copy with execution frequency COPY_FREQ connecting
+ REGNO1 and REGNO2 to form threads. */
+static void
+process_copy_to_form_thread (int regno1, int regno2, int copy_freq)
+{
+ int last, regno1_first, regno2_first;
+
+ lra_assert (regno1 >= lra_constraint_new_regno_start
+ && regno2 >= lra_constraint_new_regno_start);
+ regno1_first = regno_assign_info[regno1].first;
+ regno2_first = regno_assign_info[regno2].first;
+ if (regno1_first != regno2_first)
+ {
+ for (last = regno2_first;
+ regno_assign_info[last].next >= 0;
+ last = regno_assign_info[last].next)
+ regno_assign_info[last].first = regno1_first;
+ regno_assign_info[last].next = regno_assign_info[regno1_first].next;
+ regno_assign_info[regno1_first].first = regno2_first;
+ regno_assign_info[regno1_first].freq
+ += regno_assign_info[regno2_first].freq;
Couple of things I don't understand here:

- Why don't we set regno_assign_info[last].first (for final "last")
to regno1_first? I.e. the loop stops while "last" is still valid,
but only assigns to that element's "next" field, leaving "first"
as before.

- I might be wrong, but should:

regno_assign_info[regno1_first].first = regno2_first;

be:

regno_assign_info[regno1_first].next = regno2_first;

so that the list becomes:

regno1_first regno2_first ... last ...

The current version seems to create a cycle:

regno_assign_info[regno1_first].first == regno2_first
regno_assign_info[regno2_first].first == regno1_first
+/* Update LIVE_HARD_REG_PSEUDOS and LIVE_PSEUDOS_REG_RENUMBER by
+ pseudo REGNO assignment or by the pseudo spilling if FREE_P. */
Maybe:

/* Update the LIVE_HARD_REG_PSEUDOS and LIVE_PSEUDOS_REG_RENUMBER
entries for pseudo REGNO. Assume that the register has been
spilled if FREE_P, otherwise assume that it has been assigned
reg_renumber[REGNO] (if >= 0). */
+/* Find and return best (or TRY_ONLY_HARD_REGNO) free hard register
+ for pseudo REGNO. In the failure case, return a negative number.
+ Return through *COST the cost of usage of the hard register for the
+ pseudo. Best free hard register has smallest cost of usage for
+ REGNO or smallest register bank if the cost is the same. */
Maybe:

/* Try to find a free hard register for pseudo REGNO. Return the
hard register on success and set *COST to the cost of using
that register. (If several registers have equal cost, the one with
the lowest register bank wins.) Return -1 on failure.

If TRY_ONLY_HARD_REGNO >= 0, consider only that hard register,
otherwise consider all hard registers in REGNO's class. */
+ if (hard_regno_costs_check[hard_regno] != curr_hard_regno_costs_check)
+ hard_regno_costs[hard_regno] = 0;
+ hard_regno_costs_check[hard_regno] = curr_hard_regno_costs_check;
+ hard_regno_costs[hard_regno]
+ -= lra_reg_info[regno].preferred_hard_regno_profit1;
This pattern occurs several times. I think it'd be clearer to have
an inline helper function (adjust_hard_regno_cost, or whatever).
+ /* That is important for allocation of multi-word pseudos. */
+ IOR_COMPL_HARD_REG_SET (conflict_set, reg_class_contents[rclass]);
Maybe:

/* Make sure that all registers in a multi-word pseudo belong to the
required class. */
+ /* We can not use prohibited_class_mode_regs because it is
+ defined not for all classes. */
s/defined not/not defined/
+ && ! TEST_HARD_REG_BIT (impossible_start_hard_regs, hard_regno)
+ && (nregs_diff == 0
+#ifdef WORDS_BIG_ENDIAN
+ || (hard_regno - nregs_diff >= 0
+ && TEST_HARD_REG_BIT (reg_class_contents[rclass],
+ hard_regno - nregs_diff))
+#else
+ || TEST_HARD_REG_BIT (reg_class_contents[rclass],
+ hard_regno + nregs_diff)
+#endif
+ ))
+ conflict_hr = live_pseudos_reg_renumber[conflict_regno];
+ nregs = (hard_regno_nregs[conflict_hr]
+ [lra_reg_info[conflict_regno].biggest_mode]);
+ /* Remember about multi-register pseudos. For example, 2 hard
+ register pseudos can start on the same hard register but can
+ not start on HR and HR+1/HR-1. */
+ for (hr = conflict_hr + 1;
+ hr < FIRST_PSEUDO_REGISTER && hr < conflict_hr + nregs;
+ hr++)
+ SET_HARD_REG_BIT (impossible_start_hard_regs, hr);
+ for (hr = conflict_hr - 1;
+ hr >= 0 && hr + hard_regno_nregs[hr][biggest_mode] > conflict_hr;
+ hr--)
+ SET_HARD_REG_BIT (impossible_start_hard_regs, hr);
which I don't think copes with big-endian cases like:

other hard reg in widest mode: ........XXXX...
impossible_start_regs: .....XXX.XXX...
this hard reg in pseudo's mode: ............XX.
this hard reg in widest mode: ..........XXXX.

which AIUI is an invalid choice.

There are other corner cases too. If the other hard reg is narrower than
its widest mode, and that widest mode is wider than the current regno's
widest mode, then on big-endian targets we could have:

other hard reg in its own mode: ........XX....
other hard reg in widest mode: ......XXXX.....
impossible_start_regs: .......X.XXX... (*)
this hard reg in pseudo's mode: .....XX........
this hard reg in widest mode: .....XX........

(*) note no big-endian adjustment for the other hard reg's widest mode here.

Maybe it would be easier to track impossible end regs for
big-endian targets?
+/* Update HARD_REGNO preference for pseudos connected (directly or
+ indirectly) to a pseudo with REGNO. Use divisor DIV to the
+ corresponding copy frequency for the hard regno cost preference
+ calculation. The more indirectly a pseudo connected, the less the
+ cost preference. It is achieved by increasing the divisor for each
+ next recursive level move. */
"cost preference" seems a bit contradictory. Maybe:

/* Update the preference for using HARD_REGNO for pseudos that are
connected directly or indirectly with REGNO. Apply divisor DIV
to any preference adjustments.

The more indirectly a pseudo is connected, the smaller its effect
should be. We therefore increase DIV on each "hop". */
+static void
+update_hard_regno_preference (int regno, int hard_regno, int div)
+{
+ int another_regno, cost;
+ lra_copy_t cp, next_cp;
+
+ /* Search depth 5 seems to be enough. */
+ if (div > (1 << 5))
+ return;
+ for (cp = lra_reg_info[regno].copies; cp != NULL; cp = next_cp)
+ {
+ if (cp->regno1 == regno)
+ {
+ next_cp = cp->regno1_next;
+ another_regno = cp->regno2;
+ }
+ else if (cp->regno2 == regno)
+ {
+ next_cp = cp->regno2_next;
+ another_regno = cp->regno1;
+ }
+ else
+ gcc_unreachable ();
+ if (reg_renumber[another_regno] < 0
+ && (update_hard_regno_preference_check[another_regno]
+ != curr_update_hard_regno_preference_check))
+ {
+ update_hard_regno_preference_check[another_regno]
+ = curr_update_hard_regno_preference_check;
+ cost = cp->freq < div ? 1 : cp->freq / div;
+ lra_setup_reload_pseudo_preferenced_hard_reg
+ (another_regno, hard_regno, cost);
+ update_hard_regno_preference (another_regno, hard_regno, div * 2);
+ }
+ }
+}
Using a depth-first search for this seems a bit dangerous, because we
could end up processing a connected pseudo via a very indirect path
first, even though it is more tightly connected via a more direct path.
(Could be a well-known problem, sorry.)
+/* Update REG_RENUMBER and other pseudo preferences by assignment of
+ HARD_REGNO to pseudo REGNO and print about it if PRINT_P. */
+void
+lra_setup_reg_renumber (int regno, int hard_regno, bool print_p)
+{
+ int i, hr;
+
+ if ((hr = hard_regno) < 0)
+ hr = reg_renumber[regno];
+ reg_renumber[regno] = hard_regno;
+ lra_assert (hr >= 0);
+ for (i = 0; i < hard_regno_nregs[hr][PSEUDO_REGNO_MODE (regno)]; i++)
+ if (hard_regno < 0)
+ lra_hard_reg_usage[hr + i] -= lra_reg_info[regno].freq;
+ else
+ lra_hard_reg_usage[hr + i] += lra_reg_info[regno].freq;
Is it possible for this function to reallocate a register,
i.e. for reg_regnumber to be >= 0 both before and after the call?
If so, I think we'd need two loops. If not, an assert would be good.
+ mode = PSEUDO_REGNO_MODE (spill_regno);
+ if (lra_hard_reg_set_intersection_p
+ (live_pseudos_reg_renumber[spill_regno],
+ mode, reg_class_contents[rclass]))
+ {
+ hard_regno = live_pseudos_reg_renumber[spill_regno];
Very minor, sorry, but I think this would be more readable with the
hard_regno assignment before the condition and hard_regno used in it.
+/* Spill some pseudos for a reload pseudo REGNO and return hard
+ register which should be used for pseudo after spilling. The
+ function adds spilled pseudos to SPILLED_PSEUDO_BITMAP. When we
+ choose hard register (and pseudos occupying the hard registers and
+ to be spilled), we take into account not only how REGNO will
+ benefit from the spills but also how other reload pseudos not
+ assigned to hard registers yet benefit from the spills too. */
"...not yet assigned to hard registers benefit..."
+ curr_pseudo_check++; /* Invalidate try_hard_reg_pseudos elements. */
Comment on its own line.
+ bitmap_clear (&ignore_pseudos_bitmap);
+ bitmap_clear (&best_spill_pseudos_bitmap);
+ EXECUTE_IF_SET_IN_BITMAP (&lra_reg_info[regno].insn_bitmap, 0, uid, bi)
+ {
+ struct lra_insn_reg *ir;
+
+ for (ir = lra_get_insn_regs (uid); ir != NULL; ir = ir->next)
+ if (ir->regno >= FIRST_PSEUDO_REGISTER)
+ bitmap_set_bit (&ignore_pseudos_bitmap, ir->regno);
+ }
The name "ignore_pseudos_bitmap" doesn't seem to describe how the set is
actually used. We still allow the pseudos to be spilled, but the number
of such spills is the first-order cost. Maybe "insn_conflict_pseudos"
or something like that?
+ /* Spill pseudos. */
+ CLEAR_HARD_REG_SET (spilled_hard_regs);
+ EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
+ if ((int) spill_regno >= lra_constraint_new_regno_start
+ /* ??? */
+ && ! bitmap_bit_p (&lra_inheritance_pseudos, spill_regno)
+ && ! bitmap_bit_p (&lra_split_pseudos, spill_regno)
+ && ! bitmap_bit_p (&lra_optional_reload_pseudos, spill_regno))
+ goto fail;
Leftover ??? (or lacks enough info if it's supposed to be kept)
+ EXECUTE_IF_SET_IN_BITMAP (&live_hard_reg_pseudos[r->start],
+ 0, k, bi2)
+ sparseset_set_bit (live_range_hard_reg_pseudos, k);
live_range_hard_reg_pseudos and &live_hard_reg_pseudos[r->start]
seem like similar quantities. Was there a reason for using
sparsesets for one and bitmaps for the other?
+ for (p = r->start + 1; p <= r->finish; p++)
+ {
+ lra_live_range_t r2;
+
+ for (r2 = lra_start_point_ranges[p];
+ r2 != NULL;
+ r2 = r2->start_next)
+ if (r2->regno >= lra_constraint_new_regno_start)
+ sparseset_set_bit (live_range_reload_pseudos, r2->regno);
+ }
This is probably just showing my ignorance, but -- taking the above two
quotes together -- why do we calculate these two live sets in different ways?
Also, does live_range_reload_pseudos really just contain "reload" pseudos,
or inheritance pseudos as well?
+ /* We are trying to spill a reload pseudo. That is wrong we
+ should assign all reload pseudos, otherwise we cannot reuse
+ the selected alternatives. */
+ hard_regno = find_hard_regno_for (regno, &cost, -1);
+ if (hard_regno >= 0)
+ {
Don't really understand this comment, sorry.

Also, why are we passing -1 to find_hard_regno_for, rather than hard_regno?
The loop body up till this point has been specifically freeing up registers
to make hard_regno allocatable. I realise that, by spilling everything
that overlaps this range, we might have freed up other registers too,
and so made others besides hard_regno allocatable. But wouldn't we want
to test those other hard registers in "their" iteration of the loop
instead of this one? The spills in those iterations ought to be more
directed (i.e. there should be less incidental spilling).

As things stand, doing an rclass_size * rclass_size scan seems
unnecessarily expensive, although probably off the radar.
+ assign_temporarily (regno, hard_regno);
+ n = 0;
+ EXECUTE_IF_SET_IN_SPARSESET (live_range_reload_pseudos, reload_regno)
+ if (live_pseudos_reg_renumber[reload_regno] < 0
+ && (hard_reg_set_intersect_p
+ (reg_class_contents
+ [regno_allocno_class_array[reload_regno]],
+ spilled_hard_regs)))
+ sorted_reload_pseudos[n++] = reload_regno;
+ qsort (sorted_reload_pseudos, n, sizeof (int),
+ reload_pseudo_compare_func);
+ for (j = 0; j < n; j++)
+ {
+ reload_regno = sorted_reload_pseudos[j];
+ if (live_pseudos_reg_renumber[reload_regno] < 0
Just trying to make sure I understand, but: isn't the final condition in
this quote redundant? I thought that was a requirement for the register
being in sorted_reload_pseudos to begin with.
+ && (reload_hard_regno
+ = find_hard_regno_for (reload_regno,
+ &reload_cost, -1)) >= 0
+ && (lra_hard_reg_set_intersection_p
+ (reload_hard_regno, PSEUDO_REGNO_MODE (reload_regno),
+ spilled_hard_regs)))
+ {
+ if (lra_dump_file != NULL)
+ fprintf (lra_dump_file, " assign %d(cost=%d)",
+ reload_regno, reload_cost);
+ assign_temporarily (reload_regno, reload_hard_regno);
+ cost += reload_cost;
It looks like registers that can be reallocated make hard_regno more
expensive (specifically by reload_cost), but registers that can't be
reallocated contribute no cost. Is that right? Seemed a little odd,
so maybe worth a comment.

Also, AIUI find_hard_regno_for is trying to allocate the register for
reload_regno on the basis that reload_regno has the same conflicts as
the current regno, and so it's only an approximation. Is that right?
Might be worth a comment if so (not least to explain why we don't commit
to this allocation if we end up choosing hard_regno).
+ if (best_insn_pseudos_num > insn_pseudos_num
+ || (best_insn_pseudos_num == insn_pseudos_num
+ && best_cost > cost))
Should we check the register bank and levelling here too,
for consistency?
+ /* Restore the live hard reg pseudo info for spilled pseudos. */
+ EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
+ update_lives (spill_regno, false);
I couldn't tell why this was outside the "hard_regno >= 0" condition.
Do we really change these registers even if find_hard_regno_for fails?
+ /* Spill: */
+ EXECUTE_IF_SET_IN_BITMAP (&best_spill_pseudos_bitmap, 0, spill_regno, bi)
Very minor, but I think it'd be worth asserting that best_hard_regno >= 0
before this loop.
+/* Constraint transformation can use equivalences and they can
+ contains pseudos assigned to hard registers. Such equivalence
+ usage might create new conflicts of pseudos with hard registers
+ (like ones used for parameter passing or call clobbered ones) or
+ other pseudos assigned to the same hard registers. Another very
+ rare risky transformation is restoring whole multi-register pseudo
+ when only one subreg lives and unused hard register is used already
+ for something else.
In a way, I found this comment almost too detailed. :-) Maybe:

/* The constraints pass is allowed to create equivalences between
pseudos that make the current allocation "incorrect" (in the sense
that pseudos are assigned to hard registers from their own conflict sets).
The global variable lra_risky_transformations_p says whether this might
have happened.

if that's accurate. The detail about when this occurs probably
belongs above lra_risky_transformations_p, although it's mostly
there already. (Haven't got to the ira-conflicts.c stuff yet,
so no comments about that here.)
+ Process pseudos assigned to hard registers (most frequently used
+ first), spill if a conflict is found, and mark the spilled pseudos
+ in SPILLED_PSEUDO_BITMAP. Set up LIVE_HARD_REG_PSEUDOS from
+ pseudos, assigned to hard registers. */
Why do we spill the most frequently used registers first? Probably worth
a comment.
+ for (n = 0, i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
+ if (reg_renumber[i] >= 0 && lra_reg_info[i].nrefs > 0)
+ {
+ if (lra_risky_transformations_p)
+ sorted_pseudos[n++] = i;
+ else
+ update_lives (i, false);
+ }
+ if (! lra_risky_transformations_p)
+ return;
Seems like this would be more logically split into two (the
lra_risky_transformations_p case and the !lra_risky_transformations_p case).
+ /* If it is multi-register pseudos they should start on
+ the same hard register. */
+ || hard_regno != reg_renumber[conflict_regno])
This seems different from the find_hard_regno_for case, which took
biggest_mode into account.
+ /* Don't change reload pseudo allocation. It might have
+ this allocation for a purpose (e.g. bound to another
+ pseudo) and changing it can result in LRA cycling. */
+ if (another_regno < lra_constraint_new_regno_start
+ && (another_hard_regno = reg_renumber[another_regno]) >= 0
+ && another_hard_regno != hard_regno)
Seems like this excludes split pseudos as well as reload pseudos,
or are they never included in these copies? Might be worth mentioning
them either way.

The only general comment I have so far is that it's sometimes
difficult to follow which types of pseudos are being included
or excluded by a comparison with lra_constraint_new_regno_start.
Sometimes the comments talk about "reload pseudos", but other
similar checks imply that the registers could be inheritance
pseudos or split pseudos as well. Some thin inline wrappers
might help here.
+ /* Remember that reload pseudos can be spilled on the
+ 1st pass. */
+ bitmap_clear_bit (&all_spilled_pseudos, regno);
+ assign_hard_regno (hard_regno, regno);
Maybe:

/* This register might have been spilled by the previous pass.
Indicate that it is no longer spilled. */
+ /* We can use inheritance pseudos in original insns
+ (not reload ones). */
+ if (regno < lra_constraint_new_regno_start
+ || bitmap_bit_p (&lra_inheritance_pseudos, regno)
+ || reg_renumber[regno] < 0)
+ continue;
+ sorted_pseudos[nfails++] = regno;
+ if (lra_dump_file != NULL)
+ fprintf (lra_dump_file,
+ " Spill reload r%d(hr=%d, freq=%d)\n",
+ regno, reg_renumber[regno],
+ lra_reg_info[regno].freq);
Same comment about types of pseudo as above. (I.e. the code checks for
inheritance pseudos, but not split pseudos.)
+ bitmap_initialize (&do_not_assign_nonreload_pseudos, &reg_obstack);
+ EXECUTE_IF_SET_IN_BITMAP (&lra_inheritance_pseudos, 0, u, bi)
+ if ((restore_regno = lra_reg_info[u].restore_regno) >= 0
+ && reg_renumber[u] < 0 && bitmap_bit_p (&lra_inheritance_pseudos, u))
+ bitmap_set_bit (&do_not_assign_nonreload_pseudos, restore_regno);
+ EXECUTE_IF_SET_IN_BITMAP (&lra_split_pseudos, 0, u, bi)
+ if ((restore_regno = lra_reg_info[u].restore_regno) >= 0
+ && reg_renumber[u] >= 0 && bitmap_bit_p (&lra_split_pseudos, u))
+ bitmap_set_bit (&do_not_assign_nonreload_pseudos, restore_regno);
+ for (n = 0, i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
+ if (((i < lra_constraint_new_regno_start
+ && ! bitmap_bit_p (&do_not_assign_nonreload_pseudos, i))
+ || (bitmap_bit_p (&lra_inheritance_pseudos, i)
+ && lra_reg_info[i].restore_regno >= 0)
+ || (bitmap_bit_p (&lra_split_pseudos, i)
+ && lra_reg_info[i].restore_regno >= 0)
+ || bitmap_bit_p (&lra_optional_reload_pseudos, i))
+ && reg_renumber[i] < 0 && lra_reg_info[i].nrefs != 0
+ && regno_allocno_class_array[i] != NO_REGS)
+ sorted_pseudos[n++] = i;
+ bitmap_clear (&do_not_assign_nonreload_pseudos);
where we test very similar things inline, and then clear
do_not_assign_nonreload_pseudos. Do we need d_n_a_n_p at all?
+ if (n != 0 && lra_dump_file != NULL)
+ fprintf (lra_dump_file, " Reassing non-reload pseudos\n");
"Reassigning"

Richard
Vladimir Makarov
2012-10-12 03:37:53 UTC
Permalink
Post by Richard Sandiford
Hi Vlad,
This message is for lra-assigns.c. Sorry for the piecemeal reviews,
never sure when I'll get time...
+/* This file contains a pass mostly assigning hard registers to reload
+ pseudos. There is no any RTL code transformation on this pass.
/* This file's main objective is to assign hard registers to reload pseudos.
It also tries to allocate hard registers to other pseudos, but at a lower
priority than the reload pseudos. The pass does not transform the RTL.
if that's accurate.
Yes. That is better. I used your comment.
Post by Richard Sandiford
+ Reload pseudos get what they need (usually) hard registers in
+ anyway possibly by spilling non-reload pseudos and by assignment
+ reload pseudos with smallest number of available hard registers
+ first.
+
+ If reload pseudos can get hard registers only through spilling
+ other pseudos, we choose what pseudos to spill taking into account
+ how given reload pseudo benefits and also how other reload pseudos
+ not assigned yet benefit too (see function spill_for).
We must allocate a hard register to every reload pseudo. We try to
increase the chances of finding a viable allocation by assigning the
pseudos in order of fewest available hard registers first. If we
still fail to find a hard register, we spill other (non-reload)
pseudos in order to make room.
assign_hard_regno_for allocates registers without spilling.
spill_for does the same with spilling. Both functions use
a cost model to determine the most profitable choice of
hard and spill registers.
Ok. I just changed two sentences a bit:

find_hard_regno_for finds hard registers for allocation without
spilling. spill_for does the same with spilling.
Post by Richard Sandiford
+ Non-reload pseudos can get hard registers too if it is possible and
+ improves the code. It might be possible because of spilling
+ non-reload pseudos on given pass.
Once we have finished allocating reload pseudos, we also try to
assign registers to other (non-reload) pseudos. This is useful
if hard registers were freed up by the spilling just described.
Fixed.
Post by Richard Sandiford
+ We try to assign hard registers processing pseudos by threads. The
+ thread contains reload and inheritance pseudos connected by copies
+ (move insns). It improves the chance to get the same hard register
+ to pseudos in the thread and, as the result, to remove some move
+ insns.
We try to assign hard registers by collecting pseudos into threads.
These threads contain reload and inheritance pseudos that are connected
by copies (move insns). Doing this improves the chances of pseudos
in the thread getting the same hard register and, as a result,
of allowing some move insns to be deleted.
Fixed.
Post by Richard Sandiford
+ When we assign hard register to a pseudo, we decrease the cost of
+ the hard registers for corresponding pseudos connected by copies.
When we assign a hard register to a pseudo, we decrease the cost of
using the same hard register for pseudos that are connected by copies.
Fixed.
Post by Richard Sandiford
+ If two hard registers are equally good for assigning the pseudo
+ with hard register cost point of view, we prefer a hard register in
+ smaller register bank. By default, there is only one register
+ bank. A target can define register banks by hook
+ register_bank. For example, x86-64 has a few register banks: hard
+ regs with and without REX prefixes are in different banks. It
+ permits to generate smaller code as insns without REX prefix are
+ shorter.
If two hard registers have the same frequency-derived cost,
we prefer hard registers in lower register banks. The mapping
of registers to banks is controlled by the register_bank target hook.
For example, x86-64 has a few register banks: hard registers with and
without REX prefixes are in different banks. This permits us
to generate smaller code as insns without REX prefixes are shorter.
although this might change if the name of the hook changes.
With recent change in the hook name, I modified it to:

If two hard registers have the same frequency-derived cost, we
prefer hard registers with bigger priorities. The mapping of
registers to priorities is controlled by the register_priority
target hook. For example, x86-64 has a few register priorities:
hard registers with and without REX prefixes have different
priorities. This permits us to generate smaller code as insns
without REX prefixes are shorter.
Post by Richard Sandiford
+/* Info about pseudo used during the assignment pass. Thread is a set
+ of connected reload and inheritance pseudos with the same set of
+ available hard reg set. Thread is a pseudo itself for other
+ cases. */
+struct regno_assign_info
/* Information about the thread to which a pseudo belongs. Threads are
a set of connected reload and inheritance pseudos with the same set of
available hard registers. Lone registers belong to their own threads. */
Fixed.
Post by Richard Sandiford
+ && (ira_class_hard_regs_num[regno_allocno_class_array[regno1]]
+ == ira_class_hard_regs_num[regno_allocno_class_array[regno2]]))
i.e. the same _number_ of available hard regs, but not necessarily the
same set.
It should be the same in most cases. This condition is just faster
approximation of the same available hard reg set.
Post by Richard Sandiford
"thread" might be more mnemonic than "regno_assign" in this file,
but that's bikeshed stuff.
+ for (i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
+ {
+ regno_assign_info[i].first = i;
+ regno_assign_info[i].next = -1;
+ regno_assign_info[i].freq = lra_reg_info[i].freq;
+ }
Minor speedup, but it's probably worth caching max_reg_num () rather than
calling it in each loop iteration. Several other loops with the same thing.
That is not a critical code and LTO could solve the problem. But as we
usually don't use it for building GCC, I rewrote it.
Post by Richard Sandiford
+/* Process a pseudo copy with execution frequency COPY_FREQ connecting
+ REGNO1 and REGNO2 to form threads. */
+static void
+process_copy_to_form_thread (int regno1, int regno2, int copy_freq)
+{
+ int last, regno1_first, regno2_first;
+
+ lra_assert (regno1 >= lra_constraint_new_regno_start
+ && regno2 >= lra_constraint_new_regno_start);
+ regno1_first = regno_assign_info[regno1].first;
+ regno2_first = regno_assign_info[regno2].first;
+ if (regno1_first != regno2_first)
+ {
+ for (last = regno2_first;
+ regno_assign_info[last].next >= 0;
+ last = regno_assign_info[last].next)
+ regno_assign_info[last].first = regno1_first;
+ regno_assign_info[last].next = regno_assign_info[regno1_first].next;
+ regno_assign_info[regno1_first].first = regno2_first;
+ regno_assign_info[regno1_first].freq
+ += regno_assign_info[regno2_first].freq;
- Why don't we set regno_assign_info[last].first (for final "last")
to regno1_first? I.e. the loop stops while "last" is still valid,
but only assigns to that element's "next" field, leaving "first"
as before.
regno_assign_info[regno1_first].first = regno2_first;
regno_assign_info[regno1_first].next = regno2_first;
regno1_first regno2_first ... last ...
regno_assign_info[regno1_first].first == regno2_first
regno_assign_info[regno2_first].first == regno1_first
It is a typo. Fixed. Thanks. There is no looping danger as we don't
traverse this list by field first but it results in assignment order
different from what I assumed.
Post by Richard Sandiford
+/* Update LIVE_HARD_REG_PSEUDOS and LIVE_PSEUDOS_REG_RENUMBER by
+ pseudo REGNO assignment or by the pseudo spilling if FREE_P. */
/* Update the LIVE_HARD_REG_PSEUDOS and LIVE_PSEUDOS_REG_RENUMBER
entries for pseudo REGNO. Assume that the register has been
spilled if FREE_P, otherwise assume that it has been assigned
reg_renumber[REGNO] (if >= 0). */
Fixed. I also added a comment about recently added
insert_in_live_range_start_chain call.
Post by Richard Sandiford
+/* Find and return best (or TRY_ONLY_HARD_REGNO) free hard register
+ for pseudo REGNO. In the failure case, return a negative number.
+ Return through *COST the cost of usage of the hard register for the
+ pseudo. Best free hard register has smallest cost of usage for
+ REGNO or smallest register bank if the cost is the same. */
/* Try to find a free hard register for pseudo REGNO. Return the
hard register on success and set *COST to the cost of using
that register. (If several registers have equal cost, the one with
the lowest register bank wins.) Return -1 on failure.
If TRY_ONLY_HARD_REGNO >= 0, consider only that hard register,
otherwise consider all hard registers in REGNO's class. */
Fixed with changing bank to priority.
Post by Richard Sandiford
+ if (hard_regno_costs_check[hard_regno] != curr_hard_regno_costs_check)
+ hard_regno_costs[hard_regno] = 0;
+ hard_regno_costs_check[hard_regno] = curr_hard_regno_costs_check;
+ hard_regno_costs[hard_regno]
+ -= lra_reg_info[regno].preferred_hard_regno_profit1;
This pattern occurs several times. I think it'd be clearer to have
an inline helper function (adjust_hard_regno_cost, or whatever).
Done.
Post by Richard Sandiford
+ /* That is important for allocation of multi-word pseudos. */
+ IOR_COMPL_HARD_REG_SET (conflict_set, reg_class_contents[rclass]);
/* Make sure that all registers in a multi-word pseudo belong to the
required class. */
Fixed.
Post by Richard Sandiford
+ /* We can not use prohibited_class_mode_regs because it is
+ defined not for all classes. */
s/defined not/not defined/
Fixed.
Post by Richard Sandiford
+ && ! TEST_HARD_REG_BIT (impossible_start_hard_regs, hard_regno)
+ && (nregs_diff == 0
+#ifdef WORDS_BIG_ENDIAN
+ || (hard_regno - nregs_diff >= 0
+ && TEST_HARD_REG_BIT (reg_class_contents[rclass],
+ hard_regno - nregs_diff))
+#else
+ || TEST_HARD_REG_BIT (reg_class_contents[rclass],
+ hard_regno + nregs_diff)
+#endif
+ ))
+ conflict_hr = live_pseudos_reg_renumber[conflict_regno];
+ nregs = (hard_regno_nregs[conflict_hr]
+ [lra_reg_info[conflict_regno].biggest_mode]);
+ /* Remember about multi-register pseudos. For example, 2 hard
+ register pseudos can start on the same hard register but can
+ not start on HR and HR+1/HR-1. */
+ for (hr = conflict_hr + 1;
+ hr < FIRST_PSEUDO_REGISTER && hr < conflict_hr + nregs;
+ hr++)
+ SET_HARD_REG_BIT (impossible_start_hard_regs, hr);
+ for (hr = conflict_hr - 1;
+ hr >= 0 && hr + hard_regno_nregs[hr][biggest_mode] > conflict_hr;
+ hr--)
+ SET_HARD_REG_BIT (impossible_start_hard_regs, hr);
other hard reg in widest mode: ........XXXX...
impossible_start_regs: .....XXX.XXX...
this hard reg in pseudo's mode: ............XX.
this hard reg in widest mode: ..........XXXX.
which AIUI is an invalid choice.
There are other corner cases too. If the other hard reg is narrower than
its widest mode, and that widest mode is wider than the current regno's
other hard reg in its own mode: ........XX....
other hard reg in widest mode: ......XXXX.....
impossible_start_regs: .......X.XXX... (*)
this hard reg in pseudo's mode: .....XX........
this hard reg in widest mode: .....XX........
(*) note no big-endian adjustment for the other hard reg's widest mode here.
Maybe it would be easier to track impossible end regs for
big-endian targets?
I'll look at this tomorrow.
Post by Richard Sandiford
+/* Update HARD_REGNO preference for pseudos connected (directly or
+ indirectly) to a pseudo with REGNO. Use divisor DIV to the
+ corresponding copy frequency for the hard regno cost preference
+ calculation. The more indirectly a pseudo connected, the less the
+ cost preference. It is achieved by increasing the divisor for each
+ next recursive level move. */
/* Update the preference for using HARD_REGNO for pseudos that are
connected directly or indirectly with REGNO. Apply divisor DIV
to any preference adjustments.
The more indirectly a pseudo is connected, the smaller its effect
should be. We therefore increase DIV on each "hop". */
Fixed. By the way, it is your invention from IRA.
Post by Richard Sandiford
+static void
+update_hard_regno_preference (int regno, int hard_regno, int div)
+{
+ int another_regno, cost;
+ lra_copy_t cp, next_cp;
+
+ /* Search depth 5 seems to be enough. */
+ if (div > (1 << 5))
+ return;
+ for (cp = lra_reg_info[regno].copies; cp != NULL; cp = next_cp)
+ {
+ if (cp->regno1 == regno)
+ {
+ next_cp = cp->regno1_next;
+ another_regno = cp->regno2;
+ }
+ else if (cp->regno2 == regno)
+ {
+ next_cp = cp->regno2_next;
+ another_regno = cp->regno1;
+ }
+ else
+ gcc_unreachable ();
+ if (reg_renumber[another_regno] < 0
+ && (update_hard_regno_preference_check[another_regno]
+ != curr_update_hard_regno_preference_check))
+ {
+ update_hard_regno_preference_check[another_regno]
+ = curr_update_hard_regno_preference_check;
+ cost = cp->freq < div ? 1 : cp->freq / div;
+ lra_setup_reload_pseudo_preferenced_hard_reg
+ (another_regno, hard_regno, cost);
+ update_hard_regno_preference (another_regno, hard_regno, div * 2);
+ }
+ }
+}
Using a depth-first search for this seems a bit dangerous, because we
could end up processing a connected pseudo via a very indirect path
first, even though it is more tightly connected via a more direct path.
(Could be a well-known problem, sorry.)
Actually I did on purpose. It is a bit different situation from IRA.
The vast majority of copies form a straight line (therefore I use term
threads). Therefore I use also only two preferences of hard registers
(from two copies with hard registers). I searched a balance between
code simplicity and speed and more sophisticated and slow heuristics.
Post by Richard Sandiford
+/* Update REG_RENUMBER and other pseudo preferences by assignment of
+ HARD_REGNO to pseudo REGNO and print about it if PRINT_P. */
+void
+lra_setup_reg_renumber (int regno, int hard_regno, bool print_p)
+{
+ int i, hr;
+
+ if ((hr = hard_regno) < 0)
+ hr = reg_renumber[regno];
+ reg_renumber[regno] = hard_regno;
+ lra_assert (hr >= 0);
+ for (i = 0; i < hard_regno_nregs[hr][PSEUDO_REGNO_MODE (regno)]; i++)
+ if (hard_regno < 0)
+ lra_hard_reg_usage[hr + i] -= lra_reg_info[regno].freq;
+ else
+ lra_hard_reg_usage[hr + i] += lra_reg_info[regno].freq;
Is it possible for this function to reallocate a register,
i.e. for reg_regnumber to be >= 0 both before and after the call?
If so, I think we'd need two loops. If not, an assert would be good.
No. I added an assert.
Post by Richard Sandiford
+ mode = PSEUDO_REGNO_MODE (spill_regno);
+ if (lra_hard_reg_set_intersection_p
+ (live_pseudos_reg_renumber[spill_regno],
+ mode, reg_class_contents[rclass]))
+ {
+ hard_regno = live_pseudos_reg_renumber[spill_regno];
Very minor, sorry, but I think this would be more readable with the
hard_regno assignment before the condition and hard_regno used in it.
Fixed.
Post by Richard Sandiford
+/* Spill some pseudos for a reload pseudo REGNO and return hard
+ register which should be used for pseudo after spilling. The
+ function adds spilled pseudos to SPILLED_PSEUDO_BITMAP. When we
+ choose hard register (and pseudos occupying the hard registers and
+ to be spilled), we take into account not only how REGNO will
+ benefit from the spills but also how other reload pseudos not
+ assigned to hard registers yet benefit from the spills too. */
"...not yet assigned to hard registers benefit..."
Fixed.
Post by Richard Sandiford
+ curr_pseudo_check++; /* Invalidate try_hard_reg_pseudos elements. */
Comment on its own line.
Fixed.
Post by Richard Sandiford
+ bitmap_clear (&ignore_pseudos_bitmap);
+ bitmap_clear (&best_spill_pseudos_bitmap);
+ EXECUTE_IF_SET_IN_BITMAP (&lra_reg_info[regno].insn_bitmap, 0, uid, bi)
+ {
+ struct lra_insn_reg *ir;
+
+ for (ir = lra_get_insn_regs (uid); ir != NULL; ir = ir->next)
+ if (ir->regno >= FIRST_PSEUDO_REGISTER)
+ bitmap_set_bit (&ignore_pseudos_bitmap, ir->regno);
+ }
The name "ignore_pseudos_bitmap" doesn't seem to describe how the set is
actually used. We still allow the pseudos to be spilled, but the number
of such spills is the first-order cost. Maybe "insn_conflict_pseudos"
or something like that?
Ok. Fixed.
Post by Richard Sandiford
+ /* Spill pseudos. */
+ CLEAR_HARD_REG_SET (spilled_hard_regs);
+ EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
+ if ((int) spill_regno >= lra_constraint_new_regno_start
+ /* ??? */
+ && ! bitmap_bit_p (&lra_inheritance_pseudos, spill_regno)
+ && ! bitmap_bit_p (&lra_split_pseudos, spill_regno)
+ && ! bitmap_bit_p (&lra_optional_reload_pseudos, spill_regno))
+ goto fail;
Leftover ??? (or lacks enough info if it's supposed to be kept)
It is a leftover. As I remember it was a mark to me to check this code
when i worked on inheritance and splitting.

I removed it.
Post by Richard Sandiford
+ EXECUTE_IF_SET_IN_BITMAP (&live_hard_reg_pseudos[r->start],
+ 0, k, bi2)
+ sparseset_set_bit (live_range_hard_reg_pseudos, k);
live_range_hard_reg_pseudos and &live_hard_reg_pseudos[r->start]
seem like similar quantities. Was there a reason for using
sparsesets for one and bitmaps for the other?
+ for (p = r->start + 1; p <= r->finish; p++)
+ {
+ lra_live_range_t r2;
+
+ for (r2 = lra_start_point_ranges[p];
+ r2 != NULL;
+ r2 = r2->start_next)
+ if (r2->regno >= lra_constraint_new_regno_start)
+ sparseset_set_bit (live_range_reload_pseudos, r2->regno);
+ }
This is probably just showing my ignorance, but -- taking the above two
quotes together -- why do we calculate these two live sets in different ways?
Also, does live_range_reload_pseudos really just contain "reload" pseudos,
or inheritance pseudos as well?
Thanks for finding this. live_range_hard_reg_pseudos is not used in
this function.
As I remember, i used the same code as in find_hard_regno_for and
live_(hard_)reg_pseudos contained all pseudos including spilled ones.
But it was too expensive (especially when the register pressure was
high). So I started to use less accurate but faster heuristics.

So I am removing

+ EXECUTE_IF_SET_IN_BITMAP (&live_hard_reg_pseudos[r->start],
+ 0, k, bi2)
+ sparseset_set_bit (live_range_hard_reg_pseudos, k);

I changed

p = r->start + 1 to p = r->start


All these variants of code do not result in wrong code generation, only the quality of spilling (which is not worse reload's one in any case).

live_range_reload_pseudos contains inheritance pseudos too (inheritance psedous are also short live range pseudos). I renamed it.
Post by Richard Sandiford
+ /* We are trying to spill a reload pseudo. That is wrong we
+ should assign all reload pseudos, otherwise we cannot reuse
+ the selected alternatives. */
+ hard_regno = find_hard_regno_for (regno, &cost, -1);
+ if (hard_regno >= 0)
+ {
Don't really understand this comment, sorry.
It removed the comment. It is from an old solution code trying to
guarantee assignment to the reload pseudo.
Post by Richard Sandiford
Also, why are we passing -1 to find_hard_regno_for, rather than hard_regno?
The loop body up till this point has been specifically freeing up registers
to make hard_regno allocatable. I realise that, by spilling everything
that overlaps this range, we might have freed up other registers too,
and so made others besides hard_regno allocatable. But wouldn't we want
to test those other hard registers in "their" iteration of the loop
instead of this one? The spills in those iterations ought to be more
directed (i.e. there should be less incidental spilling).
As things stand, doing an rclass_size * rclass_size scan seems
unnecessarily expensive, although probably off the radar.
We cannot just pass hard_regno for multi-word pseudo when hard_regno-1
is already free.
You are right about possibility to speed up the code, although on
profiles I looked (including the last huge tests) spill_for and
find_hard_regno_for called from takes few time. That is probably because
you don't need spill frequently. Freeing one long live range pseudo
permits to find hard regno without spilling for many short live pseudos
(reload and inheritance ones).
Also loop rclass_size * rclass_size is not expensive, the preparation of
data for the loop is expensive.

I believe it has a potential to speed up spill_for function if we inline
find_hard_regno_for and remove duplicated code but it will be achieved
by significant complication of already complicated function.
Post by Richard Sandiford
+ assign_temporarily (regno, hard_regno);
+ n = 0;
+ EXECUTE_IF_SET_IN_SPARSESET (live_range_reload_pseudos, reload_regno)
+ if (live_pseudos_reg_renumber[reload_regno] < 0
+ && (hard_reg_set_intersect_p
+ (reg_class_contents
+ [regno_allocno_class_array[reload_regno]],
+ spilled_hard_regs)))
+ sorted_reload_pseudos[n++] = reload_regno;
+ qsort (sorted_reload_pseudos, n, sizeof (int),
+ reload_pseudo_compare_func);
+ for (j = 0; j < n; j++)
+ {
+ reload_regno = sorted_reload_pseudos[j];
+ if (live_pseudos_reg_renumber[reload_regno] < 0
Just trying to make sure I understand, but: isn't the final condition in
this quote redundant? I thought that was a requirement for the register
being in sorted_reload_pseudos to begin with.
Yes, it is redundant. It is a leftover from some experiments with the
code. I converted it to assert.
Post by Richard Sandiford
+ && (reload_hard_regno
+ = find_hard_regno_for (reload_regno,
+ &reload_cost, -1)) >= 0
+ && (lra_hard_reg_set_intersection_p
+ (reload_hard_regno, PSEUDO_REGNO_MODE (reload_regno),
+ spilled_hard_regs)))
+ {
+ if (lra_dump_file != NULL)
+ fprintf (lra_dump_file, " assign %d(cost=%d)",
+ reload_regno, reload_cost);
+ assign_temporarily (reload_regno, reload_hard_regno);
+ cost += reload_cost;
It looks like registers that can be reallocated make hard_regno more
expensive (specifically by reload_cost), but registers that can't be
reallocated contribute no cost. Is that right? Seemed a little odd,
so maybe worth a comment.
Reload cost is a negative cost. It is negative base is the pseudo
frequency.
I added the comment. The better hard register is, the more negative
cost is.
Post by Richard Sandiford
Also, AIUI find_hard_regno_for is trying to allocate the register for
reload_regno on the basis that reload_regno has the same conflicts as
the current regno, and so it's only an approximation. Is that right?
Might be worth a comment if so (not least to explain why we don't commit
to this allocation if we end up choosing hard_regno).
Sorry, I did not understand what you are asking. We try to find best
pseudos to spill which helps to assign more reload pseudos (on cost
base) as possible. Pseudos for which find_hard_regno finds not spilled
hard regs are ignored as they can be assigned without spilling.
Post by Richard Sandiford
+ if (best_insn_pseudos_num > insn_pseudos_num
+ || (best_insn_pseudos_num == insn_pseudos_num
+ && best_cost > cost))
Should we check the register bank and levelling here too,
for consistency?
As I remember I had a quick check of it on a few tests but did not find
a difference in generated code. I think that is because the probability
of the same cost is smaller than in assigning code as we usually spill
different cost pseudos.
I think we should try it on bigger tests and add such code if it is
worth, or write a comment about this. I'll put it on my todo list.
I'll work on it later on the branch.
Post by Richard Sandiford
+ /* Restore the live hard reg pseudo info for spilled pseudos. */
+ EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
+ update_lives (spill_regno, false);
I couldn't tell why this was outside the "hard_regno >= 0" condition.
Do we really change these registers even if find_hard_regno_for fails?
The first we spill some pseudo and then call find_hard_regno_for. So we
should restore the state before spilling independently on success of
find_hard_regno_for.
Post by Richard Sandiford
+ /* Spill: */
+ EXECUTE_IF_SET_IN_BITMAP (&best_spill_pseudos_bitmap, 0, spill_regno, bi)
Very minor, but I think it'd be worth asserting that best_hard_regno >= 0
before this loop.
Unfortunately, in very rare cases, best_hard_regno can be < 0. That is
why we have two iteration for assignment of reload pseudos (see comment
for 2nd iter for reload pseudo assignments.

I've added a comment for the function that it can return negative value.
Post by Richard Sandiford
+/* Constraint transformation can use equivalences and they can
+ contains pseudos assigned to hard registers. Such equivalence
+ usage might create new conflicts of pseudos with hard registers
+ (like ones used for parameter passing or call clobbered ones) or
+ other pseudos assigned to the same hard registers. Another very
+ rare risky transformation is restoring whole multi-register pseudo
+ when only one subreg lives and unused hard register is used already
+ for something else.
/* The constraints pass is allowed to create equivalences between
pseudos that make the current allocation "incorrect" (in the sense
that pseudos are assigned to hard registers from their own conflict sets).
The global variable lra_risky_transformations_p says whether this might
have happened.
if that's accurate. The detail about when this occurs probably
belongs above lra_risky_transformations_p, although it's mostly
there already. (Haven't got to the ira-conflicts.c stuff yet,
so no comments about that here.)
Ok. Fixed. It looks better to me. But more important if it looks better
to you because you have a fresh look.
Post by Richard Sandiford
+ Process pseudos assigned to hard registers (most frequently used
+ first), spill if a conflict is found, and mark the spilled pseudos
+ in SPILLED_PSEUDO_BITMAP. Set up LIVE_HARD_REG_PSEUDOS from
+ pseudos, assigned to hard registers. */
Why do we spill the most frequently used registers first? Probably worth
a comment.
It should be less frequently pseudos as an intuitive heuristic. Although
it does matter as even on all_cp2k_fortran.f90 (500K lines test) I did
not find that the order affect the generated code. I fixed the code and
the comment.
Post by Richard Sandiford
+ for (n = 0, i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
+ if (reg_renumber[i] >= 0 && lra_reg_info[i].nrefs > 0)
+ {
+ if (lra_risky_transformations_p)
+ sorted_pseudos[n++] = i;
+ else
+ update_lives (i, false);
+ }
+ if (! lra_risky_transformations_p)
+ return;
Seems like this would be more logically split into two (the
lra_risky_transformations_p case and the !lra_risky_transformations_p case).
Fixed.
Post by Richard Sandiford
+ /* If it is multi-register pseudos they should start on
+ the same hard register. */
+ || hard_regno != reg_renumber[conflict_regno])
This seems different from the find_hard_regno_for case, which took
biggest_mode into account.
I think we should use the common code. So I'll fix it with the big
endian problem.
Post by Richard Sandiford
+ /* Don't change reload pseudo allocation. It might have
+ this allocation for a purpose (e.g. bound to another
+ pseudo) and changing it can result in LRA cycling. */
+ if (another_regno < lra_constraint_new_regno_start
+ && (another_hard_regno = reg_renumber[another_regno]) >= 0
+ && another_hard_regno != hard_regno)
Seems like this excludes split pseudos as well as reload pseudos,
or are they never included in these copies? Might be worth mentioning
them either way.
I fixed it.
Post by Richard Sandiford
The only general comment I have so far is that it's sometimes
difficult to follow which types of pseudos are being included
or excluded by a comparison with lra_constraint_new_regno_start.
Sometimes the comments talk about "reload pseudos", but other
similar checks imply that the registers could be inheritance
pseudos or split pseudos as well. Some thin inline wrappers
might help here.
Inheritance, split, reload pseudos created since last constraint pass >=
lra_constraint_new_regno_start.
Inheritance and split pseudos created on any pass are in the
corresponding bitmaps.
Inheritance and split pseudos since the last constraint pass has also
restore_regno >= 0 until split or inheritance transformations are done.

I am putting the comment about this at the top of the file.
Post by Richard Sandiford
+ /* Remember that reload pseudos can be spilled on the
+ 1st pass. */
+ bitmap_clear_bit (&all_spilled_pseudos, regno);
+ assign_hard_regno (hard_regno, regno);
/* This register might have been spilled by the previous pass.
Indicate that it is no longer spilled. */
Fixed.
Post by Richard Sandiford
+ /* We can use inheritance pseudos in original insns
+ (not reload ones). */
+ if (regno < lra_constraint_new_regno_start
+ || bitmap_bit_p (&lra_inheritance_pseudos, regno)
+ || reg_renumber[regno] < 0)
+ continue;
+ sorted_pseudos[nfails++] = regno;
+ if (lra_dump_file != NULL)
+ fprintf (lra_dump_file,
+ " Spill reload r%d(hr=%d, freq=%d)\n",
+ regno, reg_renumber[regno],
+ lra_reg_info[regno].freq);
Same comment about types of pseudo as above. (I.e. the code checks for
inheritance pseudos, but not split pseudos.)
I modified the comment to

/* A reload pseudo did not get a hard register on the
first iteration because of the conflict with
another reload pseudos in the same insn. So we
consider only reload pseudos assigned to hard
registers. We shall exclude inheritance pseudos as
they can occur in original insns (not reload ones).
We can omit the check for split pseudos because
they occur only in move insns containing non-reload
pseudos. */

I hope it explains the code.
Post by Richard Sandiford
+ bitmap_initialize (&do_not_assign_nonreload_pseudos, &reg_obstack);
+ EXECUTE_IF_SET_IN_BITMAP (&lra_inheritance_pseudos, 0, u, bi)
+ if ((restore_regno = lra_reg_info[u].restore_regno) >= 0
+ && reg_renumber[u] < 0 && bitmap_bit_p (&lra_inheritance_pseudos, u))
+ bitmap_set_bit (&do_not_assign_nonreload_pseudos, restore_regno);
+ EXECUTE_IF_SET_IN_BITMAP (&lra_split_pseudos, 0, u, bi)
+ if ((restore_regno = lra_reg_info[u].restore_regno) >= 0
+ && reg_renumber[u] >= 0 && bitmap_bit_p (&lra_split_pseudos, u))
+ bitmap_set_bit (&do_not_assign_nonreload_pseudos, restore_regno);
Fixed.
Post by Richard Sandiford
+ for (n = 0, i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
+ if (((i < lra_constraint_new_regno_start
+ && ! bitmap_bit_p (&do_not_assign_nonreload_pseudos, i))
+ || (bitmap_bit_p (&lra_inheritance_pseudos, i)
+ && lra_reg_info[i].restore_regno >= 0)
+ || (bitmap_bit_p (&lra_split_pseudos, i)
+ && lra_reg_info[i].restore_regno >= 0)
+ || bitmap_bit_p (&lra_optional_reload_pseudos, i))
+ && reg_renumber[i] < 0 && lra_reg_info[i].nrefs != 0
+ && regno_allocno_class_array[i] != NO_REGS)
+ sorted_pseudos[n++] = i;
+ bitmap_clear (&do_not_assign_nonreload_pseudos);
where we test very similar things inline, and then clear
do_not_assign_nonreload_pseudos. Do we need d_n_a_n_p at all?
No the code is right. We still need d_n_a_n_p as we can not easily
calculate from what pseudo the given pseudo was inherited or split. It
is different from the loop where new inheritance and split pseudos are
checked (it is different from their origins marked in d_n_a_n_p.
Post by Richard Sandiford
+ if (n != 0 && lra_dump_file != NULL)
+ fprintf (lra_dump_file, " Reassing non-reload pseudos\n");
"Reassigning"
Fixed. :)

Richard, thank you for the review. This is the most useful review I
ever had. There are a lot of code can be simplified. I had numerous
experiments with this code (and my be with code in lra-constraints.c)
during my work on LRA. Therefore it can contains some artifacts from
these experiments.
Richard Sandiford
2012-10-12 16:13:09 UTC
Permalink
Post by Vladimir Makarov
Post by Richard Sandiford
+/* Info about pseudo used during the assignment pass. Thread is a set
+ of connected reload and inheritance pseudos with the same set of
+ available hard reg set. Thread is a pseudo itself for other
+ cases. */
+struct regno_assign_info
/* Information about the thread to which a pseudo belongs. Threads are
a set of connected reload and inheritance pseudos with the same set of
available hard registers. Lone registers belong to their own threads. */
Fixed.
Post by Richard Sandiford
+ && (ira_class_hard_regs_num[regno_allocno_class_array[regno1]]
+ == ira_class_hard_regs_num[regno_allocno_class_array[regno2]]))
i.e. the same _number_ of available hard regs, but not necessarily the
same set.
It should be the same in most cases. This condition is just faster
approximation of the same available hard reg set.
The distinction does seem important though. It's possible that
a target has two distinct register files of the same allocatable size.
Would something like:

(ira_class_subset_p[class1][class2]
&& ira_class_subset_p[class2][class1])

work instead?
Post by Vladimir Makarov
Post by Richard Sandiford
/* Update the preference for using HARD_REGNO for pseudos that are
connected directly or indirectly with REGNO. Apply divisor DIV
to any preference adjustments.
The more indirectly a pseudo is connected, the smaller its effect
should be. We therefore increase DIV on each "hop". */
Fixed. By the way, it is your invention from IRA.
Heh, I'd forgotten all about that.
Post by Vladimir Makarov
Post by Richard Sandiford
+ /* We are trying to spill a reload pseudo. That is wrong we
+ should assign all reload pseudos, otherwise we cannot reuse
+ the selected alternatives. */
+ hard_regno = find_hard_regno_for (regno, &cost, -1);
+ if (hard_regno >= 0)
+ {
Don't really understand this comment, sorry.
It removed the comment. It is from an old solution code trying to
guarantee assignment to the reload pseudo.
Post by Richard Sandiford
Also, why are we passing -1 to find_hard_regno_for, rather than hard_regno?
The loop body up till this point has been specifically freeing up registers
to make hard_regno allocatable. I realise that, by spilling everything
that overlaps this range, we might have freed up other registers too,
and so made others besides hard_regno allocatable. But wouldn't we want
to test those other hard registers in "their" iteration of the loop
instead of this one? The spills in those iterations ought to be more
directed (i.e. there should be less incidental spilling).
As things stand, doing an rclass_size * rclass_size scan seems
unnecessarily expensive, although probably off the radar.
We cannot just pass hard_regno for multi-word pseudo when hard_regno-1
is already free.
But this call is in a loop that iterates over all registers in the class:

for (i = 0; i < rclass_size; i++)
{
hard_regno = ira_class_hard_regs[rclass][i];

and we reach the find_hard_regno_for call unless there is some
conflicting register that we cannot spill. So if "hard_regno - 1"
belongs to the allocation class and is a viable choice, "its" iteration
of the loop would spill specifically for "hard_regno - 1" and get the
most accurate cost for that register. I couldn't see why any other
iteration of the loop would want to consider "hard_regno - 1".
Post by Vladimir Makarov
You are right about possibility to speed up the code, although on
profiles I looked (including the last huge tests) spill_for and
find_hard_regno_for called from takes few time. That is probably because
you don't need spill frequently. Freeing one long live range pseudo
permits to find hard regno without spilling for many short live pseudos
(reload and inheritance ones).
Also loop rclass_size * rclass_size is not expensive, the preparation of
data for the loop is expensive.
OK, in that case maybe the efficiency concern wasn't justified.
FWIW, I still think passing hard_regno would be clearer though,
in terms of meeting expectations. It just seems odd to spill for
one specific register and then test all of them. Especially when the
spilling we actually do after choosing register X is based on "X's"
iteration of this loop.

(I realise I could well be missing the point here though, sorry.)
Post by Vladimir Makarov
Post by Richard Sandiford
+ && (reload_hard_regno
+ = find_hard_regno_for (reload_regno,
+ &reload_cost, -1)) >= 0
+ && (lra_hard_reg_set_intersection_p
+ (reload_hard_regno, PSEUDO_REGNO_MODE (reload_regno),
+ spilled_hard_regs)))
+ {
+ if (lra_dump_file != NULL)
+ fprintf (lra_dump_file, " assign %d(cost=%d)",
+ reload_regno, reload_cost);
+ assign_temporarily (reload_regno, reload_hard_regno);
+ cost += reload_cost;
It looks like registers that can be reallocated make hard_regno more
expensive (specifically by reload_cost), but registers that can't be
reallocated contribute no cost. Is that right? Seemed a little odd,
so maybe worth a comment.
Reload cost is a negative cost. It is negative base is the pseudo
frequency.
Ah! Missed that, sorry.
Post by Vladimir Makarov
I added the comment. The better hard register is, the more negative
cost is.
Thanks.
Post by Vladimir Makarov
Post by Richard Sandiford
Also, AIUI find_hard_regno_for is trying to allocate the register for
reload_regno on the basis that reload_regno has the same conflicts as
the current regno, and so it's only an approximation. Is that right?
Might be worth a comment if so (not least to explain why we don't commit
to this allocation if we end up choosing hard_regno).
Sorry, I did not understand what you are asking. We try to find best
pseudos to spill which helps to assign more reload pseudos (on cost
base) as possible. Pseudos for which find_hard_regno finds not spilled
hard regs are ignored as they can be assigned without spilling.
Yeah, sorry, ignore this. I think I'd taken a break here and then
forgotten that find_hard_regno_for works out the conflicts set for
the register itself. I think I'd assumed when writing it that
find_hard_regno_for used some state that had been precalculated
before the main "for (i = 0; i < rclass_size; i++)" loop.
Post by Vladimir Makarov
Post by Richard Sandiford
+ /* Restore the live hard reg pseudo info for spilled pseudos. */
+ EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
+ update_lives (spill_regno, false);
I couldn't tell why this was outside the "hard_regno >= 0" condition.
Do we really change these registers even if find_hard_regno_for fails?
The first we spill some pseudo and then call find_hard_regno_for. So we
should restore the state before spilling independently on success of
find_hard_regno_for.
Yeah, ignore this too :-)
Post by Vladimir Makarov
Post by Richard Sandiford
+ /* Spill: */
+ EXECUTE_IF_SET_IN_BITMAP (&best_spill_pseudos_bitmap, 0, spill_regno, bi)
Very minor, but I think it'd be worth asserting that best_hard_regno >= 0
before this loop.
Unfortunately, in very rare cases, best_hard_regno can be < 0. That is
why we have two iteration for assignment of reload pseudos (see comment
for 2nd iter for reload pseudo assignments.
I've added a comment for the function that it can return negative value.
OK, thanks.
Post by Vladimir Makarov
Post by Richard Sandiford
The only general comment I have so far is that it's sometimes
difficult to follow which types of pseudos are being included
or excluded by a comparison with lra_constraint_new_regno_start.
Sometimes the comments talk about "reload pseudos", but other
similar checks imply that the registers could be inheritance
pseudos or split pseudos as well. Some thin inline wrappers
might help here.
Inheritance, split, reload pseudos created since last constraint pass >=
lra_constraint_new_regno_start.
Inheritance and split pseudos created on any pass are in the
corresponding bitmaps.
Inheritance and split pseudos since the last constraint pass has also
restore_regno >= 0 until split or inheritance transformations are done.
OK. What prompted this was that some comments refer specifically to
"reload pseudo" whereas the accompanying code simply checks against
lra_constraint_new_regno_start. It then wasn't obvious whether the code
really did just include "reload pseudos" in what I thought was the
strict sense -- e.g. because no other type of LRA-created pseudo could
occur in that context, so there was no point checking anything else --
or whether the code was actually handling inheritance and split pseudos too.

Maybe it would help to have a term to refer all four of:

- reload pseudos
- optional reload pseudos
- inheritance pseudos
- split pseudos

although I won't suggest one because I'm useless at naming things.
Post by Vladimir Makarov
Post by Richard Sandiford
+ /* We can use inheritance pseudos in original insns
+ (not reload ones). */
+ if (regno < lra_constraint_new_regno_start
+ || bitmap_bit_p (&lra_inheritance_pseudos, regno)
+ || reg_renumber[regno] < 0)
+ continue;
+ sorted_pseudos[nfails++] = regno;
+ if (lra_dump_file != NULL)
+ fprintf (lra_dump_file,
+ " Spill reload r%d(hr=%d, freq=%d)\n",
+ regno, reg_renumber[regno],
+ lra_reg_info[regno].freq);
Same comment about types of pseudo as above. (I.e. the code checks for
inheritance pseudos, but not split pseudos.)
I modified the comment to
/* A reload pseudo did not get a hard register on the
first iteration because of the conflict with
another reload pseudos in the same insn. So we
consider only reload pseudos assigned to hard
registers. We shall exclude inheritance pseudos as
they can occur in original insns (not reload ones).
We can omit the check for split pseudos because
they occur only in move insns containing non-reload
pseudos. */
I hope it explains the code.
Yes, thanks.
Post by Vladimir Makarov
Post by Richard Sandiford
+ for (n = 0, i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
+ if (((i < lra_constraint_new_regno_start
+ && ! bitmap_bit_p (&do_not_assign_nonreload_pseudos, i))
+ || (bitmap_bit_p (&lra_inheritance_pseudos, i)
+ && lra_reg_info[i].restore_regno >= 0)
+ || (bitmap_bit_p (&lra_split_pseudos, i)
+ && lra_reg_info[i].restore_regno >= 0)
+ || bitmap_bit_p (&lra_optional_reload_pseudos, i))
+ && reg_renumber[i] < 0 && lra_reg_info[i].nrefs != 0
+ && regno_allocno_class_array[i] != NO_REGS)
+ sorted_pseudos[n++] = i;
+ bitmap_clear (&do_not_assign_nonreload_pseudos);
where we test very similar things inline, and then clear
do_not_assign_nonreload_pseudos. Do we need d_n_a_n_p at all?
No the code is right. We still need d_n_a_n_p as we can not easily
calculate from what pseudo the given pseudo was inherited or split. It
is different from the loop where new inheritance and split pseudos are
checked (it is different from their origins marked in d_n_a_n_p.
Yeah, sorry about that. I misread the indices.

Richard
Richard Sandiford
2012-10-10 15:31:44 UTC
Permalink
Hi Vlad,

Here's a review of the first part of ira-constraints.c. It's a big file,
and this is a long reply, so I'll send further comments another day in
another message.
+/* This file contains code for 3 passes: constraint pass,
+ inheritance/split pass, and pass for undoing failed inheritance and
+ split.
+
+ The major goal of constraint pass is to transform RTL to satisfy
+ o choosing insn alternatives;
+ o generating *reload insns* (or reloads in brief) and *reload
+ pseudos* which will got necessary hard registers later;
s/got/get/
+ o substituting pseudo equivalences (if it is done once, is done
+ everywhere) and removes insns initializing used equivalent
+ substitution.
Suggest:

o substituting pseudos with equivalent values and removing the
instructions that initialized those pseudos.
+ To speed the pass up we process only necessary insns (first time
+ all insns) and reuse of already chosen alternatives in some
+ cases.
Suggest:

On the first iteration of the pass we process every instruction and
choose an alternative for each one. On subsequent iterations we try
to avoid reprocessing instructions if we can be sure that the old
choice is still valid.
+ The inheritance/spilt pass is to transform code to achieve
+ ineheritance and live range splitting. It is done on backward
+ traverse of EBBs.
Typo: inheritance. "backward traversal".
+ The inheritance optimization goal is to reuse values in hard
+ registers. There is analogous optimization in old reload pass. The
+
+ reload_p1 <- p reload_p1 <- p
+ ... new_p <- reload_p1
+ ... => ...
+ reload_p2 <- p reload_p2 <- new_p
+
+ where p is spilled and not changed between the insns. Reload_p1 is
+ also called *original pseudo* and new_p is called *inheritance
+ pseudo*.
+
+ The subsequent assignment pass will try to assign the same (or
+ another if it is not possible) hard register to new_p as to
+ reload_p1 or reload_p2.
+
+ If it fails to assign a hard register, the opposite transformation
+ will restore the original code on (the pass called undoing
+ inheritance) because with spilled new_p the code would be much
+ worse. [...]
Maybe:

If the assignment pass fails to assign a hard register to new_p,
this file will undo the inheritance and restore the original code.
This is because implementing the above sequence with a spilled
new_p would make the code much worse.
+ Splitting (transformation) is also done in EBB scope on the same
+
+ r <- ... or ... <- r r <- ... or ... <- r
+ ... s <- r (new insn -- save)
+ ... =>
+ ... r <- s (new insn -- restore)
+ ... <- r ... <- r
+
+ The *split pseudo* s is assigned to the hard register of the
+ original pseudo or hard register r.
+
+ o In EBBs with high register pressure for global pseudos (living
+ in at least 2 BBs) and assigned to hard registers when there
+ are more one reloads needing the hard registers;
+ o for pseudos needing save/restore code around calls.
+
+ If the split pseudo still has the same hard register as the
+ original pseudo after the subsequent assignment pass, the opposite
+ transformation is done on the same pass for undoing inheritance. */
AIUI spill_for can spill split pseudos. I think the comment should say
what happens then. If I understand the code correctly, we keep the
split if "r" is a hard register or was assigned a hard register.
We undo it if "r" was not assigned a hard register. Is that right?
+/* Array whose element is (MEM:MODE BASE_REG) corresponding to the
+ mode (index) and where BASE_REG is a base hard register for given
+ memory mode. */
+static rtx indirect_mem[MAX_MACHINE_MODE];
Maybe:

/* Index M is an rtx of the form (mem:M BASE_REG), where BASE_REG
is a sample hard register that is a valid address for mode M.
The memory refers to the generic address space. */
+/* Return class of hard regno of REGNO or if it is was not assigned to
+ a hard register, return its allocno class but only for reload
+ pseudos created on the current constraint pass. Otherwise, return
+ NO_REGS. */
+static enum reg_class
+get_reg_class (int regno)
Maybe:

/* If REGNO is a hard register or has been allocated a hard register,
return the class of that register. If REGNO is a pseudo created
by the current constraints pass, assume that it will be allocated
a hard register and return the class that that register will have.
(This assumption is optimistic when REGNO is an inheritance or
split pseudo.) Return NO_REGS otherwise. */

if that's accurate. I dropped the term "reload pseudo" because of
the general comment in my earlier reply about the use of "reload pseudo"
when the code seems to include inheritance and split pseudos too.
+/* Return true if REGNO in REG_MODE satisfies reg class constraint CL.
+ For new reload pseudos we should make more accurate class
+ *NEW_CLASS (we set up it if it is not NULL) to satisfy the
+ constraints. Otherwise, set up NEW_CLASS to NO_REGS. */
+static bool
+in_class_p (int regno, enum machine_mode reg_mode,
+ enum reg_class cl, enum reg_class *new_class)
Same comment here, since it uses get_reg_class. I.e. for registers >=
new_regno_start, we're really testing whether the first allocatable
register in REGNO's allocno class satisfies CL.

Also, the only caller that doesn't directly pass REGNO and REG_MODE
+ if (new_class != NULL)
+ *new_class = NO_REGS;
+ if (regno < FIRST_PSEUDO_REGISTER)
+ return TEST_HARD_REG_BIT (reg_class_contents[cl], regno);
+ rclass = get_reg_class (regno);
+ final_regno = regno = REGNO (reg);
+ if (regno < FIRST_PSEUDO_REGISTER)
+ {
+ rtx final_reg = reg;
+ rtx *final_loc = &final_reg;
+
+ lra_eliminate_reg_if_possible (final_loc);
+ final_regno = REGNO (*final_loc);
+ }
I.e. process_addr_reg applies eliminations before testing whereas
in_class_p doesn't. I couldn't really tell why the two were different.
Since the idea is that we use elimination source registers to represent
their targets, shouldn't in_class_p eliminate too?

With that difference removed, in_class_p could take the rtx instead
of a (REGNO, MODE) pair. It could then pass that rtx directly to
lra_eliminate_reg_if_possible. I think this would lead to a cleaner
interface and make things more regular.

Then the comment for in_class_p could be:

/* Return true if X satisfies (or will satisfy) reg class constraint CL.
If X is a pseudo created by this constraints pass, assume that it will
be allocated a hard register from its allocno class, but allow that
class to be narrowed to CL if it is currently a superset of CL.

If NEW_CLASS is nonnull, set *NEW_CLASS to the new allocno class
of REGNO (X), or NO_REGS if no change in its class was needed. */

That's a change in the meaning of NEW_CLASS, but seems easier for
+ common_class = ira_reg_class_subset[rclass][cl];
+ if (new_class != NULL)
+ *new_class = common_class;
to:

common_class = ira_reg_class_subset[rclass][cl];
if (new_class != NULL && rclass != common_class)
*new_class = common_class;
+ if (regno < new_regno_start
+ /* Do not make more accurate class from reloads generated. They
+ are mostly moves with a lot of constraints. Making more
+ accurate class may results in very narrow class and
+ impossibility of find registers for several reloads of one
+ insn. */
Maybe:

/* Do not allow the constraints for reload instructions to
influence the classes of new pseudos. These reloads are
typically moves that have many alternatives, and restricting
reload pseudos for one alternative may lead to situations
where other reload pseudos are no longer allocatable. */
+ || INSN_UID (curr_insn) >= new_insn_uid_start)
+ return ((regno >= new_regno_start && rclass == ALL_REGS)
+ || (rclass != NO_REGS && ira_class_subset_p[rclass][cl]
+ && ! hard_reg_set_subset_p (reg_class_contents[cl],
+ lra_no_alloc_regs)));
Why the ALL_REGS special case? I think it deserves a comment.
+/* Return the defined and profitable equiv substitution of reg X, return
+ X otherwise. */
Maybe:

/* If we have decided to substitute X with another value, return that value,
otherwise return X. */
+/* Change class of pseudo REGNO to NEW_CLASS. Print info about it
+ using TITLE. Output a new line if NL_P. */
+static void
+change_class (int regno, enum reg_class new_class,
+ const char *title, bool nl_p)
+{
+ if (lra_dump_file != NULL)
+ fprintf (lra_dump_file, "%s to class %s for r%d",
+ title, reg_class_names[new_class], regno);
+ setup_reg_classes (regno, new_class, NO_REGS, new_class);
+ if (lra_dump_file != NULL && nl_p)
+ fprintf (lra_dump_file, "\n");
+}
I think either this or setup_reg_classes should have an assert
that REGNO is >= FIRST_PSEUDO_REGISTER. This matters more now
because a lot of LRA deals with hard and pseudo registers
side-by-side.
+/* Create a new pseudo using MODE, RCLASS, ORIGINAL, TITLE or reuse
+ already created input reload pseudo (only if TYPE is not OP_OUT).
+ The result pseudo is returned through RESULT_REG. Return TRUE if
+ we created a new pseudo, FALSE if we reused the already created
+ input reload pseudo. */
Maybe:

/* Store in *RESULT_REG a register for reloading ORIGINAL, which has
mode MODE. TYPE specifies the direction of the reload -- either OP_IN
or OP_OUT -- and RCLASS specifies the class of hard register required.

Try to reuse existing input reloads where possible. Return true if
*RESULT_REG is a new register, false if it is an existing one.
Use TITLE to describe new registers for debug purposes. */

although I admit that's a bit convoluted...
+ for (i = 0; i < curr_insn_input_reloads_num; i++)
+ if (rtx_equal_p (curr_insn_input_reloads[i].input, original))
+ break;
+ if (i >= curr_insn_input_reloads_num
+ || ! in_class_p (REGNO (curr_insn_input_reloads[i].reg),
+ GET_MODE (curr_insn_input_reloads[i].reg),
+ rclass, &new_class))
+ {
+ res_p = true;
+ *result_reg = lra_create_new_reg (mode, original, rclass, title);
+ }
+ else
+ {
+ lra_assert (! side_effects_p (original));
+ res_p = false;
+ *result_reg = curr_insn_input_reloads[i].reg;
+ regno = REGNO (*result_reg);
+ if (lra_dump_file != NULL)
+ {
+ fprintf (lra_dump_file, " Reuse r%d for reload ", regno);
+ print_value_slim (lra_dump_file, original, 1);
+ }
+ if (rclass != new_class)
+ change_class (regno, new_class, ", change", false);
+ if (lra_dump_file != NULL)
+ fprintf (lra_dump_file, "\n");
+ }
+ lra_assert (curr_insn_input_reloads_num < LRA_MAX_INSN_RELOADS);
+ curr_insn_input_reloads[curr_insn_input_reloads_num].input = original;
+ curr_insn_input_reloads[curr_insn_input_reloads_num++].reg = *result_reg;
+ return res_p;
It probably doesn't matter in practice, but I think this would
be better as:

for (i = 0; i < curr_insn_input_reloads_num; i++)
if (rtx_equal_p (curr_insn_input_reloads[i].input, original)
&& in_class_p (curr_insn_input_reloads[i].reg, rclass, &new_class))
{
...reuse case..
return false;
}
...new case...
return true;

which also copes with the unlikely case that the same input is used
three times, and that the third use requires the same class as the
second.
+/* The page contains code to extract memory address parts. */
+
+/* Info about base and index regs of an address. In some rare cases,
+ base/index register can be actually memory. In this case we will
+ reload it. */
+struct address
+{
+ rtx *base_reg_loc; /* NULL if there is no a base register. */
+ rtx *base_reg_loc2; /* Second location of {post/pre}_modify, NULL
+ otherwise. */
+ rtx *index_reg_loc; /* NULL if there is no an index register. */
+ rtx *index_loc; /* location of index reg * scale or index_reg_loc
+ otherwise. */
+ rtx *disp_loc; /* NULL if there is no a displacement. */
+ /* Defined if base_reg_loc is not NULL. */
+ enum rtx_code base_outer_code, index_code;
+ /* True if the base register is modified in the address, for
+ example, in PRE_INC. */
+ bool base_modify_p;
+};
Comments should be consistently above the fields rather than to the right.
+/* Process address part in space AS (or all address if TOP_P) with
+ location *LOC to extract address characteristics.
+
+ If CONTEXT_P is false, we are looking at the base part of an
+ address, otherwise we are looking at the index part.
+
+ MODE is the mode of the memory reference; OUTER_CODE and INDEX_CODE
+ give the context that the rtx appears in; MODIFY_P if *LOC is
+ modified. */
+static void
+extract_loc_address_regs (bool top_p, enum machine_mode mode, addr_space_t as,
+ rtx *loc, bool context_p, enum rtx_code outer_code,
+ enum rtx_code index_code,
+ bool modify_p, struct address *ad)
+{
+ rtx x = *loc;
+ enum rtx_code code = GET_CODE (x);
+ bool base_ok_p;
+
+ switch (code)
+ {
+ if (! context_p)
+ ad->disp_loc = loc;
This looks a bit odd. I assume it's trying to avoid treating MULT
scale factors as displacements, but I thought whether something was
a displacement or not depended on whether it is involved (possibly
indirectly) in a sum with the base. Seems like it'd be better
to check for that directly.
+ /* If this machine only allows one register per address, it
+ must be in the first operand. */
+ if (MAX_REGS_PER_ADDRESS == 1 || code == LO_SUM)
+ {
+ extract_loc_address_regs (false, mode, as, arg0_loc, false, code,
+ code1, modify_p, ad);
+ ad->disp_loc = arg1_loc;
+ }
+ /* If index and base registers are the same on this machine,
+ just record registers in any non-constant operands. We
+ assume here, as well as in the tests below, that all
+ addresses are in canonical form. */
+ else if (INDEX_REG_CLASS
+ == base_reg_class (VOIDmode, as, PLUS, SCRATCH)
+ && code0 != PLUS && code0 != MULT)
+ {
+ extract_loc_address_regs (false, mode, as, arg0_loc, false, PLUS,
+ code1, modify_p, ad);
+ if (! CONSTANT_P (arg1))
+ extract_loc_address_regs (false, mode, as, arg1_loc, true, PLUS,
+ code0, modify_p, ad);
+ else
+ ad->disp_loc = arg1_loc;
+ }
+
+ /* If the second operand is a constant integer, it doesn't
+ change what class the first operand must be. */
+ else if (code1 == CONST_INT || code1 == CONST_DOUBLE)
+ {
+ ad->disp_loc = arg1_loc;
+ extract_loc_address_regs (false, mode, as, arg0_loc, context_p,
+ PLUS, code1, modify_p, ad);
+ }
+ /* If the second operand is a symbolic constant, the first
+ operand must be an index register but only if this part is
+ all the address. */
+ else if (code1 == SYMBOL_REF || code1 == CONST || code1 == LABEL_REF)
+ {
+ ad->disp_loc = arg1_loc;
+ extract_loc_address_regs (false, mode, as, arg0_loc,
+ top_p ? true : context_p, PLUS, code1,
+ modify_p, ad);
+ }
What's the reason for the distinction between the last two, which AIUI
doesn't exist in reload? I'm not sure the:

top_p ? true : context_p

condition is safe: some targets use aligning addresses like
(and X (const_int -ALIGN)), but that shouldn't really affect whether
a register in X is treated as a base or an index.
+ /* If both operands are registers but one is already a hard
+ register of index or reg-base class, give the other the
+ class that the hard register is not. */
+ else if (code0 == REG && code1 == REG
+ && REGNO (arg0) < FIRST_PSEUDO_REGISTER
+ && ((base_ok_p
+ = ok_for_base_p_nonstrict (arg0, mode, as, PLUS, REG))
+ || ok_for_index_p_nonstrict (arg0)))
+ {
+ extract_loc_address_regs (false, mode, as, arg0_loc, ! base_ok_p,
+ PLUS, REG, modify_p, ad);
+ extract_loc_address_regs (false, mode, as, arg1_loc, base_ok_p,
+ PLUS, REG, modify_p, ad);
+ }
+ else if (code0 == REG && code1 == REG
+ && REGNO (arg1) < FIRST_PSEUDO_REGISTER
+ && ((base_ok_p
+ = ok_for_base_p_nonstrict (arg1, mode, as, PLUS, REG))
+ || ok_for_index_p_nonstrict (arg1)))
+ {
+ extract_loc_address_regs (false, mode, as, arg0_loc, base_ok_p,
+ PLUS, REG, modify_p, ad);
+ extract_loc_address_regs (false, mode, as, arg1_loc, ! base_ok_p,
+ PLUS, REG, modify_p, ad);
+ }
+ /* If one operand is known to be a pointer, it must be the
+ base with the other operand the index. Likewise if the
+ other operand is a MULT. */
+ else if ((code0 == REG && REG_POINTER (arg0)) || code1 == MULT)
+ {
+ extract_loc_address_regs (false, mode, as, arg0_loc, false, PLUS,
+ code1, modify_p, ad);
+ if (code1 == MULT)
+ ad->index_loc = arg1_loc;
+ extract_loc_address_regs (false, mode, as, arg1_loc, true, PLUS,
+ code0, modify_p, ad);
+ }
+ else if ((code1 == REG && REG_POINTER (arg1)) || code0 == MULT)
+ {
+ extract_loc_address_regs (false, mode, as, arg0_loc, true, PLUS,
+ code1, modify_p, ad);
+ if (code0 == MULT)
+ ad->index_loc = arg0_loc;
+ extract_loc_address_regs (false, mode, as, arg1_loc, false, PLUS,
+ code0, modify_p, ad);
+ }
Some targets care about the choice between index and base for
correctness reasons (PA IIRC) or for performance (some ppc targets IIRC),
so I'm not sure whether it's safe to give REG_POINTER such a low priority.
+ {
+ const char *fmt = GET_RTX_FORMAT (code);
+ int i;
+
+ if (GET_RTX_LENGTH (code) != 1
+ || fmt[0] != 'e' || GET_CODE (XEXP (x, 0)) != UNSPEC)
+ {
+ for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+ if (fmt[i] == 'e')
+ extract_loc_address_regs (false, mode, as, &XEXP (x, i),
+ context_p, code, SCRATCH,
+ modify_p, ad);
+ break;
+ }
+ /* fall through for case UNARY_OP (UNSPEC ...) */
+ }
+
+ if (ad->disp_loc == NULL)
+ ad->disp_loc = loc;
+ else if (ad->base_reg_loc == NULL)
+ {
+ ad->base_reg_loc = loc;
+ ad->base_outer_code = outer_code;
+ ad->index_code = index_code;
+ ad->base_modify_p = modify_p;
+ }
+ else
+ {
+ lra_assert (ad->index_reg_loc == NULL);
+ ad->index_reg_loc = loc;
+ }
+ break;
+
+ }
Which targets use a bare UNSPEC as a displacement? I thought a
displacement had to be a link-time constant, in which case it should
satisfy CONSTANT_P. For UNSPECs, that means wrapping it in a CONST.

I'm just a bit worried that the UNSPEC handling is sensitive to the
order that subrtxes are processed (unlike PLUS, which goes to some
trouble to work out what's what). It could be especially confusing
because the default case processes operands in reverse order while
PLUS processes them in forward order.

Also, which cases require the special UNARY_OP (UNSPEC ...) fallthrough?
Probably deserves a comment.

AIUI the base_reg_loc, index_reg_loc and disp_loc fields aren't just
recording where reloads of a particular class need to go (obviously
in the case of disp_loc, which isn't reloaded at all). The feidls
have semantic value too. I.e. we use them to work out the value
of at least part of the address.

In that case it seems dangerous to look through general rtxes
in the way that the default case above does. Maybe just making
sure that DISP_LOC is involved in a sum with the base would be
enough, but another idea was:

----------------------------------------------------------------
I know of three ways of "mutating" (for want of a better word)
an address:

1. (and X (const_int X)), to align
2. a subreg
3. a unary operator (such as truncation or extension)

So maybe we could:

a. remove outer mutations (using a helper function)
b. handle LO_SUM, PRE_*, POST_*: as now
c. otherwise treat the address of the sum of one, two or three pieces.
c1. Peel mutations of all pieces.
c2. Classify the pieces into base, index and displacement.
This would be similar to the jousting code above, but hopefully
easier because all three rtxes are to hand. E.g. we could
do the base vs. index thing in a similar way to
commutative_operand_precedence.
c3. Record which pieces were mutated (e.g. using something like the
index_loc vs. index_reg_loc distinction in the current code)

That should be general enough for current targets, but if it isn't,
we could generalise it further when we know what generalisation is needed.

That's still going to be a fair amount of code, but hopefully not more,
and we might have more confidence at each stage what each value is.
And it avoids the risk of treating "mutated" addresses as "unmutated" ones.
----------------------------------------------------------------

Just an idea though. Probably not for 4.8, although I might try it
if I find time.

It would be nice to sort out the disp_loc thing for 4.8 though.
+/* Extract address characteristics in address with location *LOC in
+ space AS. Return them in AD. Parameter OUTER_CODE for MEM should
+ be MEM. Parameter OUTER_CODE for 'p' constraint should be ADDRESS
+ and MEM_MODE should be VOIDmode. */
Maybe:

/* Describe address *LOC in AD. There are two cases:

- *LOC is the address in a (mem ...). In this case OUTER_CODE is MEM
and AS is the mem's address space.

- *LOC is matched to an address constraint such as 'p'. In this case
OUTER_CODE is ADDRESS and AS is ADDR_SPACE_GENERIC. */
+/* Return start register offset of hard register REGNO in MODE. */
+int
+lra_constraint_offset (int regno, enum machine_mode mode)
+{
+ lra_assert (regno < FIRST_PSEUDO_REGISTER);
+ /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
+ multiple hard register group of scalar integer registers, so that
+ for example (reg:DI 0) and (reg:SI 1) will be considered the same
+ register. */
+ if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (mode) > UNITS_PER_WORD
+ && SCALAR_INT_MODE_P (mode))
+ return hard_regno_nregs[regno][mode] - 1;
+ return 0;
+}
Maybe the head comment could be:

/* Return the offset from REGNO of the least significant register
in (reg:MODE REGNO).

This function is used to tell whether two registers satisfy
a matching constraint. (reg:MODE1 REGNO1) matches (reg:MODE2 REGNO2) if:

REGNO1 + lra_constraint_offset (REGNO1, MODE1)
== REGNO2 + lra_constraint_offset (REGNO2, MODE2) */

(and remove the inner comment).
+/* Like rtx_equal_p except that it allows a REG and a SUBREG to match
+ if they are the same hard reg, and has special hacks for
+ auto-increment and auto-decrement. This is specifically intended for
+ process_alt_operands to use in determining whether two operands
+ match. X is the operand whose number is the lower of the two.
+
+ It is supposed that X is the output operand and Y is the input
+ operand. */
+static bool
+operands_match_p (rtx x, rtx y, int y_hard_regno)
Need to say what Y_HARD_REGNO is.
+ switch (code)
+ {
+ val = operands_match_p (XVECEXP (x, i, j), XVECEXP (y, i, j),
+ y_hard_regno);
+ if (val == 0)
+ return false;
Why do we pass the old y_hard_regno even though Y has changed?
Some of the earlier code assumes that GET_MODE (y) is the mode
of y_hard_regno.
+/* Reload pseudos created for matched input and output reloads whose
+ mode are different. Such pseudos has a modified rules for finding
+ their living ranges, e.g. assigning to subreg of such pseudo means
+ changing all pseudo value. */
+bitmap_head lra_bound_pseudos;
Maybe:

/* Reload pseudos created for matched input and output reloads whose
modes are different. Such pseudos have different live ranges from
other pseudos; e.g. any assignment to a subreg of these pseudos
changes the whole pseudo's value. */

Although that said, couldn't emit_move_insn_1 (called by gen_move_insn)
split a multiword pseudo move into two word moves? Using the traditional
clobber technique sounds better than having special liveness rules.
+/* True if C is a non-empty register class that has too few registers
+ to be safely used as a reload target class. */
+#define SMALL_REGISTER_CLASS_P(C) \
+ (reg_class_size [(C)] == 1 \
+ || (reg_class_size [(C)] >= 1 && targetm.class_likely_spilled_p (C)))
Feels like ira_class_hard_regs_num might be better, but since the
current definition is traditional, that shouldn't be a merge requirement.
+/* Return mode of WHAT inside of WHERE whose mode of the context is
+ OUTER_MODE. If WHERE does not contain WHAT, return VOIDmode. */
+static enum machine_mode
+find_mode (rtx *where, enum machine_mode outer_mode, rtx *what)
+{
+ int i, j;
+ enum machine_mode mode;
+ rtx x;
+ const char *fmt;
+ enum rtx_code code;
+
+ if (where == what)
+ return outer_mode;
+ if (*where == NULL_RTX)
+ return VOIDmode;
+ x = *where;
+ code = GET_CODE (x);
+ outer_mode = GET_MODE (x);
+ fmt = GET_RTX_FORMAT (code);
+ for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+ {
+ if (fmt[i] == 'e')
+ {
+ if ((mode = find_mode (&XEXP (x, i), outer_mode, what)) != VOIDmode)
+ return mode;
+ }
+ else if (fmt[i] == 'E')
+ {
+ for (j = XVECLEN (x, i) - 1; j >= 0; j--)
+ if ((mode = find_mode (&XVECEXP (x, i, j), outer_mode, what))
+ != VOIDmode)
+ return mode;
+ }
+ }
+ return VOIDmode;
+}
+
+/* Return mode for operand NOP of the current insn. */
+static inline enum machine_mode
+get_op_mode (int nop)
+{
+ rtx *loc;
+ enum machine_mode mode;
+ bool md_first_p = asm_noperands (PATTERN (curr_insn)) < 0;
+
+ /* Take mode from the machine description first. */
+ if (md_first_p && (mode = curr_static_id->operand[nop].mode) != VOIDmode)
+ return mode;
+ loc = curr_id->operand_loc[nop];
+ /* Take mode from the operand second. */
+ mode = GET_MODE (*loc);
+ if (mode != VOIDmode)
+ return mode;
+ if (! md_first_p && (mode = curr_static_id->operand[nop].mode) != VOIDmode)
+ return mode;
+ /* Here is a very rare case. Take mode from the context. */
+ return find_mode (&PATTERN (curr_insn), VOIDmode, loc);
+}
This looks a lot more complicated than the reload version. Why is
it needed? In reload the conditions for address operands were:

/* Address operands are reloaded in their existing mode,
no matter what is specified in the machine description. */
operand_mode[i] = GET_MODE (recog_data.operand[i]);

/* If the address is a single CONST_INT pick address mode
instead otherwise we will later not know in which mode
the reload should be performed. */
if (operand_mode[i] == VOIDmode)
operand_mode[i] = Pmode;

which for LRA might look like:

/* The mode specified in the .md file for address operands
is the mode of the addressed value, not the address itself.
We therefore need to get the mode from the operand rtx.
If the operand has no mode, assume it was Pmode. */

For other operands, recog_data.operand_mode ought to be correct.

find_mode assumes that the mode of an operand is the same as the mode of
the outer rtx, which isn't true when the outer rtx is a subreg, mem,
or one of several unary operators.

This is one that I think would be best decided for 4.8.
+/* If REG is a reload pseudo, try to make its class satisfying CL. */
+static void
+narrow_reload_pseudo_class (rtx reg, enum reg_class cl)
+{
+ int regno;
+ enum reg_class rclass;
+
+ /* Do not make more accurate class from reloads generated. They are
+ mostly moves with a lot of constraints. Making more accurate
+ class may results in very narrow class and impossibility of find
+ registers for several reloads of one insn. */
+ if (INSN_UID (curr_insn) >= new_insn_uid_start)
+ return;
+ if (GET_CODE (reg) == SUBREG)
+ reg = SUBREG_REG (reg);
+ if (! REG_P (reg) || (regno = REGNO (reg)) < new_regno_start)
+ return;
+ rclass = get_reg_class (regno);
+ rclass = ira_reg_class_subset[rclass][cl];
+ if (rclass == NO_REGS)
+ return;
+ change_class (regno, rclass, " Change", true);
+}
There seems to be an overlap in functionality with in_class_p here.
Maybe:

{
enum reg_class rclass;

if (in_class_p (reg, cl, &rclass) && rclass != NO_REGS)
change_class (REGNO (reg), rclass, " Change", true);
}

(assuming the change in in_class_p interface suggested above).
This avoids duplicating subtleties like the handling of reloads.
+ /* We create pseudo for out rtx because we always should keep
+ registers with the same original regno have synchronized
+ value (it is not true for out register but it will be
+ corrected by the next insn).
I don't understand this comment, sorry.
+ Do not reuse register because of the following situation: a <-
+ a op b, and b should be the same as a. */
This part is very convincing though :-) Maybe:

We cannot reuse the current output register because we might
have a situation like "a <- a op b", where the constraints force
the second input operand ("b") to match the output operand ("a").
"b" must then be copied into a new register so that it doesn't
clobber the current value of "a". */

We should probably keep the other reason too, of course.
+ /* Don't generate inheritance for the new register because we
+ can not use the same hard register for the corresponding
+ inheritance pseudo for input reload. */
+ bitmap_set_bit (&lra_matched_pseudos, REGNO (new_in_reg));
Suggest dropping this comment, since we don't do any inheritance here.
The comment above lra_matched_pseudos already says the same thing.
+ /* In and out operand can be got from transformations before
+ processing constraints. So the pseudos might have inaccurate
+ class and we should make their classes more accurate. */
+ narrow_reload_pseudo_class (in_rtx, goal_class);
+ narrow_reload_pseudo_class (out_rtx, goal_class);
I don't understand this, sorry. Does "transformations" mean inheritance
and reload splitting? So the registers we're changing here are inheritance
and split pseudos rather than reload pseudos created for this instruction?
If so, it sounds on face value like it conflicts with the comment quoted
above about not allowing reload instructions to the narrow the class
of pseudos. Might be worth saying why that's OK here but not there.

Also, I'm not sure I understand why it helps. Is it just trying
to encourage the pseudos to form a chain in lra-assigns.c?

E.g. MIPS16 has several instructions that require matched MIPS16 registers.
However, moves between MIPS16 registers and general registers are as cheap
as moves between two MIPS16 registers, so narrowing the reloaded values
from GENERAL_REGS to M16_REGS (if that ever happens) wouldn't necessarily
be a good thing.

Not saying this is wrong, just that it might need more commentary
to justify it.
+ for (i = 0; (in = ins[i]) >= 0; i++)
+ *curr_id->operand_loc[in] = new_in_reg;
The code assumes that all input operands have the same mode.
Probably worth asserting that here (or maybe further up; I don't mind),
just to make the assumption explicit.
+/* Return final hard regno (plus offset) which will be after
+ elimination. We do this for matching constraints because the final
+ hard regno could have a different class. */
+static int
+get_final_hard_regno (int hard_regno, int offset)
+{
+ if (hard_regno < 0)
+ return hard_regno;
+ hard_regno += offset;
+ return lra_get_elimation_hard_regno (hard_regno);
Why apply the offset before rather than after elimination?
AIUI, AVR's eliminable registers span more than one hard register,
and the elimination is based off the first.

Also, all uses but one of lra_get_hard_regno_and_offset follow
the pattern:

lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
/* The real hard regno of the operand after the allocation. */
x_hard_regno = get_final_hard_regno (x_hard_regno, offset);

so couldn't lra_get_hard_regno_and_offset just return the final
hard register, including elimination? Then it could apply the
elimination on the original rtx.

FWIW, the exception I mentioned was operands_match_p:

lra_get_hard_regno_and_offset (x, &i, &offset);
if (i < 0)
goto slow;
i += offset;

but I'm not sure why this is the only caller that would want
to ignore elimination.
+/* Return register class of OP. That is a class of the hard register
+ itself (if OP is a hard register), or class of assigned hard
+ register to the pseudo (if OP is pseudo), or allocno class of
+ unassigned pseudo (if OP is reload pseudo). Return NO_REGS
+ otherwise. */
+static enum reg_class
+get_op_class (rtx op)
+{
+ int regno, hard_regno, offset;
+
+ if (! REG_P (op))
+ return NO_REGS;
+ lra_get_hard_regno_and_offset (op, &hard_regno, &offset);
+ if (hard_regno >= 0)
+ {
+ hard_regno = get_final_hard_regno (hard_regno, offset);
+ return REGNO_REG_CLASS (hard_regno);
+ }
+ /* Reload pseudo will get a hard register in any case. */
+ if ((regno = REGNO (op)) >= new_regno_start)
+ return lra_get_allocno_class (regno);
+ return NO_REGS;
+}
This looks like it ought to be the same as:

return REG_P (x) ? get_reg_class (REGNO (x)) : NO_REGS;

If not, I think there should be a comment explaining the difference.
If so, the comment might be:

/* If OP is a register, return the class of the register as per
get_reg_class, otherwise return NO_REGS. */
+/* Return generated insn mem_pseudo:=val if TO_P or val:=mem_pseudo
+ otherwise. If modes of MEM_PSEUDO and VAL are different, use
+ SUBREG for VAL to make them equal. Assign CODE to the insn if it
+ is not recognized.
+
+ We can not use emit_move_insn in some cases because of bad used
+ practice in some machine descriptions. For example, power can use
+ only base+index addressing for altivec move insns and it is checked
+ by insn predicates. On the other hand, the same move insn
+ constraints permit to use offsetable memory for moving vector mode
+ values from/to general registers to/from memory. emit_move_insn
+ will transform offsetable address to one with base+index addressing
+ which is rejected by the constraint. So sometimes we need to
+ generate move insn without modifications and assign the code
+ explicitly because the generated move can be unrecognizable because
+ of the predicates. */
Ick :-) Can't we just say that fixing this is part of the process
of porting a target to LRA? It'd be nice not to carry hacks like
this around in shiny new code.
+static rtx
+emit_spill_move (bool to_p, rtx mem_pseudo, rtx val, int code)
+{
+ rtx insn, after;
+
+ start_sequence ();
+ if (GET_MODE (mem_pseudo) != GET_MODE (val))
+ val = gen_rtx_SUBREG (GET_MODE (mem_pseudo),
+ GET_CODE (val) == SUBREG ? SUBREG_REG (val) : val,
+ 0);
+ if (to_p)
+ insn = gen_move_insn (mem_pseudo, val);
+ else
+ insn = gen_move_insn (val, mem_pseudo);
+ if (recog_memoized (insn) < 0)
+ INSN_CODE (insn) = code;
+ emit_insn (insn);
+ after = get_insns ();
+ end_sequence ();
+ return after;
+}
this recog_memoized code effectively assumes that INSN is just one
instruction, whereas emit_move_insn_1 or the backend move expanders
could split moves into several instructions.

Since the code-forcing stuff is for rs6000, I think we could drop it
from 4.8 whatever happens.

The sequence stuff above looks redundant; we should just return
INSN directly.
+ /* Quick check on the right move insn which does not need
+ reloads. */
+ if ((dclass = get_op_class (dest)) != NO_REGS
+ && (sclass = get_op_class (src)) != NO_REGS
+ && targetm.register_move_cost (GET_MODE (src), dclass, sclass) == 2)
+ return true;
Suggest:

/* The backend guarantees that register moves of cost 2 never need
reloads. */
+ if (GET_CODE (dest) == SUBREG)
+ dreg = SUBREG_REG (dest);
+ if (GET_CODE (src) == SUBREG)
+ sreg = SUBREG_REG (src);
+ if (! REG_P (dreg) || ! REG_P (sreg))
+ return false;
+ sclass = dclass = NO_REGS;
+ dr = get_equiv_substitution (dreg);
+ if (dr != dreg)
+ dreg = copy_rtx (dr);
I think this copy is too early, because there are quite a few
conditions under which we never emit anything with DREG in it.
+ if (REG_P (dreg))
+ dclass = get_reg_class (REGNO (dreg));
+ if (dclass == ALL_REGS)
+ /* We don't know what class we will use -- let it be figured out
+ by curr_insn_transform function. Remember some targets does not
+ work with such classes through their implementation of
+ machine-dependent hooks like secondary_memory_needed. */
+ return false;
Don't really understand this comment, sorry.
+ sreg_mode = GET_MODE (sreg);
+ sr = get_equiv_substitution (sreg);
+ if (sr != sreg)
+ sreg = copy_rtx (sr);
This copy also seems too early.
+ sri.prev_sri = NULL;
+ sri.icode = CODE_FOR_nothing;
+ sri.extra_cost = 0;
+ secondary_class = NO_REGS;
+ /* Set up hard register for a reload pseudo for hook
+ secondary_reload because some targets just ignore unassigned
+ pseudos in the hook. */
+ if (dclass != NO_REGS
+ && REG_P (dreg) && (dregno = REGNO (dreg)) >= new_regno_start
+ && lra_get_regno_hard_regno (dregno) < 0)
+ reg_renumber[dregno] = ira_class_hard_regs[dclass][0];
+ else
+ dregno = -1;
+ if (sclass != NO_REGS
+ && REG_P (sreg) && (sregno = REGNO (sreg)) >= new_regno_start
+ && lra_get_regno_hard_regno (sregno) < 0)
+ reg_renumber[sregno] = ira_class_hard_regs[sclass][0];
+ else
+ sregno = -1;
I think this would be correct without the:

&& REG_P (dreg) && (dregno = REGNO (dreg)) >= new_regno_start

condition (and similarly for the src case). IMO it would be clearer too:
the decision about when to return a register class for unallocated pseudos
is then localised to get_reg_class rather than copied both here and there.
+ if (sclass != NO_REGS)
+ secondary_class
+ = (enum reg_class) targetm.secondary_reload (false, dest,
+ (reg_class_t) sclass,
+ GET_MODE (src), &sri);
+ if (sclass == NO_REGS
+ || ((secondary_class != NO_REGS || sri.icode != CODE_FOR_nothing)
+ && dclass != NO_REGS))
+ secondary_class
+ = (enum reg_class) targetm.secondary_reload (true, sreg,
+ (reg_class_t) dclass,
+ sreg_mode, &sri);
Hmm, so for register<-register moves, if the target says that the output
reload needs a secondary reload, we try again with an input reload and
hope for a different answer?

If the target is giving different answers in that case, I think that's
a bug in the target, and we should assert instead. The problem is that
if we allow the answers to be different, and both answers involve
secondary reloads, we have no way of knowing whether the second answer
is easier to implement or "more correct" than the first. An assert
avoids that, and puts the onus on the target to sort itself out.

Again, as long as x86 is free of this bug for 4.8, I don't the merge
needs to cater for broken targets.
+ *change_p = true;
I think this is the point at which substituted values should be copied.
+ new_reg = NULL_RTX;
+ if (secondary_class != NO_REGS)
+ new_reg = lra_create_new_reg_with_unique_value (sreg_mode, NULL_RTX,
+ secondary_class,
+ "secondary");
+ start_sequence ();
+ if (sri.icode == CODE_FOR_nothing)
+ lra_emit_move (new_reg, sreg);
+ else
+ {
+ enum reg_class scratch_class;
+
+ scratch_class = (reg_class_from_constraints
+ (insn_data[sri.icode].operand[2].constraint));
+ scratch_reg = (lra_create_new_reg_with_unique_value
+ (insn_data[sri.icode].operand[2].mode, NULL_RTX,
+ scratch_class, "scratch"));
+ emit_insn (GEN_FCN (sri.icode) (new_reg != NULL_RTX ? new_reg : dest,
+ sreg, scratch_reg));
+ }
+ before = get_insns ();
+ end_sequence ();
+ lra_process_new_insns (curr_insn, before, NULL_RTX, "Inserting the move");
AIUI, the constraints pass will look at these instructions and generate
what are now known as tertiary reloads where needed (by calling this
function again). Is that right? Very nice if so: that's far more
natural than the current reload handling.
+/* The chosen reg classes which should be used for the corresponding
+ operands. */
+static enum reg_class goal_alt[MAX_RECOG_OPERANDS];
+/* True if the operand should be the same as another operand and the
+ another operand does not need a reload. */
s/and the another/and that other/
+/* Make reloads for addr register in LOC which should be of class CL,
+ add reloads to list BEFORE. If AFTER is not null emit insns to set
+ the register up after the insn (it is case of inc/dec, modify). */
Maybe:

/* Arrange for address element *LOC to be a register of class CL.
Add any input reloads to list BEFORE. AFTER is nonnull if *LOC is an
automodified value; handle that case by adding the required output
reloads to list AFTER. Return true if the RTL was changed. */
+static bool
+process_addr_reg (rtx *loc, rtx *before, rtx *after, enum reg_class cl)
+{
+ int regno, final_regno;
+ enum reg_class rclass, new_class;
+ rtx reg = *loc;
+ rtx new_reg;
+ enum machine_mode mode;
+ bool change_p = false;
+
+ mode = GET_MODE (reg);
+ if (! REG_P (reg))
+ {
+ /* Always reload memory in an address even if the target
+ supports such addresses. */
+ new_reg
+ = lra_create_new_reg_with_unique_value (mode, reg, cl, "address");
+ push_to_sequence (*before);
+ lra_emit_move (new_reg, reg);
+ *before = get_insns ();
+ end_sequence ();
+ *loc = new_reg;
+ if (after != NULL)
+ {
+ start_sequence ();
+ lra_emit_move (reg, new_reg);
+ emit_insn (*after);
+ *after = get_insns ();
+ end_sequence ();
+ }
+ return true;
Why does this need to be a special case, rather than reusing the
+ }
+ lra_assert (REG_P (reg));
+ final_regno = regno = REGNO (reg);
+ if (regno < FIRST_PSEUDO_REGISTER)
+ {
+ rtx final_reg = reg;
+ rtx *final_loc = &final_reg;
+
+ lra_eliminate_reg_if_possible (final_loc);
+ final_regno = REGNO (*final_loc);
+ }
+ /* Use class of hard register after elimination because some targets
+ do not recognize virtual hard registers as valid address
+ registers. */
+ rclass = get_reg_class (final_regno);
+ if ((*loc = get_equiv_substitution (reg)) != reg)
+ {
+ if (lra_dump_file != NULL)
+ {
+ fprintf (lra_dump_file,
+ "Changing pseudo %d in address of insn %u on equiv ",
+ REGNO (reg), INSN_UID (curr_insn));
+ print_value_slim (lra_dump_file, *loc, 1);
+ fprintf (lra_dump_file, "\n");
+ }
+ *loc = copy_rtx (*loc);
+ change_p = true;
+ }
+ if (*loc != reg || ! in_class_p (final_regno, GET_MODE (reg), cl, &new_class))
+ {
+ reg = *loc;
+ if (get_reload_reg (OP_IN, mode, reg, cl, "address", &new_reg))
+ {
+ push_to_sequence (*before);
+ lra_emit_move (new_reg, reg);
+ *before = get_insns ();
+ end_sequence ();
+ }
+ *loc = new_reg;
+ if (after != NULL)
+ {
+ start_sequence ();
+ lra_emit_move (reg, new_reg);
+ emit_insn (*after);
+ *after = get_insns ();
+ end_sequence ();
+ }
+ change_p = true;
+ }
+ else if (new_class != NO_REGS && rclass != new_class)
+ change_class (regno, new_class, " Change", true);
+ return change_p;
+}
E.g.:

if ((*loc = get_equiv_substitution (reg)) != reg)
...as above...
if (*loc != reg || !in_class_p (reg, cl, &new_class))
...as above...
else if (new_class != NO_REGS && rclass != new_class)
change_class (regno, new_class, " Change", true);
return change_p;

(assuming change to in_class_p suggested earlier) seems like it
covers the same cases.

Also, should OP_IN be OP_INOUT for after != NULL, so that we don't try
to reuse existing reload pseudos? That would mean changing get_reload_reg
(both commentary and code) to handle OP_INOUT like OP_OUT.

Or maybe just pass OP_OUT instead of OP_INOUT, if that's more consistent.
I don't mind which.
+ /* Force reload if this is a constant or PLUS or if there may be a
+ problem accessing OPERAND in the outer mode. */
Suggest:

/* Force a reload of the SUBREG_REG if this ...
+ /* Constant mode ???? */
+ enum op_type type = curr_static_id->operand[nop].type;
Not sure what the comment means, but REG is still the original SUBREG_REG,
so there shouldn't be any risk of a VOIDmode constant. (subreg (const_int))
is invalid rtl.
+/* Return TRUE if *LOC refers for a hard register from SET. */
+static bool
+uses_hard_regs_p (rtx *loc, HARD_REG_SET set)
+{
Nothing seems to care about the address, so we would pass the rtx
rather than a pointer to it.
+ int i, j, x_hard_regno, offset;
+ enum machine_mode mode;
+ rtx x;
+ const char *fmt;
+ enum rtx_code code;
+
+ if (*loc == NULL_RTX)
+ return false;
+ x = *loc;
+ code = GET_CODE (x);
+ mode = GET_MODE (x);
+ if (code == SUBREG)
+ {
+ loc = &SUBREG_REG (x);
+ x = SUBREG_REG (x);
+ code = GET_CODE (x);
+ if (GET_MODE_SIZE (GET_MODE (x)) > GET_MODE_SIZE (mode))
+ mode = GET_MODE (x);
+ }
+
+ if (REG_P (x))
+ {
+ lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
+ /* The real hard regno of the operand after the allocation. */
+ x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
+ return (x_hard_regno >= 0
+ && lra_hard_reg_set_intersection_p (x_hard_regno, mode, set));
With the subreg mode handling above, this looks little-endian specific.
+ if (MEM_P (x))
+ {
+ struct address ad;
+ enum machine_mode mode = GET_MODE (x);
+ rtx *addr_loc = &XEXP (x, 0);
+
+ extract_address_regs (mode, MEM_ADDR_SPACE (x), addr_loc, MEM, &ad);
+ if (ad.base_reg_loc != NULL)
+ {
+ if (uses_hard_regs_p (ad.base_reg_loc, set))
+ return true;
+ }
+ if (ad.index_reg_loc != NULL)
+ {
+ if (uses_hard_regs_p (ad.index_reg_loc, set))
+ return true;
+ }
+ }
is independent of the subreg handling, so perhaps the paradoxical subreg
case should be handled separately, using simplify_subreg_regno.
+/* Major function to choose the current insn alternative and what
+ operands should be reloaded and how. If ONLY_ALTERNATIVE is not
+ negative we should consider only this alternative. Return false if
+ we can not choose the alternative or find how to reload the
+ operands. */
+static bool
+process_alt_operands (int only_alternative)
+{
+ bool ok_p = false;
+ int nop, small_class_operands_num, overall, nalt, offset;
+ int n_alternatives = curr_static_id->n_alternatives;
+ int n_operands = curr_static_id->n_operands;
+ /* LOSERS counts those that don't fit this alternative and would
+ require loading. */
+ int losers;
s/those/the operands/
+ /* Calculate some data common for all alternatives to speed up the
+ function. */
+ for (nop = 0; nop < n_operands; nop++)
+ {
+ op = no_subreg_operand[nop] = *curr_id->operand_loc[nop];
+ lra_get_hard_regno_and_offset (op, &hard_regno[nop], &offset);
+ /* The real hard regno of the operand after the allocation. */
+ hard_regno[nop] = get_final_hard_regno (hard_regno[nop], offset);
+
+ operand_reg[nop] = op;
+ biggest_mode[nop] = GET_MODE (operand_reg[nop]);
+ if (GET_CODE (operand_reg[nop]) == SUBREG)
+ {
+ operand_reg[nop] = SUBREG_REG (operand_reg[nop]);
+ if (GET_MODE_SIZE (biggest_mode[nop])
+ < GET_MODE_SIZE (GET_MODE (operand_reg[nop])))
+ biggest_mode[nop] = GET_MODE (operand_reg[nop]);
+ }
+ if (REG_P (operand_reg[nop]))
+ no_subreg_operand[nop] = operand_reg[nop];
+ else
+ operand_reg[nop] = NULL_RTX;
This looks odd: no_subreg_operand ends up being a subreg if the
SUBREG_REG wasn't a REG. Some more commentary might help.
+ /* The constraints are made of several alternatives. Each operand's
+ constraint looks like foo,bar,... with commas separating the
+ alternatives. The first alternatives for all operands go
+ together, the second alternatives go together, etc.
+
+ First loop over alternatives. */
+ for (nalt = 0; nalt < n_alternatives; nalt++)
+ {
+ /* Loop over operands for one constraint alternative. */
+ if (
+#ifdef HAVE_ATTR_enabled
+ (curr_id->alternative_enabled_p != NULL
+ && ! curr_id->alternative_enabled_p[nalt])
+ ||
+#endif
+ (only_alternative >= 0 && nalt != only_alternative))
+ continue;
Probably more natural if split into two "if (...) continue;"s. E.g.:

#ifdef HAVE_ATTR_enabled
if (curr_id->alternative_enabled_p != NULL
&& !curr_id->alternative_enabled_p[nalt])
continue;
#endif
if (only_alternative >= 0 && nalt != only_alternative))
continue;
+ for (nop = 0; nop < n_operands; nop++)
+ {
+ const char *p;
+ char *end;
+ int len, c, m, i, opalt_num, this_alternative_matches;
+ bool win, did_match, offmemok, early_clobber_p;
+ /* false => this operand can be reloaded somehow for this
+ alternative. */
+ bool badop;
+ /* false => this operand can be reloaded if the alternative
+ allows regs. */
+ bool winreg;
+ /* False if a constant forced into memory would be OK for
+ this operand. */
+ bool constmemok;
+ enum reg_class this_alternative, this_costly_alternative;
+ HARD_REG_SET this_alternative_set, this_costly_alternative_set;
+ bool this_alternative_match_win, this_alternative_win;
+ bool this_alternative_offmemok;
+ int invalidate_m;
+ enum machine_mode mode;
+
+ opalt_num = nalt * n_operands + nop;
+ if (curr_static_id->operand_alternative[opalt_num].anything_ok)
+ {
+ /* Fast track for no constraints at all. */
+ curr_alt[nop] = NO_REGS;
+ CLEAR_HARD_REG_SET (curr_alt_set[nop]);
+ curr_alt_win[nop] = true;
+ curr_alt_match_win[nop] = false;
+ curr_alt_offmemok[nop] = false;
+ curr_alt_matches[nop] = -1;
+ continue;
+ }
Given that this code is pretty complex, it might be clearer to remove
the intermediate "this_*" variables and assign directly to curr_alt_*. I.e.:

curr_alt[nop] = NO_REGS;
CLEAR_HARD_REG_SET (curr_alt_set[nop]);
curr_alt_win[nop] = false;
curr_alt_match_win[nop] = false;
curr_alt_offmemok[nop] = false;
curr_alt_matches[nop] = -1;

opalt_num = nalt * n_operands + nop;
if (curr_static_id->operand_alternative[opalt_num].anything_ok)
{
/* Fast track for no constraints at all. */
curr_alt_win[nop] = true;
continue;
}
+ /* We update set of possible hard regs besides its class
+ because reg class might be inaccurate. For example,
+ union of LO_REGS (l), HI_REGS(h), and STACK_REG(k) in ARM
+ is translated in HI_REGS because classes are merged by
+ pairs and there is no accurate intermediate class. */
somewhere though, either here or above the declaration of curr_alt_set.
+ /* We are supposed to match a previous operand.
+ If we do, we win if that one did. If we do
+ not, count both of the operands as losers.
+ (This is too conservative, since most of the
+ time only a single reload insn will be needed
+ to make the two operands win. As a result,
+ this alternative may be rejected when it is
+ actually desirable.) */
+ /* If it conflicts with others. */
Last line looks incomplete/misplaced.
+ match_p = false;
+ if (operands_match_p (*curr_id->operand_loc[nop],
+ *curr_id->operand_loc[m], m_hregno))
+ {
+ int i;
+
+ for (i = 0; i < early_clobbered_regs_num; i++)
+ if (early_clobbered_nops[i] == m)
+ break;
+ /* We should reject matching of an early
+ clobber operand if the matching operand is
+ not dying in the insn. */
+ if (i >= early_clobbered_regs_num
Why not simply use operands m's early_clobber field?
+ || operand_reg[nop] == NULL_RTX
+ || (find_regno_note (curr_insn, REG_DEAD,
+ REGNO (operand_reg[nop]))
+ != NULL_RTX))
+ match_p = true;
...although I don't really understand this condition. If the two
operands are the same value X, then X must die here whatever the
notes say. So I assume this is coping with a case where the operands
are different but still match. If so, could you give an example?

Matched earlyclobbers explicitly guarantee that the earlyclobber doesn't
apply to the matched input operand; the earlyclobber only applies to
other input operands. So I'd have expected it was those operands
that might need reloading rather than this one.

E.g. if X occurs three times, twice in a matched earlyclobber pair
and once as an independent operand, it's the latter operand that would
need reloading.
+ /* Operands don't match. */
+ /* Retroactively mark the operand we had to
+ match as a loser, if it wasn't already and
+ it wasn't matched to a register constraint
+ (e.g it might be matched by memory). */
+ if (curr_alt_win[m]
+ && (operand_reg[m] == NULL_RTX
+ || hard_regno[m] < 0))
+ {
+ losers++;
+ if (curr_alt[m] != NO_REGS)
+ reload_nregs
+ += (ira_reg_class_max_nregs[curr_alt[m]]
+ [GET_MODE (*curr_id->operand_loc[m])]);
+ }
+ invalidate_m = m;
+ if (curr_alt[m] == NO_REGS)
+ continue;
I found this a bit confusing. If the operands don't match and operand m
allows no registers, don't we have to reject this constraint outright?
E.g. something like:

/* Operands don't match. Both operands must
allow a reload register, otherwise we cannot
make them match. */
if (curr_alt[m] == NO_REGS)
break;
/* Retroactively mark the operand we had to
match as a loser, if it wasn't already and
it wasn't matched to a register constraint
(e.g it might be matched by memory). */
if (curr_alt_win[m]
&& (operand_reg[m] == NULL_RTX
|| hard_regno[m] < 0))
{
losers++;
reload_nregs
+= (ira_reg_class_max_nregs[curr_alt[m]]
[GET_MODE (*curr_id->operand_loc[m])]);
}
+ /* This can be fixed with reloads if the operand
+ we are supposed to match can be fixed with
+ reloads. */
+ badop = false;
+ this_alternative = curr_alt[m];
+ COPY_HARD_REG_SET (this_alternative_set, curr_alt_set[m]);
+
+ /* If we have to reload this operand and some
+ previous operand also had to match the same
+ thing as this operand, we don't know how to do
+ that. So reject this alternative. */
+ if (! did_match)
+ for (i = 0; i < nop; i++)
+ if (curr_alt_matches[i] == this_alternative_matches)
+ badop = true;
OK, so this is another case of cruft from reload that I'd like to remove,
but do you know of any reason why this shouldn't be:

/* If we have to reload this operand and some previous
operand also had to match the same thing as this
operand, we don't know how to do that. */
if (!match_p || !curr_alt_win[m])
{
for (i = 0; i < nop; i++)
if (curr_alt_matches[i] == m)
break;
if (i < nop)
break;
}
else
---> did_match = true;

/* This can be fixed with reloads if the operand
we are supposed to match can be fixed with
reloads. */
---> this_alternative_matches = m;
---> invalidate_m = m;
badop = false;
this_alternative = curr_alt[m];
COPY_HARD_REG_SET (this_alternative_set, curr_alt_set[m]);

(although a helper function might be better than the awkward breaking)?
Note that the ---> lines have moved from further up.

This is the only time in the switch statement where one constraint
in a constraint string uses "badop = true" to reject the operand.
I.e. for "<something else>0" we should normally not reject the
alternative based solely on the "0", since the "<something else>"
might have been satisfied instead. And we should only record matching
information if we've decided the match can be implemented by reloads
(the last block).
+ /* We prefer no matching alternatives because
+ it gives more freedom in RA. */
+ if (operand_reg[nop] == NULL_RTX
+ || (find_regno_note (curr_insn, REG_DEAD,
+ REGNO (operand_reg[nop]))
+ == NULL_RTX))
+ reject += 2;
Looks like a new reject rule. I agree it makes conceptual sense though,
so I'm all for it.
+ || (REG_P (op)
+ && REGNO (op) >= FIRST_PSEUDO_REGISTER
+ && in_mem_p (REGNO (op))))
This pattern occurs several times. I think a helper function like
spilled_reg_p (op) would help. See 'g' below.
+ if (CONST_INT_P (op)
+ || (GET_CODE (op) == CONST_DOUBLE && mode == VOIDmode))
...
+ if (CONST_INT_P (op)
+ || (GET_CODE (op) == CONST_DOUBLE && mode == VOIDmode))
After recent changes these should be CONST_SCALAR_INT_P (op)
+ if (/* A PLUS is never a valid operand, but LRA can
+ make it from a register when eliminating
+ registers. */
+ GET_CODE (op) != PLUS
+ && (! CONSTANT_P (op) || ! flag_pic
+ || LEGITIMATE_PIC_OPERAND_P (op))
+ && (! REG_P (op)
+ || (REGNO (op) >= FIRST_PSEUDO_REGISTER
+ && in_mem_p (REGNO (op)))))
Rather than the special case for PLUS, I think this would be better as:

if (MEM_P (op)
|| spilled_reg_p (op)
|| general_constant_p (op))
win = true;

where general_constant_p abstracts away:

(CONSTANT_P (op)
&& (! flag_pic || LEGITIMATE_PIC_OPERAND_P (op)))

general_constant_p probably ought to go in a common file because several
places need this condition (including other parts of this switch statement).
+#ifdef EXTRA_CONSTRAINT_STR
+ if (EXTRA_MEMORY_CONSTRAINT (c, p))
+ {
+ if (EXTRA_CONSTRAINT_STR (op, c, p))
+ win = true;
+ /* For regno_equiv_mem_loc we have to
+ check. */
+ else if (REG_P (op)
+ && REGNO (op) >= FIRST_PSEUDO_REGISTER
+ && in_mem_p (REGNO (op)))
Looks like an old comment from an earlier iteration. There doesn't
seem to be a function called regno_equiv_mem_loc in the current patch.
But...
+ {
+ /* We could transform spilled memory
+ finally to indirect memory. */
+ if (EXTRA_CONSTRAINT_STR
+ (get_indirect_mem (mode), c, p))
+ win = true;
+ }
...is this check really needed? It's a documented requirement that memory
+ /* If we didn't already win, we can reload
+ constants via force_const_mem, and other
+ MEMs by reloading the address like for
+ 'o'. */
+ if (CONST_POOL_OK_P (mode, op) || MEM_P (op))
+ badop = false;
It seems a bit inconsistent to treat a spilled pseudo whose address
might well need reloading as a win, while not treating existing MEMs
whose addresses need reloading as a win.
+ if (EXTRA_CONSTRAINT_STR (op, c, p))
+ win = true;
+ else if (REG_P (op)
+ && REGNO (op) >= FIRST_PSEUDO_REGISTER
+ && in_mem_p (REGNO (op)))
+ {
+ /* We could transform spilled memory finally
+ to indirect memory. */
+ if (EXTRA_CONSTRAINT_STR (get_indirect_mem (mode),
+ c, p))
+ win = true;
+ }
I don't understand why there's two copies of this. I think we have
to trust the target's classification of constraints, so if the target
says that something isn't a memory constraint, we shouldn't check the
(mem (base)) case.
+ if (c != ' ' && c != '\t')
+ costly_p = c == '*';
I think there needs to be a comment somewhere saying how we handle this.
Being costly seems to contribute one reject point (i.e. a sixth of a '?')
compared to the normal case, which is very different from the current
reload behaviour. We should probably update the "*" documentation
in md.texi too.

Some targets use "*" constraints to make sure that floating-point
registers don't get used as spill space in purely integer code
(so that task switches don't pay the FPU save/restore penalty).
Would that still "work" with this definition? (FWIW, I think
using "*" is a bad way to achieve this feature, just asking.)
+ /* We simulate the behaviour of old reload here.
+ Although scratches need hard registers and it
+ might result in spilling other pseudos, no reload
+ insns are generated for the scratches. So it
+ might cost something but probably less than old
+ reload pass believes. */
+ if (lra_former_scratch_p (REGNO (operand_reg[nop])))
+ reject += LOSER_COST_FACTOR;
Yeah, this caused me no end of trouble when tweaking the MIPS
multiply-accumulate patterns. However, unlike the other bits of
cruft I've been complaining about, this is one where I can't think
of any alternative that makes more inherent sense (to me). So I agree
that leaving it as-is is the best approach for now.
+ /* If the operand is dying, has a matching constraint,
+ and satisfies constraints of the matched operand
+ which failed to satisfy the own constraints, we do
+ not need to generate a reload insn for this
+ operand. */
+ if (this_alternative_matches < 0
+ || curr_alt_win[this_alternative_matches]
+ || ! REG_P (op)
+ || find_regno_note (curr_insn, REG_DEAD,
+ REGNO (op)) == NULL_RTX
+ || ((hard_regno[nop] < 0
+ || ! in_hard_reg_set_p (this_alternative_set,
+ mode, hard_regno[nop]))
+ && (hard_regno[nop] >= 0
+ || ! in_class_p (REGNO (op), GET_MODE (op),
+ this_alternative, NULL))))
+ losers++;
I think this might be clearer as:

if (!(this_alternative_matches >= 0
&& !curr_alt_win[this_alternative_matches]
&& REG_P (op)
&& find_regno_note (curr_insn, REG_DEAD, REGNO (op))
&& (hard_regno[nop] >= 0
? in_hard_reg_set_p (this_alternative_set,
mode, hard_regno[nop])
: in_class_p (op, this_alternative, NULL))))
losers++;
+ if (operand_reg[nop] != NULL_RTX)
+ {
+ int last_reload = (lra_reg_info[ORIGINAL_REGNO
+ (operand_reg[nop])]
+ .last_reload);
+
+ if (last_reload > bb_reload_num)
+ reload_sum += last_reload;
+ else
+ reload_sum += bb_reload_num;
+/* Overall number reflecting distances of previous reloading the same
+ value. It is used to improve inheritance chances. */
+static int best_reload_sum;
which made me think of distance from the current instruction. I see
it's actually something else, effectively a sum of instruction numbers.

I assumed the idea was to prefer registers that were reloaded more
recently (closer the current instruction). In that case I thought that,
for a distance-based best_reload_sum, smaller would be better,
while for an instruction-number-based best_reload_sum, larger would
be better. It looks like we use instruction-number based best_reload_sums
+ && (reload_nregs < best_reload_nregs
+ || (reload_nregs == best_reload_nregs
+ && best_reload_sum < reload_sum))))))))
Is that intentional?

Also, is this value meaningful for output reloads, which aren't really
going to be able to inherit a value as such? We seem to apply the cost
regardless of whether it's an input or an output, so probably deserves
a comment.

Same for matched input operands, which as you say elsewhere aren't
inherited.
+ if (badop
+ /* Alternative loses if it has no regs for a reg
+ operand. */
+ || (REG_P (op) && no_regs_p
+ && this_alternative_matches < 0))
+ goto fail;
+ if (this_alternative_matches < 0
+ && no_regs_p && ! this_alternative_offmemok && ! constmemok)
+ goto fail;
+ /* If this operand could be handled with a reg, and some reg
+ is allowed, then this operand can be handled. */
+ if (winreg && this_alternative != NO_REGS)
+ badop = false;
which I think belongs in the same else statement. At least after the
matching changes suggested above, I think all three can be replaced by:

/* If this operand accepts a register, and if the register class
has at least one allocatable register, then this operand
can be reloaded. */
if (winreg && !no_regs_p)
badop = false;

if (badop)
goto fail;

which IMO belongs after the "no_regs_p" assignment. badop should never
be false if we have no way of reloading the value.
+ if (! no_regs_p)
+ reload_nregs
+ += ira_reg_class_max_nregs[this_alternative][mode];
I wasn't sure why we counted this even in the "const_to_mem && constmmeok"
and "MEM_P (op) && offmemok" cases from:

/* We prefer to reload pseudos over reloading other
things, since such reloads may be able to be
eliminated later. So bump REJECT in other cases.
Don't do this in the case where we are forcing a
constant into memory and it will then win since we
don't want to have a different alternative match
then. */
if (! (REG_P (op)
&& REGNO (op) >= FIRST_PSEUDO_REGISTER)
&& ! (const_to_mem && constmemok)
/* We can reload the address instead of memory (so
do not punish it). It is preferable to do to
avoid cycling in some cases. */
&& ! (MEM_P (op) && offmemok))
reject += 2;
+ if (early_clobber_p)
+ reject++;
+ /* ??? Should we update the cost because early clobber
+ register reloads or it is a rare thing to be worth to do
+ it. */
+ overall = losers * LOSER_COST_FACTOR + reject;
Could you expand on the comment a bit?
+ if ((best_losers == 0 || losers != 0) && best_overall < overall)
+ goto fail;
+
+ curr_alt[nop] = this_alternative;
+ COPY_HARD_REG_SET (curr_alt_set[nop], this_alternative_set);
+ curr_alt_win[nop] = this_alternative_win;
+ curr_alt_match_win[nop] = this_alternative_match_win;
+ curr_alt_offmemok[nop] = this_alternative_offmemok;
+ curr_alt_matches[nop] = this_alternative_matches;
+
+ if (invalidate_m >= 0 && ! this_alternative_win)
+ curr_alt_win[invalidate_m] = false;
BTW, after the matching changes above, I don't think we need both
"invalidate_m" and "this_alternative_matches".
+ for (j = hard_regno_nregs[clobbered_hard_regno][biggest_mode[i]] - 1;
+ j >= 0;
+ j--)
+ SET_HARD_REG_BIT (temp_set, clobbered_hard_regno + j);
add_to_hard_reg_set.
+ else if (curr_alt_matches[j] == i && curr_alt_match_win[j])
+ {
+ /* This is a trick. Such operands don't conflict and
+ don't need a reload. But it is hard to transfer
+ this information to the assignment pass which
+ spills one operand without this info. We avoid the
+ conflict by forcing to use the same pseudo for the
+ operands hoping that the pseudo gets the same hard
+ regno as the operands and the reloads are gone. */
+ if (*curr_id->operand_loc[i] != *curr_id->operand_loc[j])
...
+ /* See the comment for the previous case. */
+ if (*curr_id->operand_loc[i] != *curr_id->operand_loc[j])
What are these last two if statements for? I wasn't sure how two operands
could have the same address.

Not saying they're wrong, but I think a comment would be good.
+ small_class_operands_num = 0;
+ for (nop = 0; nop < n_operands; nop++)
+ /* If this alternative can be made to work by reloading, and
+ it needs less reloading than the others checked so far,
+ record it as the chosen goal for reloading. */
+ small_class_operands_num
+ += SMALL_REGISTER_CLASS_P (curr_alt[nop]) ? 1 : 0;
Misplaced comment; I think it belongs after this line.

Richard
Richard Sandiford
2012-10-10 19:50:04 UTC
Permalink
Sorry, reading back in different surroundings made me notice a couple
if ((*loc = get_equiv_substitution (reg)) != reg)
...as above...
if (*loc != reg || !in_class_p (reg, cl, &new_class))
...as above...
else if (new_class != NO_REGS && rclass != new_class)
change_class (regno, new_class, " Change", true);
return change_p;
(assuming change to in_class_p suggested earlier) seems like it
covers the same cases.
...but that same in_class_p change means that the "rclass != new_class"
condition isn't needed. I.e.

if ((*loc = get_equiv_substitution (reg)) != reg)
...as above...
if (*loc != reg || !in_class_p (reg, cl, &new_class))
...as above...
else if (new_class != NO_REGS)
change_class (regno, new_class, " Change", true);
return change_p;
+ if (operand_reg[nop] != NULL_RTX)
+ {
+ int last_reload = (lra_reg_info[ORIGINAL_REGNO
+ (operand_reg[nop])]
+ .last_reload);
+
+ if (last_reload > bb_reload_num)
+ reload_sum += last_reload;
+ else
+ reload_sum += bb_reload_num;
+/* Overall number reflecting distances of previous reloading the same
+ value. It is used to improve inheritance chances. */
+static int best_reload_sum;
which made me think of distance from the current instruction. I see
it's actually something else, effectively a sum of instruction numbers.
I assumed the idea was to prefer registers that were reloaded more
recently (closer the current instruction). In that case I thought that,
for a distance-based best_reload_sum, smaller would be better,
while for an instruction-number-based best_reload_sum, larger would
be better. It looks like we use instruction-number based best_reload_sums
+ && (reload_nregs < best_reload_nregs
+ || (reload_nregs == best_reload_nregs
+ && best_reload_sum < reload_sum))))))))
Is that intentional?
Clearly I can't read. The code _does_ prefer higher numbers. I still
think "distance" is a bit misleading though. :-)
Also, is this value meaningful for output reloads, which aren't really
going to be able to inherit a value as such? We seem to apply the cost
regardless of whether it's an input or an output, so probably deserves
a comment.
Same for matched input operands, which as you say elsewhere aren't
inherited.
still applies.

Richard
Vladimir Makarov
2012-10-14 17:36:40 UTC
Permalink
Post by Richard Sandiford
Hi Vlad,
Here's a review of the first part of ira-constraints.c. It's a big file,
and this is a long reply, so I'll send further comments another day in
another message.
+/* This file contains code for 3 passes: constraint pass,
+ inheritance/split pass, and pass for undoing failed inheritance and
+ split.
+
+ The major goal of constraint pass is to transform RTL to satisfy
+ o choosing insn alternatives;
+ o generating *reload insns* (or reloads in brief) and *reload
+ pseudos* which will got necessary hard registers later;
s/got/get/
Fixed.
Post by Richard Sandiford
+ o substituting pseudo equivalences (if it is done once, is done
+ everywhere) and removes insns initializing used equivalent
+ substitution.
o substituting pseudos with equivalent values and removing the
instructions that initialized those pseudos.
Fixed.
Post by Richard Sandiford
+ To speed the pass up we process only necessary insns (first time
+ all insns) and reuse of already chosen alternatives in some
+ cases.
On the first iteration of the pass we process every instruction and
choose an alternative for each one. On subsequent iterations we try
to avoid reprocessing instructions if we can be sure that the old
choice is still valid.
Fixed.
Post by Richard Sandiford
+ The inheritance/spilt pass is to transform code to achieve
+ ineheritance and live range splitting. It is done on backward
+ traverse of EBBs.
Typo: inheritance. "backward traversal".
Fixed.
Post by Richard Sandiford
+ The inheritance optimization goal is to reuse values in hard
+ registers. There is analogous optimization in old reload pass. The
+
+ reload_p1 <- p reload_p1 <- p
+ ... new_p <- reload_p1
+ ... => ...
+ reload_p2 <- p reload_p2 <- new_p
+
+ where p is spilled and not changed between the insns. Reload_p1 is
+ also called *original pseudo* and new_p is called *inheritance
+ pseudo*.
+
+ The subsequent assignment pass will try to assign the same (or
+ another if it is not possible) hard register to new_p as to
+ reload_p1 or reload_p2.
+
+ If it fails to assign a hard register, the opposite transformation
+ will restore the original code on (the pass called undoing
+ inheritance) because with spilled new_p the code would be much
+ worse. [...]
If the assignment pass fails to assign a hard register to new_p,
this file will undo the inheritance and restore the original code.
This is because implementing the above sequence with a spilled
new_p would make the code much worse.
Fixed.
Post by Richard Sandiford
+ Splitting (transformation) is also done in EBB scope on the same
+
+ r <- ... or ... <- r r <- ... or ... <- r
+ ... s <- r (new insn -- save)
+ ... =>
+ ... r <- s (new insn -- restore)
+ ... <- r ... <- r
+
+ The *split pseudo* s is assigned to the hard register of the
+ original pseudo or hard register r.
+
+ o In EBBs with high register pressure for global pseudos (living
+ in at least 2 BBs) and assigned to hard registers when there
+ are more one reloads needing the hard registers;
+ o for pseudos needing save/restore code around calls.
+
+ If the split pseudo still has the same hard register as the
+ original pseudo after the subsequent assignment pass, the opposite
+ transformation is done on the same pass for undoing inheritance. */
AIUI spill_for can spill split pseudos. I think the comment should say
what happens then. If I understand the code correctly, we keep the
split if "r" is a hard register or was assigned a hard register.
We undo it if "r" was not assigned a hard register. Is that right?
Yes. To be more correctly, r and s are assigned the same hard reg
before the assignment pass. So if r is *spilled* (by assignment pass)
or r and s have the same hard reg after the assignment pass, we undo the
transformation.

I added a comment.
Post by Richard Sandiford
+/* Array whose element is (MEM:MODE BASE_REG) corresponding to the
+ mode (index) and where BASE_REG is a base hard register for given
+ memory mode. */
+static rtx indirect_mem[MAX_MACHINE_MODE];
/* Index M is an rtx of the form (mem:M BASE_REG), where BASE_REG
is a sample hard register that is a valid address for mode M.
The memory refers to the generic address space. */
Fixed.
Post by Richard Sandiford
+/* Return class of hard regno of REGNO or if it is was not assigned to
+ a hard register, return its allocno class but only for reload
+ pseudos created on the current constraint pass. Otherwise, return
+ NO_REGS. */
+static enum reg_class
+get_reg_class (int regno)
/* If REGNO is a hard register or has been allocated a hard register,
return the class of that register. If REGNO is a pseudo created
by the current constraints pass, assume that it will be allocated
a hard register and return the class that that register will have.
(This assumption is optimistic when REGNO is an inheritance or
split pseudo.) Return NO_REGS otherwise. */
I don't like

assume that it will be allocated
a hard register and return the class that that register will have

For example, I could treat this as assigning ax to pseudo having class
general_regs and returning class a_reg instead of general_regs.

So I modified your comment a bit.
Post by Richard Sandiford
if that's accurate. I dropped the term "reload pseudo" because of
the general comment in my earlier reply about the use of "reload pseudo"
when the code seems to include inheritance and split pseudos too.
There is no inheritance and splitting yet. It is done after the
constraint pass.
So at this stage >= new_regno_start means reload pseudo.
Post by Richard Sandiford
+/* Return true if REGNO in REG_MODE satisfies reg class constraint CL.
+ For new reload pseudos we should make more accurate class
+ *NEW_CLASS (we set up it if it is not NULL) to satisfy the
+ constraints. Otherwise, set up NEW_CLASS to NO_REGS. */
+static bool
+in_class_p (int regno, enum machine_mode reg_mode,
+ enum reg_class cl, enum reg_class *new_class)
Same comment here, since it uses get_reg_class. I.e. for registers >=
new_regno_start, we're really testing whether the first allocatable
register in REGNO's allocno class satisfies CL.
See my comment above.
Post by Richard Sandiford
Also, the only caller that doesn't directly pass REGNO and REG_MODE
+ if (new_class != NULL)
+ *new_class = NO_REGS;
+ if (regno < FIRST_PSEUDO_REGISTER)
+ return TEST_HARD_REG_BIT (reg_class_contents[cl], regno);
+ rclass = get_reg_class (regno);
+ final_regno = regno = REGNO (reg);
+ if (regno < FIRST_PSEUDO_REGISTER)
+ {
+ rtx final_reg = reg;
+ rtx *final_loc = &final_reg;
+
+ lra_eliminate_reg_if_possible (final_loc);
+ final_regno = REGNO (*final_loc);
+ }
I.e. process_addr_reg applies eliminations before testing whereas
in_class_p doesn't. I couldn't really tell why the two were different.
Since the idea is that we use elimination source registers to represent
their targets, shouldn't in_class_p eliminate too?
As I remember that was a fix for a bug for some target which needed an
eliminated reg for legitimate address recognition. I did not see such
problem for the constraints.
Post by Richard Sandiford
With that difference removed, in_class_p could take the rtx instead
of a (REGNO, MODE) pair. It could then pass that rtx directly to
lra_eliminate_reg_if_possible. I think this would lead to a cleaner
interface and make things more regular.
Thanks. I fixed it.
Post by Richard Sandiford
/* Return true if X satisfies (or will satisfy) reg class constraint CL.
If X is a pseudo created by this constraints pass, assume that it will
be allocated a hard register from its allocno class, but allow that
class to be narrowed to CL if it is currently a superset of CL.
If NEW_CLASS is nonnull, set *NEW_CLASS to the new allocno class
of REGNO (X), or NO_REGS if no change in its class was needed. */
Fixed. I just added 'X is a reload pseudo...'
Post by Richard Sandiford
That's a change in the meaning of NEW_CLASS, but seems easier for
+ common_class = ira_reg_class_subset[rclass][cl];
+ if (new_class != NULL)
+ *new_class = common_class;
common_class = ira_reg_class_subset[rclass][cl];
if (new_class != NULL && rclass != common_class)
*new_class = common_class;
This change results in infinite LRA looping on a first libgcc file
compilation. Unfortunately I have no time to investigate it.
I'd like to say that most code of in this code is very sensitive to
changes. I see it a lot. You change something looking obvious and a
target is broken.
I am going to investigate it when I have more time.
Post by Richard Sandiford
+ if (regno < new_regno_start
+ /* Do not make more accurate class from reloads generated. They
+ are mostly moves with a lot of constraints. Making more
+ accurate class may results in very narrow class and
+ impossibility of find registers for several reloads of one
+ insn. */
/* Do not allow the constraints for reload instructions to
influence the classes of new pseudos. These reloads are
typically moves that have many alternatives, and restricting
reload pseudos for one alternative may lead to situations
where other reload pseudos are no longer allocatable. */
Fixed.
Post by Richard Sandiford
+ || INSN_UID (curr_insn) >= new_insn_uid_start)
+ return ((regno >= new_regno_start && rclass == ALL_REGS)
+ || (rclass != NO_REGS && ira_class_subset_p[rclass][cl]
+ && ! hard_reg_set_subset_p (reg_class_contents[cl],
+ lra_no_alloc_regs)));
Why the ALL_REGS special case? I think it deserves a comment.
I added a comment.
Post by Richard Sandiford
+/* Return the defined and profitable equiv substitution of reg X, return
+ X otherwise. */
/* If we have decided to substitute X with another value, return that value,
otherwise return X. */
Fixed.
Post by Richard Sandiford
+/* Change class of pseudo REGNO to NEW_CLASS. Print info about it
+ using TITLE. Output a new line if NL_P. */
+static void
+change_class (int regno, enum reg_class new_class,
+ const char *title, bool nl_p)
+{
+ if (lra_dump_file != NULL)
+ fprintf (lra_dump_file, "%s to class %s for r%d",
+ title, reg_class_names[new_class], regno);
+ setup_reg_classes (regno, new_class, NO_REGS, new_class);
+ if (lra_dump_file != NULL && nl_p)
+ fprintf (lra_dump_file, "\n");
+}
I think either this or setup_reg_classes should have an assert
that REGNO is >= FIRST_PSEUDO_REGISTER. This matters more now
because a lot of LRA deals with hard and pseudo registers
side-by-side.
Ok. I added an assert.
Post by Richard Sandiford
+/* Create a new pseudo using MODE, RCLASS, ORIGINAL, TITLE or reuse
+ already created input reload pseudo (only if TYPE is not OP_OUT).
+ The result pseudo is returned through RESULT_REG. Return TRUE if
+ we created a new pseudo, FALSE if we reused the already created
+ input reload pseudo. */
/* Store in *RESULT_REG a register for reloading ORIGINAL, which has
mode MODE. TYPE specifies the direction of the reload -- either OP_IN
or OP_OUT -- and RCLASS specifies the class of hard register required.
It can be OP_INOUT too.
Post by Richard Sandiford
Try to reuse existing input reloads where possible. Return true if
*RESULT_REG is a new register, false if it is an existing one.
Use TITLE to describe new registers for debug purposes. */
although I admit that's a bit convoluted...
I combined two comments in a better one.
Post by Richard Sandiford
+ for (i = 0; i < curr_insn_input_reloads_num; i++)
+ if (rtx_equal_p (curr_insn_input_reloads[i].input, original))
+ break;
+ if (i >= curr_insn_input_reloads_num
+ || ! in_class_p (REGNO (curr_insn_input_reloads[i].reg),
+ GET_MODE (curr_insn_input_reloads[i].reg),
+ rclass, &new_class))
+ {
+ res_p = true;
+ *result_reg = lra_create_new_reg (mode, original, rclass, title);
+ }
+ else
+ {
+ lra_assert (! side_effects_p (original));
+ res_p = false;
+ *result_reg = curr_insn_input_reloads[i].reg;
+ regno = REGNO (*result_reg);
+ if (lra_dump_file != NULL)
+ {
+ fprintf (lra_dump_file, " Reuse r%d for reload ", regno);
+ print_value_slim (lra_dump_file, original, 1);
+ }
+ if (rclass != new_class)
+ change_class (regno, new_class, ", change", false);
+ if (lra_dump_file != NULL)
+ fprintf (lra_dump_file, "\n");
+ }
+ lra_assert (curr_insn_input_reloads_num < LRA_MAX_INSN_RELOADS);
+ curr_insn_input_reloads[curr_insn_input_reloads_num].input = original;
+ curr_insn_input_reloads[curr_insn_input_reloads_num++].reg = *result_reg;
+ return res_p;
It probably doesn't matter in practice, but I think this would
for (i = 0; i < curr_insn_input_reloads_num; i++)
if (rtx_equal_p (curr_insn_input_reloads[i].input, original)
&& in_class_p (curr_insn_input_reloads[i].reg, rclass, &new_class))
{
...reuse case..
return false;
}
...new case...
return true;
which also copes with the unlikely case that the same input is used
three times, and that the third use requires the same class as the
second.
Ok. I fixed it.
Post by Richard Sandiford
+/* The page contains code to extract memory address parts. */
+
+/* Info about base and index regs of an address. In some rare cases,
+ base/index register can be actually memory. In this case we will
+ reload it. */
+struct address
+{
+ rtx *base_reg_loc; /* NULL if there is no a base register. */
+ rtx *base_reg_loc2; /* Second location of {post/pre}_modify, NULL
+ otherwise. */
+ rtx *index_reg_loc; /* NULL if there is no an index register. */
+ rtx *index_loc; /* location of index reg * scale or index_reg_loc
+ otherwise. */
+ rtx *disp_loc; /* NULL if there is no a displacement. */
+ /* Defined if base_reg_loc is not NULL. */
+ enum rtx_code base_outer_code, index_code;
+ /* True if the base register is modified in the address, for
+ example, in PRE_INC. */
+ bool base_modify_p;
+};
Comments should be consistently above the fields rather than to the right.
Fixed.
Post by Richard Sandiford
+/* Process address part in space AS (or all address if TOP_P) with
+ location *LOC to extract address characteristics.
+
+ If CONTEXT_P is false, we are looking at the base part of an
+ address, otherwise we are looking at the index part.
+
+ MODE is the mode of the memory reference; OUTER_CODE and INDEX_CODE
+ give the context that the rtx appears in; MODIFY_P if *LOC is
+ modified. */
+static void
+extract_loc_address_regs (bool top_p, enum machine_mode mode, addr_space_t as,
+ rtx *loc, bool context_p, enum rtx_code outer_code,
+ enum rtx_code index_code,
+ bool modify_p, struct address *ad)
+{
+ rtx x = *loc;
+ enum rtx_code code = GET_CODE (x);
+ bool base_ok_p;
+
+ switch (code)
+ {
+ if (! context_p)
+ ad->disp_loc = loc;
This looks a bit odd. I assume it's trying to avoid treating MULT
scale factors as displacements, but I thought whether something was
a displacement or not depended on whether it is involved (possibly
indirectly) in a sum with the base. Seems like it'd be better
to check for that directly.
+ /* If this machine only allows one register per address, it
+ must be in the first operand. */
+ if (MAX_REGS_PER_ADDRESS == 1 || code == LO_SUM)
+ {
+ extract_loc_address_regs (false, mode, as, arg0_loc, false, code,
+ code1, modify_p, ad);
+ ad->disp_loc = arg1_loc;
+ }
+ /* If index and base registers are the same on this machine,
+ just record registers in any non-constant operands. We
+ assume here, as well as in the tests below, that all
+ addresses are in canonical form. */
+ else if (INDEX_REG_CLASS
+ == base_reg_class (VOIDmode, as, PLUS, SCRATCH)
+ && code0 != PLUS && code0 != MULT)
+ {
+ extract_loc_address_regs (false, mode, as, arg0_loc, false, PLUS,
+ code1, modify_p, ad);
+ if (! CONSTANT_P (arg1))
+ extract_loc_address_regs (false, mode, as, arg1_loc, true, PLUS,
+ code0, modify_p, ad);
+ else
+ ad->disp_loc = arg1_loc;
+ }
+
+ /* If the second operand is a constant integer, it doesn't
+ change what class the first operand must be. */
+ else if (code1 == CONST_INT || code1 == CONST_DOUBLE)
+ {
+ ad->disp_loc = arg1_loc;
+ extract_loc_address_regs (false, mode, as, arg0_loc, context_p,
+ PLUS, code1, modify_p, ad);
+ }
+ /* If the second operand is a symbolic constant, the first
+ operand must be an index register but only if this part is
+ all the address. */
+ else if (code1 == SYMBOL_REF || code1 == CONST || code1 == LABEL_REF)
+ {
+ ad->disp_loc = arg1_loc;
+ extract_loc_address_regs (false, mode, as, arg0_loc,
+ top_p ? true : context_p, PLUS, code1,
+ modify_p, ad);
+ }
What's the reason for the distinction between the last two, which AIUI
top_p ? true : context_p
condition is safe: some targets use aligning addresses like
(and X (const_int -ALIGN)), but that shouldn't really affect whether
a register in X is treated as a base or an index.
The code works for PPC which uses aligning addresses.
Post by Richard Sandiford
+ /* If both operands are registers but one is already a hard
+ register of index or reg-base class, give the other the
+ class that the hard register is not. */
+ else if (code0 == REG && code1 == REG
+ && REGNO (arg0) < FIRST_PSEUDO_REGISTER
+ && ((base_ok_p
+ = ok_for_base_p_nonstrict (arg0, mode, as, PLUS, REG))
+ || ok_for_index_p_nonstrict (arg0)))
+ {
+ extract_loc_address_regs (false, mode, as, arg0_loc, ! base_ok_p,
+ PLUS, REG, modify_p, ad);
+ extract_loc_address_regs (false, mode, as, arg1_loc, base_ok_p,
+ PLUS, REG, modify_p, ad);
+ }
+ else if (code0 == REG && code1 == REG
+ && REGNO (arg1) < FIRST_PSEUDO_REGISTER
+ && ((base_ok_p
+ = ok_for_base_p_nonstrict (arg1, mode, as, PLUS, REG))
+ || ok_for_index_p_nonstrict (arg1)))
+ {
+ extract_loc_address_regs (false, mode, as, arg0_loc, base_ok_p,
+ PLUS, REG, modify_p, ad);
+ extract_loc_address_regs (false, mode, as, arg1_loc, ! base_ok_p,
+ PLUS, REG, modify_p, ad);
+ }
+ /* If one operand is known to be a pointer, it must be the
+ base with the other operand the index. Likewise if the
+ other operand is a MULT. */
+ else if ((code0 == REG && REG_POINTER (arg0)) || code1 == MULT)
+ {
+ extract_loc_address_regs (false, mode, as, arg0_loc, false, PLUS,
+ code1, modify_p, ad);
+ if (code1 == MULT)
+ ad->index_loc = arg1_loc;
+ extract_loc_address_regs (false, mode, as, arg1_loc, true, PLUS,
+ code0, modify_p, ad);
+ }
+ else if ((code1 == REG && REG_POINTER (arg1)) || code0 == MULT)
+ {
+ extract_loc_address_regs (false, mode, as, arg0_loc, true, PLUS,
+ code1, modify_p, ad);
+ if (code0 == MULT)
+ ad->index_loc = arg0_loc;
+ extract_loc_address_regs (false, mode, as, arg1_loc, false, PLUS,
+ code0, modify_p, ad);
+ }
Some targets care about the choice between index and base for
correctness reasons (PA IIRC) or for performance (some ppc targets IIRC),
so I'm not sure whether it's safe to give REG_POINTER such a low priority.
This code works for PPC and PARISC.
Post by Richard Sandiford
+ {
+ const char *fmt = GET_RTX_FORMAT (code);
+ int i;
+
+ if (GET_RTX_LENGTH (code) != 1
+ || fmt[0] != 'e' || GET_CODE (XEXP (x, 0)) != UNSPEC)
+ {
+ for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+ if (fmt[i] == 'e')
+ extract_loc_address_regs (false, mode, as, &XEXP (x, i),
+ context_p, code, SCRATCH,
+ modify_p, ad);
+ break;
+ }
+ /* fall through for case UNARY_OP (UNSPEC ...) */
+ }
+
+ if (ad->disp_loc == NULL)
+ ad->disp_loc = loc;
+ else if (ad->base_reg_loc == NULL)
+ {
+ ad->base_reg_loc = loc;
+ ad->base_outer_code = outer_code;
+ ad->index_code = index_code;
+ ad->base_modify_p = modify_p;
+ }
+ else
+ {
+ lra_assert (ad->index_reg_loc == NULL);
+ ad->index_reg_loc = loc;
+ }
+ break;
+
+ }
Which targets use a bare UNSPEC as a displacement? I thought a
displacement had to be a link-time constant, in which case it should
satisfy CONSTANT_P. For UNSPECs, that means wrapping it in a CONST.
I saw it somewhere. I guess IA64.
Post by Richard Sandiford
I'm just a bit worried that the UNSPEC handling is sensitive to the
order that subrtxes are processed (unlike PLUS, which goes to some
trouble to work out what's what). It could be especially confusing
because the default case processes operands in reverse order while
PLUS processes them in forward order.
Also, which cases require the special UNARY_OP (UNSPEC ...) fallthrough?
Probably deserves a comment.
I don't remember. To figure out, I should switch it off and try all
targets supported by LRA.
Post by Richard Sandiford
AIUI the base_reg_loc, index_reg_loc and disp_loc fields aren't just
recording where reloads of a particular class need to go (obviously
in the case of disp_loc, which isn't reloaded at all). The feidls
have semantic value too. I.e. we use them to work out the value
of at least part of the address.
In that case it seems dangerous to look through general rtxes
in the way that the default case above does. Maybe just making
sure that DISP_LOC is involved in a sum with the base would be
----------------------------------------------------------------
I know of three ways of "mutating" (for want of a better word)
1. (and X (const_int X)), to align
2. a subreg
3. a unary operator (such as truncation or extension)
a. remove outer mutations (using a helper function)
b. handle LO_SUM, PRE_*, POST_*: as now
c. otherwise treat the address of the sum of one, two or three pieces.
c1. Peel mutations of all pieces.
c2. Classify the pieces into base, index and displacement.
This would be similar to the jousting code above, but hopefully
easier because all three rtxes are to hand. E.g. we could
do the base vs. index thing in a similar way to
commutative_operand_precedence.
c3. Record which pieces were mutated (e.g. using something like the
index_loc vs. index_reg_loc distinction in the current code)
That should be general enough for current targets, but if it isn't,
we could generalise it further when we know what generalisation is needed.
That's still going to be a fair amount of code, but hopefully not more,
and we might have more confidence at each stage what each value is.
And it avoids the risk of treating "mutated" addresses as "unmutated" ones.
----------------------------------------------------------------
Just an idea though. Probably not for 4.8, although I might try it
if I find time.
I am not sure that you listed all the cases. It would be great if you
listed all the cases. In this case we could make this function more clear.
I tried to do this first but permanently found new cases. After that I
gave up and tried to use more general implementation.

This function was rewritten and modified many times. I am afraid to do
this again when clock is ticking.

It would be great if you re-implement the function according to your
ideas and we could try it on 8 targets to which LRA was already ported.
An LRA sub-branch would a perfect place to do it
Post by Richard Sandiford
It would be nice to sort out the disp_loc thing for 4.8 though.
+/* Extract address characteristics in address with location *LOC in
+ space AS. Return them in AD. Parameter OUTER_CODE for MEM should
+ be MEM. Parameter OUTER_CODE for 'p' constraint should be ADDRESS
+ and MEM_MODE should be VOIDmode. */
- *LOC is the address in a (mem ...). In this case OUTER_CODE is MEM
and AS is the mem's address space.
- *LOC is matched to an address constraint such as 'p'. In this case
OUTER_CODE is ADDRESS and AS is ADDR_SPACE_GENERIC. */
Fixed.
Post by Richard Sandiford
+/* Return start register offset of hard register REGNO in MODE. */
+int
+lra_constraint_offset (int regno, enum machine_mode mode)
+{
+ lra_assert (regno < FIRST_PSEUDO_REGISTER);
+ /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
+ multiple hard register group of scalar integer registers, so that
+ for example (reg:DI 0) and (reg:SI 1) will be considered the same
+ register. */
+ if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (mode) > UNITS_PER_WORD
+ && SCALAR_INT_MODE_P (mode))
+ return hard_regno_nregs[regno][mode] - 1;
+ return 0;
+}
/* Return the offset from REGNO of the least significant register
in (reg:MODE REGNO).
This function is used to tell whether two registers satisfy
REGNO1 + lra_constraint_offset (REGNO1, MODE1)
== REGNO2 + lra_constraint_offset (REGNO2, MODE2) */
(and remove the inner comment).
Fixed.
Post by Richard Sandiford
+/* Like rtx_equal_p except that it allows a REG and a SUBREG to match
+ if they are the same hard reg, and has special hacks for
+ auto-increment and auto-decrement. This is specifically intended for
+ process_alt_operands to use in determining whether two operands
+ match. X is the operand whose number is the lower of the two.
+
+ It is supposed that X is the output operand and Y is the input
+ operand. */
+static bool
+operands_match_p (rtx x, rtx y, int y_hard_regno)
Need to say what Y_HARD_REGNO is.
Fixed.
Post by Richard Sandiford
+ switch (code)
+ {
Ok. It looks I need another round of merging. Oh, well.
Post by Richard Sandiford
+ val = operands_match_p (XVECEXP (x, i, j), XVECEXP (y, i, j),
+ y_hard_regno);
+ if (val == 0)
+ return false;
Why do we pass the old y_hard_regno even though Y has changed?
Some of the earlier code assumes that GET_MODE (y) is the mode
of y_hard_regno.
It does not matter. As we processed reg and subreg case above and
y_hard_regno can be non-negative only for this cases. We can not
process it again. But I changed it to -1 for clearness.
Post by Richard Sandiford
+/* Reload pseudos created for matched input and output reloads whose
+ mode are different. Such pseudos has a modified rules for finding
+ their living ranges, e.g. assigning to subreg of such pseudo means
+ changing all pseudo value. */
+bitmap_head lra_bound_pseudos;
/* Reload pseudos created for matched input and output reloads whose
modes are different. Such pseudos have different live ranges from
other pseudos; e.g. any assignment to a subreg of these pseudos
changes the whole pseudo's value. */
Fixed.
Post by Richard Sandiford
Although that said, couldn't emit_move_insn_1 (called by gen_move_insn)
split a multiword pseudo move into two word moves? Using the traditional
clobber technique sounds better than having special liveness rules.
It is not only about multi-words pseudos. It is about representation of
this situation by constructions semantically incorrect in order parts of
compiler. Reload has no such problem as it does not use RTL. So I
don't think it splits as I use emit_move_insn and that calls
emit_move_insn_1 too. I really needed a special liveness treatment
(although I don't remember details) and therefore I added it. I had no
detail design for LRA. The code was modified by numerous test failures
on different targets. There is a lot of code analogous to reload one
and probably its necessity should be rigorously questioned. I thought
about and modified part of this code but unfortunately not all.

Also bound pseudos are rare. Their bitmap is very small and testing (2
lines of code in overall) them in ira-lives.c is fast.
Post by Richard Sandiford
+/* True if C is a non-empty register class that has too few registers
+ to be safely used as a reload target class. */
+#define SMALL_REGISTER_CLASS_P(C) \
+ (reg_class_size [(C)] == 1 \
+ || (reg_class_size [(C)] >= 1 && targetm.class_likely_spilled_p (C)))
Feels like ira_class_hard_regs_num might be better, but since the
current definition is traditional, that shouldn't be a merge requirement.
Ok.
Post by Richard Sandiford
+/* Return mode of WHAT inside of WHERE whose mode of the context is
+ OUTER_MODE. If WHERE does not contain WHAT, return VOIDmode. */
+static enum machine_mode
+find_mode (rtx *where, enum machine_mode outer_mode, rtx *what)
+{
+ int i, j;
+ enum machine_mode mode;
+ rtx x;
+ const char *fmt;
+ enum rtx_code code;
+
+ if (where == what)
+ return outer_mode;
+ if (*where == NULL_RTX)
+ return VOIDmode;
+ x = *where;
+ code = GET_CODE (x);
+ outer_mode = GET_MODE (x);
+ fmt = GET_RTX_FORMAT (code);
+ for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+ {
+ if (fmt[i] == 'e')
+ {
+ if ((mode = find_mode (&XEXP (x, i), outer_mode, what)) != VOIDmode)
+ return mode;
+ }
+ else if (fmt[i] == 'E')
+ {
+ for (j = XVECLEN (x, i) - 1; j >= 0; j--)
+ if ((mode = find_mode (&XVECEXP (x, i, j), outer_mode, what))
+ != VOIDmode)
+ return mode;
+ }
+ }
+ return VOIDmode;
+}
+
+/* Return mode for operand NOP of the current insn. */
+static inline enum machine_mode
+get_op_mode (int nop)
+{
+ rtx *loc;
+ enum machine_mode mode;
+ bool md_first_p = asm_noperands (PATTERN (curr_insn)) < 0;
+
+ /* Take mode from the machine description first. */
+ if (md_first_p && (mode = curr_static_id->operand[nop].mode) != VOIDmode)
+ return mode;
+ loc = curr_id->operand_loc[nop];
+ /* Take mode from the operand second. */
+ mode = GET_MODE (*loc);
+ if (mode != VOIDmode)
+ return mode;
+ if (! md_first_p && (mode = curr_static_id->operand[nop].mode) != VOIDmode)
+ return mode;
+ /* Here is a very rare case. Take mode from the context. */
+ return find_mode (&PATTERN (curr_insn), VOIDmode, loc);
+}
This looks a lot more complicated than the reload version. Why is
/* Address operands are reloaded in their existing mode,
no matter what is specified in the machine description. */
operand_mode[i] = GET_MODE (recog_data.operand[i]);
/* If the address is a single CONST_INT pick address mode
instead otherwise we will later not know in which mode
the reload should be performed. */
if (operand_mode[i] == VOIDmode)
operand_mode[i] = Pmode;
/* The mode specified in the .md file for address operands
is the mode of the addressed value, not the address itself.
We therefore need to get the mode from the operand rtx.
If the operand has no mode, assume it was Pmode. */
For other operands, recog_data.operand_mode ought to be correct.
find_mode assumes that the mode of an operand is the same as the mode of
the outer rtx, which isn't true when the outer rtx is a subreg, mem,
or one of several unary operators.
This is one that I think would be best decided for 4.8.
I found the reason for this code:

http://old.nabble.com/-lra--patch-to-fix-SPEC2000-sixtrack-compiler-crash-p32189310.html

But I tried it again and can not repeat the problem. So I reverted the
patch. If on stage3 we find a problem, we try to find a solution.
Post by Richard Sandiford
+/* If REG is a reload pseudo, try to make its class satisfying CL. */
+static void
+narrow_reload_pseudo_class (rtx reg, enum reg_class cl)
+{
+ int regno;
+ enum reg_class rclass;
+
+ /* Do not make more accurate class from reloads generated. They are
+ mostly moves with a lot of constraints. Making more accurate
+ class may results in very narrow class and impossibility of find
+ registers for several reloads of one insn. */
+ if (INSN_UID (curr_insn) >= new_insn_uid_start)
+ return;
+ if (GET_CODE (reg) == SUBREG)
+ reg = SUBREG_REG (reg);
+ if (! REG_P (reg) || (regno = REGNO (reg)) < new_regno_start)
+ return;
+ rclass = get_reg_class (regno);
+ rclass = ira_reg_class_subset[rclass][cl];
+ if (rclass == NO_REGS)
+ return;
+ change_class (regno, rclass, " Change", true);
+}
There seems to be an overlap in functionality with in_class_p here.
{
enum reg_class rclass;
if (in_class_p (reg, cl, &rclass) && rclass != NO_REGS)
change_class (REGNO (reg), rclass, " Change", true);
}
(assuming the change in in_class_p interface suggested above).
This avoids duplicating subtleties like the handling of reloads.
Fixed.
Post by Richard Sandiford
+ /* We create pseudo for out rtx because we always should keep
+ registers with the same original regno have synchronized
+ value (it is not true for out register but it will be
+ corrected by the next insn).
I don't understand this comment, sorry.
Pseudos have values -- see comments for lra_reg_info. Different pseudos
with the same value do not conflict even if they live in the same
place. When we create a pseudo we assign value of original pseudo (if
any) from which we created the new pseudo. If we create the pseudo from
the input pseudo, the new pseudo will no conflict with the input pseudo
which is wrong when the input pseudo lives after the insn and as the new
pseudo value is changed by the insn output. Therefore we create the new
pseudo from the output.

I hope it is more understandable. I changed the comment.
Post by Richard Sandiford
+ Do not reuse register because of the following situation: a <-
+ a op b, and b should be the same as a. */
We cannot reuse the current output register because we might
have a situation like "a <- a op b", where the constraints force
the second input operand ("b") to match the output operand ("a").
"b" must then be copied into a new register so that it doesn't
clobber the current value of "a". */
Ok. Fixed.
Post by Richard Sandiford
We should probably keep the other reason too, of course.
+ /* Don't generate inheritance for the new register because we
+ can not use the same hard register for the corresponding
+ inheritance pseudo for input reload. */
+ bitmap_set_bit (&lra_matched_pseudos, REGNO (new_in_reg));
Suggest dropping this comment, since we don't do any inheritance here.
The comment above lra_matched_pseudos already says the same thing.
Fixed.
Post by Richard Sandiford
+ /* In and out operand can be got from transformations before
+ processing constraints. So the pseudos might have inaccurate
+ class and we should make their classes more accurate. */
+ narrow_reload_pseudo_class (in_rtx, goal_class);
+ narrow_reload_pseudo_class (out_rtx, goal_class);
I don't understand this, sorry. Does "transformations" mean inheritance
and reload splitting? So the registers we're changing here are inheritance
and split pseudos rather than reload pseudos created for this instruction?
If so, it sounds on face value like it conflicts with the comment quoted
above about not allowing reload instructions to the narrow the class
of pseudos. Might be worth saying why that's OK here but not there.
Again, inheritance and splitting is done after the constraint pass.

The transformations here are mostly reloading of subregs which is done
before reloads for given insn. On this transformation we create new
pseudos for which we don't know reg class yet. In case we don't know
pseudo reg class yet, we assign ALL_REGS to the pseudo.
Post by Richard Sandiford
Also, I'm not sure I understand why it helps. Is it just trying
to encourage the pseudos to form a chain in lra-assigns.c?
E.g. MIPS16 has several instructions that require matched MIPS16 registers.
However, moves between MIPS16 registers and general registers are as cheap
as moves between two MIPS16 registers, so narrowing the reloaded values
from GENERAL_REGS to M16_REGS (if that ever happens) wouldn't necessarily
be a good thing.
Not saying this is wrong, just that it might need more commentary
to justify it.
+ for (i = 0; (in = ins[i]) >= 0; i++)
+ *curr_id->operand_loc[in] = new_in_reg;
The code assumes that all input operands have the same mode.
Probably worth asserting that here (or maybe further up; I don't mind),
just to make the assumption explicit.
I added an assert.
Post by Richard Sandiford
+/* Return final hard regno (plus offset) which will be after
+ elimination. We do this for matching constraints because the final
+ hard regno could have a different class. */
+static int
+get_final_hard_regno (int hard_regno, int offset)
+{
+ if (hard_regno < 0)
+ return hard_regno;
+ hard_regno += offset;
+ return lra_get_elimation_hard_regno (hard_regno);
Why apply the offset before rather than after elimination?
AIUI, AVR's eliminable registers span more than one hard register,
and the elimination is based off the first.
I fixed it.
Post by Richard Sandiford
Also, all uses but one of lra_get_hard_regno_and_offset follow
lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
/* The real hard regno of the operand after the allocation. */
x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
so couldn't lra_get_hard_regno_and_offset just return the final
hard register, including elimination? Then it could apply the
elimination on the original rtx.
lra_get_hard_regno_and_offset (x, &i, &offset);
if (i < 0)
goto slow;
i += offset;
but I'm not sure why this is the only caller that would want
to ignore elimination.
???
Post by Richard Sandiford
+/* Return register class of OP. That is a class of the hard register
+ itself (if OP is a hard register), or class of assigned hard
+ register to the pseudo (if OP is pseudo), or allocno class of
+ unassigned pseudo (if OP is reload pseudo). Return NO_REGS
+ otherwise. */
+static enum reg_class
+get_op_class (rtx op)
+{
+ int regno, hard_regno, offset;
+
+ if (! REG_P (op))
+ return NO_REGS;
+ lra_get_hard_regno_and_offset (op, &hard_regno, &offset);
+ if (hard_regno >= 0)
+ {
+ hard_regno = get_final_hard_regno (hard_regno, offset);
+ return REGNO_REG_CLASS (hard_regno);
+ }
+ /* Reload pseudo will get a hard register in any case. */
+ if ((regno = REGNO (op)) >= new_regno_start)
+ return lra_get_allocno_class (regno);
+ return NO_REGS;
+}
return REG_P (x) ? get_reg_class (REGNO (x)) : NO_REGS;
If not, I think there should be a comment explaining the difference.
/* If OP is a register, return the class of the register as per
get_reg_class, otherwise return NO_REGS. */
The difference is in elimination. But if i add elimination to
get_reg_class, it will be the same. I think it is right to do. It
permits to remove elimination code in process_addr_reg too. So I
modified the code. It looks more brief and logical.
Post by Richard Sandiford
+/* Return generated insn mem_pseudo:=val if TO_P or val:=mem_pseudo
+ otherwise. If modes of MEM_PSEUDO and VAL are different, use
+ SUBREG for VAL to make them equal. Assign CODE to the insn if it
+ is not recognized.
+
+ We can not use emit_move_insn in some cases because of bad used
+ practice in some machine descriptions. For example, power can use
+ only base+index addressing for altivec move insns and it is checked
+ by insn predicates. On the other hand, the same move insn
+ constraints permit to use offsetable memory for moving vector mode
+ values from/to general registers to/from memory. emit_move_insn
+ will transform offsetable address to one with base+index addressing
+ which is rejected by the constraint. So sometimes we need to
+ generate move insn without modifications and assign the code
+ explicitly because the generated move can be unrecognizable because
+ of the predicates. */
Ick :-) Can't we just say that fixing this is part of the process
of porting a target to LRA? It'd be nice not to carry hacks like
this around in shiny new code.
It would be great but I don't expect that target maintainers will be so
cooperative. So my goal was to write LRA requiring minimum changes in
target code or no changes at all.
Even with keeping this goal, I am a bit pessimistic about how much time
will be needed to remove reload. With such requirements not to use
hacks, it would take forever.

The biggest number of tricks I saw was PPC. I spent a lot of time
porting LRA to it. And there are a lot of code in rs6000.c for LRA.
Post by Richard Sandiford
+static rtx
+emit_spill_move (bool to_p, rtx mem_pseudo, rtx val, int code)
+{
+ rtx insn, after;
+
+ start_sequence ();
+ if (GET_MODE (mem_pseudo) != GET_MODE (val))
+ val = gen_rtx_SUBREG (GET_MODE (mem_pseudo),
+ GET_CODE (val) == SUBREG ? SUBREG_REG (val) : val,
+ 0);
+ if (to_p)
+ insn = gen_move_insn (mem_pseudo, val);
+ else
+ insn = gen_move_insn (val, mem_pseudo);
+ if (recog_memoized (insn) < 0)
+ INSN_CODE (insn) = code;
+ emit_insn (insn);
+ after = get_insns ();
+ end_sequence ();
+ return after;
+}
this recog_memoized code effectively assumes that INSN is just one
instruction, whereas emit_move_insn_1 or the backend move expanders
could split moves into several instructions.
Since the code-forcing stuff is for rs6000, I think we could drop it
from 4.8 whatever happens.
The sequence stuff above looks redundant; we should just return
INSN directly.
Ok. I fixed. Although it makes my life harder as some targets will be
broken on the branch after all these changes.
Post by Richard Sandiford
+ /* Quick check on the right move insn which does not need
+ reloads. */
+ if ((dclass = get_op_class (dest)) != NO_REGS
+ && (sclass = get_op_class (src)) != NO_REGS
+ && targetm.register_move_cost (GET_MODE (src), dclass, sclass) == 2)
+ return true;
/* The backend guarantees that register moves of cost 2 never need
reloads. */
Fixed.
Post by Richard Sandiford
+ if (GET_CODE (dest) == SUBREG)
+ dreg = SUBREG_REG (dest);
+ if (GET_CODE (src) == SUBREG)
+ sreg = SUBREG_REG (src);
+ if (! REG_P (dreg) || ! REG_P (sreg))
+ return false;
+ sclass = dclass = NO_REGS;
+ dr = get_equiv_substitution (dreg);
+ if (dr != dreg)
+ dreg = copy_rtx (dr);
I think this copy is too early, because there are quite a few
conditions under which we never emit anything with DREG in it.
Ok, fixed.
Post by Richard Sandiford
+ if (REG_P (dreg))
+ dclass = get_reg_class (REGNO (dreg));
+ if (dclass == ALL_REGS)
+ /* We don't know what class we will use -- let it be figured out
+ by curr_insn_transform function. Remember some targets does not
+ work with such classes through their implementation of
+ machine-dependent hooks like secondary_memory_needed. */
+ return false;
Don't really understand this comment, sorry.
Again ALL_REGS is used for new pseudos created by transformation like
reload of SUBREG_REG. We don't know its class yet. We should figure
out the class from processing the insn constraints not in this fast path
function. Even if ALL_REGS were a right class for the pseudo,
secondary_... hooks usually are not define for ALL_REGS.

I fixed the comment.
Post by Richard Sandiford
+ sreg_mode = GET_MODE (sreg);
+ sr = get_equiv_substitution (sreg);
+ if (sr != sreg)
+ sreg = copy_rtx (sr);
This copy also seems too early.
Fixed.
Post by Richard Sandiford
+ sri.prev_sri = NULL;
+ sri.icode = CODE_FOR_nothing;
+ sri.extra_cost = 0;
+ secondary_class = NO_REGS;
+ /* Set up hard register for a reload pseudo for hook
+ secondary_reload because some targets just ignore unassigned
+ pseudos in the hook. */
+ if (dclass != NO_REGS
+ && REG_P (dreg) && (dregno = REGNO (dreg)) >= new_regno_start
+ && lra_get_regno_hard_regno (dregno) < 0)
+ reg_renumber[dregno] = ira_class_hard_regs[dclass][0];
+ else
+ dregno = -1;
+ if (sclass != NO_REGS
+ && REG_P (sreg) && (sregno = REGNO (sreg)) >= new_regno_start
+ && lra_get_regno_hard_regno (sregno) < 0)
+ reg_renumber[sregno] = ira_class_hard_regs[sclass][0];
+ else
+ sregno = -1;
&& REG_P (dreg) && (dregno = REGNO (dreg)) >= new_regno_start
the decision about when to return a register class for unallocated pseudos
is then localised to get_reg_class rather than copied both here and there.
Right. fixed.
Post by Richard Sandiford
+ if (sclass != NO_REGS)
+ secondary_class
+ = (enum reg_class) targetm.secondary_reload (false, dest,
+ (reg_class_t) sclass,
+ GET_MODE (src), &sri);
+ if (sclass == NO_REGS
+ || ((secondary_class != NO_REGS || sri.icode != CODE_FOR_nothing)
+ && dclass != NO_REGS))
+ secondary_class
+ = (enum reg_class) targetm.secondary_reload (true, sreg,
+ (reg_class_t) dclass,
+ sreg_mode, &sri);
Hmm, so for register<-register moves, if the target says that the output
reload needs a secondary reload, we try again with an input reload and
hope for a different answer?
If the target is giving different answers in that case, I think that's
a bug in the target, and we should assert instead. The problem is that
if we allow the answers to be different, and both answers involve
secondary reloads, we have no way of knowing whether the second answer
is easier to implement or "more correct" than the first. An assert
avoids that, and puts the onus on the target to sort itself out.
Again, as long as x86 is free of this bug for 4.8, I don't the merge
needs to cater for broken targets.
I added an assert.
Post by Richard Sandiford
+ *change_p = true;
I think this is the point at which substituted values should be copied.
Fixed.
Post by Richard Sandiford
+ new_reg = NULL_RTX;
+ if (secondary_class != NO_REGS)
+ new_reg = lra_create_new_reg_with_unique_value (sreg_mode, NULL_RTX,
+ secondary_class,
+ "secondary");
+ start_sequence ();
+ if (sri.icode == CODE_FOR_nothing)
+ lra_emit_move (new_reg, sreg);
+ else
+ {
+ enum reg_class scratch_class;
+
+ scratch_class = (reg_class_from_constraints
+ (insn_data[sri.icode].operand[2].constraint));
+ scratch_reg = (lra_create_new_reg_with_unique_value
+ (insn_data[sri.icode].operand[2].mode, NULL_RTX,
+ scratch_class, "scratch"));
+ emit_insn (GEN_FCN (sri.icode) (new_reg != NULL_RTX ? new_reg : dest,
+ sreg, scratch_reg));
+ }
+ before = get_insns ();
+ end_sequence ();
+ lra_process_new_insns (curr_insn, before, NULL_RTX, "Inserting the move");
AIUI, the constraints pass will look at these instructions and generate
what are now known as tertiary reloads where needed (by calling this
function again). Is that right? Very nice if so: that's far more
natural than the current reload handling.
Yes.
Post by Richard Sandiford
+/* The chosen reg classes which should be used for the corresponding
+ operands. */
+static enum reg_class goal_alt[MAX_RECOG_OPERANDS];
+/* True if the operand should be the same as another operand and the
+ another operand does not need a reload. */
s/and the another/and that other/
Fixed.
Post by Richard Sandiford
+/* Make reloads for addr register in LOC which should be of class CL,
+ add reloads to list BEFORE. If AFTER is not null emit insns to set
+ the register up after the insn (it is case of inc/dec, modify). */
/* Arrange for address element *LOC to be a register of class CL.
Add any input reloads to list BEFORE. AFTER is nonnull if *LOC is an
automodified value; handle that case by adding the required output
reloads to list AFTER. Return true if the RTL was changed. */
Fixed.
Post by Richard Sandiford
+static bool
+process_addr_reg (rtx *loc, rtx *before, rtx *after, enum reg_class cl)
+{
+ int regno, final_regno;
+ enum reg_class rclass, new_class;
+ rtx reg = *loc;
+ rtx new_reg;
+ enum machine_mode mode;
+ bool change_p = false;
+
+ mode = GET_MODE (reg);
+ if (! REG_P (reg))
+ {
+ /* Always reload memory in an address even if the target
+ supports such addresses. */
+ new_reg
+ = lra_create_new_reg_with_unique_value (mode, reg, cl, "address");
+ push_to_sequence (*before);
+ lra_emit_move (new_reg, reg);
+ *before = get_insns ();
+ end_sequence ();
+ *loc = new_reg;
+ if (after != NULL)
+ {
+ start_sequence ();
+ lra_emit_move (reg, new_reg);
+ emit_insn (*after);
+ *after = get_insns ();
+ end_sequence ();
+ }
+ return true;
Why does this need to be a special case, rather than reusing the
I simplified the function factoring common code.
Post by Richard Sandiford
+ }
+ lra_assert (REG_P (reg));
+ final_regno = regno = REGNO (reg);
+ if (regno < FIRST_PSEUDO_REGISTER)
+ {
+ rtx final_reg = reg;
+ rtx *final_loc = &final_reg;
+
+ lra_eliminate_reg_if_possible (final_loc);
+ final_regno = REGNO (*final_loc);
+ }
+ /* Use class of hard register after elimination because some targets
+ do not recognize virtual hard registers as valid address
+ registers. */
+ rclass = get_reg_class (final_regno);
+ if ((*loc = get_equiv_substitution (reg)) != reg)
+ {
+ if (lra_dump_file != NULL)
+ {
+ fprintf (lra_dump_file,
+ "Changing pseudo %d in address of insn %u on equiv ",
+ REGNO (reg), INSN_UID (curr_insn));
+ print_value_slim (lra_dump_file, *loc, 1);
+ fprintf (lra_dump_file, "\n");
+ }
+ *loc = copy_rtx (*loc);
+ change_p = true;
+ }
+ if (*loc != reg || ! in_class_p (final_regno, GET_MODE (reg), cl, &new_class))
+ {
+ reg = *loc;
+ if (get_reload_reg (OP_IN, mode, reg, cl, "address", &new_reg))
+ {
+ push_to_sequence (*before);
+ lra_emit_move (new_reg, reg);
+ *before = get_insns ();
+ end_sequence ();
+ }
+ *loc = new_reg;
+ if (after != NULL)
+ {
+ start_sequence ();
+ lra_emit_move (reg, new_reg);
+ emit_insn (*after);
+ *after = get_insns ();
+ end_sequence ();
+ }
+ change_p = true;
+ }
+ else if (new_class != NO_REGS && rclass != new_class)
+ change_class (regno, new_class, " Change", true);
+ return change_p;
+}
if ((*loc = get_equiv_substitution (reg)) != reg)
...as above...
if (*loc != reg || !in_class_p (reg, cl, &new_class))
...as above...
else if (new_class != NO_REGS && rclass != new_class)
change_class (regno, new_class, " Change", true);
return change_p;
(assuming change to in_class_p suggested earlier) seems like it
covers the same cases.
Also, should OP_IN be OP_INOUT for after != NULL, so that we don't try
to reuse existing reload pseudos? That would mean changing get_reload_reg
(both commentary and code) to handle OP_INOUT like OP_OUT.
Fixed.
Post by Richard Sandiford
Or maybe just pass OP_OUT instead of OP_INOUT, if that's more consistent.
I don't mind which.
+ /* Force reload if this is a constant or PLUS or if there may be a
+ problem accessing OPERAND in the outer mode. */
/* Force a reload of the SUBREG_REG if this ...
Fixed.
Post by Richard Sandiford
+ /* Constant mode ???? */
+ enum op_type type = curr_static_id->operand[nop].type;
Not sure what the comment means, but REG is still the original SUBREG_REG,
so there shouldn't be any risk of a VOIDmode constant. (subreg (const_int))
is invalid rtl.
I removed the comment. It was probably a question for myself.
Post by Richard Sandiford
+/* Return TRUE if *LOC refers for a hard register from SET. */
+static bool
+uses_hard_regs_p (rtx *loc, HARD_REG_SET set)
+{
Nothing seems to care about the address, so we would pass the rtx
rather than a pointer to it.
Fixed.
Post by Richard Sandiford
+ int i, j, x_hard_regno, offset;
+ enum machine_mode mode;
+ rtx x;
+ const char *fmt;
+ enum rtx_code code;
+
+ if (*loc == NULL_RTX)
+ return false;
+ x = *loc;
+ code = GET_CODE (x);
+ mode = GET_MODE (x);
+ if (code == SUBREG)
+ {
+ loc = &SUBREG_REG (x);
+ x = SUBREG_REG (x);
+ code = GET_CODE (x);
+ if (GET_MODE_SIZE (GET_MODE (x)) > GET_MODE_SIZE (mode))
+ mode = GET_MODE (x);
+ }
+
+ if (REG_P (x))
+ {
+ lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
+ /* The real hard regno of the operand after the allocation. */
+ x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
+ return (x_hard_regno >= 0
+ && lra_hard_reg_set_intersection_p (x_hard_regno, mode, set));
With the subreg mode handling above, this looks little-endian specific.
+ if (MEM_P (x))
+ {
+ struct address ad;
+ enum machine_mode mode = GET_MODE (x);
+ rtx *addr_loc = &XEXP (x, 0);
+
+ extract_address_regs (mode, MEM_ADDR_SPACE (x), addr_loc, MEM, &ad);
+ if (ad.base_reg_loc != NULL)
+ {
+ if (uses_hard_regs_p (ad.base_reg_loc, set))
+ return true;
+ }
+ if (ad.index_reg_loc != NULL)
+ {
+ if (uses_hard_regs_p (ad.index_reg_loc, set))
+ return true;
+ }
+ }
is independent of the subreg handling, so perhaps the paradoxical subreg
case should be handled separately, using simplify_subreg_regno.
+/* Major function to choose the current insn alternative and what
+ operands should be reloaded and how. If ONLY_ALTERNATIVE is not
+ negative we should consider only this alternative. Return false if
+ we can not choose the alternative or find how to reload the
+ operands. */
+static bool
+process_alt_operands (int only_alternative)
+{
+ bool ok_p = false;
+ int nop, small_class_operands_num, overall, nalt, offset;
+ int n_alternatives = curr_static_id->n_alternatives;
+ int n_operands = curr_static_id->n_operands;
+ /* LOSERS counts those that don't fit this alternative and would
+ require loading. */
+ int losers;
s/those/the operands/
Fixed.
Post by Richard Sandiford
+ /* Calculate some data common for all alternatives to speed up the
+ function. */
+ for (nop = 0; nop < n_operands; nop++)
+ {
+ op = no_subreg_operand[nop] = *curr_id->operand_loc[nop];
+ lra_get_hard_regno_and_offset (op, &hard_regno[nop], &offset);
+ /* The real hard regno of the operand after the allocation. */
+ hard_regno[nop] = get_final_hard_regno (hard_regno[nop], offset);
+
+ operand_reg[nop] = op;
+ biggest_mode[nop] = GET_MODE (operand_reg[nop]);
+ if (GET_CODE (operand_reg[nop]) == SUBREG)
+ {
+ operand_reg[nop] = SUBREG_REG (operand_reg[nop]);
+ if (GET_MODE_SIZE (biggest_mode[nop])
+ < GET_MODE_SIZE (GET_MODE (operand_reg[nop])))
+ biggest_mode[nop] = GET_MODE (operand_reg[nop]);
+ }
+ if (REG_P (operand_reg[nop]))
+ no_subreg_operand[nop] = operand_reg[nop];
+ else
+ operand_reg[nop] = NULL_RTX;
This looks odd: no_subreg_operand ends up being a subreg if the
SUBREG_REG wasn't a REG. Some more commentary might help.
Probably I should have a better name no_subreg_reg_operand. I also
added comments for its definition and definition of operand_reg.
Post by Richard Sandiford
+ /* The constraints are made of several alternatives. Each operand's
+ constraint looks like foo,bar,... with commas separating the
+ alternatives. The first alternatives for all operands go
+ together, the second alternatives go together, etc.
+
+ First loop over alternatives. */
+ for (nalt = 0; nalt < n_alternatives; nalt++)
+ {
+ /* Loop over operands for one constraint alternative. */
+ if (
+#ifdef HAVE_ATTR_enabled
+ (curr_id->alternative_enabled_p != NULL
+ && ! curr_id->alternative_enabled_p[nalt])
+ ||
+#endif
+ (only_alternative >= 0 && nalt != only_alternative))
+ continue;
Fixed.
Post by Richard Sandiford
#ifdef HAVE_ATTR_enabled
if (curr_id->alternative_enabled_p != NULL
&& !curr_id->alternative_enabled_p[nalt])
continue;
#endif
if (only_alternative >= 0 && nalt != only_alternative))
continue;
+ for (nop = 0; nop < n_operands; nop++)
+ {
+ const char *p;
+ char *end;
+ int len, c, m, i, opalt_num, this_alternative_matches;
+ bool win, did_match, offmemok, early_clobber_p;
+ /* false => this operand can be reloaded somehow for this
+ alternative. */
+ bool badop;
+ /* false => this operand can be reloaded if the alternative
+ allows regs. */
+ bool winreg;
+ /* False if a constant forced into memory would be OK for
+ this operand. */
+ bool constmemok;
+ enum reg_class this_alternative, this_costly_alternative;
+ HARD_REG_SET this_alternative_set, this_costly_alternative_set;
+ bool this_alternative_match_win, this_alternative_win;
+ bool this_alternative_offmemok;
+ int invalidate_m;
+ enum machine_mode mode;
+
+ opalt_num = nalt * n_operands + nop;
+ if (curr_static_id->operand_alternative[opalt_num].anything_ok)
+ {
+ /* Fast track for no constraints at all. */
+ curr_alt[nop] = NO_REGS;
+ CLEAR_HARD_REG_SET (curr_alt_set[nop]);
+ curr_alt_win[nop] = true;
+ curr_alt_match_win[nop] = false;
+ curr_alt_offmemok[nop] = false;
+ curr_alt_matches[nop] = -1;
+ continue;
+ }
Given that this code is pretty complex, it might be clearer to remove
I'd rather not to do this. Using array element instead of simple
variable occurrences in about 50 places. I don't think it makes code
cleaner.
Post by Richard Sandiford
curr_alt[nop] = NO_REGS;
CLEAR_HARD_REG_SET (curr_alt_set[nop]);
curr_alt_win[nop] = false;
curr_alt_match_win[nop] = false;
curr_alt_offmemok[nop] = false;
curr_alt_matches[nop] = -1;
opalt_num = nalt * n_operands + nop;
if (curr_static_id->operand_alternative[opalt_num].anything_ok)
{
/* Fast track for no constraints at all. */
curr_alt_win[nop] = true;
continue;
}
+ /* We update set of possible hard regs besides its class
+ because reg class might be inaccurate. For example,
+ union of LO_REGS (l), HI_REGS(h), and STACK_REG(k) in ARM
+ is translated in HI_REGS because classes are merged by
+ pairs and there is no accurate intermediate class. */
somewhere though, either here or above the declaration of curr_alt_set.
+ /* We are supposed to match a previous operand.
+ If we do, we win if that one did. If we do
+ not, count both of the operands as losers.
+ (This is too conservative, since most of the
+ time only a single reload insn will be needed
+ to make the two operands win. As a result,
+ this alternative may be rejected when it is
+ actually desirable.) */
+ /* If it conflicts with others. */
Last line looks incomplete/misplaced.
I have no idea where/what it should be. I removed it.
Post by Richard Sandiford
+ match_p = false;
+ if (operands_match_p (*curr_id->operand_loc[nop],
+ *curr_id->operand_loc[m], m_hregno))
+ {
+ int i;
+
+ for (i = 0; i < early_clobbered_regs_num; i++)
+ if (early_clobbered_nops[i] == m)
+ break;
+ /* We should reject matching of an early
+ clobber operand if the matching operand is
+ not dying in the insn. */
+ if (i >= early_clobbered_regs_num
Why not simply use operands m's early_clobber field?
Ok. Fixed.
Post by Richard Sandiford
+ || operand_reg[nop] == NULL_RTX
+ || (find_regno_note (curr_insn, REG_DEAD,
+ REGNO (operand_reg[nop]))
+ != NULL_RTX))
+ match_p = true;
...although I don't really understand this condition. If the two
operands are the same value X, then X must die here whatever the
notes say. So I assume this is coping with a case where the operands
are different but still match. If so, could you give an example?
I remember I saw such insn but I don't remember details.
Post by Richard Sandiford
Matched earlyclobbers explicitly guarantee that the earlyclobber doesn't
apply to the matched input operand; the earlyclobber only applies to
other input operands. So I'd have expected it was those operands
that might need reloading rather than this one.
E.g. if X occurs three times, twice in a matched earlyclobber pair
and once as an independent operand, it's the latter operand that would
need reloading.
Yes, I know.
Post by Richard Sandiford
+ /* Operands don't match. */
+ /* Retroactively mark the operand we had to
+ match as a loser, if it wasn't already and
+ it wasn't matched to a register constraint
+ (e.g it might be matched by memory). */
+ if (curr_alt_win[m]
+ && (operand_reg[m] == NULL_RTX
+ || hard_regno[m] < 0))
+ {
+ losers++;
+ if (curr_alt[m] != NO_REGS)
+ reload_nregs
+ += (ira_reg_class_max_nregs[curr_alt[m]]
+ [GET_MODE (*curr_id->operand_loc[m])]);
+ }
+ invalidate_m = m;
+ if (curr_alt[m] == NO_REGS)
+ continue;
I found this a bit confusing. If the operands don't match and operand m
allows no registers, don't we have to reject this constraint outright?
Yes, this function as in reload can be investigated for long time. I
cleared it a bit. May be it is right time to clear it up more although
time is a scarce resource.
I tried your variant. I did not find serious problems on x86 which
should be concerned of this code. So I am using it.
Post by Richard Sandiford
/* Operands don't match. Both operands must
allow a reload register, otherwise we cannot
make them match. */
if (curr_alt[m] == NO_REGS)
break;
/* Retroactively mark the operand we had to
match as a loser, if it wasn't already and
it wasn't matched to a register constraint
(e.g it might be matched by memory). */
if (curr_alt_win[m]
&& (operand_reg[m] == NULL_RTX
|| hard_regno[m] < 0))
{
losers++;
reload_nregs
+= (ira_reg_class_max_nregs[curr_alt[m]]
[GET_MODE (*curr_id->operand_loc[m])]);
}
+ /* This can be fixed with reloads if the operand
+ we are supposed to match can be fixed with
+ reloads. */
+ badop = false;
+ this_alternative = curr_alt[m];
+ COPY_HARD_REG_SET (this_alternative_set, curr_alt_set[m]);
+
+ /* If we have to reload this operand and some
+ previous operand also had to match the same
+ thing as this operand, we don't know how to do
+ that. So reject this alternative. */
+ if (! did_match)
+ for (i = 0; i < nop; i++)
+ if (curr_alt_matches[i] == this_alternative_matches)
+ badop = true;
OK, so this is another case of cruft from reload that I'd like to remove,
/* If we have to reload this operand and some previous
operand also had to match the same thing as this
operand, we don't know how to do that. */
if (!match_p || !curr_alt_win[m])
{
for (i = 0; i < nop; i++)
if (curr_alt_matches[i] == m)
break;
if (i < nop)
break;
}
else
---> did_match = true;
/* This can be fixed with reloads if the operand
we are supposed to match can be fixed with
reloads. */
---> this_alternative_matches = m;
---> invalidate_m = m;
badop = false;
this_alternative = curr_alt[m];
COPY_HARD_REG_SET (this_alternative_set, curr_alt_set[m]);
(although a helper function might be better than the awkward breaking)?
Note that the ---> lines have moved from further up.
This is the only time in the switch statement where one constraint
in a constraint string uses "badop = true" to reject the operand.
I.e. for "<something else>0" we should normally not reject the
alternative based solely on the "0", since the "<something else>"
might have been satisfied instead. And we should only record matching
information if we've decided the match can be implemented by reloads
(the last block).
Yes, that is bad. Especially it is not documented but people use this.

I tried it and I did not a problem with this. So I use your variant
(with one modification setting invalidate_m only for !did_match).
Post by Richard Sandiford
+ /* We prefer no matching alternatives because
+ it gives more freedom in RA. */
+ if (operand_reg[nop] == NULL_RTX
+ || (find_regno_note (curr_insn, REG_DEAD,
+ REGNO (operand_reg[nop]))
+ == NULL_RTX))
+ reject += 2;
Looks like a new reject rule. I agree it makes conceptual sense though,
so I'm all for it.
+ || (REG_P (op)
+ && REGNO (op) >= FIRST_PSEUDO_REGISTER
+ && in_mem_p (REGNO (op))))
This pattern occurs several times. I think a helper function like
spilled_reg_p (op) would help. See 'g' below.
Ok. Fixed.
Post by Richard Sandiford
+ if (CONST_INT_P (op)
+ || (GET_CODE (op) == CONST_DOUBLE && mode == VOIDmode))
...
+ if (CONST_INT_P (op)
+ || (GET_CODE (op) == CONST_DOUBLE && mode == VOIDmode))
After recent changes these should be CONST_SCALAR_INT_P (op)
OK.
Post by Richard Sandiford
+ if (/* A PLUS is never a valid operand, but LRA can
+ make it from a register when eliminating
+ registers. */
+ GET_CODE (op) != PLUS
+ && (! CONSTANT_P (op) || ! flag_pic
+ || LEGITIMATE_PIC_OPERAND_P (op))
+ && (! REG_P (op)
+ || (REGNO (op) >= FIRST_PSEUDO_REGISTER
+ && in_mem_p (REGNO (op)))))
if (MEM_P (op)
|| spilled_reg_p (op)
|| general_constant_p (op))
win = true;
(CONSTANT_P (op)
&& (! flag_pic || LEGITIMATE_PIC_OPERAND_P (op)))
general_constant_p probably ought to go in a common file because several
places need this condition (including other parts of this switch statement).
OK. Fixed.
Post by Richard Sandiford
+#ifdef EXTRA_CONSTRAINT_STR
+ if (EXTRA_MEMORY_CONSTRAINT (c, p))
+ {
+ if (EXTRA_CONSTRAINT_STR (op, c, p))
+ win = true;
+ /* For regno_equiv_mem_loc we have to
+ check. */
+ else if (REG_P (op)
+ && REGNO (op) >= FIRST_PSEUDO_REGISTER
+ && in_mem_p (REGNO (op)))
Looks like an old comment from an earlier iteration. There doesn't
seem to be a function called regno_equiv_mem_loc in the current patch.
But...
Yes, it is from early version. I removed it.
Post by Richard Sandiford
+ {
+ /* We could transform spilled memory
+ finally to indirect memory. */
+ if (EXTRA_CONSTRAINT_STR
+ (get_indirect_mem (mode), c, p))
+ win = true;
+ }
...is this check really needed? It's a documented requirement that memory
I removed it. At least it works for x86/x86-64.
Post by Richard Sandiford
+ /* If we didn't already win, we can reload
+ constants via force_const_mem, and other
+ MEMs by reloading the address like for
+ 'o'. */
+ if (CONST_POOL_OK_P (mode, op) || MEM_P (op))
+ badop = false;
It seems a bit inconsistent to treat a spilled pseudo whose address
might well need reloading as a win, while not treating existing MEMs
whose addresses need reloading as a win.
Well, probability of reloading address of spilled pseudo is very small
on most targets but reloading for MEM in this case is real. So I see it
logical.
Post by Richard Sandiford
+ if (EXTRA_CONSTRAINT_STR (op, c, p))
+ win = true;
+ else if (REG_P (op)
+ && REGNO (op) >= FIRST_PSEUDO_REGISTER
+ && in_mem_p (REGNO (op)))
+ {
+ /* We could transform spilled memory finally
+ to indirect memory. */
+ if (EXTRA_CONSTRAINT_STR (get_indirect_mem (mode),
+ c, p))
+ win = true;
+ }
I don't understand why there's two copies of this. I think we have
to trust the target's classification of constraints, so if the target
says that something isn't a memory constraint, we shouldn't check the
(mem (base)) case.
I removed it too.
Post by Richard Sandiford
+ if (c != ' ' && c != '\t')
+ costly_p = c == '*';
I think there needs to be a comment somewhere saying how we handle this.
Being costly seems to contribute one reject point (i.e. a sixth of a '?')
compared to the normal case, which is very different from the current
reload behaviour. We should probably update the "*" documentation
in md.texi too.
Yes. It is different. New heuristics result in a better code generation.
Post by Richard Sandiford
Some targets use "*" constraints to make sure that floating-point
registers don't get used as spill space in purely integer code
(so that task switches don't pay the FPU save/restore penalty).
Would that still "work" with this definition? (FWIW, I think
using "*" is a bad way to achieve this feature, just asking.)
There are two places for processing '*'. One is in ira-cost.c for
choose classes. Therefore I believe it will work. At least I did not
find any problems on 8 targets. I changed md.texi too.
Post by Richard Sandiford
+ /* We simulate the behaviour of old reload here.
+ Although scratches need hard registers and it
+ might result in spilling other pseudos, no reload
+ insns are generated for the scratches. So it
+ might cost something but probably less than old
+ reload pass believes. */
+ if (lra_former_scratch_p (REGNO (operand_reg[nop])))
+ reject += LOSER_COST_FACTOR;
Yeah, this caused me no end of trouble when tweaking the MIPS
multiply-accumulate patterns. However, unlike the other bits of
cruft I've been complaining about, this is one where I can't think
of any alternative that makes more inherent sense (to me). So I agree
that leaving it as-is is the best approach for now.
+ /* If the operand is dying, has a matching constraint,
+ and satisfies constraints of the matched operand
+ which failed to satisfy the own constraints, we do
+ not need to generate a reload insn for this
+ operand. */
+ if (this_alternative_matches < 0
+ || curr_alt_win[this_alternative_matches]
+ || ! REG_P (op)
+ || find_regno_note (curr_insn, REG_DEAD,
+ REGNO (op)) == NULL_RTX
+ || ((hard_regno[nop] < 0
+ || ! in_hard_reg_set_p (this_alternative_set,
+ mode, hard_regno[nop]))
+ && (hard_regno[nop] >= 0
+ || ! in_class_p (REGNO (op), GET_MODE (op),
+ this_alternative, NULL))))
+ losers++;
Fixed.
Post by Richard Sandiford
if (!(this_alternative_matches >= 0
&& !curr_alt_win[this_alternative_matches]
&& REG_P (op)
&& find_regno_note (curr_insn, REG_DEAD, REGNO (op))
&& (hard_regno[nop] >= 0
? in_hard_reg_set_p (this_alternative_set,
mode, hard_regno[nop])
: in_class_p (op, this_alternative, NULL))))
losers++;
+ if (operand_reg[nop] != NULL_RTX)
+ {
+ int last_reload = (lra_reg_info[ORIGINAL_REGNO
+ (operand_reg[nop])]
+ .last_reload);
+
+ if (last_reload > bb_reload_num)
+ reload_sum += last_reload;
+ else
+ reload_sum += bb_reload_num;
+/* Overall number reflecting distances of previous reloading the same
+ value. It is used to improve inheritance chances. */
+static int best_reload_sum;
That is a wrong comment. It should be

/* Overall number reflecting distances of previous reloading the same
value. The distances are counted from the current BB start. It is
used to improve inheritance chances. */

I fixed it. I am also decreasing the number by bb_reload_num every time
when I increase reload_sum.
Post by Richard Sandiford
which made me think of distance from the current instruction. I see
it's actually something else, effectively a sum of instruction numbers.
I assumed the idea was to prefer registers that were reloaded more
recently (closer the current instruction). In that case I thought that,
for a distance-based best_reload_sum, smaller would be better,
while for an instruction-number-based best_reload_sum, larger would
be better. It looks like we use instruction-number based best_reload_sums
+ && (reload_nregs < best_reload_nregs
+ || (reload_nregs == best_reload_nregs
+ && best_reload_sum < reload_sum))))))))
Is that intentional?
Now it has sense the bigger number, the closer the last reloading to the
current insn.
Post by Richard Sandiford
Also, is this value meaningful for output reloads, which aren't really
going to be able to inherit a value as such? We seem to apply the cost
regardless of whether it's an input or an output, so probably deserves
a comment.
Same for matched input operands, which as you say elsewhere aren't
inherited.
Right. It could improve the heuristic more. I added the code.
Post by Richard Sandiford
+ if (badop
+ /* Alternative loses if it has no regs for a reg
+ operand. */
+ || (REG_P (op) && no_regs_p
+ && this_alternative_matches < 0))
+ goto fail;
+ if (this_alternative_matches < 0
+ && no_regs_p && ! this_alternative_offmemok && ! constmemok)
+ goto fail;
+ /* If this operand could be handled with a reg, and some reg
+ is allowed, then this operand can be handled. */
+ if (winreg && this_alternative != NO_REGS)
+ badop = false;
which I think belongs in the same else statement. At least after the
/* If this operand accepts a register, and if the register class
has at least one allocatable register, then this operand
can be reloaded. */
if (winreg && !no_regs_p)
badop = false;
if (badop)
goto fail;
which IMO belongs after the "no_regs_p" assignment. badop should never
be false if we have no way of reloading the value.
This change is non-trivial without knowing semantics of all these
variables. It looks ok to me.

So I changed the code.
Post by Richard Sandiford
+ if (! no_regs_p)
+ reload_nregs
+ += ira_reg_class_max_nregs[this_alternative][mode];
I wasn't sure why we counted this even in the "const_to_mem && constmmeok"
/* We prefer to reload pseudos over reloading other
things, since such reloads may be able to be
eliminated later. So bump REJECT in other cases.
Don't do this in the case where we are forcing a
constant into memory and it will then win since we
don't want to have a different alternative match
then. */
if (! (REG_P (op)
&& REGNO (op) >= FIRST_PSEUDO_REGISTER)
&& ! (const_to_mem && constmemok)
/* We can reload the address instead of memory (so
do not punish it). It is preferable to do to
avoid cycling in some cases. */
&& ! (MEM_P (op) && offmemok))
reject += 2;
I think constmemok is obvious. It is not a reload, it just putting
constant in the constant pool. We should not punish it as no additional
insns are generated.

There is a comment for offmemok case. I think it describes it.
Apparently it was a fix for LRA cycling. I don't remember details. To
restore them, I need to remove the code and to try it on many targets.
I guess, it would take 3-4 days. But I removed this as it does not
affect x86/x86-64.
Post by Richard Sandiford
+ if (early_clobber_p)
+ reject++;
+ /* ??? Should we update the cost because early clobber
+ register reloads or it is a rare thing to be worth to do
+ it. */
+ overall = losers * LOSER_COST_FACTOR + reject;
Could you expand on the comment a bit?
Yes, I did it.
Post by Richard Sandiford
+ if ((best_losers == 0 || losers != 0) && best_overall < overall)
+ goto fail;
+
+ curr_alt[nop] = this_alternative;
+ COPY_HARD_REG_SET (curr_alt_set[nop], this_alternative_set);
+ curr_alt_win[nop] = this_alternative_win;
+ curr_alt_match_win[nop] = this_alternative_match_win;
+ curr_alt_offmemok[nop] = this_alternative_offmemok;
+ curr_alt_matches[nop] = this_alternative_matches;
+
+ if (invalidate_m >= 0 && ! this_alternative_win)
+ curr_alt_win[invalidate_m] = false;
BTW, after the matching changes above, I don't think we need both
"invalidate_m" and "this_alternative_matches".
Yes, if we use did_match, we could remove invalidate_m. I removed it.
Post by Richard Sandiford
+ for (j = hard_regno_nregs[clobbered_hard_regno][biggest_mode[i]] - 1;
+ j >= 0;
+ j--)
+ SET_HARD_REG_BIT (temp_set, clobbered_hard_regno + j);
add_to_hard_reg_set.
Fixed.
Post by Richard Sandiford
+ else if (curr_alt_matches[j] == i && curr_alt_match_win[j])
+ {
+ /* This is a trick. Such operands don't conflict and
+ don't need a reload. But it is hard to transfer
+ this information to the assignment pass which
+ spills one operand without this info. We avoid the
+ conflict by forcing to use the same pseudo for the
+ operands hoping that the pseudo gets the same hard
+ regno as the operands and the reloads are gone. */
+ if (*curr_id->operand_loc[i] != *curr_id->operand_loc[j])
...
+ /* See the comment for the previous case. */
+ if (*curr_id->operand_loc[i] != *curr_id->operand_loc[j])
What are these last two if statements for? I wasn't sure how two operands
could have the same address.
Not saying they're wrong, but I think a comment would be good.
I think it is a leftover from older state of code. If lra-assign.c does
something wrong in this situation, I think, next round lra-constraint.c
will fix it.
So I am removing the code.
Post by Richard Sandiford
+ small_class_operands_num = 0;
+ for (nop = 0; nop < n_operands; nop++)
+ /* If this alternative can be made to work by reloading, and
+ it needs less reloading than the others checked so far,
+ record it as the chosen goal for reloading. */
+ small_class_operands_num
+ += SMALL_REGISTER_CLASS_P (curr_alt[nop]) ? 1 : 0;
Misplaced comment; I think it belongs after this line.
Yes. Fixed.
That was a rigorous review. I took 2 full days and numerous bootstraps
and testing to check your proposals and answer you questions.
But I am glad. I believe the code became more clear.
Richard Sandiford
2012-10-15 12:06:30 UTC
Permalink
Post by Vladimir Makarov
Post by Richard Sandiford
if that's accurate. I dropped the term "reload pseudo" because of
the general comment in my earlier reply about the use of "reload pseudo"
when the code seems to include inheritance and split pseudos too.
There is no inheritance and splitting yet. It is done after the
constraint pass.
So at this stage >= new_regno_start means reload pseudo.
Ah, OK.
Post by Vladimir Makarov
Post by Richard Sandiford
That's a change in the meaning of NEW_CLASS, but seems easier for
+ common_class = ira_reg_class_subset[rclass][cl];
+ if (new_class != NULL)
+ *new_class = common_class;
common_class = ira_reg_class_subset[rclass][cl];
if (new_class != NULL && rclass != common_class)
*new_class = common_class;
This change results in infinite LRA looping on a first libgcc file
compilation. Unfortunately I have no time to investigate it.
I'd like to say that most code of in this code is very sensitive to
changes. I see it a lot. You change something looking obvious and a
target is broken.
I am going to investigate it when I have more time.
Thanks.
Post by Vladimir Makarov
Post by Richard Sandiford
+ {
+ const char *fmt = GET_RTX_FORMAT (code);
+ int i;
+
+ if (GET_RTX_LENGTH (code) != 1
+ || fmt[0] != 'e' || GET_CODE (XEXP (x, 0)) != UNSPEC)
+ {
+ for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+ if (fmt[i] == 'e')
+ extract_loc_address_regs (false, mode, as, &XEXP (x, i),
+ context_p, code, SCRATCH,
+ modify_p, ad);
+ break;
+ }
+ /* fall through for case UNARY_OP (UNSPEC ...) */
+ }
+
+ if (ad->disp_loc == NULL)
+ ad->disp_loc = loc;
+ else if (ad->base_reg_loc == NULL)
+ {
+ ad->base_reg_loc = loc;
+ ad->base_outer_code = outer_code;
+ ad->index_code = index_code;
+ ad->base_modify_p = modify_p;
+ }
+ else
+ {
+ lra_assert (ad->index_reg_loc == NULL);
+ ad->index_reg_loc = loc;
+ }
+ break;
+
+ }
Which targets use a bare UNSPEC as a displacement? I thought a
displacement had to be a link-time constant, in which case it should
satisfy CONSTANT_P. For UNSPECs, that means wrapping it in a CONST.
I saw it somewhere. I guess IA64.
Post by Richard Sandiford
I'm just a bit worried that the UNSPEC handling is sensitive to the
order that subrtxes are processed (unlike PLUS, which goes to some
trouble to work out what's what). It could be especially confusing
because the default case processes operands in reverse order while
PLUS processes them in forward order.
Also, which cases require the special UNARY_OP (UNSPEC ...) fallthrough?
Probably deserves a comment.
I don't remember. To figure out, I should switch it off and try all
targets supported by LRA.
Post by Richard Sandiford
AIUI the base_reg_loc, index_reg_loc and disp_loc fields aren't just
recording where reloads of a particular class need to go (obviously
in the case of disp_loc, which isn't reloaded at all). The feidls
have semantic value too. I.e. we use them to work out the value
of at least part of the address.
In that case it seems dangerous to look through general rtxes
in the way that the default case above does. Maybe just making
sure that DISP_LOC is involved in a sum with the base would be
----------------------------------------------------------------
I know of three ways of "mutating" (for want of a better word)
1. (and X (const_int X)), to align
2. a subreg
3. a unary operator (such as truncation or extension)
a. remove outer mutations (using a helper function)
b. handle LO_SUM, PRE_*, POST_*: as now
c. otherwise treat the address of the sum of one, two or three pieces.
c1. Peel mutations of all pieces.
c2. Classify the pieces into base, index and displacement.
This would be similar to the jousting code above, but hopefully
easier because all three rtxes are to hand. E.g. we could
do the base vs. index thing in a similar way to
commutative_operand_precedence.
c3. Record which pieces were mutated (e.g. using something like the
index_loc vs. index_reg_loc distinction in the current code)
That should be general enough for current targets, but if it isn't,
we could generalise it further when we know what generalisation is needed.
That's still going to be a fair amount of code, but hopefully not more,
and we might have more confidence at each stage what each value is.
And it avoids the risk of treating "mutated" addresses as "unmutated" ones.
----------------------------------------------------------------
Just an idea though. Probably not for 4.8, although I might try it
if I find time.
I am not sure that you listed all the cases. It would be great if you
listed all the cases. In this case we could make this function more clear.
I tried to do this first but permanently found new cases. After that I
gave up and tried to use more general implementation.
This function was rewritten and modified many times. I am afraid to do
this again when clock is ticking.
It would be great if you re-implement the function according to your
ideas and we could try it on 8 targets to which LRA was already ported.
An LRA sub-branch would a perfect place to do it
Post by Richard Sandiford
It would be nice to sort out the disp_loc thing for 4.8 though.
though. My point is that base_plus_disp_to_reg assumes that
*base_loc and *disp_loc are involved in a sum. It adds them together,
replaces the base_loc with the new pseudo, and removes the disp_loc.
But extract_address_regs seems to be deliberately written in a way that
doesn't require base_loc and disp_loc to be involved in a sum, and uses
a more indirect approach to working out disp_loc instead. It just feels
like it leaves open the potential for a silent wrong-code bug.
Post by Vladimir Makarov
Post by Richard Sandiford
+/* Reload pseudos created for matched input and output reloads whose
+ mode are different. Such pseudos has a modified rules for finding
+ their living ranges, e.g. assigning to subreg of such pseudo means
+ changing all pseudo value. */
+bitmap_head lra_bound_pseudos;
/* Reload pseudos created for matched input and output reloads whose
modes are different. Such pseudos have different live ranges from
other pseudos; e.g. any assignment to a subreg of these pseudos
changes the whole pseudo's value. */
Fixed.
Post by Richard Sandiford
Although that said, couldn't emit_move_insn_1 (called by gen_move_insn)
split a multiword pseudo move into two word moves? Using the traditional
clobber technique sounds better than having special liveness rules.
It is not only about multi-words pseudos. It is about representation of
this situation by constructions semantically incorrect in order parts of
compiler. Reload has no such problem as it does not use RTL. So I
don't think it splits as I use emit_move_insn and that calls
emit_move_insn_1 too.
But my point is that emit_move_insn_1 _does_ split moves that have no
.md pattern of their own. E.g. some targets do not define double-word
move patterns because such moves are always equivalent to two individual
word moves. And if emit_move_insn_1 splits:

(set (reg:DI X) (reg:DI Y))

into:

(set (subreg:SI (reg:DI X) 0) (subreg:SI (reg:DI Y) 0))
(set (subreg:SI (reg:DI X) 4) (subreg:SI (reg:DI Y) 4))

then it would be to say that the subreg in the second instruction
is a complete definition of X.
Post by Vladimir Makarov
I really needed a special liveness treatment (although I don't
remember details) and therefore I added it. I had no detail design
for LRA. The code was modified by numerous test failures on different
targets. There is a lot of code analogous to reload one and probably
its necessity should be rigorously questioned. I thought about and
modified part of this code but unfortunately not all.
Also bound pseudos are rare. Their bitmap is very small and testing (2
lines of code in overall) them in ira-lives.c is fast.
FWIW, It wasn't really speed as much as correctness I was worried about.
In a way, rarity makes having special rules seem even more dangerous.
Post by Vladimir Makarov
Post by Richard Sandiford
+ /* We create pseudo for out rtx because we always should keep
+ registers with the same original regno have synchronized
+ value (it is not true for out register but it will be
+ corrected by the next insn).
I don't understand this comment, sorry.
Pseudos have values -- see comments for lra_reg_info. Different pseudos
with the same value do not conflict even if they live in the same
place. When we create a pseudo we assign value of original pseudo (if
any) from which we created the new pseudo. If we create the pseudo from
the input pseudo, the new pseudo will no conflict with the input pseudo
which is wrong when the input pseudo lives after the insn and as the new
pseudo value is changed by the insn output. Therefore we create the new
pseudo from the output.
I hope it is more understandable. I changed the comment.
Yeah, I think that makes it a lot clearer, thanks.
Post by Vladimir Makarov
Post by Richard Sandiford
+ /* In and out operand can be got from transformations before
+ processing constraints. So the pseudos might have inaccurate
+ class and we should make their classes more accurate. */
+ narrow_reload_pseudo_class (in_rtx, goal_class);
+ narrow_reload_pseudo_class (out_rtx, goal_class);
I don't understand this, sorry. Does "transformations" mean inheritance
and reload splitting? So the registers we're changing here are inheritance
and split pseudos rather than reload pseudos created for this instruction?
If so, it sounds on face value like it conflicts with the comment quoted
above about not allowing reload instructions to the narrow the class
of pseudos. Might be worth saying why that's OK here but not there.
Again, inheritance and splitting is done after the constraint pass.
The transformations here are mostly reloading of subregs which is done
before reloads for given insn. On this transformation we create new
pseudos for which we don't know reg class yet. In case we don't know
pseudo reg class yet, we assign ALL_REGS to the pseudo.
OK, thanks.
Post by Vladimir Makarov
Post by Richard Sandiford
Also, all uses but one of lra_get_hard_regno_and_offset follow
lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
/* The real hard regno of the operand after the allocation. */
x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
so couldn't lra_get_hard_regno_and_offset just return the final
hard register, including elimination? Then it could apply the
elimination on the original rtx.
lra_get_hard_regno_and_offset (x, &i, &offset);
if (i < 0)
goto slow;
i += offset;
but I'm not sure why this is the only caller that would want
to ignore elimination.
???
Not sure what you meant here :-) Was that a placeholder,
or something else? What I was getting at was that it would
be nice to replace all occurences of:

lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
/* The real hard regno of the operand after the allocation. */
x_hard_regno = get_final_hard_regno (x_hard_regno, offset);

with something like:

x_hard_regno = lra_get_hard_regno (x);

and this operands_match_p code seemed to be the only place that didn't
apply get_final_hard_regno to the result of lra_get_hard_regno_and_offset.
I wasn't really sure why operands_match_p was different.
Post by Vladimir Makarov
Post by Richard Sandiford
+ int i, j, x_hard_regno, offset;
+ enum machine_mode mode;
+ rtx x;
+ const char *fmt;
+ enum rtx_code code;
+
+ if (*loc == NULL_RTX)
+ return false;
+ x = *loc;
+ code = GET_CODE (x);
+ mode = GET_MODE (x);
+ if (code == SUBREG)
+ {
+ loc = &SUBREG_REG (x);
+ x = SUBREG_REG (x);
+ code = GET_CODE (x);
+ if (GET_MODE_SIZE (GET_MODE (x)) > GET_MODE_SIZE (mode))
+ mode = GET_MODE (x);
+ }
+
+ if (REG_P (x))
+ {
+ lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
+ /* The real hard regno of the operand after the allocation. */
+ x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
+ return (x_hard_regno >= 0
+ && lra_hard_reg_set_intersection_p (x_hard_regno, mode, set));
With the subreg mode handling above, this looks little-endian specific.
+ if (MEM_P (x))
+ {
+ struct address ad;
+ enum machine_mode mode = GET_MODE (x);
+ rtx *addr_loc = &XEXP (x, 0);
+
+ extract_address_regs (mode, MEM_ADDR_SPACE (x), addr_loc, MEM, &ad);
+ if (ad.base_reg_loc != NULL)
+ {
+ if (uses_hard_regs_p (ad.base_reg_loc, set))
+ return true;
+ }
+ if (ad.index_reg_loc != NULL)
+ {
+ if (uses_hard_regs_p (ad.index_reg_loc, set))
+ return true;
+ }
+ }
is independent of the subreg handling, so perhaps the paradoxical subreg
case should be handled separately, using simplify_subreg_regno.
Not sure: did you have any thoughts on this?
Post by Vladimir Makarov
Post by Richard Sandiford
+ match_p = false;
+ if (operands_match_p (*curr_id->operand_loc[nop],
+ *curr_id->operand_loc[m], m_hregno))
+ {
+ int i;
+
+ for (i = 0; i < early_clobbered_regs_num; i++)
+ if (early_clobbered_nops[i] == m)
+ break;
+ /* We should reject matching of an early
+ clobber operand if the matching operand is
+ not dying in the insn. */
+ if (i >= early_clobbered_regs_num
Why not simply use operands m's early_clobber field?
Ok. Fixed.
Post by Richard Sandiford
+ || operand_reg[nop] == NULL_RTX
+ || (find_regno_note (curr_insn, REG_DEAD,
+ REGNO (operand_reg[nop]))
+ != NULL_RTX))
+ match_p = true;
...although I don't really understand this condition. If the two
operands are the same value X, then X must die here whatever the
notes say. So I assume this is coping with a case where the operands
are different but still match. If so, could you give an example?
I remember I saw such insn but I don't remember details.
Post by Richard Sandiford
Matched earlyclobbers explicitly guarantee that the earlyclobber doesn't
apply to the matched input operand; the earlyclobber only applies to
other input operands. So I'd have expected it was those operands
that might need reloading rather than this one.
E.g. if X occurs three times, twice in a matched earlyclobber pair
and once as an independent operand, it's the latter operand that would
need reloading.
Yes, I know.
But in that case I don't understand the condition. If we have:

(set (reg X) (... (reg X) ...))

(which is the kind of thing operands_match_p is testing for)
then there is no requirement for a REG_DEAD note for X.
But it's still OK for two Xs form an earlyclobber pair.
Post by Vladimir Makarov
Post by Richard Sandiford
+ /* If we didn't already win, we can reload
+ constants via force_const_mem, and other
+ MEMs by reloading the address like for
+ 'o'. */
+ if (CONST_POOL_OK_P (mode, op) || MEM_P (op))
+ badop = false;
It seems a bit inconsistent to treat a spilled pseudo whose address
might well need reloading as a win, while not treating existing MEMs
whose addresses need reloading as a win.
Well, probability of reloading address of spilled pseudo is very small
on most targets but reloading for MEM in this case is real. So I see it
logical.
OK, that's probably true. :-)
Post by Vladimir Makarov
Post by Richard Sandiford
+ if (! no_regs_p)
+ reload_nregs
+ += ira_reg_class_max_nregs[this_alternative][mode];
I wasn't sure why we counted this even in the "const_to_mem && constmmeok"
/* We prefer to reload pseudos over reloading other
things, since such reloads may be able to be
eliminated later. So bump REJECT in other cases.
Don't do this in the case where we are forcing a
constant into memory and it will then win since we
don't want to have a different alternative match
then. */
if (! (REG_P (op)
&& REGNO (op) >= FIRST_PSEUDO_REGISTER)
&& ! (const_to_mem && constmemok)
/* We can reload the address instead of memory (so
do not punish it). It is preferable to do to
avoid cycling in some cases. */
&& ! (MEM_P (op) && offmemok))
reject += 2;
I think constmemok is obvious. It is not a reload, it just putting
constant in the constant pool. We should not punish it as no additional
insns are generated.
There is a comment for offmemok case. I think it describes it.
Apparently it was a fix for LRA cycling. I don't remember details. To
restore them, I need to remove the code and to try it on many targets.
I guess, it would take 3-4 days. But I removed this as it does not
affect x86/x86-64.
Sorry, my comment wasn't as clear as it should have been. I think:

/* We prefer to reload pseudos over reloading other
things, since such reloads may be able to be
eliminated later. So bump REJECT in other cases.
Don't do this in the case where we are forcing a
constant into memory and it will then win since we
don't want to have a different alternative match
then. */
if (! (REG_P (op)
&& REGNO (op) >= FIRST_PSEUDO_REGISTER)
&& ! (const_to_mem && constmemok)
/* We can reload the address instead of memory (so
do not punish it). It is preferable to do to
avoid cycling in some cases. */
&& ! (MEM_P (op) && offmemok))
reject += 2;

is fine. The point I was trying to make is that this code (correctly IMO)
ignores register reload costs if the operand won't be reloaded into
a register. The operand won't be reloaded into a register if:

(const_to_mem && constmemok)
|| (MEM_P (op) && offmemok)

However:

if (! no_regs_p)
reload_nregs
+= ira_reg_class_max_nregs[this_alternative][mode];

seems to count the register reload cost regardless of whether
the operand would be reloaded into a register. It felt like
it ought to be something like:

if (!((const_to_mem && const_mem_ok)
|| (MEM_P (op) && offmemok)))
{
/* We prefer to reload pseudos over reloading other
things, since such reloads may be able to be
eliminated later. So bump REJECT in other cases. */
if (! (REG_P (op)
&& REGNO (op) >= FIRST_PSEUDO_REGISTER)
reject += 2;

if (! no_regs_p)
reload_nregs
+= ira_reg_class_max_nregs[this_alternative][mode];
}

Maybe other register-specific stuff could go in that if statement too.

Richard
Vladimir Makarov
2012-10-19 05:14:30 UTC
Permalink
Post by Richard Sandiford
Post by Vladimir Makarov
Post by Richard Sandiford
if that's accurate. I dropped the term "reload pseudo" because of
the general comment in my earlier reply about the use of "reload pseudo"
when the code seems to include inheritance and split pseudos too.
There is no inheritance and splitting yet. It is done after the
constraint pass.
So at this stage >= new_regno_start means reload pseudo.
Ah, OK.
Post by Vladimir Makarov
Post by Richard Sandiford
That's a change in the meaning of NEW_CLASS, but seems easier for
+ common_class = ira_reg_class_subset[rclass][cl];
+ if (new_class != NULL)
+ *new_class = common_class;
common_class = ira_reg_class_subset[rclass][cl];
if (new_class != NULL && rclass != common_class)
*new_class = common_class;
This change results in infinite LRA looping on a first libgcc file
compilation. Unfortunately I have no time to investigate it.
I'd like to say that most code of in this code is very sensitive to
changes. I see it a lot. You change something looking obvious and a
target is broken.
I am going to investigate it when I have more time.
Thanks.
Post by Vladimir Makarov
Post by Richard Sandiford
+ {
+ const char *fmt = GET_RTX_FORMAT (code);
+ int i;
+
+ if (GET_RTX_LENGTH (code) != 1
+ || fmt[0] != 'e' || GET_CODE (XEXP (x, 0)) != UNSPEC)
+ {
+ for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+ if (fmt[i] == 'e')
+ extract_loc_address_regs (false, mode, as, &XEXP (x, i),
+ context_p, code, SCRATCH,
+ modify_p, ad);
+ break;
+ }
+ /* fall through for case UNARY_OP (UNSPEC ...) */
+ }
+
+ if (ad->disp_loc == NULL)
+ ad->disp_loc = loc;
+ else if (ad->base_reg_loc == NULL)
+ {
+ ad->base_reg_loc = loc;
+ ad->base_outer_code = outer_code;
+ ad->index_code = index_code;
+ ad->base_modify_p = modify_p;
+ }
+ else
+ {
+ lra_assert (ad->index_reg_loc == NULL);
+ ad->index_reg_loc = loc;
+ }
+ break;
+
+ }
Which targets use a bare UNSPEC as a displacement? I thought a
displacement had to be a link-time constant, in which case it should
satisfy CONSTANT_P. For UNSPECs, that means wrapping it in a CONST.
I saw it somewhere. I guess IA64.
Post by Richard Sandiford
I'm just a bit worried that the UNSPEC handling is sensitive to the
order that subrtxes are processed (unlike PLUS, which goes to some
trouble to work out what's what). It could be especially confusing
because the default case processes operands in reverse order while
PLUS processes them in forward order.
Also, which cases require the special UNARY_OP (UNSPEC ...) fallthrough?
Probably deserves a comment.
I don't remember. To figure out, I should switch it off and try all
targets supported by LRA.
Post by Richard Sandiford
AIUI the base_reg_loc, index_reg_loc and disp_loc fields aren't just
recording where reloads of a particular class need to go (obviously
in the case of disp_loc, which isn't reloaded at all). The feidls
have semantic value too. I.e. we use them to work out the value
of at least part of the address.
In that case it seems dangerous to look through general rtxes
in the way that the default case above does. Maybe just making
sure that DISP_LOC is involved in a sum with the base would be
----------------------------------------------------------------
I know of three ways of "mutating" (for want of a better word)
1. (and X (const_int X)), to align
2. a subreg
3. a unary operator (such as truncation or extension)
a. remove outer mutations (using a helper function)
b. handle LO_SUM, PRE_*, POST_*: as now
c. otherwise treat the address of the sum of one, two or three pieces.
c1. Peel mutations of all pieces.
c2. Classify the pieces into base, index and displacement.
This would be similar to the jousting code above, but hopefully
easier because all three rtxes are to hand. E.g. we could
do the base vs. index thing in a similar way to
commutative_operand_precedence.
c3. Record which pieces were mutated (e.g. using something like the
index_loc vs. index_reg_loc distinction in the current code)
That should be general enough for current targets, but if it isn't,
we could generalise it further when we know what generalisation is needed.
That's still going to be a fair amount of code, but hopefully not more,
and we might have more confidence at each stage what each value is.
And it avoids the risk of treating "mutated" addresses as "unmutated" ones.
----------------------------------------------------------------
Just an idea though. Probably not for 4.8, although I might try it
if I find time.
I am not sure that you listed all the cases. It would be great if you
listed all the cases. In this case we could make this function more clear.
I tried to do this first but permanently found new cases. After that I
gave up and tried to use more general implementation.
This function was rewritten and modified many times. I am afraid to do
this again when clock is ticking.
It would be great if you re-implement the function according to your
ideas and we could try it on 8 targets to which LRA was already ported.
An LRA sub-branch would a perfect place to do it
Post by Richard Sandiford
It would be nice to sort out the disp_loc thing for 4.8 though.
though. My point is that base_plus_disp_to_reg assumes that
*base_loc and *disp_loc are involved in a sum. It adds them together,
replaces the base_loc with the new pseudo, and removes the disp_loc.
But extract_address_regs seems to be deliberately written in a way that
doesn't require base_loc and disp_loc to be involved in a sum, and uses
a more indirect approach to working out disp_loc instead. It just feels
like it leaves open the potential for a silent wrong-code bug.
I add the explicit check of base + disp case. I also added a lot of
asserts checking that we don't assign twice.
Post by Richard Sandiford
Post by Vladimir Makarov
Post by Richard Sandiford
+/* Reload pseudos created for matched input and output reloads whose
+ mode are different. Such pseudos has a modified rules for finding
+ their living ranges, e.g. assigning to subreg of such pseudo means
+ changing all pseudo value. */
+bitmap_head lra_bound_pseudos;
/* Reload pseudos created for matched input and output reloads whose
modes are different. Such pseudos have different live ranges from
other pseudos; e.g. any assignment to a subreg of these pseudos
changes the whole pseudo's value. */
Fixed.
Post by Richard Sandiford
Although that said, couldn't emit_move_insn_1 (called by gen_move_insn)
split a multiword pseudo move into two word moves? Using the traditional
clobber technique sounds better than having special liveness rules.
It is not only about multi-words pseudos. It is about representation of
this situation by constructions semantically incorrect in order parts of
compiler. Reload has no such problem as it does not use RTL. So I
don't think it splits as I use emit_move_insn and that calls
emit_move_insn_1 too.
But my point is that emit_move_insn_1 _does_ split moves that have no
.md pattern of their own. E.g. some targets do not define double-word
move patterns because such moves are always equivalent to two individual
(set (reg:DI X) (reg:DI Y))
(set (subreg:SI (reg:DI X) 0) (subreg:SI (reg:DI Y) 0))
(set (subreg:SI (reg:DI X) 4) (subreg:SI (reg:DI Y) 4))
then it would be to say that the subreg in the second instruction
is a complete definition of X.
Post by Vladimir Makarov
I really needed a special liveness treatment (although I don't
remember details) and therefore I added it. I had no detail design
for LRA. The code was modified by numerous test failures on different
targets. There is a lot of code analogous to reload one and probably
its necessity should be rigorously questioned. I thought about and
modified part of this code but unfortunately not all.
Also bound pseudos are rare. Their bitmap is very small and testing (2
lines of code in overall) them in ira-lives.c is fast.
FWIW, It wasn't really speed as much as correctness I was worried about.
In a way, rarity makes having special rules seem even more dangerous.
Right. I don't think it is more dangerous than clobbers but I like what
you proposed. After few tries, I managed with clobbers. So now there
are no bound pseudos (or special reload pseudos. that is how it is
called now). The code is clear and there is no additional notion for
LRA. Thank you.
Post by Richard Sandiford
Post by Vladimir Makarov
Post by Richard Sandiford
+ /* We create pseudo for out rtx because we always should keep
+ registers with the same original regno have synchronized
+ value (it is not true for out register but it will be
+ corrected by the next insn).
I don't understand this comment, sorry.
Pseudos have values -- see comments for lra_reg_info. Different pseudos
with the same value do not conflict even if they live in the same
place. When we create a pseudo we assign value of original pseudo (if
any) from which we created the new pseudo. If we create the pseudo from
the input pseudo, the new pseudo will no conflict with the input pseudo
which is wrong when the input pseudo lives after the insn and as the new
pseudo value is changed by the insn output. Therefore we create the new
pseudo from the output.
I hope it is more understandable. I changed the comment.
Yeah, I think that makes it a lot clearer, thanks.
Post by Vladimir Makarov
Post by Richard Sandiford
+ /* In and out operand can be got from transformations before
+ processing constraints. So the pseudos might have inaccurate
+ class and we should make their classes more accurate. */
+ narrow_reload_pseudo_class (in_rtx, goal_class);
+ narrow_reload_pseudo_class (out_rtx, goal_class);
I don't understand this, sorry. Does "transformations" mean inheritance
and reload splitting? So the registers we're changing here are inheritance
and split pseudos rather than reload pseudos created for this instruction?
If so, it sounds on face value like it conflicts with the comment quoted
above about not allowing reload instructions to the narrow the class
of pseudos. Might be worth saying why that's OK here but not there.
Again, inheritance and splitting is done after the constraint pass.
The transformations here are mostly reloading of subregs which is done
before reloads for given insn. On this transformation we create new
pseudos for which we don't know reg class yet. In case we don't know
pseudo reg class yet, we assign ALL_REGS to the pseudo.
OK, thanks.
Post by Vladimir Makarov
Post by Richard Sandiford
Also, all uses but one of lra_get_hard_regno_and_offset follow
lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
/* The real hard regno of the operand after the allocation. */
x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
so couldn't lra_get_hard_regno_and_offset just return the final
hard register, including elimination? Then it could apply the
elimination on the original rtx.
lra_get_hard_regno_and_offset (x, &i, &offset);
if (i < 0)
goto slow;
i += offset;
but I'm not sure why this is the only caller that would want
to ignore elimination.
???
Sorry, I just composed this email for 2 days and that was a placeholder
for your question I should answer or work on:) It looks I did not
answered the question.
Post by Richard Sandiford
Not sure what you meant here :-) Was that a placeholder,
or something else? What I was getting at was that it would
lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
/* The real hard regno of the operand after the allocation. */
x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
x_hard_regno = lra_get_hard_regno (x);
and this operands_match_p code seemed to be the only place that didn't
apply get_final_hard_regno to the result of lra_get_hard_regno_and_offset.
I wasn't really sure why operands_match_p was different.
Fixed.
Post by Richard Sandiford
Post by Vladimir Makarov
Post by Richard Sandiford
+ int i, j, x_hard_regno, offset;
+ enum machine_mode mode;
+ rtx x;
+ const char *fmt;
+ enum rtx_code code;
+
+ if (*loc == NULL_RTX)
+ return false;
+ x = *loc;
+ code = GET_CODE (x);
+ mode = GET_MODE (x);
+ if (code == SUBREG)
+ {
+ loc = &SUBREG_REG (x);
+ x = SUBREG_REG (x);
+ code = GET_CODE (x);
+ if (GET_MODE_SIZE (GET_MODE (x)) > GET_MODE_SIZE (mode))
+ mode = GET_MODE (x);
+ }
+
+ if (REG_P (x))
+ {
+ lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
+ /* The real hard regno of the operand after the allocation. */
+ x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
+ return (x_hard_regno >= 0
+ && lra_hard_reg_set_intersection_p (x_hard_regno, mode, set));
With the subreg mode handling above, this looks little-endian specific.
+ if (MEM_P (x))
+ {
+ struct address ad;
+ enum machine_mode mode = GET_MODE (x);
+ rtx *addr_loc = &XEXP (x, 0);
+
+ extract_address_regs (mode, MEM_ADDR_SPACE (x), addr_loc, MEM, &ad);
+ if (ad.base_reg_loc != NULL)
+ {
+ if (uses_hard_regs_p (ad.base_reg_loc, set))
+ return true;
+ }
+ if (ad.index_reg_loc != NULL)
+ {
+ if (uses_hard_regs_p (ad.index_reg_loc, set))
+ return true;
+ }
+ }
is independent of the subreg handling, so perhaps the paradoxical subreg
case should be handled separately, using simplify_subreg_regno.
Not sure: did you have any thoughts on this?
No. Sorry. I'll think about this morning. A bit tired for today.
Post by Richard Sandiford
Post by Vladimir Makarov
Post by Richard Sandiford
+ match_p = false;
+ if (operands_match_p (*curr_id->operand_loc[nop],
+ *curr_id->operand_loc[m], m_hregno))
+ {
+ int i;
+
+ for (i = 0; i < early_clobbered_regs_num; i++)
+ if (early_clobbered_nops[i] == m)
+ break;
+ /* We should reject matching of an early
+ clobber operand if the matching operand is
+ not dying in the insn. */
+ if (i >= early_clobbered_regs_num
Why not simply use operands m's early_clobber field?
Ok. Fixed.
Post by Richard Sandiford
+ || operand_reg[nop] == NULL_RTX
+ || (find_regno_note (curr_insn, REG_DEAD,
+ REGNO (operand_reg[nop]))
+ != NULL_RTX))
+ match_p = true;
...although I don't really understand this condition. If the two
operands are the same value X, then X must die here whatever the
notes say. So I assume this is coping with a case where the operands
are different but still match. If so, could you give an example?
I remember I saw such insn but I don't remember details.
Post by Richard Sandiford
Matched earlyclobbers explicitly guarantee that the earlyclobber doesn't
apply to the matched input operand; the earlyclobber only applies to
other input operands. So I'd have expected it was those operands
that might need reloading rather than this one.
E.g. if X occurs three times, twice in a matched earlyclobber pair
and once as an independent operand, it's the latter operand that would
need reloading.
Yes, I know.
(set (reg X) (... (reg X) ...))
(which is the kind of thing operands_match_p is testing for)
then there is no requirement for a REG_DEAD note for X.
But it's still OK for two Xs form an earlyclobber pair.
I think I added this for some purpose. It might be IRA assigning
wrongly the same hard register in situation.

(set (reg X1) (... (reg X2) ...)) REG_UNUSED:X1)
and X2 lives below.

I think it is better not to change this code. I promise I'll remove it
on LRA branch to find why I added.
Post by Richard Sandiford
Post by Vladimir Makarov
Post by Richard Sandiford
+ /* If we didn't already win, we can reload
+ constants via force_const_mem, and other
+ MEMs by reloading the address like for
+ 'o'. */
+ if (CONST_POOL_OK_P (mode, op) || MEM_P (op))
+ badop = false;
It seems a bit inconsistent to treat a spilled pseudo whose address
might well need reloading as a win, while not treating existing MEMs
whose addresses need reloading as a win.
Well, probability of reloading address of spilled pseudo is very small
on most targets but reloading for MEM in this case is real. So I see it
logical.
OK, that's probably true. :-)
Post by Vladimir Makarov
Post by Richard Sandiford
+ if (! no_regs_p)
+ reload_nregs
+ += ira_reg_class_max_nregs[this_alternative][mode];
I wasn't sure why we counted this even in the "const_to_mem && constmmeok"
/* We prefer to reload pseudos over reloading other
things, since such reloads may be able to be
eliminated later. So bump REJECT in other cases.
Don't do this in the case where we are forcing a
constant into memory and it will then win since we
don't want to have a different alternative match
then. */
if (! (REG_P (op)
&& REGNO (op) >= FIRST_PSEUDO_REGISTER)
&& ! (const_to_mem && constmemok)
/* We can reload the address instead of memory (so
do not punish it). It is preferable to do to
avoid cycling in some cases. */
&& ! (MEM_P (op) && offmemok))
reject += 2;
I think constmemok is obvious. It is not a reload, it just putting
constant in the constant pool. We should not punish it as no additional
insns are generated.
There is a comment for offmemok case. I think it describes it.
Apparently it was a fix for LRA cycling. I don't remember details. To
restore them, I need to remove the code and to try it on many targets.
I guess, it would take 3-4 days. But I removed this as it does not
affect x86/x86-64.
/* We prefer to reload pseudos over reloading other
things, since such reloads may be able to be
eliminated later. So bump REJECT in other cases.
Don't do this in the case where we are forcing a
constant into memory and it will then win since we
don't want to have a different alternative match
then. */
if (! (REG_P (op)
&& REGNO (op) >= FIRST_PSEUDO_REGISTER)
&& ! (const_to_mem && constmemok)
/* We can reload the address instead of memory (so
do not punish it). It is preferable to do to
avoid cycling in some cases. */
&& ! (MEM_P (op) && offmemok))
reject += 2;
is fine. The point I was trying to make is that this code (correctly IMO)
ignores register reload costs if the operand won't be reloaded into
(const_to_mem && constmemok)
|| (MEM_P (op) && offmemok)
if (! no_regs_p)
reload_nregs
+= ira_reg_class_max_nregs[this_alternative][mode];
seems to count the register reload cost regardless of whether
the operand would be reloaded into a register. It felt like
if (!((const_to_mem && const_mem_ok)
|| (MEM_P (op) && offmemok)))
{
/* We prefer to reload pseudos over reloading other
things, since such reloads may be able to be
eliminated later. So bump REJECT in other cases. */
if (! (REG_P (op)
&& REGNO (op) >= FIRST_PSEUDO_REGISTER)
reject += 2;
if (! no_regs_p)
reload_nregs
+= ira_reg_class_max_nregs[this_alternative][mode];
}
It has sense. I fixed it using your variant.

Richard Sandiford
2012-10-12 14:29:56 UTC
Permalink
Hi Vlad,

Comments for the rest of ira-constraints.c.
+ saved_base_reg = saved_base_reg2 = saved_index_reg = NULL_RTX;
+ change_p = equiv_address_substitution (&ad, addr_loc, mode, as, code);
+ if (ad.base_reg_loc != NULL)
+ {
+ if (process_addr_reg
+ (ad.base_reg_loc, before,
+ (ad.base_modify_p && REG_P (*ad.base_reg_loc)
+ && find_regno_note (curr_insn, REG_DEAD,
+ REGNO (*ad.base_reg_loc)) == NULL
+ ? after : NULL),
+ base_reg_class (mode, as, ad.base_outer_code, ad.index_code)))
+ change_p = true;
+ if (ad.base_reg_loc2 != NULL)
+ *ad.base_reg_loc2 = *ad.base_reg_loc;
+ saved_base_reg = *ad.base_reg_loc;
+ lra_eliminate_reg_if_possible (ad.base_reg_loc);
+ if (ad.base_reg_loc2 != NULL)
+ {
+ saved_base_reg2 = *ad.base_reg_loc2;
+ lra_eliminate_reg_if_possible (ad.base_reg_loc2);
+ }
We unconditionally make *ad.base_reg_loc2 = *ad.base_reg_loc, so it
might be clearer without saved_base_reg2. More below...
+ /* The following addressing is checked by constraints and
+ usually target specific legitimate address hooks do not
+ consider them valid. */
+ || GET_CODE (*addr_loc) == POST_DEC || GET_CODE (*addr_loc) == POST_INC
+ || GET_CODE (*addr_loc) == PRE_DEC || GET_CODE (*addr_loc) == PRE_DEC
+ || GET_CODE (*addr_loc) == PRE_MODIFY
+ || GET_CODE (*addr_loc) == POST_MODIFY
the whole lot could just be replaced by ad.base_modify_p, or perhaps
+ /* In this case we can not do anything because if it is wrong
+ that is because of wrong displacement. Remember that any
+ address was legitimate in non-strict sense before LRA. */
+ || ad.disp_loc == NULL)
It doesn't seem worth validating the address at all for ad.disp_loc == NULL.
E.g. something like:

if (ad.base_reg_loc != NULL
&& (process_addr_reg
(ad.base_reg_loc, before,
(ad.base_modify_p && REG_P (*ad.base_reg_loc)
&& find_regno_note (curr_insn, REG_DEAD,
REGNO (*ad.base_reg_loc)) == NULL
? after : NULL),
base_reg_class (mode, as, ad.base_outer_code, ad.index_code))))
{
change_p = true;
if (ad.base_reg_loc2 != NULL)
*ad.base_reg_loc2 = *ad.base_reg_loc;
}

if (ad.index_reg_loc != NULL
&& process_addr_reg (ad.index_reg_loc, before, NULL, INDEX_REG_CLASS))
change_p = true;

/* The address was valid before LRA. We only change its form if the
address has a displacement, so if it has no displacement it must
still be valid. */
if (ad.disp_loc == NULL)
return change_p;

/* See whether the address is still valid. Some ports do not check
displacements for eliminable registers, so we replace them
temporarily with the elimination target. */
saved_base_reg = saved_index_reg = NULL_RTX;
...
if (ok_p)
return change_p;
+#ifdef HAVE_lo_sum
+ {
+ rtx insn;
+ rtx last = get_last_insn ();
+
+ /* disp => lo_sum (new_base, disp) */
+ insn = emit_insn (gen_rtx_SET
+ (VOIDmode, new_reg,
+ gen_rtx_HIGH (Pmode, copy_rtx (*ad.disp_loc))));
+ code = recog_memoized (insn);
+ if (code >= 0)
+ {
+ rtx save = *ad.disp_loc;
+
+ *ad.disp_loc = gen_rtx_LO_SUM (Pmode, new_reg, *ad.disp_loc);
+ if (! valid_address_p (mode, *ad.disp_loc, as))
+ {
+ *ad.disp_loc = save;
+ code = -1;
+ }
+ }
+ if (code < 0)
+ delete_insns_since (last);
+ }
+#endif
Nice :-)

Purely for the record, I wondered whether the high part should be
generated with emit_move_insn(_1) instead, with the rhs of the move
being the HIGH rtx. That would allow targets to cope with cases where
the high part isn't represented directly as a HIGH. E.g. on MIPS and
Alpha, small-data accesses use the global register as the high part instead.

However, both MIPS and Alpha accept small-data addresses as legitimate
constants and addresses before and during reload and only introduce the
split form after reload. And I think that's how any other cases that
aren't simple HIGHs should be handled too. E.g. MIPS also represents
GOT page loads as HIGHs until after reload, and only then lowers the
HIGH to a GOT load. Allowing the backend to generate anything other
than a plain HIGH set here would be a double-edged sword.

So after all that I agree that the gen_rtx_SET above is better than
calling the move expanders.
+ /* index * scale + disp => new base + index * scale */
+ enum reg_class cl = base_reg_class (mode, as, SCRATCH, SCRATCH);
+
+ lra_assert (INDEX_REG_CLASS != NO_REGS);
+ new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, "disp");
+ lra_assert (GET_CODE (*addr_loc) == PLUS);
+ lra_emit_move (new_reg, *ad.disp_loc);
+ if (CONSTANT_P (XEXP (*addr_loc, 1)))
+ XEXP (*addr_loc, 1) = XEXP (*addr_loc, 0);
+ XEXP (*addr_loc, 0) = new_reg;
The canonical form is (plus (mult ...) (reg)) rather than
(plus (reg) (mult ...)), but it looks like we create the latter.
+ /* Some targets like ARM, accept address operands in
+ specific order -- try exchange them if necessary. */
+ if (! valid_address_p (mode, *addr_loc, as))
+ {
+ exchange_plus_ops (*addr_loc);
+ if (! valid_address_p (mode, *addr_loc, as))
+ exchange_plus_ops (*addr_loc);
+ }
but I think we should try the canonical form first. And I'd prefer it
if we didn't try the other form at all, especially in 4.8. It isn't
really the backend's job to reject non-canonical rtl. This might well
be another case where some targets need a (hopefully small) tweak in
order to play by the rules.

Also, I suppose this section of code feeds back to my question on
Wednesday about the distinction that LRA seems to make between the
compile-time constant in:

(plus (reg X1) (const_int Y1))

and the link-time constant in:

(plus (reg X2) (symbol_ref Y2))

It looked like extract_address_regs classified X1 as a base register and
X2 as an index register. The difference between the two constants has
no run-time significance though, and I think we should handle both X1
and X2 as base registers (as I think reload does).

I think the path above would then be specific to scaled indices.
In the original address the "complex" index must come first and the
displacement second. In the modified address, the index would stay
first and the new base register would be second. More below.
+ /* We don't use transformation 'base + disp => base + new index'
+ because of bad practice used in some machine descriptions
+ (see comments for emit_spill_move). */
+ /* base + disp => new base */
As before when commenting on emit_spill_move, I think we should leave
the "bad machine description" stuff out of 4.8 and treat fixing the
machine descriptions as part of the LRA port.

In this case I think there's another reason not to reload the
displacement into an index though: IIRC postreload should be able
to optimise a sequence of address reloads that have the same base
and different displacements. LRA itself might try using "anchor"
bases in future -- although obviously not in the initial merge --
since that was one thing that LEGITIMIZE_RELOAD_ADDRESS was used for.

E.g. maybe the justification could be:

/* base + disp => new base */
/* Another option would be to reload the displacement into an
index register. However, postreload has code to optimize
address reloads that have the same base and different
displacements, so reloading into an index register would
not necessarily be a win. */
+ /* base + scale * index + disp => new base + scale * index */
+ new_reg = base_plus_disp_to_reg (mode, as, &ad);
+ *addr_loc = gen_rtx_PLUS (Pmode, new_reg, *ad.index_loc);
+ if (! valid_address_p (mode, *addr_loc, as))
+ {
+ /* Some targets like ARM, accept address operands in
+ specific order -- try exchange them if necessary. */
+ exchange_plus_ops (*addr_loc);
+ if (! valid_address_p (mode, *addr_loc, as))
+ exchange_plus_ops (*addr_loc);
+ }
Same comment as above about canonical rtl. Here we can have two
registers -- in which case the base should come first -- or a more
complex index -- in which case the index should come first.

We should be able to pass both rtxes to simplify_gen_binary (PLUS, ...),
with the operands in either order, and let it take care of the details.
Using simplify_gen_binary would help with the earlier index+disp case too.
+ /* If this is post-increment, first copy the location to the reload reg. */
+ if (post && real_in != result)
+ emit_insn (gen_move_insn (result, real_in));
Nit, but real_in != result can never be true AIUI, and I was confused how
the code could be correct in that case. Maybe just remove it, or make
it an assert?
+ /* We suppose that there are insns to add/sub with the constant
+ increment permitted in {PRE/POST)_{DEC/INC/MODIFY}. At least the
+ old reload worked with this assumption. If the assumption
+ becomes wrong, we should use approach in function
+ base_plus_disp_to_reg. */
+ if (in == value)
+ {
+ /* See if we can directly increment INCLOC. */
+ last = get_last_insn ();
+ add_insn = emit_insn (plus_p
+ ? gen_add2_insn (incloc, inc)
+ : gen_sub2_insn (incloc, inc));
+
+ code = recog_memoized (add_insn);
+ /* We should restore recog_data for the current insn. */
Looks like this comment might be a left-over, maybe from before the
cached insn data?
+ /* Restore non-modified value for the result. We prefer this
+ way because it does not require an addition hard
+ register. */
+ if (plus_p)
+ {
+ if (CONST_INT_P (inc))
+ emit_insn (gen_add2_insn (result, GEN_INT (-INTVAL (inc))));
+ else
+ emit_insn (gen_sub2_insn (result, inc));
+ }
+ else if (CONST_INT_P (inc))
+ emit_insn (gen_add2_insn (result, inc));
The last two lines look redundant. The behaviour is the same as for
+ else
+ emit_insn (gen_add2_insn (result, inc));
and I don't think there are any cases where !plus && CONST_INT_P (inc)
would hold.
+/* Main entry point of this file: search the body of the current insn
s/this file/the constraints code/, since it's a static function.
+ if (change_p)
+ /* Changes in the insn might result in that we can not satisfy
+ constraints in lately used alternative of the insn. */
+ lra_set_used_insn_alternative (curr_insn, -1);
Maybe:

/* If we've changed the instruction then any alternative that
we chose previously may no longer be valid. */
+ rtx x;
+
+ curr_swapped = !curr_swapped;
+ if (curr_swapped)
+ {
+ x = *curr_id->operand_loc[commutative];
+ *curr_id->operand_loc[commutative]
+ = *curr_id->operand_loc[commutative + 1];
+ *curr_id->operand_loc[commutative + 1] = x;
+ /* Swap the duplicates too. */
+ lra_update_dup (curr_id, commutative);
+ lra_update_dup (curr_id, commutative + 1);
+ goto try_swapped;
+ }
+ else
+ {
+ x = *curr_id->operand_loc[commutative];
+ *curr_id->operand_loc[commutative]
+ = *curr_id->operand_loc[commutative + 1];
+ *curr_id->operand_loc[commutative + 1] = x;
+ lra_update_dup (curr_id, commutative);
+ lra_update_dup (curr_id, commutative + 1);
+ }
The swap code is the same in both cases, so I think it'd be better to
make it common. Or possibly a helper function, since the same code
appears again later on.
+ if (GET_CODE (op) == PLUS)
+ {
+ plus = op;
+ op = XEXP (op, 1);
+ }
Sorry, I'm complaining about old reload code again, but: does this
actually happen in LRA? In reload, a register operand could become a
PLUS because of elimination, but I thought LRA did things differently.
+ if (CONST_POOL_OK_P (mode, op)
+ && ((targetm.preferred_reload_class
+ (op, (enum reg_class) goal_alt[i]) == NO_REGS)
+ || no_input_reloads_p)
+ && mode != VOIDmode)
+ {
+ rtx tem = force_const_mem (mode, op);
+
+ change_p = true;
+ /* If we stripped a SUBREG or a PLUS above add it back. */
+ if (plus != NULL_RTX)
+ tem = gen_rtx_PLUS (mode, XEXP (plus, 0), tem);
and we shouldn't have (plus (constant ...) ...) after elimination
(or at all outside of a CONST). I don't understand why the code is
needed even in reload.
+ for (i = 0; i < n_operands; i++)
+ {
+ rtx old, new_reg;
+ rtx op = *curr_id->operand_loc[i];
+
+ if (goal_alt_win[i])
+ {
+ if (goal_alt[i] == NO_REGS
+ && REG_P (op)
+ && lra_former_scratch_operand_p (curr_insn, i))
+ change_class (REGNO (op), NO_REGS, " Change", true);
I think this could do with a comment. Does setting the class to NO_REGS
indirectly cause the operand to be switched back to a SCRATCH?
+ push_to_sequence (before);
+ rclass = base_reg_class (GET_MODE (op), MEM_ADDR_SPACE (op),
+ MEM, SCRATCH);
+ if (code == PRE_DEC || code == POST_DEC
+ || code == PRE_INC || code == POST_INC
+ || code == PRE_MODIFY || code == POST_MODIFY)
Very minor, but: GET_RTX_CLASS (code) == RTX_AUTOINC
+ enum machine_mode mode;
+ rtx reg, *loc;
+ int hard_regno, byte;
+ enum op_type type = curr_static_id->operand[i].type;
+
+ loc = curr_id->operand_loc[i];
+ mode = get_op_mode (i);
+ if (GET_CODE (*loc) == SUBREG)
+ {
+ reg = SUBREG_REG (*loc);
+ byte = SUBREG_BYTE (*loc);
+ if (REG_P (reg)
+ /* Strict_low_part requires reload the register not
+ the sub-register. */
+ && (curr_static_id->operand[i].strict_low
+ || (GET_MODE_SIZE (mode)
+ <= GET_MODE_SIZE (GET_MODE (reg))
+ && (hard_regno
+ = get_try_hard_regno (REGNO (reg))) >= 0
+ && (simplify_subreg_regno
+ (hard_regno,
+ GET_MODE (reg), byte, mode) < 0)
+ && (goal_alt[i] == NO_REGS
+ || (simplify_subreg_regno
+ (ira_class_hard_regs[goal_alt[i]][0],
+ GET_MODE (reg), byte, mode) >= 0)))))
+ {
+ loc = &SUBREG_REG (*loc);
+ mode = GET_MODE (*loc);
+ }
+ old = *loc;
I think this needs a bit more justifying commentary (although I'm glad
to see it's much simpler than the reload version :-)). One thing in
particular I didn't understand was why we don't reload the inner
register of a paradoxical subreg.
+ if (get_reload_reg (type, mode, old, goal_alt[i], "", &new_reg)
+ && type != OP_OUT)
+ {
+ push_to_sequence (before);
+ lra_emit_move (new_reg, old);
+ before = get_insns ();
+ end_sequence ();
+ }
+ *loc = new_reg;
+ if (type != OP_IN)
+ {
+ if (find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX)
+ {
+ start_sequence ();
+ /* We don't want sharing subregs as the pseudo can
+ get a memory and the memory can be processed
+ several times for eliminations. */
+ lra_emit_move (GET_CODE (old) == SUBREG && type == OP_INOUT
+ ? copy_rtx (old) : old,
+ new_reg);
I think this should simply be:

lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg);

leaving copy_rtx to figure out which rtxes can be shared. No comment
would be needed for that.
+ emit_insn (after);
+ after = get_insns ();
+ end_sequence ();
+ }
+ *loc = new_reg;
+ }
Very minor again, but: redundant *loc assignment (so that the two nested
if statements collapse to one).
+ else
+ {
+ lra_assert (INSN_CODE (curr_insn) < 0);
+ error_for_asm (curr_insn,
+ "inconsistent operand constraints in an %<asm%>");
+ /* Avoid further trouble with this insn. */
+ PATTERN (curr_insn) = gen_rtx_USE (VOIDmode, const0_rtx);
+ return false;
Is this code handling a different case from the corresponding error
code in curr_insn_transform? If so, it probably deserves a comment
explaining the difference.
+/* Process all regs in debug location *LOC and change them on
+ equivalent substitution. Return true if any change was done. */
+static bool
+debug_loc_equivalence_change_p (rtx *loc)
This doesn't keep the rtl in canonical form. Probably the easiest and
best fix is to use simplify_replace_fn_rtx, which handles all that for you.
(simplify_replace_fn_rtx returns the original rtx if no change was made.)
+ for (i = FIRST_PSEUDO_REGISTER; i < new_regno_start; i++)
+ ira_reg_equiv[i].profitable_p = true;
+ for (i = FIRST_PSEUDO_REGISTER; i < new_regno_start; i++)
+ if (lra_reg_info[i].nrefs != 0)
+ {
+ if ((hard_regno = lra_get_regno_hard_regno (i)) >= 0)
+ {
+ int j, nregs = hard_regno_nregs[hard_regno][PSEUDO_REGNO_MODE (i)];
+
+ for (j = 0; j < nregs; j++)
+ df_set_regs_ever_live (hard_regno + j, true);
+ }
+ else if ((x = get_equiv_substitution (regno_reg_rtx[i])) != NULL_RTX)
+ {
+ if (! first_p && contains_reg_p (x, false, false))
+ /* After RTL transformation, we can not guarantee that
+ pseudo in the substitution was not reloaded which
+ might make equivalence invalid. For example, in
+ reverse equiv of p0
+
+ p0 <- ...
+ ...
+ equiv_mem <- p0
+
+ the memory address register was reloaded before the
+ 2nd insn. */
+ ira_reg_equiv[i].defined_p = false;
+ if (contains_reg_p (x, false, true))
+ ira_reg_equiv[i].profitable_p = false;
+ }
+ }
Do we need two loops because the second may check for equivalences
of other pseudos besides "i"? I couldn't see how offhand, but I might
well have missed something. Might be worth a comment.
+ dest_reg = SET_DEST (set);
+ /* The equivalence pseudo could be set up as SUBREG in a
+ case when it is a call restore insn in a mode
+ different from the pseudo mode. */
+ if (GET_CODE (dest_reg) == SUBREG)
+ dest_reg = SUBREG_REG (dest_reg);
+ if ((REG_P (dest_reg)
+ && (x = get_equiv_substitution (dest_reg)) != dest_reg
+ /* Remove insns which set up a pseudo whose value
+ can not be changed. Such insns might be not in
+ init_insns because we don't update equiv data
+ during insn transformations.
+
+ As an example, let suppose that a pseudo got
+ hard register and on the 1st pass was not
+ changed to equivalent constant. We generate an
+ additional insn setting up the pseudo because of
+ secondary memory movement. Then the pseudo is
+ spilled and we use the equiv constant. In this
+ case we should remove the additional insn and
+ this insn is not init_insns list. */
+ && (! MEM_P (x) || MEM_READONLY_P (x)
+ || in_list_p (curr_insn,
+ ira_reg_equiv
+ [REGNO (dest_reg)].init_insns)))
This is probably a stupid question, sorry, but when do we ever want
to keep an assignment to a substituted pseudo? I.e. why isn't this just:

if ((REG_P (dest_reg)
&& (x = get_equiv_substitution (dest_reg)) != dest_reg)
+/* Info about last usage of registers in EBB to do inheritance/split
+ transformation. Inheritance transformation is done from a spilled
+ pseudo and split transformations from a hard register or a pseudo
+ assigned to a hard register. */
+struct usage_insns
+{
+ /* If the value is equal to CURR_USAGE_INSNS_CHECK, then the member
+ value INSNS is valid. The insns is chain of optional debug insns
+ and a finishing non-debug insn using the corresponding reg. */
+ int check;
+ /* Value of global reloads_num at the ???corresponding next insns. */
+ int reloads_num;
+ /* Value of global reloads_num at the ???corresponding next insns. */
+ int calls_num;
"???s". Probably "at the last instruction in INSNS" if that's accurate
(because debug insns in INSNS don't affect these fields).
+/* Process all regs OLD_REGNO in location *LOC and change them on the
+ reload pseudo NEW_REG. Return true if any change was done. */
+static bool
+substitute_pseudo (rtx *loc, int old_regno, rtx new_reg)
This is another case where I found the term "reload pseudo" a bit confusing,
since AIUI new_reg can be an inheritance or split pseudo rather than a pseudo
created solely for insn reloads. I'll follow up about that on the original
thread. Maybe just:

/* Replace all references to register OLD_REGNO in *LOC with pseudo register
NEW_REG. Return true if any change was made. */
+ code = GET_CODE (x);
+ if (code == REG && (int) REGNO (x) == old_regno)
+ {
+ *loc = new_reg;
+ return true;
+ }
Maybe assert that the modes are the same?
+/* Do inheritance transformation for insn INSN defining (if DEF_P) or
+ using ORIGINAL_REGNO where the subsequent insn(s) in EBB (remember
+ we traverse insns in the backward direction) for the original regno
+ is NEXT_USAGE_INSNS. The transformations look like
Maybe:

/* Do interitance transformations for insn INSN, which defines (if DEF_P)
or uses ORIGINAL_REGNO. NEXT_USAGE_INSNS specifies which instruction
in the EBB next uses ORIGINAL_REGNO; it has the same form as the
"insns" field of usage_insns.
+
+ p <- ... i <- ...
+ ... p <- i (new insn)
+ ... =>
+ <- ... p ... <- ... i ...
+ or
+ ... i <- p (new insn)
+ <- ... p ... <- ... i ...
+ ... =>
+ <- ... p ... <- ... i ...
+ where p is a spilled original pseudo and i is a new inheritance pseudo.
+
+ The inheritance pseudo has the smallest class of two classes CL and
+ class of ORIGINAL REGNO. It will have unique value if UNIQ_P. The
+ unique value is necessary for correct assignment to inheritance
+ pseudo for input of an insn which should be the same as output
+ (bound pseudos). Return true if we succeed in such
+ transformation. */
This comment looks really good, but I still wasn't sure about the
UNIQ_P thing. AIUI this is for cases like:

i <- p [new insn]
r <- ... p ... r <- ... i ... [input reload]
r <- ... r ... => r <- ... r ... [original insn]
<- r <- r [output reload]
.... ......
<- ... p ... <- ... i ... [next ref]

where "r" is used on both sides of the original insn and where the
output reload assigns to something other than "p" (otherwise "next ref"
wouldn't be the next ref). But why does this affect the way "i" is created?
I think it'd be worth expanding that part a bit.
+ if (! ira_reg_classes_intersect_p[cl][rclass])
+ {
+ if (lra_dump_file != NULL)
+ {
+ fprintf (lra_dump_file,
+ " Rejecting inheritance for %d "
+ "because of too different classes %s and %s\n",
Suggest s/too different/disjoint/
+ if ((ira_class_subset_p[cl][rclass] && cl != rclass)
+ || ira_class_hard_regs_num[cl] < ira_class_hard_regs_num[rclass])
+ {
+ if (lra_dump_file != NULL)
+ fprintf (lra_dump_file, " Use smallest class of %s and %s\n",
+ reg_class_names[cl], reg_class_names[rclass]);
+
+ rclass = cl;
+ }
I don't understand the second line of the if statement. Why do we prefer
classes with fewer allocatable registers?

My guess before reading the code was that we'd use the subunion of CL and
RCLASS, so maybe a comment explaining why we use this choice would help.
+ if (NEXT_INSN (new_insns) != NULL_RTX)
+ {
+ if (lra_dump_file != NULL)
+ {
+ fprintf (lra_dump_file,
+ " Rejecting inheritance %d->%d "
+ "as it results in 2 or more insns:\n",
+ original_regno, REGNO (new_reg));
+ debug_rtl_slim (lra_dump_file, new_insns, NULL_RTX, -1, 0);
+ fprintf (lra_dump_file,
+ " >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\n");
+ }
+ return false;
+ }
Hmm, I wasn't sure about this at first. Some targets define patterns for
multiword moves and split them later. Others expose the split straight away.
The two approaches don't really imply any difference in cost, so I didn't
want us to penalise the latter.

But I suppose on targets that split straight away, lower-subreg would
tend to replace the multiword pseudo with individual word-sized pseudos,
so LRA shouldn't see them. I suppose this check shouldn't matter in practice.
+ if (def_p)
+ lra_process_new_insns (insn, NULL_RTX, new_insns,
+ "Add original<-inheritance");
+ else
+ lra_process_new_insns (insn, new_insns, NULL_RTX,
+ "Add inheritance<-pseudo");
Maybe "original" rather than "pseudo" here too for consistency.
+/* Return true if we need a split for hard register REGNO or pseudo
+ REGNO which was assigned to a hard register.
+ POTENTIAL_RELOAD_HARD_REGS contains hard registers which might be
+ used for reloads since the EBB end. It is an approximation of the
+ used hard registers in the split range. The exact value would
+ require expensive calculations. If we were aggressive with
+ splitting because of the approximation, the split pseudo will save
+ the same hard register assignment and will be removed in the undo
+ pass. We still need the approximation because too aggressive
+ splitting would result in too inaccurate cost calculation in the
+ assignment pass because of too many generated moves which will be
+ probably removed in the undo pass. */
+static inline bool
+need_for_split_p (HARD_REG_SET potential_reload_hard_regs, int regno)
+{
+ int hard_regno = regno < FIRST_PSEUDO_REGISTER ? regno : reg_renumber[regno];
+
+ lra_assert (hard_regno >= 0);
+ return ((TEST_HARD_REG_BIT (potential_reload_hard_regs, hard_regno)
+ && ! TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno)
+ && (usage_insns[regno].reloads_num
+ + (regno < FIRST_PSEUDO_REGISTER ? 0 : 2) < reloads_num)
+ && ((regno < FIRST_PSEUDO_REGISTER
+ && ! bitmap_bit_p (&ebb_global_regs, regno))
+ || (regno >= FIRST_PSEUDO_REGISTER
+ && lra_reg_info[regno].nrefs > 3
+ && bitmap_bit_p (&ebb_global_regs, regno))))
+ || (regno >= FIRST_PSEUDO_REGISTER && need_for_call_save_p (regno)));
+}
Could you add more commentary about the thinking behind this particular
choice of heuristic? E.g. I wasn't sure what the reloads_num check did,
or why we only split hard registers that are local to the EBB and only
split pseudos that aren't.

The 2 and 3 numbers seemed a bit magic too. I suppose the 2 has
something to do with "one save and one restore", but I wasn't sure
why we applied it only for pseudos. (AIUI that arm of the check
deals with "genuine" split pseudos rather than call saves & restores.)

Still, it says a lot for the high quality of LRA that, out of all the
1000s of lines of code I've read so far, this is the only part that
didn't seem to have an intuitive justification.
+ for (i = 0;
+ (cl = reg_class_subclasses[allocno_class][i]) != LIM_REG_CLASSES;
+ i++)
+ if (! SECONDARY_MEMORY_NEEDED (cl, hard_reg_class, mode)
+ && ! SECONDARY_MEMORY_NEEDED (hard_reg_class, cl, mode)
+ && TEST_HARD_REG_BIT (reg_class_contents[cl], hard_regno)
+ && (best_cl == NO_REGS
+ || (hard_reg_set_subset_p (reg_class_contents[best_cl],
+ reg_class_contents[cl])
+ && ! hard_reg_set_equal_p (reg_class_contents[best_cl],
+ reg_class_contents[cl]))))
+ best_cl = cl;
OK, so this suggestion isn't backed up by any evidence, but what do
you think about this alternative:

&& (best_cl == NO_REGS
|| (ira_class_hard_regs_num[best_cl]
< ira_class_hard_regs_num[cl]))

which should choose the largest class that requires no secondary memory.
It looks like the subset version could get "stuck" on a single-register
class that happens to be early in the list but has no superclass smaller
than allocno_class.
+/* Do split transformation for insn INSN defining or
+ using ORIGINAL_REGNO where the subsequent insn(s) in EBB (remember
+ we traverse insns in the backward direction) for the original regno
+ is NEXT_USAGE_INSNS. The transformations look like
Same suggestion as for the inheritance function above.
+ if (call_save_p)
+ save = emit_spill_move (true, new_reg, original_reg, -1);
+ else
+ {
+ start_sequence ();
+ emit_move_insn (new_reg, original_reg);
+ save = get_insns ();
+ end_sequence ();
+ }
+ if (NEXT_INSN (save) != NULL_RTX)
+ {
+ lra_assert (! call_save_p);
Is emit_spill_move really guaranteed to return only one instruction in
cases where emit_move_insn might not? Both of them use emit_move_insn_1
internally, so I wouldn't have expected much difference.

In fact I wasn't really sure why:

save = gen_move_insn (new, original_reg);

wouldn't be correct for both.

Same comments for the restore code.
+ /* See which defined values die here. */
+ for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+ if (reg->type == OP_OUT && ! reg->early_clobber
+ && (! reg->subreg_p
+ || bitmap_bit_p (&lra_bound_pseudos, reg->regno)))
+ bitmap_clear_bit (&live_regs, reg->regno);
+ /* Mark each used value as live. */
+ for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+ if (reg->type == OP_IN
+ && bitmap_bit_p (&check_only_regs, reg->regno))
+ bitmap_set_bit (&live_regs, reg->regno);
+ /* Mark early clobber outputs dead. */
+ for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+ if (reg->type == OP_OUT && reg->early_clobber && ! reg->subreg_p)
+ bitmap_clear_bit (&live_regs, reg->regno);
I don't think this would be correct for unreloaded insns because an
unreloaded insn can have the same pseudo as an input and an earlyclobber
output. (Probably not an issue here, since we're called after the
constraints pass.) There's also the case of matched earlyclobber operands,
where the matched input is specifically not affected by the earlyclobber.

I'd have thought:

/* See which defined values die here. */
for (reg = curr_id->regs; reg != NULL; reg = reg->next)
if (reg->type == OP_OUT
&& (! reg->subreg_p
|| bitmap_bit_p (&lra_bound_pseudos, reg->regno)))
bitmap_clear_bit (&live_regs, reg->regno);
/* Mark each used value as live. */
for (reg = curr_id->regs; reg != NULL; reg = reg->next)
if (reg->type == OP_IN
&& bitmap_bit_p (&check_only_regs, reg->regno))
bitmap_set_bit (&live_regs, reg->regno);

ought to be correct, but perhaps I'm missing something.

(I'm still uneasy about the special treatment of bound pseudos here.
A clobber really does seem better.)
+ /* It is quite important to remove dead move insns because it
+ means removing dead store, we don't need to process them for
+ constraints, and unfortunately some subsequent optimizations
+ (like shrink-wrapping) currently based on assumption that
+ there are no trivial dead insns. */
Maybe best to drop the "subsequent optimizations" part. This comment
is unlikely to be updated after any change to shrink-wrapping & co.,
and the first two justifications seem convincing enough on their own.
+/* Add inheritance info REGNO and INSNS. */
+static void
+add_to_inherit (int regno, rtx insns)
+{
+ int i;
+
+ for (i = 0; i < to_inherit_num; i++)
+ if (to_inherit[i].regno == regno)
+ return;
Is the existing "insns" field guaranteed to match the "insns" parameter
in this case, or might they be different? Probably worth an assert or
comment respectively.
+/* Return first (if FIRST_P) or last non-debug insn in basic block BB.
+ Return null if there are no non-debug insns in the block. */
+static rtx
+get_non_debug_insn (bool first_p, basic_block bb)
+{
+ rtx insn;
+
+ for (insn = first_p ? BB_HEAD (bb) : BB_END (bb);
+ insn != NULL_RTX && ! NONDEBUG_INSN_P (insn);
+ insn = first_p ? NEXT_INSN (insn) : PREV_INSN (insn))
+ ;
+ if (insn != NULL_RTX && BLOCK_FOR_INSN (insn) != bb)
+ insn = NULL_RTX;
+ return insn;
+}
It probably doesn't matter in practice, but it looks like it'd be better
to limit the walk to the bb, rather than walking until null and then
testing the bb after the walk.

Maybe it would be eaiser to split into two functions, since first_p is
always constant. E.g.:

rtx insn;

FOR_BB_INSNS (bb, insn)
if (NONDEBUG_INSN_P (insn))
return insn;
return NULL_RTX;

for first_p. s/FOR_BB_INSNS/FOR_BB_INSNS_REVERSE/ for !first_p.
+/* Set up RES by registers living on edges FROM except the edge (FROM,
+ TO) or by registers set up in a jump insn in BB FROM. */
+static void
+get_live_on_other_edges (basic_block from, basic_block to, bitmap res)
+{
+ int regno;
+ rtx last;
+ struct lra_insn_reg *reg;
+ edge e;
+ edge_iterator ei;
+
+ lra_assert (to != NULL);
+ bitmap_clear (res);
+ FOR_EACH_EDGE (e, ei, from->succs)
+ if (e->dest != to)
+ bitmap_ior_into (res, DF_LR_IN (e->dest));
+ if ((last = get_non_debug_insn (false, from)) == NULL_RTX || ! JUMP_P (last))
+ return;
+ curr_id = lra_get_insn_recog_data (last);
+ for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+ if (reg->type != OP_IN
+ && (regno = reg->regno) >= FIRST_PSEUDO_REGISTER)
+ bitmap_set_bit (res, regno);
+}
Probably a silly question, sorry, but: why does the JUMP_P part only
include pseudo registers? The other calculations (here and elsewhere)
seem to handle both hard and pseudo registers.
+/* Do inheritance/split transformations in EBB starting with HEAD and
+ finishing on TAIL. We process EBB insns in the reverse order.
+ Return true if we did any inheritance/split transformation in the
+ EBB.
+
+ We should avoid excessive splitting which results in worse code
+ because of inaccurate cost calculations for spilling new split
+ pseudos in such case. To achieve this we do splitting only if
+ register pressure is high in given basic block and there reload
"...and there are reload"
+ pseudos requiring hard registers. We could do more register
+ pressure calculations at any given program point to avoid necessary
+ splitting even more but it is to expensive and the current approach
+ is well enough. */
"works well enough".
+ change_p = false;
+ curr_usage_insns_check++;
+ reloads_num = calls_num = 0;
+ /* Remember: we can remove the current insn. */
+ bitmap_clear (&check_only_regs);
+ last_processed_bb = NULL;
I couldn't tell which part of the code the comment is referring to.
Maybe left over?
+ after_p = (last_insn != NULL_RTX && ! JUMP_P (last_insn)
+ && (! CALL_P (last_insn)
+ || (find_reg_note (last_insn,
+ REG_NORETURN, NULL) == NULL_RTX
+ && ((next_insn
+ = next_nonnote_nondebug_insn (last_insn))
+ == NULL_RTX
+ || GET_CODE (next_insn) != BARRIER))));
Genuine question, but: when are the last four lines needed? The condition
that they're testing for sounds like a noreturn call.
+ if (src_regno < lra_constraint_new_regno_start
+ && src_regno >= FIRST_PSEUDO_REGISTER
+ && reg_renumber[src_regno] < 0
+ && dst_regno >= lra_constraint_new_regno_start
+ && (cl = lra_get_allocno_class (dst_regno)) != NO_REGS)
+ {
+ /* 'reload_pseudo <- original_pseudo'. */
+ reloads_num++;
+ succ_p = false;
+ if (usage_insns[src_regno].check == curr_usage_insns_check
+ && (next_usage_insns = usage_insns[src_regno].insns) != NULL_RTX)
+ succ_p = inherit_reload_reg (false,
+ bitmap_bit_p (&lra_matched_pseudos,
+ dst_regno),
+ src_regno, cl,
+ curr_insn, next_usage_insns);
+ if (succ_p)
+ change_p = true;
+ else
+ {
+ usage_insns[src_regno].check = curr_usage_insns_check;
+ usage_insns[src_regno].insns = curr_insn;
+ usage_insns[src_regno].reloads_num = reloads_num;
+ usage_insns[src_regno].calls_num = calls_num;
+ usage_insns[src_regno].after_p = false;
+ }
Looks like this and other places could use the add_next_usage_insn
helper function.
+ if (cl != NO_REGS
+ && hard_reg_set_subset_p (reg_class_contents[cl],
+ live_hard_regs))
+ IOR_HARD_REG_SET (potential_reload_hard_regs,
+ reg_class_contents[cl]);
Redundant "cl != NO_REGS" check. (Was a bit confused by that at first.)

I don't understand the way potential_reload_hard_regs is set up.
Why does it only include reload pseudos involved in moves of the form
"reload_pseudo <- original_pseudo" and "original_pseudo <- reload_pseudo",
but include those reloads regardless of whether inheritance is possible?

I wondered whether it might be deliberately selective in order to speed
up LRA, but we walk all the registers in an insn regardless.

Same for reloads_num.
+ if (cl != NO_REGS
+ && hard_reg_set_subset_p (reg_class_contents[cl],
+ live_hard_regs))
+ IOR_HARD_REG_SET (potential_reload_hard_regs,
+ reg_class_contents[cl]);
Same comment as for the previous block.
+ if (reg_renumber[dst_regno] < 0
+ || (reg->type == OP_OUT && ! reg->subreg_p))
+ /* Invalidate. */
+ usage_insns[dst_regno].check = 0;
Could you explain this condition a bit more? Why does reg_renumber
affect things?
+/* This value affects EBB forming. If probability of edge from EBB to
+ a BB is not greater than the following value, we don't add the BB
+ to EBB. */
+#define EBB_PROBABILITY_CUTOFF (REG_BR_PROB_BASE / 2)
It looks like schedule_ebbs uses a higher default cutoff for FDO.
Would the same distinction be useful here?

Maybe schedule_ebbs-like params would be good here too.
+ bitmap_and (&temp_bitmap_head, removed_pseudos, live);
+ EXECUTE_IF_SET_IN_BITMAP (&temp_bitmap_head, 0, regno, bi)
This isn't going to have much effect on compile time, but
EXECUTE_IF_AND_IN_BITMAP avoids the need for a temporary bitmap.
+/* Remove inheritance/split pseudos which are in REMOVE_PSEUDOS and
+ return true if we did any change. The undo transformations for
+ inheritance looks like
+ i <- i2
+ p <- i => p <- i2
+ or removing
+ p <- i, i <- p, and i <- i3
+ where p is original pseudo from which inheritance pseudo i was
+ created, i and i3 are removed inheritance pseudos, i2 is another
+ not removed inheritance pseudo. All split pseudos or other
+ occurrences of removed inheritance pseudos are changed on the
+ corresponding original pseudos. */
+static bool
+remove_inheritance_pseudos (bitmap remove_pseudos)
+{
+ basic_block bb;
+ int regno, sregno, prev_sregno, dregno, restore_regno;
+ rtx set, prev_set, prev_insn;
+ bool change_p, done_p;
+
+ change_p = ! bitmap_empty_p (remove_pseudos);
I wondered from the comment why we couldn't just return straight away
for the empty set, but it looks like the function also schedules a
constraints pass for instructions that keep their inheritance or
split pseudos. Is that right? Might be worth mentioning that
in the function comment if so.
+ else if (bitmap_bit_p (remove_pseudos, sregno)
+ && bitmap_bit_p (&lra_inheritance_pseudos, sregno))
+ {
+ inherit_or_split_pseudo1 <- inherit_or_split_pseudo2
+ original_pseudo <- inherit_or_split_pseudo1
+ where the 2nd insn is the current insn and
+ inherit_or_split_pseudo2 is not removed. If it is found,
+ original_pseudo1 <- inherit_or_split_pseudo2. */
s/original_pseudo1/original_pseudo/ I think (we don't change the destination).
+ for (prev_insn = PREV_INSN (curr_insn);
+ prev_insn != NULL_RTX && ! NONDEBUG_INSN_P (prev_insn);
+ prev_insn = PREV_INSN (prev_insn))
+ ;
+ if (prev_insn != NULL_RTX && BLOCK_FOR_INSN (prev_insn) == bb
+ && (prev_set = single_set (prev_insn)) != NULL_RTX
+ /* There should be no subregs in insn we are
+ searching because only the original reg might
+ be in subreg when we changed the mode of
+ load/store for splitting. */
+ && REG_P (SET_DEST (prev_set))
+ && REG_P (SET_SRC (prev_set))
+ && (int) REGNO (SET_DEST (prev_set)) == sregno
+ && ((prev_sregno = REGNO (SET_SRC (prev_set)))
+ >= FIRST_PSEUDO_REGISTER)
+ && (lra_reg_info[sregno].restore_regno
+ == lra_reg_info[prev_sregno].restore_regno)
+ && ! bitmap_bit_p (remove_pseudos, prev_sregno))
I'm sure the restore_regno comparison near the end is correct,
but could you add a comment to explain it? The substitution
itself seems OK either way.
+ struct lra_insn_reg *reg;
+ bool insn_change_p = false;
+
+ curr_id = lra_get_insn_recog_data (curr_insn);
+ for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+ if ((regno = reg->regno) >= lra_constraint_new_regno_start
+ && lra_reg_info[regno].restore_regno >= 0)
Is the first part of the comparison needed? Most other places don't check,
so it looked at first glance like there was something special here.
+ {
+ if (change_p && bitmap_bit_p (remove_pseudos, regno))
+ {
+ restore_regno = lra_reg_info[regno].restore_regno;
+ substitute_pseudo (&curr_insn, regno,
+ regno_reg_rtx[restore_regno]);
+ insn_change_p = true;
+ }
+ else if (NONDEBUG_INSN_P (curr_insn))
+ {
+ lra_push_insn_and_update_insn_regno_info (curr_insn);
+ lra_set_used_insn_alternative_by_uid
+ (INSN_UID (curr_insn), -1);
+ }
+ }
+ if (insn_change_p)
+ {
+ lra_update_insn_regno_info (curr_insn);
+ if (lra_dump_file != NULL)
+ {
+ fprintf (lra_dump_file, " Restore original insn:\n");
+ debug_rtl_slim (lra_dump_file,
+ curr_insn, curr_insn, -1, 0);
+ }
+ }
AIUI we could have a partial restore, keeping some registers but
restoring others. Is that right? The dump entry made it sounds
like a full restore.

Maybe something like:

struct lra_insn_reg *reg;
bool restored_regs_p = false;
bool kept_regs_p = false;

curr_id = lra_get_insn_recog_data (curr_insn);
for (reg = curr_id->regs; reg != NULL; reg = reg->next)
{
regno = reg->regno;
restore_regno = lra_reg_info[regno].restore_regno;
if (restore_regno >= 0)
{
if (change_p && bitmap_bit_p (remove_pseudos, regno))
{
substitute_pseudo (&curr_insn, regno,
regno_reg_rtx[restore_regno]);
restored_regs_p = true;
}
else
kept_regs_p = true;
}
}
if (NONDEBUG_INSN_P (curr_insn) && kept_regs_p)
{
/* The instruction has changed since the previous
constraints pass. */
lra_push_insn_and_update_insn_regno_info (curr_insn);
lra_set_used_insn_alternative_by_uid
(INSN_UID (curr_insn), -1);
}
else if (restored_regs_p)
/* The instruction has been restored to the form that
it had during the previous constraints pass. */
lra_update_insn_regno_info (curr_insn);

if (restored_regs_p && lra_dump_file != NULL)
{
fprintf (lra_dump_file,
" Insn after restoring regs:\n");
debug_rtl_slim (lra_dump_file, curr_insn, curr_insn, -1, 0);
}

(if correct) might make the partial case clearer, but that's personal
preference, so please feel free to ignore, chop or change.

Also, is regno_reg_rtx[restore_regno] always correct when restoring
registers? I thought restore_regno could be a hard register and that
the hard register might not necessarily be used in the same mode as
the regno_reg_rtx[] entry.

That just leaves lra.h, lra-int.h and lra.c itself. I'm hoping to have
read through those by the middle of next week, but any comments about them
will probably just be banal stuff (even more than my comments so far :-))
so I deliberately left them to last.

Richard
Richard Sandiford
2012-10-13 07:42:13 UTC
Permalink
I'm having to correct my own comments again, sorry.
Post by Richard Sandiford
+ /* If this is post-increment, first copy the location to the reload reg. */
+ if (post && real_in != result)
+ emit_insn (gen_move_insn (result, real_in));
Nit, but real_in != result can never be true AIUI, and I was confused how
the code could be correct in that case. Maybe just remove it, or make
it an assert?
Probably obvious, but I meant "real_in == result" can never be true.
"real_in != result" could be removed or turned into an assert.
Post by Richard Sandiford
+ if (GET_CODE (op) == PLUS)
+ {
+ plus = op;
+ op = XEXP (op, 1);
+ }
Sorry, I'm complaining about old reload code again, but: does this
actually happen in LRA? In reload, a register operand could become a
PLUS because of elimination, but I thought LRA did things differently.
+ if (CONST_POOL_OK_P (mode, op)
+ && ((targetm.preferred_reload_class
+ (op, (enum reg_class) goal_alt[i]) == NO_REGS)
+ || no_input_reloads_p)
+ && mode != VOIDmode)
+ {
+ rtx tem = force_const_mem (mode, op);
+
+ change_p = true;
+ /* If we stripped a SUBREG or a PLUS above add it back. */
+ if (plus != NULL_RTX)
+ tem = gen_rtx_PLUS (mode, XEXP (plus, 0), tem);
and we shouldn't have (plus (constant ...) ...) after elimination
(or at all outside of a CONST). I don't understand why the code is
needed even in reload.
Scratch the thing about needing it for reload. It's obviously the
second second operand we're reloading, not the first, and it could
well be that an elimination displacement needs to be reloaded via
the constant pool.

The question for LRA still stands though.

Richard
Vladimir Makarov
2012-10-17 00:54:20 UTC
Permalink
Post by Richard Sandiford
Hi Vlad,
Comments for the rest of ira-constraints.c.
+ saved_base_reg = saved_base_reg2 = saved_index_reg = NULL_RTX;
+ change_p = equiv_address_substitution (&ad, addr_loc, mode, as, code);
+ if (ad.base_reg_loc != NULL)
+ {
+ if (process_addr_reg
+ (ad.base_reg_loc, before,
+ (ad.base_modify_p && REG_P (*ad.base_reg_loc)
+ && find_regno_note (curr_insn, REG_DEAD,
+ REGNO (*ad.base_reg_loc)) == NULL
+ ? after : NULL),
+ base_reg_class (mode, as, ad.base_outer_code, ad.index_code)))
+ change_p = true;
+ if (ad.base_reg_loc2 != NULL)
+ *ad.base_reg_loc2 = *ad.base_reg_loc;
+ saved_base_reg = *ad.base_reg_loc;
+ lra_eliminate_reg_if_possible (ad.base_reg_loc);
+ if (ad.base_reg_loc2 != NULL)
+ {
+ saved_base_reg2 = *ad.base_reg_loc2;
+ lra_eliminate_reg_if_possible (ad.base_reg_loc2);
+ }
We unconditionally make *ad.base_reg_loc2 = *ad.base_reg_loc, so it
might be clearer without saved_base_reg2. More below...
+ /* The following addressing is checked by constraints and
+ usually target specific legitimate address hooks do not
+ consider them valid. */
+ || GET_CODE (*addr_loc) == POST_DEC || GET_CODE (*addr_loc) == POST_INC
+ || GET_CODE (*addr_loc) == PRE_DEC || GET_CODE (*addr_loc) == PRE_DEC
+ || GET_CODE (*addr_loc) == PRE_MODIFY
+ || GET_CODE (*addr_loc) == POST_MODIFY
the whole lot could just be replaced by ad.base_modify_p, or perhaps
+ /* In this case we can not do anything because if it is wrong
+ that is because of wrong displacement. Remember that any
+ address was legitimate in non-strict sense before LRA. */
+ || ad.disp_loc == NULL)
It doesn't seem worth validating the address at all for ad.disp_loc == NULL.
if (ad.base_reg_loc != NULL
&& (process_addr_reg
(ad.base_reg_loc, before,
(ad.base_modify_p && REG_P (*ad.base_reg_loc)
&& find_regno_note (curr_insn, REG_DEAD,
REGNO (*ad.base_reg_loc)) == NULL
? after : NULL),
base_reg_class (mode, as, ad.base_outer_code, ad.index_code))))
{
change_p = true;
if (ad.base_reg_loc2 != NULL)
*ad.base_reg_loc2 = *ad.base_reg_loc;
}
if (ad.index_reg_loc != NULL
&& process_addr_reg (ad.index_reg_loc, before, NULL, INDEX_REG_CLASS))
change_p = true;
/* The address was valid before LRA. We only change its form if the
address has a displacement, so if it has no displacement it must
still be valid. */
if (ad.disp_loc == NULL)
return change_p;
/* See whether the address is still valid. Some ports do not check
displacements for eliminable registers, so we replace them
temporarily with the elimination target. */
saved_base_reg = saved_index_reg = NULL_RTX;
...
if (ok_p)
return change_p;
Yes, it has sense. I changed the code as you propose.
Post by Richard Sandiford
+#ifdef HAVE_lo_sum
+ {
+ rtx insn;
+ rtx last = get_last_insn ();
+
+ /* disp => lo_sum (new_base, disp) */
+ insn = emit_insn (gen_rtx_SET
+ (VOIDmode, new_reg,
+ gen_rtx_HIGH (Pmode, copy_rtx (*ad.disp_loc))));
+ code = recog_memoized (insn);
+ if (code >= 0)
+ {
+ rtx save = *ad.disp_loc;
+
+ *ad.disp_loc = gen_rtx_LO_SUM (Pmode, new_reg, *ad.disp_loc);
+ if (! valid_address_p (mode, *ad.disp_loc, as))
+ {
+ *ad.disp_loc = save;
+ code = -1;
+ }
+ }
+ if (code < 0)
+ delete_insns_since (last);
+ }
+#endif
Nice :-)
Purely for the record, I wondered whether the high part should be
generated with emit_move_insn(_1) instead, with the rhs of the move
being the HIGH rtx. That would allow targets to cope with cases where
the high part isn't represented directly as a HIGH. E.g. on MIPS and
Alpha, small-data accesses use the global register as the high part instead.
However, both MIPS and Alpha accept small-data addresses as legitimate
constants and addresses before and during reload and only introduce the
split form after reload. And I think that's how any other cases that
aren't simple HIGHs should be handled too. E.g. MIPS also represents
GOT page loads as HIGHs until after reload, and only then lowers the
HIGH to a GOT load. Allowing the backend to generate anything other
than a plain HIGH set here would be a double-edged sword.
So after all that I agree that the gen_rtx_SET above is better than
calling the move expanders.
Thanks for sharing your knowledge.
Post by Richard Sandiford
+ /* index * scale + disp => new base + index * scale */
+ enum reg_class cl = base_reg_class (mode, as, SCRATCH, SCRATCH);
+
+ lra_assert (INDEX_REG_CLASS != NO_REGS);
+ new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, "disp");
+ lra_assert (GET_CODE (*addr_loc) == PLUS);
+ lra_emit_move (new_reg, *ad.disp_loc);
+ if (CONSTANT_P (XEXP (*addr_loc, 1)))
+ XEXP (*addr_loc, 1) = XEXP (*addr_loc, 0);
+ XEXP (*addr_loc, 0) = new_reg;
The canonical form is (plus (mult ...) (reg)) rather than
(plus (reg) (mult ...)), but it looks like we create the latter.
It might happen because equiv substitution in LRA.
Post by Richard Sandiford
+ /* Some targets like ARM, accept address operands in
+ specific order -- try exchange them if necessary. */
+ if (! valid_address_p (mode, *addr_loc, as))
+ {
+ exchange_plus_ops (*addr_loc);
+ if (! valid_address_p (mode, *addr_loc, as))
+ exchange_plus_ops (*addr_loc);
+ }
but I think we should try the canonical form first. And I'd prefer it
if we didn't try the other form at all, especially in 4.8. It isn't
really the backend's job to reject non-canonical rtl. This might well
be another case where some targets need a (hopefully small) tweak in
order to play by the rules.
Also, I suppose this section of code feeds back to my question on
Wednesday about the distinction that LRA seems to make between the
(plus (reg X1) (const_int Y1))
(plus (reg X2) (symbol_ref Y2))
It looked like extract_address_regs classified X1 as a base register and
X2 as an index register. The difference between the two constants has
no run-time significance though, and I think we should handle both X1
and X2 as base registers (as I think reload does).
I think the path above would then be specific to scaled indices.
In the original address the "complex" index must come first and the
displacement second. In the modified address, the index would stay
first and the new base register would be second. More below.
As I wrote above the problem is also in that equiv substitution can
create non-canonical forms.
Post by Richard Sandiford
+ /* We don't use transformation 'base + disp => base + new index'
+ because of bad practice used in some machine descriptions
+ (see comments for emit_spill_move). */
+ /* base + disp => new base */
As before when commenting on emit_spill_move, I think we should leave
the "bad machine description" stuff out of 4.8 and treat fixing the
machine descriptions as part of the LRA port.
In this case I think there's another reason not to reload the
displacement into an index though: IIRC postreload should be able
to optimise a sequence of address reloads that have the same base
and different displacements. LRA itself might try using "anchor"
bases in future -- although obviously not in the initial merge --
since that was one thing that LEGITIMIZE_RELOAD_ADDRESS was used for.
/* base + disp => new base */
/* Another option would be to reload the displacement into an
index register. However, postreload has code to optimize
address reloads that have the same base and different
displacements, so reloading into an index register would
not necessarily be a win. */
Fixed.
Post by Richard Sandiford
+ /* base + scale * index + disp => new base + scale * index */
+ new_reg = base_plus_disp_to_reg (mode, as, &ad);
+ *addr_loc = gen_rtx_PLUS (Pmode, new_reg, *ad.index_loc);
+ if (! valid_address_p (mode, *addr_loc, as))
+ {
+ /* Some targets like ARM, accept address operands in
+ specific order -- try exchange them if necessary. */
+ exchange_plus_ops (*addr_loc);
+ if (! valid_address_p (mode, *addr_loc, as))
+ exchange_plus_ops (*addr_loc);
+ }
Same comment as above about canonical rtl. Here we can have two
registers -- in which case the base should come first -- or a more
complex index -- in which case the index should come first.
We should be able to pass both rtxes to simplify_gen_binary (PLUS, ...),
with the operands in either order, and let it take care of the details.
Using simplify_gen_binary would help with the earlier index+disp case too.
Equiv substitution can create non-canonical forms. There are 2 approaches:
o have a code for dealing with non-canonical forms (equiv substitution,
target stupidity)
o always support canonical forms and require them from targets.

I decided to use the 1st variant but I am reconsidering it. I'll try to
fix before inclusion. But I am not sure I have time for this. All
these changes makes LRA unstable. In fact, I've just found that changes
I already made so far resulted in 2 SPEC2000 tests broken although GCC
testsuite and bootstrap looks good.

As I wrote we also have two different approaches to implement LRA. Fist
one is to make easier porting LRA to other targets (which still requires
some changes on target side). The 2nd one is to require targets play by
strict rules. I believe that the 2nd approach will slow down transition
to LRA and removing reload. But probably it is more right.

After recent changes required by you, we are now in some middle
approach. A lot of targets on lra branch are broken by these changes
and I will need some efforts on target dependent side to make them working.

If it were only target stupidity (as in ARM case), I would remove this
code right away. But unfortunately, equiv substitution is also a
problem. And right now, I have no time to fix it using your approach.
The whole idea of including LRA for x86 for 4.8 was to get a real
experience and harsh testing for LRA. I am sorry, Richard but I think
solving this problem as you want might result in postponing LRA
inclusion until gcc4.9.
Post by Richard Sandiford
+ /* If this is post-increment, first copy the location to the reload reg. */
+ if (post && real_in != result)
+ emit_insn (gen_move_insn (result, real_in));
Nit, but real_in != result can never be true AIUI, and I was confused how
the code could be correct in that case. Maybe just remove it, or make
it an assert?
No, it might be true:

real_in = in == value ? incloc : in;
...
if (cond)
result = incloc;
else
result = ...

if (post && real_in != result)

So it is true if in==value && cond
Post by Richard Sandiford
+ /* We suppose that there are insns to add/sub with the constant
+ increment permitted in {PRE/POST)_{DEC/INC/MODIFY}. At least the
+ old reload worked with this assumption. If the assumption
+ becomes wrong, we should use approach in function
+ base_plus_disp_to_reg. */
+ if (in == value)
+ {
+ /* See if we can directly increment INCLOC. */
+ last = get_last_insn ();
+ add_insn = emit_insn (plus_p
+ ? gen_add2_insn (incloc, inc)
+ : gen_sub2_insn (incloc, inc));
+
+ code = recog_memoized (add_insn);
+ /* We should restore recog_data for the current insn. */
Looks like this comment might be a left-over, maybe from before the
cached insn data?
Yes. The very first variant used recog_data and was much slower. I
removed the comment.
Post by Richard Sandiford
+ /* Restore non-modified value for the result. We prefer this
+ way because it does not require an addition hard
+ register. */
+ if (plus_p)
+ {
+ if (CONST_INT_P (inc))
+ emit_insn (gen_add2_insn (result, GEN_INT (-INTVAL (inc))));
+ else
+ emit_insn (gen_sub2_insn (result, inc));
+ }
+ else if (CONST_INT_P (inc))
+ emit_insn (gen_add2_insn (result, inc));
The last two lines look redundant. The behaviour is the same as for
+ else
+ emit_insn (gen_add2_insn (result, inc));
Fixed.
Post by Richard Sandiford
and I don't think there are any cases where !plus && CONST_INT_P (inc)
would hold.
It seems I thought about this and therefore I added a placeholder which
I can remove now.
Post by Richard Sandiford
+/* Main entry point of this file: search the body of the current insn
s/this file/the constraints code/, since it's a static function.
Fixed.
Post by Richard Sandiford
+ if (change_p)
+ /* Changes in the insn might result in that we can not satisfy
+ constraints in lately used alternative of the insn. */
+ lra_set_used_insn_alternative (curr_insn, -1);
/* If we've changed the instruction then any alternative that
we chose previously may no longer be valid. */
Fixed.
Post by Richard Sandiford
+ rtx x;
+
+ curr_swapped = !curr_swapped;
+ if (curr_swapped)
+ {
+ x = *curr_id->operand_loc[commutative];
+ *curr_id->operand_loc[commutative]
+ = *curr_id->operand_loc[commutative + 1];
+ *curr_id->operand_loc[commutative + 1] = x;
+ /* Swap the duplicates too. */
+ lra_update_dup (curr_id, commutative);
+ lra_update_dup (curr_id, commutative + 1);
+ goto try_swapped;
+ }
+ else
+ {
+ x = *curr_id->operand_loc[commutative];
+ *curr_id->operand_loc[commutative]
+ = *curr_id->operand_loc[commutative + 1];
+ *curr_id->operand_loc[commutative + 1] = x;
+ lra_update_dup (curr_id, commutative);
+ lra_update_dup (curr_id, commutative + 1);
+ }
The swap code is the same in both cases, so I think it'd be better to
make it common. Or possibly a helper function, since the same code
appears again later on.
Fixed.
Post by Richard Sandiford
+ if (GET_CODE (op) == PLUS)
+ {
+ plus = op;
+ op = XEXP (op, 1);
+ }
Sorry, I'm complaining about old reload code again, but: does this
actually happen in LRA? In reload, a register operand could become a
PLUS because of elimination, but I thought LRA did things differently.
No, I don't think it happens in LRA. It is a leftover from reload
code. I removed it.
Post by Richard Sandiford
+ if (CONST_POOL_OK_P (mode, op)
+ && ((targetm.preferred_reload_class
+ (op, (enum reg_class) goal_alt[i]) == NO_REGS)
+ || no_input_reloads_p)
+ && mode != VOIDmode)
+ {
+ rtx tem = force_const_mem (mode, op);
+
+ change_p = true;
+ /* If we stripped a SUBREG or a PLUS above add it back. */
+ if (plus != NULL_RTX)
+ tem = gen_rtx_PLUS (mode, XEXP (plus, 0), tem);
and we shouldn't have (plus (constant ...) ...) after elimination
(or at all outside of a CONST). I don't understand why the code is
needed even in reload.
I removed the plus code.
Post by Richard Sandiford
+ for (i = 0; i < n_operands; i++)
+ {
+ rtx old, new_reg;
+ rtx op = *curr_id->operand_loc[i];
+
+ if (goal_alt_win[i])
+ {
+ if (goal_alt[i] == NO_REGS
+ && REG_P (op)
+ && lra_former_scratch_operand_p (curr_insn, i))
+ change_class (REGNO (op), NO_REGS, " Change", true);
I think this could do with a comment. Does setting the class to NO_REGS
indirectly cause the operand to be switched back to a SCRATCH?
When we assign NO_REGS it means that we will not assign a hard register
to the scratch pseudo and the scratch pseudo will be spilled. Spilled
scratch pseudos are transformed back to scratches at the LRA end. I've
added a comment.
Post by Richard Sandiford
+ push_to_sequence (before);
+ rclass = base_reg_class (GET_MODE (op), MEM_ADDR_SPACE (op),
+ MEM, SCRATCH);
+ if (code == PRE_DEC || code == POST_DEC
+ || code == PRE_INC || code == POST_INC
+ || code == PRE_MODIFY || code == POST_MODIFY)
Very minor, but: GET_RTX_CLASS (code) == RTX_AUTOINC
Fixed.
Post by Richard Sandiford
+ enum machine_mode mode;
+ rtx reg, *loc;
+ int hard_regno, byte;
+ enum op_type type = curr_static_id->operand[i].type;
+
+ loc = curr_id->operand_loc[i];
+ mode = get_op_mode (i);
+ if (GET_CODE (*loc) == SUBREG)
+ {
+ reg = SUBREG_REG (*loc);
+ byte = SUBREG_BYTE (*loc);
+ if (REG_P (reg)
+ /* Strict_low_part requires reload the register not
+ the sub-register. */
+ && (curr_static_id->operand[i].strict_low
+ || (GET_MODE_SIZE (mode)
+ <= GET_MODE_SIZE (GET_MODE (reg))
+ && (hard_regno
+ = get_try_hard_regno (REGNO (reg))) >= 0
+ && (simplify_subreg_regno
+ (hard_regno,
+ GET_MODE (reg), byte, mode) < 0)
+ && (goal_alt[i] == NO_REGS
+ || (simplify_subreg_regno
+ (ira_class_hard_regs[goal_alt[i]][0],
+ GET_MODE (reg), byte, mode) >= 0)))))
+ {
+ loc = &SUBREG_REG (*loc);
+ mode = GET_MODE (*loc);
+ }
+ old = *loc;
I think this needs a bit more justifying commentary (although I'm glad
to see it's much simpler than the reload version :-)). One thing in
particular I didn't understand was why we don't reload the inner
register of a paradoxical subreg.
It seems not necessary. As we use the biggest mode, pseudo gets all hard
registers or wide stack slot for paradoxical subreg.
Post by Richard Sandiford
+ if (get_reload_reg (type, mode, old, goal_alt[i], "", &new_reg)
+ && type != OP_OUT)
+ {
+ push_to_sequence (before);
+ lra_emit_move (new_reg, old);
+ before = get_insns ();
+ end_sequence ();
+ }
+ *loc = new_reg;
+ if (type != OP_IN)
+ {
+ if (find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX)
+ {
+ start_sequence ();
+ /* We don't want sharing subregs as the pseudo can
+ get a memory and the memory can be processed
+ several times for eliminations. */
+ lra_emit_move (GET_CODE (old) == SUBREG && type == OP_INOUT
+ ? copy_rtx (old) : old,
+ new_reg);
lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg);
leaving copy_rtx to figure out which rtxes can be shared. No comment
would be needed for that.
Fixed.
Post by Richard Sandiford
+ emit_insn (after);
+ after = get_insns ();
+ end_sequence ();
+ }
+ *loc = new_reg;
+ }
Very minor again, but: redundant *loc assignment (so that the two nested
if statements collapse to one).
Fixed. It seems a leftover from previous modifications. First variants
of the function were more complicated.
Post by Richard Sandiford
+ else
+ {
+ lra_assert (INSN_CODE (curr_insn) < 0);
+ error_for_asm (curr_insn,
+ "inconsistent operand constraints in an %<asm%>");
+ /* Avoid further trouble with this insn. */
+ PATTERN (curr_insn) = gen_rtx_USE (VOIDmode, const0_rtx);
+ return false;
Is this code handling a different case from the corresponding error
code in curr_insn_transform? If so, it probably deserves a comment
explaining the difference.
I have no idea. It is taken from reload. I think we could try to
remove it. If process_alt_operands finds an alternative we should
generate a code anyway. I'll make it unreachable.
Post by Richard Sandiford
+/* Process all regs in debug location *LOC and change them on
+ equivalent substitution. Return true if any change was done. */
+static bool
+debug_loc_equivalence_change_p (rtx *loc)
This doesn't keep the rtl in canonical form. Probably the easiest and
best fix is to use simplify_replace_fn_rtx, which handles all that for you.
(simplify_replace_fn_rtx returns the original rtx if no change was made.)
See comments above about canonical forms.
Post by Richard Sandiford
+ for (i = FIRST_PSEUDO_REGISTER; i < new_regno_start; i++)
+ ira_reg_equiv[i].profitable_p = true;
+ for (i = FIRST_PSEUDO_REGISTER; i < new_regno_start; i++)
+ if (lra_reg_info[i].nrefs != 0)
+ {
+ if ((hard_regno = lra_get_regno_hard_regno (i)) >= 0)
+ {
+ int j, nregs = hard_regno_nregs[hard_regno][PSEUDO_REGNO_MODE (i)];
+
+ for (j = 0; j < nregs; j++)
+ df_set_regs_ever_live (hard_regno + j, true);
+ }
+ else if ((x = get_equiv_substitution (regno_reg_rtx[i])) != NULL_RTX)
+ {
+ if (! first_p && contains_reg_p (x, false, false))
+ /* After RTL transformation, we can not guarantee that
+ pseudo in the substitution was not reloaded which
+ might make equivalence invalid. For example, in
+ reverse equiv of p0
+
+ p0 <- ...
+ ...
+ equiv_mem <- p0
+
+ the memory address register was reloaded before the
+ 2nd insn. */
+ ira_reg_equiv[i].defined_p = false;
+ if (contains_reg_p (x, false, true))
+ ira_reg_equiv[i].profitable_p = false;
+ }
+ }
Do we need two loops because the second may check for equivalences
of other pseudos besides "i"? I couldn't see how offhand, but I might
well have missed something. Might be worth a comment.
No, we don't need two loops. I merged them.
Post by Richard Sandiford
+ dest_reg = SET_DEST (set);
+ /* The equivalence pseudo could be set up as SUBREG in a
+ case when it is a call restore insn in a mode
+ different from the pseudo mode. */
+ if (GET_CODE (dest_reg) == SUBREG)
+ dest_reg = SUBREG_REG (dest_reg);
+ if ((REG_P (dest_reg)
+ && (x = get_equiv_substitution (dest_reg)) != dest_reg
+ /* Remove insns which set up a pseudo whose value
+ can not be changed. Such insns might be not in
+ init_insns because we don't update equiv data
+ during insn transformations.
+
+ As an example, let suppose that a pseudo got
+ hard register and on the 1st pass was not
+ changed to equivalent constant. We generate an
+ additional insn setting up the pseudo because of
+ secondary memory movement. Then the pseudo is
+ spilled and we use the equiv constant. In this
+ case we should remove the additional insn and
+ this insn is not init_insns list. */
+ && (! MEM_P (x) || MEM_READONLY_P (x)
+ || in_list_p (curr_insn,
+ ira_reg_equiv
+ [REGNO (dest_reg)].init_insns)))
This is probably a stupid question, sorry, but when do we ever want
if ((REG_P (dest_reg)
&& (x = get_equiv_substitution (dest_reg)) != dest_reg)
Equivalence can be memory location. It means you can assign many
different values to the location. Removing them would generate a wrong
code. For example, if you use your simple variant of code, you will
have > 100 test failures of GCC testsuite.
Post by Richard Sandiford
+/* Info about last usage of registers in EBB to do inheritance/split
+ transformation. Inheritance transformation is done from a spilled
+ pseudo and split transformations from a hard register or a pseudo
+ assigned to a hard register. */
+struct usage_insns
+{
+ /* If the value is equal to CURR_USAGE_INSNS_CHECK, then the member
+ value INSNS is valid. The insns is chain of optional debug insns
+ and a finishing non-debug insn using the corresponding reg. */
+ int check;
+ /* Value of global reloads_num at the ???corresponding next insns. */
+ int reloads_num;
+ /* Value of global reloads_num at the ???corresponding next insns. */
+ int calls_num;
"???s". Probably "at the last instruction in INSNS" if that's accurate
(because debug insns in INSNS don't affect these fields).
Fixed.
Post by Richard Sandiford
+/* Process all regs OLD_REGNO in location *LOC and change them on the
+ reload pseudo NEW_REG. Return true if any change was done. */
+static bool
+substitute_pseudo (rtx *loc, int old_regno, rtx new_reg)
This is another case where I found the term "reload pseudo" a bit confusing,
since AIUI new_reg can be an inheritance or split pseudo rather than a pseudo
created solely for insn reloads. I'll follow up about that on the original
/* Replace all references to register OLD_REGNO in *LOC with pseudo register
NEW_REG. Return true if any change was made. */
Fixed.
Post by Richard Sandiford
+ code = GET_CODE (x);
+ if (code == REG && (int) REGNO (x) == old_regno)
+ {
+ *loc = new_reg;
+ return true;
+ }
Maybe assert that the modes are the same?
I've just realized that the modes might be different because of
secondary_memory_needed_mode in split_reg (probably small chance). So I
added a code dealing with it. Thanks, Richard.
Post by Richard Sandiford
+/* Do inheritance transformation for insn INSN defining (if DEF_P) or
+ using ORIGINAL_REGNO where the subsequent insn(s) in EBB (remember
+ we traverse insns in the backward direction) for the original regno
+ is NEXT_USAGE_INSNS. The transformations look like
/* Do interitance transformations for insn INSN, which defines (if DEF_P)
or uses ORIGINAL_REGNO. NEXT_USAGE_INSNS specifies which instruction
in the EBB next uses ORIGINAL_REGNO; it has the same form as the
"insns" field of usage_insns.
Fixed.
Post by Richard Sandiford
+
+ p <- ... i <- ...
+ ... p <- i (new insn)
+ ... =>
+ <- ... p ... <- ... i ...
+ or
+ ... i <- p (new insn)
+ <- ... p ... <- ... i ...
+ ... =>
+ <- ... p ... <- ... i ...
+ where p is a spilled original pseudo and i is a new inheritance pseudo.
+
+ The inheritance pseudo has the smallest class of two classes CL and
+ class of ORIGINAL REGNO. It will have unique value if UNIQ_P. The
+ unique value is necessary for correct assignment to inheritance
+ pseudo for input of an insn which should be the same as output
+ (bound pseudos). Return true if we succeed in such
+ transformation. */
This comment looks really good, but I still wasn't sure about the
i <- p [new insn]
r <- ... p ... r <- ... i ... [input reload]
r <- ... r ... => r <- ... r ... [original insn]
<- r <- r [output reload]
.... ......
<- ... p ... <- ... i ... [next ref]
where "r" is used on both sides of the original insn and where the
output reload assigns to something other than "p" (otherwise "next ref"
wouldn't be the next ref). But why does this affect the way "i" is created?
I think it'd be worth expanding that part a bit.
I realized that code is not necessary anymore. It was necessary when I
generated 2 pseudos for reloading matching operands with different
modes. Now I use one pseudo for this and use (may be illegal in other
parts of GCC) subregs of this pseudo. I modified the code.
Post by Richard Sandiford
+ if (! ira_reg_classes_intersect_p[cl][rclass])
+ {
+ if (lra_dump_file != NULL)
+ {
+ fprintf (lra_dump_file,
+ " Rejecting inheritance for %d "
+ "because of too different classes %s and %s\n",
Suggest s/too different/disjoint/
Fixed.
Post by Richard Sandiford
+ if ((ira_class_subset_p[cl][rclass] && cl != rclass)
+ || ira_class_hard_regs_num[cl] < ira_class_hard_regs_num[rclass])
+ {
+ if (lra_dump_file != NULL)
+ fprintf (lra_dump_file, " Use smallest class of %s and %s\n",
+ reg_class_names[cl], reg_class_names[rclass]);
+
+ rclass = cl;
+ }
I don't understand the second line of the if statement. Why do we prefer
classes with fewer allocatable registers?
My guess before reading the code was that we'd use the subunion of CL and
RCLASS, so maybe a comment explaining why we use this choice would help.
I added a comment

/* We don't use a subset of two classes because it can be
NO_REGS. This transformation is still profitable in most
cases even if the classes are not intersected as register
move is probably cheaper than a memory load. */
Post by Richard Sandiford
+ if (NEXT_INSN (new_insns) != NULL_RTX)
+ {
+ if (lra_dump_file != NULL)
+ {
+ fprintf (lra_dump_file,
+ " Rejecting inheritance %d->%d "
+ "as it results in 2 or more insns:\n",
+ original_regno, REGNO (new_reg));
+ debug_rtl_slim (lra_dump_file, new_insns, NULL_RTX, -1, 0);
+ fprintf (lra_dump_file,
+ " >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\n");
+ }
+ return false;
+ }
Hmm, I wasn't sure about this at first. Some targets define patterns for
multiword moves and split them later. Others expose the split straight away.
The two approaches don't really imply any difference in cost, so I didn't
want us to penalise the latter.
But I suppose on targets that split straight away, lower-subreg would
tend to replace the multiword pseudo with individual word-sized pseudos,
so LRA shouldn't see them. I suppose this check shouldn't matter in practice.
I think I saw it on practice and therefore I added the code. I don't
remember details although. I tried to document in commentaries some
specific cases in LRA. But probably I should have made more comments
with examples in the code because it is easy to forget them (too many
cases and details).
Post by Richard Sandiford
+ if (def_p)
+ lra_process_new_insns (insn, NULL_RTX, new_insns,
+ "Add original<-inheritance");
+ else
+ lra_process_new_insns (insn, new_insns, NULL_RTX,
+ "Add inheritance<-pseudo");
Maybe "original" rather than "pseudo" here too for consistency.
Fixed.
Post by Richard Sandiford
+/* Return true if we need a split for hard register REGNO or pseudo
+ REGNO which was assigned to a hard register.
+ POTENTIAL_RELOAD_HARD_REGS contains hard registers which might be
+ used for reloads since the EBB end. It is an approximation of the
+ used hard registers in the split range. The exact value would
+ require expensive calculations. If we were aggressive with
+ splitting because of the approximation, the split pseudo will save
+ the same hard register assignment and will be removed in the undo
+ pass. We still need the approximation because too aggressive
+ splitting would result in too inaccurate cost calculation in the
+ assignment pass because of too many generated moves which will be
+ probably removed in the undo pass. */
+static inline bool
+need_for_split_p (HARD_REG_SET potential_reload_hard_regs, int regno)
+{
+ int hard_regno = regno < FIRST_PSEUDO_REGISTER ? regno : reg_renumber[regno];
+
+ lra_assert (hard_regno >= 0);
+ return ((TEST_HARD_REG_BIT (potential_reload_hard_regs, hard_regno)
+ && ! TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno)
+ && (usage_insns[regno].reloads_num
+ + (regno < FIRST_PSEUDO_REGISTER ? 0 : 2) < reloads_num)
+ && ((regno < FIRST_PSEUDO_REGISTER
+ && ! bitmap_bit_p (&ebb_global_regs, regno))
+ || (regno >= FIRST_PSEUDO_REGISTER
+ && lra_reg_info[regno].nrefs > 3
+ && bitmap_bit_p (&ebb_global_regs, regno))))
+ || (regno >= FIRST_PSEUDO_REGISTER && need_for_call_save_p (regno)));
+}
Could you add more commentary about the thinking behind this particular
choice of heuristic? E.g. I wasn't sure what the reloads_num check did,
or why we only split hard registers that are local to the EBB and only
split pseudos that aren't.
The 2 and 3 numbers seemed a bit magic too. I suppose the 2 has
something to do with "one save and one restore", but I wasn't sure
why we applied it only for pseudos. (AIUI that arm of the check
deals with "genuine" split pseudos rather than call saves & restores.)
Still, it says a lot for the high quality of LRA that, out of all the
1000s of lines of code I've read so far, this is the only part that
didn't seem to have an intuitive justification.
Yes, right. I checked many parameters and finally picked out the above
ones. What I found is that aggressive and even moderate splitting
usually result in worse code. Most splitting is undone and if we do
splitting aggressively we generate a lot of insns which will be removed
but their costs are taken into account during assignment pass.
Inheritance has quite bigger chance to successful. One reason why
splitting is undone is that spilling + inheritance of short living
pseudos is a substitution of splitting in some way. Therefore I do
splitting for long living pseudos. That means non local pseudos.

I added the following comments.

return ((TEST_HARD_REG_BIT (potential_reload_hard_regs, hard_regno)
&& ! TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno)
/* We need at least 2 reloads to make pseudo splitting
profitable. We should provide hard regno splitting in
any case to solve 1st insn scheduling problem when
moving hard register definition up might result in
impossibility to find hard register for reload pseudo of
small register class. */
&& (usage_insns[regno].reloads_num
+ (regno < FIRST_PSEUDO_REGISTER ? 0 : 2) < reloads_num)
&& (regno < FIRST_PSEUDO_REGISTER
/* For short living pseudos, spilling + inheritance can
be considered a substitution for splitting.
Therefore we do not splitting for local pseudos. It
decreases also aggressiveness of splitting. The
minimal number of references is chosen taking into
account that for 2 references splitting has no sense
as we can just spill the pseudo. */
|| (regno >= FIRST_PSEUDO_REGISTER
&& lra_reg_info[regno].nrefs > 3
&& bitmap_bit_p (&ebb_global_regs, regno))))

I removed checking that hard_regno is local. It was a parameter to
decrease aggressiveness of hard register splitting but I realized that
it contradicts the reason for splitting to solve 1st insn scheduling
problem because regional scheduling can move hard register definition
beyond BB.
Post by Richard Sandiford
+ for (i = 0;
+ (cl = reg_class_subclasses[allocno_class][i]) != LIM_REG_CLASSES;
+ i++)
+ if (! SECONDARY_MEMORY_NEEDED (cl, hard_reg_class, mode)
+ && ! SECONDARY_MEMORY_NEEDED (hard_reg_class, cl, mode)
+ && TEST_HARD_REG_BIT (reg_class_contents[cl], hard_regno)
+ && (best_cl == NO_REGS
+ || (hard_reg_set_subset_p (reg_class_contents[best_cl],
+ reg_class_contents[cl])
+ && ! hard_reg_set_equal_p (reg_class_contents[best_cl],
+ reg_class_contents[cl]))))
+ best_cl = cl;
OK, so this suggestion isn't backed up by any evidence, but what do
&& (best_cl == NO_REGS
|| (ira_class_hard_regs_num[best_cl]
< ira_class_hard_regs_num[cl]))
which should choose the largest class that requires no secondary memory.
It looks like the subset version could get "stuck" on a single-register
class that happens to be early in the list but has no superclass smaller
than allocno_class.
It seems reasonable. I choose your variant.
Post by Richard Sandiford
+/* Do split transformation for insn INSN defining or
+ using ORIGINAL_REGNO where the subsequent insn(s) in EBB (remember
+ we traverse insns in the backward direction) for the original regno
+ is NEXT_USAGE_INSNS. The transformations look like
Same suggestion as for the inheritance function above.
Fixed.
Post by Richard Sandiford
+ if (call_save_p)
+ save = emit_spill_move (true, new_reg, original_reg, -1);
+ else
+ {
+ start_sequence ();
+ emit_move_insn (new_reg, original_reg);
+ save = get_insns ();
+ end_sequence ();
+ }
+ if (NEXT_INSN (save) != NULL_RTX)
+ {
+ lra_assert (! call_save_p);
Is emit_spill_move really guaranteed to return only one instruction in
cases where emit_move_insn might not? Both of them use emit_move_insn_1
internally, so I wouldn't have expected much difference.
save = gen_move_insn (new, original_reg);
For caller save, modes new and original can be different.
Post by Richard Sandiford
wouldn't be correct for both.
Same comments for the restore code.
I fixed it using only emit_spill_move.
Post by Richard Sandiford
+ /* See which defined values die here. */
+ for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+ if (reg->type == OP_OUT && ! reg->early_clobber
+ && (! reg->subreg_p
+ || bitmap_bit_p (&lra_bound_pseudos, reg->regno)))
+ bitmap_clear_bit (&live_regs, reg->regno);
+ /* Mark each used value as live. */
+ for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+ if (reg->type == OP_IN
+ && bitmap_bit_p (&check_only_regs, reg->regno))
+ bitmap_set_bit (&live_regs, reg->regno);
+ /* Mark early clobber outputs dead. */
+ for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+ if (reg->type == OP_OUT && reg->early_clobber && ! reg->subreg_p)
+ bitmap_clear_bit (&live_regs, reg->regno);
I don't think this would be correct for unreloaded insns because an
unreloaded insn can have the same pseudo as an input and an earlyclobber
output. (Probably not an issue here, since we're called after the
constraints pass.) There's also the case of matched earlyclobber operands,
where the matched input is specifically not affected by the earlyclobber.
/* See which defined values die here. */
for (reg = curr_id->regs; reg != NULL; reg = reg->next)
if (reg->type == OP_OUT
&& (! reg->subreg_p
|| bitmap_bit_p (&lra_bound_pseudos, reg->regno)))
bitmap_clear_bit (&live_regs, reg->regno);
/* Mark each used value as live. */
for (reg = curr_id->regs; reg != NULL; reg = reg->next)
if (reg->type == OP_IN
&& bitmap_bit_p (&check_only_regs, reg->regno))
bitmap_set_bit (&live_regs, reg->regno);
ought to be correct, but perhaps I'm missing something.
(I'm still uneasy about the special treatment of bound pseudos here.
A clobber really does seem better.)
It seems a right variant. I am going to use it.
Post by Richard Sandiford
+ /* It is quite important to remove dead move insns because it
+ means removing dead store, we don't need to process them for
+ constraints, and unfortunately some subsequent optimizations
+ (like shrink-wrapping) currently based on assumption that
+ there are no trivial dead insns. */
Maybe best to drop the "subsequent optimizations" part. This comment
is unlikely to be updated after any change to shrink-wrapping & co.,
and the first two justifications seem convincing enough on their own.
Fixed.
Post by Richard Sandiford
+/* Add inheritance info REGNO and INSNS. */
+static void
+add_to_inherit (int regno, rtx insns)
+{
+ int i;
+
+ for (i = 0; i < to_inherit_num; i++)
+ if (to_inherit[i].regno == regno)
+ return;
Is the existing "insns" field guaranteed to match the "insns" parameter
in this case, or might they be different? Probably worth an assert or
comment respectively.
I added a comment.
Post by Richard Sandiford
+/* Return first (if FIRST_P) or last non-debug insn in basic block BB.
+ Return null if there are no non-debug insns in the block. */
+static rtx
+get_non_debug_insn (bool first_p, basic_block bb)
+{
+ rtx insn;
+
+ for (insn = first_p ? BB_HEAD (bb) : BB_END (bb);
+ insn != NULL_RTX && ! NONDEBUG_INSN_P (insn);
+ insn = first_p ? NEXT_INSN (insn) : PREV_INSN (insn))
+ ;
+ if (insn != NULL_RTX && BLOCK_FOR_INSN (insn) != bb)
+ insn = NULL_RTX;
+ return insn;
+}
It probably doesn't matter in practice, but it looks like it'd be better
to limit the walk to the bb, rather than walking until null and then
testing the bb after the walk.
Maybe it would be eaiser to split into two functions, since first_p is
rtx insn;
FOR_BB_INSNS (bb, insn)
if (NONDEBUG_INSN_P (insn))
return insn;
return NULL_RTX;
for first_p. s/FOR_BB_INSNS/FOR_BB_INSNS_REVERSE/ for !first_p.
That is better. Fixed.
Post by Richard Sandiford
+/* Set up RES by registers living on edges FROM except the edge (FROM,
+ TO) or by registers set up in a jump insn in BB FROM. */
+static void
+get_live_on_other_edges (basic_block from, basic_block to, bitmap res)
+{
+ int regno;
+ rtx last;
+ struct lra_insn_reg *reg;
+ edge e;
+ edge_iterator ei;
+
+ lra_assert (to != NULL);
+ bitmap_clear (res);
+ FOR_EACH_EDGE (e, ei, from->succs)
+ if (e->dest != to)
+ bitmap_ior_into (res, DF_LR_IN (e->dest));
+ if ((last = get_non_debug_insn (false, from)) == NULL_RTX || ! JUMP_P (last))
+ return;
+ curr_id = lra_get_insn_recog_data (last);
+ for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+ if (reg->type != OP_IN
+ && (regno = reg->regno) >= FIRST_PSEUDO_REGISTER)
+ bitmap_set_bit (res, regno);
+}
Probably a silly question, sorry, but: why does the JUMP_P part only
include pseudo registers? The other calculations (here and elsewhere)
seem to handle both hard and pseudo registers.
No, that is not a silly question. I think it is a potential bug
although it is probability is very small. Originally, splitting and
inheritance were done only on pseudos. After that, splitting hard
registers were added and that code was not changed. Thanks, Richard. I
removed this condition.
Post by Richard Sandiford
+/* Do inheritance/split transformations in EBB starting with HEAD and
+ finishing on TAIL. We process EBB insns in the reverse order.
+ Return true if we did any inheritance/split transformation in the
+ EBB.
+
+ We should avoid excessive splitting which results in worse code
+ because of inaccurate cost calculations for spilling new split
+ pseudos in such case. To achieve this we do splitting only if
+ register pressure is high in given basic block and there reload
"...and there are reload"
Fixed.
Post by Richard Sandiford
+ pseudos requiring hard registers. We could do more register
+ pressure calculations at any given program point to avoid necessary
+ splitting even more but it is to expensive and the current approach
+ is well enough. */
"works well enough".
Fixed.
Post by Richard Sandiford
+ change_p = false;
+ curr_usage_insns_check++;
+ reloads_num = calls_num = 0;
+ /* Remember: we can remove the current insn. */
+ bitmap_clear (&check_only_regs);
+ last_processed_bb = NULL;
I couldn't tell which part of the code the comment is referring to.
Maybe left over?
It is a leftover. I removed it.
Post by Richard Sandiford
+ after_p = (last_insn != NULL_RTX && ! JUMP_P (last_insn)
+ && (! CALL_P (last_insn)
+ || (find_reg_note (last_insn,
+ REG_NORETURN, NULL) == NULL_RTX
+ && ((next_insn
+ = next_nonnote_nondebug_insn (last_insn))
+ == NULL_RTX
+ || GET_CODE (next_insn) != BARRIER))));
Genuine question, but: when are the last four lines needed? The condition
that they're testing for sounds like a noreturn call.
Yes, probably you are right. I also can not imagine such situation. I
am removing them.
Post by Richard Sandiford
+ if (src_regno < lra_constraint_new_regno_start
+ && src_regno >= FIRST_PSEUDO_REGISTER
+ && reg_renumber[src_regno] < 0
+ && dst_regno >= lra_constraint_new_regno_start
+ && (cl = lra_get_allocno_class (dst_regno)) != NO_REGS)
+ {
+ /* 'reload_pseudo <- original_pseudo'. */
+ reloads_num++;
+ succ_p = false;
+ if (usage_insns[src_regno].check == curr_usage_insns_check
+ && (next_usage_insns = usage_insns[src_regno].insns) != NULL_RTX)
+ succ_p = inherit_reload_reg (false,
+ bitmap_bit_p (&lra_matched_pseudos,
+ dst_regno),
+ src_regno, cl,
+ curr_insn, next_usage_insns);
+ if (succ_p)
+ change_p = true;
+ else
+ {
+ usage_insns[src_regno].check = curr_usage_insns_check;
+ usage_insns[src_regno].insns = curr_insn;
+ usage_insns[src_regno].reloads_num = reloads_num;
+ usage_insns[src_regno].calls_num = calls_num;
+ usage_insns[src_regno].after_p = false;
+ }
Looks like this and other places could use the add_next_usage_insn
helper function.
Fixed.
Post by Richard Sandiford
+ if (cl != NO_REGS
+ && hard_reg_set_subset_p (reg_class_contents[cl],
+ live_hard_regs))
+ IOR_HARD_REG_SET (potential_reload_hard_regs,
+ reg_class_contents[cl]);
Redundant "cl != NO_REGS" check. (Was a bit confused by that at first.)
Fixed.
Post by Richard Sandiford
I don't understand the way potential_reload_hard_regs is set up.
Why does it only include reload pseudos involved in moves of the form
"reload_pseudo <- original_pseudo" and "original_pseudo <- reload_pseudo",
but include those reloads regardless of whether inheritance is possible?
I wondered whether it might be deliberately selective in order to speed
up LRA, but we walk all the registers in an insn regardless.
Same for reloads_num.
It is just simple heuristics. They can be changed. Adding new
heuristics (more complicated ones) requires a lot of experiments. This
work can be done lately. LRA as now is not frozen. Its development
will be continued.
Post by Richard Sandiford
+ if (cl != NO_REGS
+ && hard_reg_set_subset_p (reg_class_contents[cl],
+ live_hard_regs))
+ IOR_HARD_REG_SET (potential_reload_hard_regs,
+ reg_class_contents[cl]);
Same comment as for the previous block.
Fixed.
Post by Richard Sandiford
+ if (reg_renumber[dst_regno] < 0
+ || (reg->type == OP_OUT && ! reg->subreg_p))
+ /* Invalidate. */
+ usage_insns[dst_regno].check = 0;
Could you explain this condition a bit more? Why does reg_renumber
affect things?
I added a comment.
Post by Richard Sandiford
+/* This value affects EBB forming. If probability of edge from EBB to
+ a BB is not greater than the following value, we don't add the BB
+ to EBB. */
+#define EBB_PROBABILITY_CUTOFF (REG_BR_PROB_BASE / 2)
It looks like schedule_ebbs uses a higher default cutoff for FDO.
Would the same distinction be useful here?
Maybe schedule_ebbs-like params would be good here too.
I am thinking about adding several parametersfor IRA and LRA. But I'd
like to add for the next release.
Post by Richard Sandiford
+ bitmap_and (&temp_bitmap_head, removed_pseudos, live);
+ EXECUTE_IF_SET_IN_BITMAP (&temp_bitmap_head, 0, regno, bi)
This isn't going to have much effect on compile time, but
EXECUTE_IF_AND_IN_BITMAP avoids the need for a temporary bitmap.
Fixed.
Post by Richard Sandiford
+/* Remove inheritance/split pseudos which are in REMOVE_PSEUDOS and
+ return true if we did any change. The undo transformations for
+ inheritance looks like
+ i <- i2
+ p <- i => p <- i2
+ or removing
+ p <- i, i <- p, and i <- i3
+ where p is original pseudo from which inheritance pseudo i was
+ created, i and i3 are removed inheritance pseudos, i2 is another
+ not removed inheritance pseudo. All split pseudos or other
+ occurrences of removed inheritance pseudos are changed on the
+ corresponding original pseudos. */
+static bool
+remove_inheritance_pseudos (bitmap remove_pseudos)
+{
+ basic_block bb;
+ int regno, sregno, prev_sregno, dregno, restore_regno;
+ rtx set, prev_set, prev_insn;
+ bool change_p, done_p;
+
+ change_p = ! bitmap_empty_p (remove_pseudos);
I wondered from the comment why we couldn't just return straight away
for the empty set, but it looks like the function also schedules a
constraints pass for instructions that keep their inheritance or
split pseudos. Is that right? Might be worth mentioning that
in the function comment if so.
Yes, that is right. I've added a comment.
Post by Richard Sandiford
+ else if (bitmap_bit_p (remove_pseudos, sregno)
+ && bitmap_bit_p (&lra_inheritance_pseudos, sregno))
+ {
+ inherit_or_split_pseudo1 <- inherit_or_split_pseudo2
+ original_pseudo <- inherit_or_split_pseudo1
+ where the 2nd insn is the current insn and
+ inherit_or_split_pseudo2 is not removed. If it is found,
+ original_pseudo1 <- inherit_or_split_pseudo2. */
s/original_pseudo1/original_pseudo/ I think (we don't change the destination).
Yes, it is a typo. Fixed.
Post by Richard Sandiford
+ for (prev_insn = PREV_INSN (curr_insn);
+ prev_insn != NULL_RTX && ! NONDEBUG_INSN_P (prev_insn);
+ prev_insn = PREV_INSN (prev_insn))
+ ;
+ if (prev_insn != NULL_RTX && BLOCK_FOR_INSN (prev_insn) == bb
+ && (prev_set = single_set (prev_insn)) != NULL_RTX
+ /* There should be no subregs in insn we are
+ searching because only the original reg might
+ be in subreg when we changed the mode of
+ load/store for splitting. */
+ && REG_P (SET_DEST (prev_set))
+ && REG_P (SET_SRC (prev_set))
+ && (int) REGNO (SET_DEST (prev_set)) == sregno
+ && ((prev_sregno = REGNO (SET_SRC (prev_set)))
+ >= FIRST_PSEUDO_REGISTER)
+ && (lra_reg_info[sregno].restore_regno
+ == lra_reg_info[prev_sregno].restore_regno)
+ && ! bitmap_bit_p (remove_pseudos, prev_sregno))
I'm sure the restore_regno comparison near the end is correct,
but could you add a comment to explain it? The substitution
itself seems OK either way.
I added a comment.
Post by Richard Sandiford
+ struct lra_insn_reg *reg;
+ bool insn_change_p = false;
+
+ curr_id = lra_get_insn_recog_data (curr_insn);
+ for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+ if ((regno = reg->regno) >= lra_constraint_new_regno_start
+ && lra_reg_info[regno].restore_regno >= 0)
Is the first part of the comparison needed? Most other places don't check,
so it looked at first glance like there was something special here.
I think it is a leftover from older implementation where I worked
differently with restore_reg. I removed.
Post by Richard Sandiford
+ {
+ if (change_p && bitmap_bit_p (remove_pseudos, regno))
+ {
+ restore_regno = lra_reg_info[regno].restore_regno;
+ substitute_pseudo (&curr_insn, regno,
+ regno_reg_rtx[restore_regno]);
+ insn_change_p = true;
+ }
+ else if (NONDEBUG_INSN_P (curr_insn))
+ {
+ lra_push_insn_and_update_insn_regno_info (curr_insn);
+ lra_set_used_insn_alternative_by_uid
+ (INSN_UID (curr_insn), -1);
+ }
+ }
+ if (insn_change_p)
+ {
+ lra_update_insn_regno_info (curr_insn);
+ if (lra_dump_file != NULL)
+ {
+ fprintf (lra_dump_file, " Restore original insn:\n");
+ debug_rtl_slim (lra_dump_file,
+ curr_insn, curr_insn, -1, 0);
+ }
+ }
AIUI we could have a partial restore, keeping some registers but
restoring others. Is that right? The dump entry made it sounds
like a full restore.
Yes, the dump is a bit misleading. It might be a partial restore.
Post by Richard Sandiford
struct lra_insn_reg *reg;
bool restored_regs_p = false;
bool kept_regs_p = false;
curr_id = lra_get_insn_recog_data (curr_insn);
for (reg = curr_id->regs; reg != NULL; reg = reg->next)
{
regno = reg->regno;
restore_regno = lra_reg_info[regno].restore_regno;
if (restore_regno >= 0)
{
if (change_p && bitmap_bit_p (remove_pseudos, regno))
{
substitute_pseudo (&curr_insn, regno,
regno_reg_rtx[restore_regno]);
restored_regs_p = true;
}
else
kept_regs_p = true;
}
}
if (NONDEBUG_INSN_P (curr_insn) && kept_regs_p)
{
/* The instruction has changed since the previous
constraints pass. */
lra_push_insn_and_update_insn_regno_info (curr_insn);
lra_set_used_insn_alternative_by_uid
(INSN_UID (curr_insn), -1);
}
else if (restored_regs_p)
/* The instruction has been restored to the form that
it had during the previous constraints pass. */
lra_update_insn_regno_info (curr_insn);
if (restored_regs_p && lra_dump_file != NULL)
{
fprintf (lra_dump_file,
" Insn after restoring regs:\n");
debug_rtl_slim (lra_dump_file, curr_insn, curr_insn, -1, 0);
}
(if correct) might make the partial case clearer, but that's personal
preference, so please feel free to ignore, chop or change.
I used your variant.
Post by Richard Sandiford
Also, is regno_reg_rtx[restore_regno] always correct when restoring
registers? I thought restore_regno could be a hard register and that
the hard register might not necessarily be used in the same mode as
the regno_reg_rtx[] entry.
Right. I fixed it.
Post by Richard Sandiford
That just leaves lra.h, lra-int.h and lra.c itself. I'm hoping to have
read through those by the middle of next week, but any comments about them
will probably just be banal stuff (even more than my comments so far :-))
so I deliberately left them to last.
Richard, thanks again for your invaluable feedback which made and will
make LRA better.
Richard Sandiford
2012-10-17 11:24:31 UTC
Permalink
Thanks for all the updates.
Post by Vladimir Makarov
Post by Richard Sandiford
+ /* index * scale + disp => new base + index * scale */
+ enum reg_class cl = base_reg_class (mode, as, SCRATCH, SCRATCH);
+
+ lra_assert (INDEX_REG_CLASS != NO_REGS);
+ new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, "disp");
+ lra_assert (GET_CODE (*addr_loc) == PLUS);
+ lra_emit_move (new_reg, *ad.disp_loc);
+ if (CONSTANT_P (XEXP (*addr_loc, 1)))
+ XEXP (*addr_loc, 1) = XEXP (*addr_loc, 0);
+ XEXP (*addr_loc, 0) = new_reg;
The canonical form is (plus (mult ...) (reg)) rather than
(plus (reg) (mult ...)), but it looks like we create the latter.
It might happen because equiv substitution in LRA.
Post by Richard Sandiford
+ /* Some targets like ARM, accept address operands in
+ specific order -- try exchange them if necessary. */
+ if (! valid_address_p (mode, *addr_loc, as))
+ {
+ exchange_plus_ops (*addr_loc);
+ if (! valid_address_p (mode, *addr_loc, as))
+ exchange_plus_ops (*addr_loc);
+ }
but I think we should try the canonical form first. And I'd prefer it
if we didn't try the other form at all, especially in 4.8. It isn't
really the backend's job to reject non-canonical rtl. This might well
be another case where some targets need a (hopefully small) tweak in
order to play by the rules.
Also, I suppose this section of code feeds back to my question on
Wednesday about the distinction that LRA seems to make between the
(plus (reg X1) (const_int Y1))
(plus (reg X2) (symbol_ref Y2))
It looked like extract_address_regs classified X1 as a base register and
X2 as an index register. The difference between the two constants has
no run-time significance though, and I think we should handle both X1
and X2 as base registers (as I think reload does).
I think the path above would then be specific to scaled indices.
In the original address the "complex" index must come first and the
displacement second. In the modified address, the index would stay
first and the new base register would be second. More below.
As I wrote above the problem is also in that equiv substitution can
create non-canonical forms.
Right. Just in case there's a misunderstanding: I'm not complaining
about these routines internally using forms that are noncanonical
(which could happen because of equiv substitution, like you say).
I just think that what we eventually try to validate should be canonical.
In a way it's similar to how the simplify-rtx.c routines work.

If there are targets that only accept noncanonical rtl (which is after
all just a specific type of invalid rtl), they need to be fixed.
Post by Vladimir Makarov
Post by Richard Sandiford
+ /* base + scale * index + disp => new base + scale * index */
+ new_reg = base_plus_disp_to_reg (mode, as, &ad);
+ *addr_loc = gen_rtx_PLUS (Pmode, new_reg, *ad.index_loc);
+ if (! valid_address_p (mode, *addr_loc, as))
+ {
+ /* Some targets like ARM, accept address operands in
+ specific order -- try exchange them if necessary. */
+ exchange_plus_ops (*addr_loc);
+ if (! valid_address_p (mode, *addr_loc, as))
+ exchange_plus_ops (*addr_loc);
+ }
Same comment as above about canonical rtl. Here we can have two
registers -- in which case the base should come first -- or a more
complex index -- in which case the index should come first.
We should be able to pass both rtxes to simplify_gen_binary (PLUS, ...),
with the operands in either order, and let it take care of the details.
Using simplify_gen_binary would help with the earlier index+disp case too.
o have a code for dealing with non-canonical forms (equiv substitution,
target stupidity)
o always support canonical forms and require them from targets.
I decided to use the 1st variant but I am reconsidering it. I'll try to
fix before inclusion. But I am not sure I have time for this. All
these changes makes LRA unstable. In fact, I've just found that changes
I already made so far resulted in 2 SPEC2000 tests broken although GCC
testsuite and bootstrap looks good.
OK. I'm happy to try fixing the noncanonical thing.
Post by Vladimir Makarov
Post by Richard Sandiford
+ /* If this is post-increment, first copy the location to the reload reg. */
+ if (post && real_in != result)
+ emit_insn (gen_move_insn (result, real_in));
Nit, but real_in != result can never be true AIUI, and I was confused how
the code could be correct in that case. Maybe just remove it, or make
it an assert?
real_in = in == value ? incloc : in;
...
if (cond)
result = incloc;
else
result = ...
if (post && real_in != result)
So it is true if in==value && cond
Sorry, what I meant was that cond is "! post && REG_P (incloc)":

if (! post && REG_P (incloc))
result = incloc;
else
result = lra_create_new_reg (GET_MODE (value), value, new_rclass,
"INC/DEC result");

so it can never be true in the "post" case quoted above.
Post by Vladimir Makarov
Post by Richard Sandiford
+ dest_reg = SET_DEST (set);
+ /* The equivalence pseudo could be set up as SUBREG in a
+ case when it is a call restore insn in a mode
+ different from the pseudo mode. */
+ if (GET_CODE (dest_reg) == SUBREG)
+ dest_reg = SUBREG_REG (dest_reg);
+ if ((REG_P (dest_reg)
+ && (x = get_equiv_substitution (dest_reg)) != dest_reg
+ /* Remove insns which set up a pseudo whose value
+ can not be changed. Such insns might be not in
+ init_insns because we don't update equiv data
+ during insn transformations.
+
+ As an example, let suppose that a pseudo got
+ hard register and on the 1st pass was not
+ changed to equivalent constant. We generate an
+ additional insn setting up the pseudo because of
+ secondary memory movement. Then the pseudo is
+ spilled and we use the equiv constant. In this
+ case we should remove the additional insn and
+ this insn is not init_insns list. */
+ && (! MEM_P (x) || MEM_READONLY_P (x)
+ || in_list_p (curr_insn,
+ ira_reg_equiv
+ [REGNO (dest_reg)].init_insns)))
This is probably a stupid question, sorry, but when do we ever want
if ((REG_P (dest_reg)
&& (x = get_equiv_substitution (dest_reg)) != dest_reg)
Equivalence can be memory location. It means you can assign many
different values to the location. Removing them would generate a wrong
code. For example, if you use your simple variant of code, you will
have > 100 test failures of GCC testsuite.
OK :-)
Post by Vladimir Makarov
Post by Richard Sandiford
+/* Return true if we need a split for hard register REGNO or pseudo
+ REGNO which was assigned to a hard register.
+ POTENTIAL_RELOAD_HARD_REGS contains hard registers which might be
+ used for reloads since the EBB end. It is an approximation of the
+ used hard registers in the split range. The exact value would
+ require expensive calculations. If we were aggressive with
+ splitting because of the approximation, the split pseudo will save
+ the same hard register assignment and will be removed in the undo
+ pass. We still need the approximation because too aggressive
+ splitting would result in too inaccurate cost calculation in the
+ assignment pass because of too many generated moves which will be
+ probably removed in the undo pass. */
+static inline bool
+need_for_split_p (HARD_REG_SET potential_reload_hard_regs, int regno)
+{
+ int hard_regno = regno < FIRST_PSEUDO_REGISTER ? regno : reg_renumber[regno];
+
+ lra_assert (hard_regno >= 0);
+ return ((TEST_HARD_REG_BIT (potential_reload_hard_regs, hard_regno)
+ && ! TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno)
+ && (usage_insns[regno].reloads_num
+ + (regno < FIRST_PSEUDO_REGISTER ? 0 : 2) < reloads_num)
+ && ((regno < FIRST_PSEUDO_REGISTER
+ && ! bitmap_bit_p (&ebb_global_regs, regno))
+ || (regno >= FIRST_PSEUDO_REGISTER
+ && lra_reg_info[regno].nrefs > 3
+ && bitmap_bit_p (&ebb_global_regs, regno))))
+ || (regno >= FIRST_PSEUDO_REGISTER && need_for_call_save_p (regno)));
+}
Could you add more commentary about the thinking behind this particular
choice of heuristic? E.g. I wasn't sure what the reloads_num check did,
or why we only split hard registers that are local to the EBB and only
split pseudos that aren't.
The 2 and 3 numbers seemed a bit magic too. I suppose the 2 has
something to do with "one save and one restore", but I wasn't sure
why we applied it only for pseudos. (AIUI that arm of the check
deals with "genuine" split pseudos rather than call saves & restores.)
Still, it says a lot for the high quality of LRA that, out of all the
1000s of lines of code I've read so far, this is the only part that
didn't seem to have an intuitive justification.
Yes, right. I checked many parameters and finally picked out the above
ones. What I found is that aggressive and even moderate splitting
usually result in worse code. Most splitting is undone and if we do
splitting aggressively we generate a lot of insns which will be removed
but their costs are taken into account during assignment pass.
Inheritance has quite bigger chance to successful. One reason why
splitting is undone is that spilling + inheritance of short living
pseudos is a substitution of splitting in some way. Therefore I do
splitting for long living pseudos. That means non local pseudos.
I added the following comments.
return ((TEST_HARD_REG_BIT (potential_reload_hard_regs, hard_regno)
&& ! TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno)
/* We need at least 2 reloads to make pseudo splitting
profitable. We should provide hard regno splitting in
any case to solve 1st insn scheduling problem when
moving hard register definition up might result in
impossibility to find hard register for reload pseudo of
small register class. */
&& (usage_insns[regno].reloads_num
+ (regno < FIRST_PSEUDO_REGISTER ? 0 : 2) < reloads_num)
&& (regno < FIRST_PSEUDO_REGISTER
/* For short living pseudos, spilling + inheritance can
be considered a substitution for splitting.
Therefore we do not splitting for local pseudos. It
decreases also aggressiveness of splitting. The
minimal number of references is chosen taking into
account that for 2 references splitting has no sense
as we can just spill the pseudo. */
|| (regno >= FIRST_PSEUDO_REGISTER
&& lra_reg_info[regno].nrefs > 3
&& bitmap_bit_p (&ebb_global_regs, regno))))
I removed checking that hard_regno is local. It was a parameter to
decrease aggressiveness of hard register splitting but I realized that
it contradicts the reason for splitting to solve 1st insn scheduling
problem because regional scheduling can move hard register definition
beyond BB.
Thanks, this looks much better to me FWIW.

Richard
Vladimir Makarov
2012-10-19 05:14:37 UTC
Permalink
Post by Richard Sandiford
Thanks for all the updates.
Post by Vladimir Makarov
Post by Richard Sandiford
+ /* index * scale + disp => new base + index * scale */
+ enum reg_class cl = base_reg_class (mode, as, SCRATCH, SCRATCH);
+
+ lra_assert (INDEX_REG_CLASS != NO_REGS);
+ new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, "disp");
+ lra_assert (GET_CODE (*addr_loc) == PLUS);
+ lra_emit_move (new_reg, *ad.disp_loc);
+ if (CONSTANT_P (XEXP (*addr_loc, 1)))
+ XEXP (*addr_loc, 1) = XEXP (*addr_loc, 0);
+ XEXP (*addr_loc, 0) = new_reg;
The canonical form is (plus (mult ...) (reg)) rather than
(plus (reg) (mult ...)), but it looks like we create the latter.
It might happen because equiv substitution in LRA.
Post by Richard Sandiford
+ /* Some targets like ARM, accept address operands in
+ specific order -- try exchange them if necessary. */
+ if (! valid_address_p (mode, *addr_loc, as))
+ {
+ exchange_plus_ops (*addr_loc);
+ if (! valid_address_p (mode, *addr_loc, as))
+ exchange_plus_ops (*addr_loc);
+ }
but I think we should try the canonical form first. And I'd prefer it
if we didn't try the other form at all, especially in 4.8. It isn't
really the backend's job to reject non-canonical rtl. This might well
be another case where some targets need a (hopefully small) tweak in
order to play by the rules.
Also, I suppose this section of code feeds back to my question on
Wednesday about the distinction that LRA seems to make between the
(plus (reg X1) (const_int Y1))
(plus (reg X2) (symbol_ref Y2))
It looked like extract_address_regs classified X1 as a base register and
X2 as an index register. The difference between the two constants has
no run-time significance though, and I think we should handle both X1
and X2 as base registers (as I think reload does).
I think the path above would then be specific to scaled indices.
In the original address the "complex" index must come first and the
displacement second. In the modified address, the index would stay
first and the new base register would be second. More below.
As I wrote above the problem is also in that equiv substitution can
create non-canonical forms.
Right. Just in case there's a misunderstanding: I'm not complaining
about these routines internally using forms that are noncanonical
(which could happen because of equiv substitution, like you say).
I just think that what we eventually try to validate should be canonical.
In a way it's similar to how the simplify-rtx.c routines work.
If there are targets that only accept noncanonical rtl (which is after
all just a specific type of invalid rtl), they need to be fixed.
Agree. In order not to forget to fix targets I am removing operand
exchange.
Post by Richard Sandiford
Post by Vladimir Makarov
Post by Richard Sandiford
+ /* base + scale * index + disp => new base + scale * index */
+ new_reg = base_plus_disp_to_reg (mode, as, &ad);
+ *addr_loc = gen_rtx_PLUS (Pmode, new_reg, *ad.index_loc);
+ if (! valid_address_p (mode, *addr_loc, as))
+ {
+ /* Some targets like ARM, accept address operands in
+ specific order -- try exchange them if necessary. */
+ exchange_plus_ops (*addr_loc);
+ if (! valid_address_p (mode, *addr_loc, as))
+ exchange_plus_ops (*addr_loc);
+ }
Same comment as above about canonical rtl. Here we can have two
registers -- in which case the base should come first -- or a more
complex index -- in which case the index should come first.
We should be able to pass both rtxes to simplify_gen_binary (PLUS, ...),
with the operands in either order, and let it take care of the details.
Using simplify_gen_binary would help with the earlier index+disp case too.
o have a code for dealing with non-canonical forms (equiv substitution,
target stupidity)
o always support canonical forms and require them from targets.
I decided to use the 1st variant but I am reconsidering it. I'll try to
fix before inclusion. But I am not sure I have time for this. All
these changes makes LRA unstable. In fact, I've just found that changes
I already made so far resulted in 2 SPEC2000 tests broken although GCC
testsuite and bootstrap looks good.
OK. I'm happy to try fixing the noncanonical thing.
Post by Vladimir Makarov
Post by Richard Sandiford
+ /* If this is post-increment, first copy the location to the reload reg. */
+ if (post && real_in != result)
+ emit_insn (gen_move_insn (result, real_in));
Nit, but real_in != result can never be true AIUI, and I was confused how
the code could be correct in that case. Maybe just remove it, or make
it an assert?
real_in = in == value ? incloc : in;
...
if (cond)
result = incloc;
else
result = ...
if (post && real_in != result)
So it is true if in==value && cond
if (! post && REG_P (incloc))
result = incloc;
else
result = lra_create_new_reg (GET_MODE (value), value, new_rclass,
"INC/DEC result");
so it can never be true in the "post" case quoted above.
Fixed.
Richard Sandiford
2012-10-15 16:49:40 UTC
Permalink
Hi Vlad,

Some comments about the rest of LRA. Nothing major here...
+/* Info about register in an insn. */
+struct lra_insn_reg
+{
+ /* The biggest mode through which the insn refers to the register
+ (remember the register can be accessed through a subreg in the
+ insn). */
+ ENUM_BITFIELD(machine_mode) biggest_mode : 16;
AFAICT, this is actually always the mode of a specific reference,
and if there are references to the same register in different modes,
those references get their own lra_insn_regs. "mode" might be better
than "biggest_mode" if so.
+/* Static part (common info for insns with the same ICODE) of LRA
+ internal insn info. It exists in at most one exemplar for each
+ non-negative ICODE. Warning: if the structure definition is
+ changed, the initializer for debug_insn_static_data in lra.c should
+ be changed too. */
Probably worth saying (before the warning) that there is also
one structure for each asm.
+/* LRA internal info about an insn (LRA internal insn
+ representation). */
+struct lra_insn_recog_data
+{
+ int icode; /* The insn code. */
+ rtx insn; /* The insn itself. */
+ /* Common data for insns with the same ICODE. */
+ struct lra_static_insn_data *insn_static_data;
Maybe worth mentioning asms here too.
+ /* Two arrays of size correspondingly equal to the operand and the
+ duplication numbers: */
+ rtx **operand_loc; /* The operand locations, NULL if no operands. */
+ rtx **dup_loc; /* The dup locations, NULL if no dups. */
+ /* Number of hard registers implicitly used in given call insn. The
+ value can be NULL or points to array of the hard register numbers
+ ending with a negative value. */
+ int *arg_hard_regs;
+#ifdef HAVE_ATTR_enabled
+ /* Alternative enabled for the insn. NULL for debug insns. */
+ bool *alternative_enabled_p;
+#endif
+ /* The alternative should be used for the insn, -1 if invalid, or we
+ should try to use any alternative, or the insn is a debug
+ insn. */
+ int used_insn_alternative;
+ struct lra_insn_reg *regs; /* Always NULL for a debug insn. */
Comments consistently above the field.
+extern void lra_expand_reg_info (void);
This doesn't exist any more.
+extern int lra_constraint_new_insn_uid_start;
Just saying in case: this seems to be write-only, with lra-constraints.c
instead using a static variable to track the uid start.

I realise you might want to keep it anyway for consistency with
lra_constraint_new_regno_start, or for debugging.
+extern rtx lra_secondary_memory[NUM_MACHINE_MODES];
This doesn't exist any more.
+/* lra-saves.c: */
+
+extern bool lra_save_restore (void);
Same for this file & function.
+/* The function returns TRUE if at least one hard register from ones
+ starting with HARD_REGNO and containing value of MODE are in set
+ HARD_REGSET. */
+static inline bool
+lra_hard_reg_set_intersection_p (int hard_regno, enum machine_mode mode,
+ HARD_REG_SET hard_regset)
+{
+ int i;
+
+ lra_assert (hard_regno >= 0);
+ for (i = hard_regno_nregs[hard_regno][mode] - 1; i >= 0; i--)
+ if (TEST_HARD_REG_BIT (hard_regset, hard_regno + i))
+ return true;
+ return false;
+}
This is the same as overlaps_hard_reg_set_p.
+/* Return hard regno and offset of (sub-)register X through arguments
+ HARD_REGNO and OFFSET. If it is not (sub-)register or the hard
+ register is unknown, then return -1 and 0 correspondingly. */
The function seems to return -1 for both.
+/* Add hard registers starting with HARD_REGNO and holding value of
+ MODE to the set S. */
+static inline void
+lra_add_hard_reg_set (int hard_regno, enum machine_mode mode, HARD_REG_SET *s)
+{
+ int i;
+
+ for (i = hard_regno_nregs[hard_regno][mode] - 1; i >= 0; i--)
+ SET_HARD_REG_BIT (*s, hard_regno + i);
+}
This is add_to_hard_reg_set.
+
+ ---------------------
+ | Undo inheritance | --------------- ---------------
+ | for spilled pseudos)| | Memory-memory | | New (and old) |
+ | and splits (for |<----| move coalesce |<-----| pseudos |
+ | pseudos got the | --------------- | assignment |
+ Start | same hard regs) | ---------------
+ | --------------------- ^
+ V | ---------------- |
+ ----------- V | Update virtual | |
+| Remove |----> ------------>| register | |
+| scratches | ^ | displacements | |
+ ----------- | ---------------- |
+ | | |
+ | V New |
+ ---------------- No ------------ pseudos -------------------
+ | Spilled pseudo | change |Constraints:| or insns | Inheritance/split |
+ | to memory |<-------| RTL |--------->| transformations |
+ | substitution | | transfor- | | in EBB scope |
+ ---------------- | mations | -------------------
+ | ------------
+ V
+ -------------------------
+ | Hard regs substitution, |
+ | devirtalization, and |------> Finish
+ | restoring scratches got |
+ | memory |
+ -------------------------
This is a great diagram, thanks.
+/* Create and return a new reg from register FROM corresponding to
+ machine description operand of mode MD_MODE. Initialize its
+ register class to RCLASS. Print message about assigning class
+ RCLASS containing new register name TITLE unless it is NULL. The
+ created register will have unique held value. */
+rtx
+lra_create_new_reg_with_unique_value (enum machine_mode md_mode, rtx original,
+ enum reg_class rclass, const char *title)
Comment says FROM, but parameter is called ORIGINAL. The code copes
with both null and non-register ORIGINALs, which aren't mentinoed in
the comment.
+/* Target checks operands through operand predicates to recognize an
+ insn. We should have a special precaution to generate add insns
+ which are frequent results of elimination.
+
+ Emit insns for x = y + z. X can be used to store intermediate
+ values and should be not in Y and Z when we use x to store an
+ intermediate value. */
I think this should say what Y and Z are allowed to be, since it's
more than just registers and constants.
+/* Map INSN_UID -> the operand alternative data (NULL if unknown). We
+ assume that this data is valid until register info is changed
+ because classes in the data can be changed. */
+struct operand_alternative *op_alt_data[LAST_INSN_CODE];
In that case I think it should be in a target_globals structure,
a bit like target_ira.
+ for (curr = list; curr != NULL; curr = curr->next)
+ if (curr->regno == regno)
+ break;
+ if (curr == NULL || curr->subreg_p != subreg_p
+ || curr->biggest_mode != mode)
+ {
+ /* This is a new hard regno or the info can not be
+ integrated into the found structure. */
+#ifdef STACK_REGS
+ early_clobber
+ = (early_clobber
+ /* This clobber is to inform popping floating
+ point stack only. */
+ && ! (FIRST_STACK_REG <= regno
+ && regno <= LAST_STACK_REG));
+#endif
+ list = new_insn_reg (regno, type, mode, subreg_p,
+ early_clobber, list);
+ }
+ else
+ {
+ if (curr->type != type)
+ curr->type = OP_INOUT;
+ if (curr->early_clobber != early_clobber)
+ curr->early_clobber = true;
+ }
OK, so this is probably only a technicality, but I think this should be:

for (curr = list; curr != NULL; curr = curr->next)
if (curr->regno == regno
&& curr->subreg_p == subreg_p
&& curr->biggest_mode == mode)
{
..reuse..;
return list;
}
..new entry..;
return list;
+ icode = INSN_CODE (insn);
+ if (icode < 0)
+ /* It might be a new simple insn which is not recognized yet. */
+ INSN_CODE (insn) = icode = recog (PATTERN (insn), insn, 0);
Any reason not to use recog_memoized here?
+ n = insn_static_data->n_operands;
+ if (n == 0)
+ locs = NULL;
+ else
+ {
+
+ locs = (rtx **) xmalloc (n * sizeof (rtx *));
+ memcpy (locs, recog_data.operand_loc, n * sizeof (rtx *));
+ }
Excess blank line after "else" (sorry!)
+ /* Some output operand can be recognized only from the context not
+ from the constraints which are empty in this case. Call insn may
+ contain a hard register in set destination with empty constraint
+ and extract_insn treats them as an input. */
+ for (i = 0; i < insn_static_data->n_operands; i++)
+ {
+ int j;
+ rtx pat, set;
+ struct lra_operand_data *operand = &insn_static_data->operand[i];
+
+ /* ??? Should we treat 'X' the same way. It looks to me that
+ 'X' means anything and empty constraint means we do not
+ care. */
FWIW, I think any X output operand has to be "=X" or "+X"; just "X"
would be as wrong as "r". genrecog is supposed to complain about that
for insns, and parse_output_constraint for asms.

So I agree the code is correct in just handling empty constraints.
+/* Update all the insn info about INSN. It is usually called when
+ something in the insn was changed. Return the udpated info. */
Typo: updated.
+ for (i = 0; i < reg_info_size; i++)
+ {
+ bitmap_initialize (&lra_reg_info[i].insn_bitmap, &reg_obstack);
+#ifdef STACK_REGS
+ lra_reg_info[i].no_stack_p = false;
+#endif
+ CLEAR_HARD_REG_SET (lra_reg_info[i].conflict_hard_regs);
+ lra_reg_info[i].preferred_hard_regno1 = -1;
+ lra_reg_info[i].preferred_hard_regno2 = -1;
+ lra_reg_info[i].preferred_hard_regno_profit1 = 0;
+ lra_reg_info[i].preferred_hard_regno_profit2 = 0;
+ lra_reg_info[i].live_ranges = NULL;
+ lra_reg_info[i].nrefs = lra_reg_info[i].freq = 0;
+ lra_reg_info[i].last_reload = 0;
+ lra_reg_info[i].restore_regno = -1;
+ lra_reg_info[i].val = get_new_reg_value ();
+ lra_reg_info[i].copies = NULL;
+ }
The same loop (with a different start index) appears in expand_reg_info.
It'd be nice to factor it out, so that there's only one place to update
if the structure is changed.
+ for (curr = data->regs; curr != NULL; curr = curr->next)
+ if (curr->regno == regno)
+ break;
+ if (curr->subreg_p != subreg_p || curr->biggest_mode != mode)
+ /* The info can not be integrated into the found
+ structure. */
+ data->regs = new_insn_reg (regno, type, mode, subreg_p,
+ early_clobber, data->regs);
+ else
+ {
+ if (curr->type != type)
+ curr->type = OP_INOUT;
+ if (curr->early_clobber != early_clobber)
+ curr->early_clobber = true;
+ }
+ lra_assert (curr != NULL);
+ }
Same loop comment as for collect_non_operand_hard_regs. Maybe another
factoring opportunity.
+ /* Some ports don't recognize the following addresses
+ as legitimate. Although they are legitimate if
+ they satisfies the constraints and will be checked
+ by insn constraints which we ignore here. */
+ && GET_CODE (XEXP (op, 0)) != UNSPEC
+ && GET_CODE (XEXP (op, 0)) != PRE_DEC
+ && GET_CODE (XEXP (op, 0)) != PRE_INC
+ && GET_CODE (XEXP (op, 0)) != POST_DEC
+ && GET_CODE (XEXP (op, 0)) != POST_INC
+ && GET_CODE (XEXP (op, 0)) != PRE_MODIFY
+ && GET_CODE (XEXP (op, 0)) != POST_MODIFY)
GET_RTX_CLASS (GET_CODE (XEXP (op, 0))) == RTX_AUTOINC
+/* Determine if the current function has an exception receiver block
+ that reaches the exit block via non-exceptional edges */
+static bool
+has_nonexceptional_receiver (void)
+{
+ edge e;
+ edge_iterator ei;
+ basic_block *tos, *worklist, bb;
+
+ /* If we're not optimizing, then just err on the safe side. */
+ if (!optimize)
+ return true;
+
+ /* First determine which blocks can reach exit via normal paths. */
+ tos = worklist = XNEWVEC (basic_block, n_basic_blocks + 1);
+
+ FOR_EACH_BB (bb)
+ bb->flags &= ~BB_REACHABLE;
+
+ /* Place the exit block on our worklist. */
+ EXIT_BLOCK_PTR->flags |= BB_REACHABLE;
+ *tos++ = EXIT_BLOCK_PTR;
+
+ /* Iterate: find everything reachable from what we've already seen. */
+ while (tos != worklist)
+ {
+ bb = *--tos;
+
+ FOR_EACH_EDGE (e, ei, bb->preds)
+ if (!(e->flags & EDGE_ABNORMAL))
+ {
+ basic_block src = e->src;
+
+ if (!(src->flags & BB_REACHABLE))
+ {
+ src->flags |= BB_REACHABLE;
+ *tos++ = src;
+ }
+ }
+ }
+ free (worklist);
+
+ /* Now see if there's a reachable block with an exceptional incoming
+ edge. */
+ FOR_EACH_BB (bb)
+ if (bb->flags & BB_REACHABLE)
+ FOR_EACH_EDGE (e, ei, bb->preds)
+ if (e->flags & EDGE_ABNORMAL)
+ return true;
+
+ /* No exceptional block reached exit unexceptionally. */
+ return false;
+}
Looks like we could just early out on the first loop and get rid
of the second.
+/* Remove all REG_DEAD and REG_UNUSED notes and regenerate REG_INC.
+ We change pseudos by hard registers without notification of DF and
+ that can make the notes obsolete. DF-infrastructure does not deal
+ with REG_INC notes -- so we should regenerate them here. */
These days passes are supposed to regenerate REG_DEAD and REG_UNUSED
notes if they need them, so that part might not be necessary.
The REG_INC bit is still needed though...
+/* Initialize LRA data once per function. */
+void
+lra_init (void)
+{
+ init_op_alt_data ();
+}
I think it's more like:

/* Initialize LRA whenever register-related information is changed. */


In summary, LRA looks really good to me FWIW. Thanks for all your hard work.

Getting rid of reload always seemed like a pipe dream, and if the only
known drawback of this replacement is that it takes a while on extreme
testcases, that's an amazing achievement. (Not to say compile time
isn't important, just that there were so many other hurdles to overcome.)

It looks like opinion has crystalised in favour of merging LRA for 4.8.
I hope that's what happens. I don't see that anything would be gained
by delaying it to 4.9. The code's not going to get any more testing on the
branch that it already has; whenever we merge, the stress test is always
going to be trunk.

Richard
Vladimir Makarov
2012-10-17 19:53:49 UTC
Permalink
Post by Richard Sandiford
Hi Vlad,
Some comments about the rest of LRA. Nothing major here...
+/* Info about register in an insn. */
+struct lra_insn_reg
+{
+ /* The biggest mode through which the insn refers to the register
+ (remember the register can be accessed through a subreg in the
+ insn). */
+ ENUM_BITFIELD(machine_mode) biggest_mode : 16;
AFAICT, this is actually always the mode of a specific reference,
and if there are references to the same register in different modes,
those references get their own lra_insn_regs. "mode" might be better
than "biggest_mode" if so.
I seems mode is also not good. I've just modified the comment to
reflect the fact that is just a reference.
Post by Richard Sandiford
+/* Static part (common info for insns with the same ICODE) of LRA
+ internal insn info. It exists in at most one exemplar for each
+ non-negative ICODE. Warning: if the structure definition is
+ changed, the initializer for debug_insn_static_data in lra.c should
+ be changed too. */
Probably worth saying (before the warning) that there is also
one structure for each asm.
Good point. I added a comment.
Post by Richard Sandiford
+/* LRA internal info about an insn (LRA internal insn
+ representation). */
+struct lra_insn_recog_data
+{
+ int icode; /* The insn code. */
+ rtx insn; /* The insn itself. */
+ /* Common data for insns with the same ICODE. */
+ struct lra_static_insn_data *insn_static_data;
Maybe worth mentioning asms here too.
Fixed.
Post by Richard Sandiford
+ /* Two arrays of size correspondingly equal to the operand and the
+ duplication numbers: */
+ rtx **operand_loc; /* The operand locations, NULL if no operands. */
+ rtx **dup_loc; /* The dup locations, NULL if no dups. */
+ /* Number of hard registers implicitly used in given call insn. The
+ value can be NULL or points to array of the hard register numbers
+ ending with a negative value. */
+ int *arg_hard_regs;
+#ifdef HAVE_ATTR_enabled
+ /* Alternative enabled for the insn. NULL for debug insns. */
+ bool *alternative_enabled_p;
+#endif
+ /* The alternative should be used for the insn, -1 if invalid, or we
+ should try to use any alternative, or the insn is a debug
+ insn. */
+ int used_insn_alternative;
+ struct lra_insn_reg *regs; /* Always NULL for a debug insn. */
Comments consistently above the field.
Fixed.
Post by Richard Sandiford
+extern void lra_expand_reg_info (void);
This doesn't exist any more.
Fixed.
Post by Richard Sandiford
+extern int lra_constraint_new_insn_uid_start;
Just saying in case: this seems to be write-only, with lra-constraints.c
instead using a static variable to track the uid start.
I realise you might want to keep it anyway for consistency with
lra_constraint_new_regno_start, or for debugging.
+extern rtx lra_secondary_memory[NUM_MACHINE_MODES];
This doesn't exist any more.
Removed. Thanks.
Post by Richard Sandiford
+/* lra-saves.c: */
+
+extern bool lra_save_restore (void);
Same for this file & function.
Removed.
Post by Richard Sandiford
+/* The function returns TRUE if at least one hard register from ones
+ starting with HARD_REGNO and containing value of MODE are in set
+ HARD_REGSET. */
+static inline bool
+lra_hard_reg_set_intersection_p (int hard_regno, enum machine_mode mode,
+ HARD_REG_SET hard_regset)
+{
+ int i;
+
+ lra_assert (hard_regno >= 0);
+ for (i = hard_regno_nregs[hard_regno][mode] - 1; i >= 0; i--)
+ if (TEST_HARD_REG_BIT (hard_regset, hard_regno + i))
+ return true;
+ return false;
+}
This is the same as overlaps_hard_reg_set_p.
I removed it and started to use the function overlaps_hard_reg_set_p.
Post by Richard Sandiford
+/* Return hard regno and offset of (sub-)register X through arguments
+ HARD_REGNO and OFFSET. If it is not (sub-)register or the hard
+ register is unknown, then return -1 and 0 correspondingly. */
The function seems to return -1 for both.
Fixed. It does not matter for the rest of code as offset is used only
when hard_regno >= 0.
Post by Richard Sandiford
+/* Add hard registers starting with HARD_REGNO and holding value of
+ MODE to the set S. */
+static inline void
+lra_add_hard_reg_set (int hard_regno, enum machine_mode mode, HARD_REG_SET *s)
+{
+ int i;
+
+ for (i = hard_regno_nregs[hard_regno][mode] - 1; i >= 0; i--)
+ SET_HARD_REG_BIT (*s, hard_regno + i);
+}
This is add_to_hard_reg_set.
Removed.
Post by Richard Sandiford
+
+ ---------------------
+ | Undo inheritance | --------------- ---------------
+ | for spilled pseudos)| | Memory-memory | | New (and old) |
+ | and splits (for |<----| move coalesce |<-----| pseudos |
+ | pseudos got the | --------------- | assignment |
+ Start | same hard regs) | ---------------
+ | --------------------- ^
+ V | ---------------- |
+ ----------- V | Update virtual | |
+| Remove |----> ------------>| register | |
+| scratches | ^ | displacements | |
+ ----------- | ---------------- |
+ | | |
+ | V New |
+ ---------------- No ------------ pseudos -------------------
+ | Spilled pseudo | change |Constraints:| or insns | Inheritance/split |
+ | to memory |<-------| RTL |--------->| transformations |
+ | substitution | | transfor- | | in EBB scope |
+ ---------------- | mations | -------------------
+ | ------------
+ V
+ -------------------------
+ | Hard regs substitution, |
+ | devirtalization, and |------> Finish
+ | restoring scratches got |
+ | memory |
+ -------------------------
This is a great diagram, thanks.
+/* Create and return a new reg from register FROM corresponding to
+ machine description operand of mode MD_MODE. Initialize its
+ register class to RCLASS. Print message about assigning class
+ RCLASS containing new register name TITLE unless it is NULL. The
+ created register will have unique held value. */
+rtx
+lra_create_new_reg_with_unique_value (enum machine_mode md_mode, rtx original,
+ enum reg_class rclass, const char *title)
Comment says FROM, but parameter is called ORIGINAL. The code copes
with both null and non-register ORIGINALs, which aren't mentinoed in
the comment.
Fixed.
Post by Richard Sandiford
+/* Target checks operands through operand predicates to recognize an
+ insn. We should have a special precaution to generate add insns
+ which are frequent results of elimination.
+
+ Emit insns for x = y + z. X can be used to store intermediate
+ values and should be not in Y and Z when we use x to store an
+ intermediate value. */
I think this should say what Y and Z are allowed to be, since it's
more than just registers and constants.
Fixed.
Post by Richard Sandiford
+/* Map INSN_UID -> the operand alternative data (NULL if unknown). We
+ assume that this data is valid until register info is changed
+ because classes in the data can be changed. */
+struct operand_alternative *op_alt_data[LAST_INSN_CODE];
In that case I think it should be in a target_globals structure,
a bit like target_ira.
Fixed.
Post by Richard Sandiford
+ for (curr = list; curr != NULL; curr = curr->next)
+ if (curr->regno == regno)
+ break;
+ if (curr == NULL || curr->subreg_p != subreg_p
+ || curr->biggest_mode != mode)
+ {
+ /* This is a new hard regno or the info can not be
+ integrated into the found structure. */
+#ifdef STACK_REGS
+ early_clobber
+ = (early_clobber
+ /* This clobber is to inform popping floating
+ point stack only. */
+ && ! (FIRST_STACK_REG <= regno
+ && regno <= LAST_STACK_REG));
+#endif
+ list = new_insn_reg (regno, type, mode, subreg_p,
+ early_clobber, list);
+ }
+ else
+ {
+ if (curr->type != type)
+ curr->type = OP_INOUT;
+ if (curr->early_clobber != early_clobber)
+ curr->early_clobber = true;
+ }
for (curr = list; curr != NULL; curr = curr->next)
if (curr->regno == regno
&& curr->subreg_p == subreg_p
&& curr->biggest_mode == mode)
{
..reuse..;
return list;
}
..new entry..;
return list;
Fixed by using your approach. It cannot be return because the return
originally is after a loop covering this code.
Post by Richard Sandiford
+ icode = INSN_CODE (insn);
+ if (icode < 0)
+ /* It might be a new simple insn which is not recognized yet. */
+ INSN_CODE (insn) = icode = recog (PATTERN (insn), insn, 0);
Any reason not to use recog_memoized here?
It simply will call recog always as icode < 0. But as it has simpler
interface, I've changed it.
Post by Richard Sandiford
+ n = insn_static_data->n_operands;
+ if (n == 0)
+ locs = NULL;
+ else
+ {
+
+ locs = (rtx **) xmalloc (n * sizeof (rtx *));
+ memcpy (locs, recog_data.operand_loc, n * sizeof (rtx *));
+ }
Excess blank line after "else" (sorry!)
It looks like it is already fixed.
Post by Richard Sandiford
+ /* Some output operand can be recognized only from the context not
+ from the constraints which are empty in this case. Call insn may
+ contain a hard register in set destination with empty constraint
+ and extract_insn treats them as an input. */
+ for (i = 0; i < insn_static_data->n_operands; i++)
+ {
+ int j;
+ rtx pat, set;
+ struct lra_operand_data *operand = &insn_static_data->operand[i];
+
+ /* ??? Should we treat 'X' the same way. It looks to me that
+ 'X' means anything and empty constraint means we do not
+ care. */
FWIW, I think any X output operand has to be "=X" or "+X"; just "X"
would be as wrong as "r". genrecog is supposed to complain about that
for insns, and parse_output_constraint for asms.
So I agree the code is correct in just handling empty constraints.
Ok.
Post by Richard Sandiford
+/* Update all the insn info about INSN. It is usually called when
+ something in the insn was changed. Return the udpated info. */
Typo: updated.
Fixed.
Post by Richard Sandiford
+ for (i = 0; i < reg_info_size; i++)
+ {
+ bitmap_initialize (&lra_reg_info[i].insn_bitmap, &reg_obstack);
+#ifdef STACK_REGS
+ lra_reg_info[i].no_stack_p = false;
+#endif
+ CLEAR_HARD_REG_SET (lra_reg_info[i].conflict_hard_regs);
+ lra_reg_info[i].preferred_hard_regno1 = -1;
+ lra_reg_info[i].preferred_hard_regno2 = -1;
+ lra_reg_info[i].preferred_hard_regno_profit1 = 0;
+ lra_reg_info[i].preferred_hard_regno_profit2 = 0;
+ lra_reg_info[i].live_ranges = NULL;
+ lra_reg_info[i].nrefs = lra_reg_info[i].freq = 0;
+ lra_reg_info[i].last_reload = 0;
+ lra_reg_info[i].restore_regno = -1;
+ lra_reg_info[i].val = get_new_reg_value ();
+ lra_reg_info[i].copies = NULL;
+ }
The same loop (with a different start index) appears in expand_reg_info.
It'd be nice to factor it out, so that there's only one place to update
if the structure is changed.
Fixed.
Post by Richard Sandiford
+ for (curr = data->regs; curr != NULL; curr = curr->next)
+ if (curr->regno == regno)
+ break;
+ if (curr->subreg_p != subreg_p || curr->biggest_mode != mode)
+ /* The info can not be integrated into the found
+ structure. */
+ data->regs = new_insn_reg (regno, type, mode, subreg_p,
+ early_clobber, data->regs);
+ else
+ {
+ if (curr->type != type)
+ curr->type = OP_INOUT;
+ if (curr->early_clobber != early_clobber)
+ curr->early_clobber = true;
+ }
+ lra_assert (curr != NULL);
+ }
Same loop comment as for collect_non_operand_hard_regs. Maybe another
factoring opportunity.
Fixed.
Post by Richard Sandiford
+ /* Some ports don't recognize the following addresses
+ as legitimate. Although they are legitimate if
+ they satisfies the constraints and will be checked
+ by insn constraints which we ignore here. */
+ && GET_CODE (XEXP (op, 0)) != UNSPEC
+ && GET_CODE (XEXP (op, 0)) != PRE_DEC
+ && GET_CODE (XEXP (op, 0)) != PRE_INC
+ && GET_CODE (XEXP (op, 0)) != POST_DEC
+ && GET_CODE (XEXP (op, 0)) != POST_INC
+ && GET_CODE (XEXP (op, 0)) != PRE_MODIFY
+ && GET_CODE (XEXP (op, 0)) != POST_MODIFY)
GET_RTX_CLASS (GET_CODE (XEXP (op, 0))) == RTX_AUTOINC
Fixed.
Post by Richard Sandiford
+/* Determine if the current function has an exception receiver block
+ that reaches the exit block via non-exceptional edges */
+static bool
+has_nonexceptional_receiver (void)
+{
+ edge e;
+ edge_iterator ei;
+ basic_block *tos, *worklist, bb;
+
+ /* If we're not optimizing, then just err on the safe side. */
+ if (!optimize)
+ return true;
+
+ /* First determine which blocks can reach exit via normal paths. */
+ tos = worklist = XNEWVEC (basic_block, n_basic_blocks + 1);
+
+ FOR_EACH_BB (bb)
+ bb->flags &= ~BB_REACHABLE;
+
+ /* Place the exit block on our worklist. */
+ EXIT_BLOCK_PTR->flags |= BB_REACHABLE;
+ *tos++ = EXIT_BLOCK_PTR;
+
+ /* Iterate: find everything reachable from what we've already seen. */
+ while (tos != worklist)
+ {
+ bb = *--tos;
+
+ FOR_EACH_EDGE (e, ei, bb->preds)
+ if (!(e->flags & EDGE_ABNORMAL))
+ {
+ basic_block src = e->src;
+
+ if (!(src->flags & BB_REACHABLE))
+ {
+ src->flags |= BB_REACHABLE;
+ *tos++ = src;
+ }
+ }
+ }
+ free (worklist);
+
+ /* Now see if there's a reachable block with an exceptional incoming
+ edge. */
+ FOR_EACH_BB (bb)
+ if (bb->flags & BB_REACHABLE)
+ FOR_EACH_EDGE (e, ei, bb->preds)
+ if (e->flags & EDGE_ABNORMAL)
+ return true;
+
+ /* No exceptional block reached exit unexceptionally. */
+ return false;
+}
Looks like we could just early out on the first loop and get rid
of the second.
It seems so. I fixed it. This is from reload. Code in reload could be
fixed too.
Post by Richard Sandiford
+/* Remove all REG_DEAD and REG_UNUSED notes and regenerate REG_INC.
+ We change pseudos by hard registers without notification of DF and
+ that can make the notes obsolete. DF-infrastructure does not deal
+ with REG_INC notes -- so we should regenerate them here. */
These days passes are supposed to regenerate REG_DEAD and REG_UNUSED
notes if they need them, so that part might not be necessary.
The REG_INC bit is still needed though...
Ok. I fixed it.
Post by Richard Sandiford
+/* Initialize LRA data once per function. */
+void
+lra_init (void)
+{
+ init_op_alt_data ();
+}
/* Initialize LRA whenever register-related information is changed. */
Fixed.
Post by Richard Sandiford
In summary, LRA looks really good to me FWIW. Thanks for all your hard work.
Getting rid of reload always seemed like a pipe dream, and if the only
known drawback of this replacement is that it takes a while on extreme
testcases, that's an amazing achievement. (Not to say compile time
isn't important, just that there were so many other hurdles to overcome.)
It is my second attempt. The first one was YARA project. I got a lot
of experience from this project and knowledge how not to do this.
LRA will be still a long lasting project. I don't think I found all
weirdness of reload just trying 8 targets (fixing one bug on one target
frequently resulted in new bugs on other targets so it required to do
frequently cardinal changes to the original code). Only after trying the
8 targets I got feeling that this approach could well. There are still
much more targets (and subtargets) to port to LRA. And I hope that
people will help me. I am very glad and appreciate that you did such
rigorous review because it means that you now understand LRA very well
(sometimes I think even better than me).
Post by Richard Sandiford
It looks like opinion has crystalised in favour of merging LRA for 4.8.
I hope that's what happens. I don't see that anything would be gained
by delaying it to 4.9. The code's not going to get any more testing on the
branch that it already has; whenever we merge, the stress test is always
going to be trunk.
Richard, thanks for you invaluable help.
Steven Bosscher
2012-10-17 20:12:01 UTC
Permalink
Post by Richard Sandiford
Getting rid of reload always seemed like a pipe dream, and if the only
known drawback of this replacement is that it takes a while on extreme
testcases, that's an amazing achievement. (Not to say compile time
isn't important, just that there were so many other hurdles to overcome.)
Just to be clear, LRA now does no worse from a compile time POV than,
say, tree-ssa-live. Most of the scalability problems have been
addressed.
It is my second attempt. The first one was YARA project. I got a lot of
experience from this project and knowledge how not to do this.
LRA will be still a long lasting project. I don't think I found all
weirdness of reload just trying 8 targets (fixing one bug on one target
frequently resulted in new bugs on other targets so it required to do
frequently cardinal changes to the original code). Only after trying the 8
targets I got feeling that this approach could well.
Hats off to you, Vlad, for your years of effort on improving GCC's RA!

Ciao!
Steven
Continue reading on narkive:
Loading...