Discussion:
[RFC][PATCH][MIPS] Patch to enable LRA for MIPS backend
(too old to reply)
Robert Suchanek
2014-03-29 01:07:40 UTC
Permalink
Hi All,

This patch enables LRA by default for MIPS. The classic reload is still
available and can be enabled via -mreload switch.

All regression are fixed, with one exception described below.

There was a necessary change in the LRA core as I believe there was a genuine
unhandled case in LRA when processing addresses. It is specific to MIPS16
as store/load[unsigned] halfword/byte instructions cannot access the stack pointer
directly. Potentially, it can affect other architectures if they have similar
limitation. One of the problems showed an RTL that contained $frame as the base register
(without any offset, simple move) but LRA temporarily eliminated it to $sp before
calling the target hook to validate the address.
The backend rejected it because of the mode and $sp. Then, LRA tried to emit base+disp
but ICEd because there never was any displacement. Another testcase, revealed offset
not being used and unnecessary 'add' instructions were inserted preventing the use of offsets.
Marking an insn with STACK_POINTER_REGNUM as valid was not an option as LRA would
generate an insn with $sp and fail during coherency check. The patch attempts to
reload $sp into a register and re-validate the address with offset (if there is one).
If this fails it sticks to the original plan inserting base+disp.

The generated code optimized for size is fairly acceptable. CSiBE shows a slight
advantage of LRA over the reload for MIPS16 with some minor regression for mips32*,
mips64*, on average less than 0.5%. The code size improvements are being investigated.

The patch has been tested on the following variations:
- cross-tested mips-mti-elf, mips-mti-linux-gnu (languages=c,c++):
{-mips32,-mips32r2}{-EL,-EB}{-mhard-float,-msoft-float}{-mno-mips16,-mips16}
-mips64r2 -mabi=n32 {-mhard-float,-msoft-float}
-mips64r2 -mabi=64 {-mhard-float,-msoft-float}
- bootstrapped and regtested x86_84-unknown-linux-gnu (all languages)

There are two known DejaGNU failures on mips64 with mabi=64, namely, m{add,sub}-8 tests
because of the subtleties in LRA costing model but it's not a correctness issue.
The *mul_{add,sub}_si patterns are tuned explicitly for LRA and all failures
have been resolved with m{add,sub}-* except the above. By saying failures I mean
the differences between tests ran with/without -mreload switch. A number of failures
already existed on ToT at the time of testing.

The patch is intended for Stage 1. As for the legal part, the company-wide
copyright assignment is in process.

Regards,
Robert

testsuite/ChangeLog:

2014-03-26 Robert Suchanek <***@imgtec.com>

* lra-constraints.c (base_to_reg): New function.
(process_address): Use new function.
* rtlanal.c (get_base_term): Add CONSTANT_P (*inner).

* config/mips/constraints.md ("d"): BASE_REG_CLASS
replaced by ADDR_REG_CLASS.
* config/mips/mips.c (mips_regno_mode_ok_for_base_p):
Remove use !strict_p for MIPS16.
(mips_register_priority): New function that implements
the target hook TARGET_REGISTER_PRIORITY.
(mips_spill_class): Likewise for TARGET_SPILL_CLASS
(mips_lra_p): Likewise for TARGET_LRA_P.
* config/mips/mips.h (reg_class): Add M16F_REGS and SPILL_REGS
classes.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(BASE_REG_CLASS): Use M16F_REGS.
(ADDR_REG_CLASS): Define.
(IRA_HARD_REGNO_ADD_COST_MULTIPLIER): Define.
* config/mips/mips.md (*mul_acc_si, *mul_sub_si): Add alternative
tuned for LRA. New set attribute to enable alternatives
depending on the register allocator used.
(*and<mode>3_mips16): Remove the load alternatives.
(*lea64): Disable pattern for MIPS16.
* config/mips/mips.opt
(mreload): New option.
---
gcc/config/mips/constraints.md | 2 +-
gcc/config/mips/mips.c | 51 +++++++++++++++++-
gcc/config/mips/mips.h | 17 +++++-
gcc/config/mips/mips.md | 112 +++++++++++++++++++++++-----------------
gcc/config/mips/mips.opt | 4 ++
gcc/lra-constraints.c | 44 +++++++++++++++-
gcc/rtlanal.c | 3 +-
7 files changed, 181 insertions(+), 52 deletions(-)

diff --git gcc/config/mips/constraints.md gcc/config/mips/constraints.md
index 49e4895..3810ac3 100644
--- gcc/config/mips/constraints.md
+++ gcc/config/mips/constraints.md
@@ -19,7 +19,7 @@

;; Register constraints

-(define_register_constraint "d" "BASE_REG_CLASS"
+(define_register_constraint "d" "ADDR_REG_CLASS"
"An address register. This is equivalent to @code{r} unless
generating MIPS16 code.")

diff --git gcc/config/mips/mips.c gcc/config/mips/mips.c
index 143169b..f27a801 100644
--- gcc/config/mips/mips.c
+++ gcc/config/mips/mips.c
@@ -2255,7 +2255,7 @@ mips_regno_mode_ok_for_base_p (int regno, enum machine_mode mode,
All in all, it seems more consistent to only enforce this restriction
during and after reload. */
if (TARGET_MIPS16 && regno == STACK_POINTER_REGNUM)
- return !strict_p || GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
+ return GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;

return TARGET_MIPS16 ? M16_REG_P (regno) : GP_REG_P (regno);
}
@@ -12114,6 +12114,32 @@ mips_register_move_cost (enum machine_mode mode,
return 0;
}

+/* Return a register priority for hard reg REGNO. */
+
+static int
+mips_register_priority (int hard_regno)
+{
+ if (TARGET_MIPS16)
+ {
+ /* Treat MIPS16 registers with higher priority than other regs. */
+ switch (hard_regno)
+ {
+ case 2:
+ case 3:
+ case 4:
+ case 5:
+ case 6:
+ case 7:
+ case 16:
+ case 17:
+ return 1;
+ default:
+ return 0;
+ }
+ }
+ return 0;
+}
+
/* Implement TARGET_MEMORY_MOVE_COST. */

static int
@@ -18897,6 +18923,22 @@ mips_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
*update = build2 (COMPOUND_EXPR, void_type_node, *update,
atomic_feraiseexcept_call);
}
+
+static reg_class_t
+mips_spill_class (reg_class_t rclass ATTRIBUTE_UNUSED,
+ enum machine_mode mode ATTRIBUTE_UNUSED)
+{
+ if (TARGET_MIPS16)
+ return SPILL_REGS;
+ return NO_REGS;
+}
+
+static bool
+mips_lra_p (void)
+{
+ return !TARGET_RELOAD;
+}
+


/* Initialize the GCC target structure. */
#undef TARGET_ASM_ALIGNED_HI_OP
@@ -18960,6 +19002,8 @@ mips_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
#define TARGET_VALID_POINTER_MODE mips_valid_pointer_mode
#undef TARGET_REGISTER_MOVE_COST
#define TARGET_REGISTER_MOVE_COST mips_register_move_cost
+#undef TARGET_REGISTER_PRIORITY
+#define TARGET_REGISTER_PRIORITY mips_register_priority
#undef TARGET_MEMORY_MOVE_COST
#define TARGET_MEMORY_MOVE_COST mips_memory_move_cost
#undef TARGET_RTX_COSTS
@@ -19134,6 +19178,11 @@ mips_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
#undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
#define TARGET_ATOMIC_ASSIGN_EXPAND_FENV mips_atomic_assign_expand_fenv

+#undef TARGET_SPILL_CLASS
+#define TARGET_SPILL_CLASS mips_spill_class
+#undef TARGET_LRA_P
+#define TARGET_LRA_P mips_lra_p
+
struct gcc_target targetm = TARGET_INITIALIZER;


#include "gt-mips.h"
diff --git gcc/config/mips/mips.h gcc/config/mips/mips.h
index a786d4c..712008f 100644
--- gcc/config/mips/mips.h
+++ gcc/config/mips/mips.h
@@ -1871,10 +1871,12 @@ enum reg_class
{
NO_REGS, /* no registers in set */
M16_REGS, /* mips16 directly accessible registers */
+ M16F_REGS, /* mips16 + frame */
T_REG, /* mips16 T register ($24) */
M16_T_REGS, /* mips16 registers plus T register */
PIC_FN_ADDR_REG, /* SVR4 PIC function address register */
V1_REG, /* Register $v1 ($3) used for TLS access. */
+ SPILL_REGS, /* All but $sp and call preserved regs are in here */
LEA_REGS, /* Every GPR except $25 */
GR_REGS, /* integer registers */
FP_REGS, /* floating point registers */
@@ -1908,10 +1910,12 @@ enum reg_class
{ \
"NO_REGS", \
"M16_REGS", \
+ "M16F_REGS", \
"T_REG", \
"M16_T_REGS", \
"PIC_FN_ADDR_REG", \
"V1_REG", \
+ "SPILL_REGS", \
"LEA_REGS", \
"GR_REGS", \
"FP_REGS", \
@@ -1948,10 +1952,12 @@ enum reg_class
{ \
{ 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 }, /* NO_REGS */ \
{ 0x000300fc, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 }, /* M16_REGS */ \
+ { 0x200300fc, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 }, /* M16F_REGS */ \
{ 0x01000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 }, /* T_REG */ \
{ 0x010300fc, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 }, /* M16_T_REGS */ \
{ 0x02000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 }, /* PIC_FN_ADDR_REG */ \
{ 0x00000008, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 }, /* V1_REG */ \
+ { 0x0003fffc, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 }, /* SPILL_REGS */ \
{ 0xfdffffff, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 }, /* LEA_REGS */ \
{ 0xffffffff, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 }, /* GR_REGS */ \
{ 0x00000000, 0xffffffff, 0x00000000, 0x00000000, 0x00000000, 0x00000000 }, /* FP_REGS */ \
@@ -1984,7 +1990,9 @@ enum reg_class
valid base register must belong. A base register is one used in
an address which is the register value plus a displacement. */

-#define BASE_REG_CLASS (TARGET_MIPS16 ? M16_REGS : GR_REGS)
+#define BASE_REG_CLASS (TARGET_MIPS16 ? M16F_REGS : GR_REGS)
+
+#define ADDR_REG_CLASS (TARGET_MIPS16 ? M16_REGS : GR_REGS)

/* A macro whose definition is the name of the class to which a
valid index register must belong. An index register is one used
@@ -1994,6 +2002,13 @@ enum reg_class

#define INDEX_REG_CLASS NO_REGS

+/* Add costs to hard registers based on frequency. This helps to negate
+ some of the reduced cost associated with argument registers which
+ unfairly promotes their use and increases register pressure */
+#define IRA_HARD_REGNO_ADD_COST_MULTIPLIER(REGNO) \
+ (TARGET_MIPS16 && optimize_size \
+ ? ((REGNO) >= 4 && (REGNO) <= 7 ? 2 : 0) : 0)
+
/* We generally want to put call-clobbered registers ahead of
call-saved ones. (IRA expects this.) */

diff --git gcc/config/mips/mips.md gcc/config/mips/mips.md
index 1e3e9e6..ababd5e 100644
--- gcc/config/mips/mips.md
+++ gcc/config/mips/mips.md
@@ -1622,40 +1622,66 @@
;; copy instructions. Reload therefore thinks that the second alternative
;; is two reloads more costly than the first. We add "*?*?" to the first
;; alternative as a counterweight.
+;;
+;; LRA simulates reload but the cost of reloading scratches is lower
+;; than of the classic reload. For the time being, removing the counterweight
+;; for LRA is more profitable.
(define_insn "*mul_acc_si"
- [(set (match_operand:SI 0 "register_operand" "=l*?*?,d?")
- (plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d")
- (match_operand:SI 2 "register_operand" "d,d"))
- (match_operand:SI 3 "register_operand" "0,d")))
- (clobber (match_scratch:SI 4 "=X,l"))
- (clobber (match_scratch:SI 5 "=X,&d"))]
+ [(set (match_operand:SI 0 "register_operand" "=l*?*?,l,d?")
+ (plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d,d")
+ (match_operand:SI 2 "register_operand" "d,d,d"))
+ (match_operand:SI 3 "register_operand" "0,0,d")))
+ (clobber (match_scratch:SI 4 "=X,X,l"))
+ (clobber (match_scratch:SI 5 "=X,X,&d"))]
"GENERATE_MADD_MSUB && !TARGET_MIPS16"
"@
madd\t%1,%2
+ madd\t%1,%2
#"
[(set_attr "type" "imadd")
(set_attr "accum_in" "3")
(set_attr "mode" "SI")
- (set_attr "insn_count" "1,2")])
+ (set_attr "insn_count" "1,1,2")
+ (set (attr "enabled")
+ (cond [(and (eq_attr "alternative" "0")
+ (match_test "TARGET_RELOAD"))
+ (const_string "yes")
+ (and (eq_attr "alternative" "1")
+ (match_test "!TARGET_RELOAD"))
+ (const_string "yes")
+ (eq_attr "alternative" "2")
+ (const_string "yes")]
+ (const_string "no")))])

;; The same idea applies here. The middle alternative needs one less
;; clobber than the final alternative, so we add "*?" as a counterweight.
(define_insn "*mul_acc_si_r3900"
- [(set (match_operand:SI 0 "register_operand" "=l*?*?,d*?,d?")
- (plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d,d")
- (match_operand:SI 2 "register_operand" "d,d,d"))
- (match_operand:SI 3 "register_operand" "0,l,d")))
- (clobber (match_scratch:SI 4 "=X,3,l"))
- (clobber (match_scratch:SI 5 "=X,X,&d"))]
+ [(set (match_operand:SI 0 "register_operand" "=l*?*?,l,d*?,d?")
+ (plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d,d,d")
+ (match_operand:SI 2 "register_operand" "d,d,d,d"))
+ (match_operand:SI 3 "register_operand" "0,0,l,d")))
+ (clobber (match_scratch:SI 4 "=X,X,3,l"))
+ (clobber (match_scratch:SI 5 "=X,X,X,&d"))]
"TARGET_MIPS3900 && !TARGET_MIPS16"
"@
madd\t%1,%2
+ madd\t%1,%2
madd\t%0,%1,%2
#"
[(set_attr "type" "imadd")
(set_attr "accum_in" "3")
(set_attr "mode" "SI")
- (set_attr "insn_count" "1,1,2")])
+ (set_attr "insn_count" "1,1,1,2")
+ (set (attr "enabled")
+ (cond [(and (eq_attr "alternative" "0")
+ (match_test "TARGET_RELOAD"))
+ (const_string "yes")
+ (and (eq_attr "alternative" "1")
+ (match_test "!TARGET_RELOAD"))
+ (const_string "yes")
+ (eq_attr "alternative" "2,3")
+ (const_string "yes")]
+ (const_string "no")))])

;; Split *mul_acc_si if both the source and destination accumulator
;; values are GPRs.
@@ -1859,20 +1885,31 @@

;; See the comment above *mul_add_si for details.
(define_insn "*mul_sub_si"
- [(set (match_operand:SI 0 "register_operand" "=l*?*?,d?")
- (minus:SI (match_operand:SI 1 "register_operand" "0,d")
- (mult:SI (match_operand:SI 2 "register_operand" "d,d")
- (match_operand:SI 3 "register_operand" "d,d"))))
- (clobber (match_scratch:SI 4 "=X,l"))
- (clobber (match_scratch:SI 5 "=X,&d"))]
+ [(set (match_operand:SI 0 "register_operand" "=l*?*?,l,d?")
+ (minus:SI (match_operand:SI 1 "register_operand" "0,0,d")
+ (mult:SI (match_operand:SI 2 "register_operand" "d,d,d")
+ (match_operand:SI 3 "register_operand" "d,d,d"))))
+ (clobber (match_scratch:SI 4 "=X,X,l"))
+ (clobber (match_scratch:SI 5 "=X,X,&d"))]
"GENERATE_MADD_MSUB"
"@
msub\t%2,%3
+ msub\t%2,%3
#"
[(set_attr "type" "imadd")
(set_attr "accum_in" "1")
(set_attr "mode" "SI")
- (set_attr "insn_count" "1,2")])
+ (set_attr "insn_count" "1,1,2")
+ (set (attr "enabled")
+ (cond [(and (eq_attr "alternative" "0")
+ (match_test "TARGET_RELOAD"))
+ (const_string "yes")
+ (and (eq_attr "alternative" "1")
+ (match_test "!TARGET_RELOAD"))
+ (const_string "yes")
+ (eq_attr "alternative" "2")
+ (const_string "yes")]
+ (const_string "no")))])

;; Split *mul_sub_si if both the source and destination accumulator
;; values are GPRs.
@@ -2986,31 +3023,14 @@
(set_attr "mode" "<MODE>")])

(define_insn "*and<mode>3_mips16"
- [(set (match_operand:GPR 0 "register_operand" "=d,d,d,d,d")
- (and:GPR (match_operand:GPR 1 "nonimmediate_operand" "%W,W,W,d,0")
- (match_operand:GPR 2 "and_operand" "Yb,Yh,Yw,Yw,d")))]
+ [(set (match_operand:GPR 0 "register_operand" "=d,d")
+ (and:GPR (match_operand:GPR 1 "register_operand" "%d,0")
+ (match_operand:GPR 2 "and_operand" "Yw,d")))]
"TARGET_MIPS16 && and_operands_ok (<MODE>mode, operands[1], operands[2])"
-{
- switch (which_alternative)
- {
- case 0:
- operands[1] = gen_lowpart (QImode, operands[1]);
- return "lbu\t%0,%1";
- case 1:
- operands[1] = gen_lowpart (HImode, operands[1]);
- return "lhu\t%0,%1";
- case 2:
- operands[1] = gen_lowpart (SImode, operands[1]);
- return "lwu\t%0,%1";
- case 3:
- return "#";
- case 4:
- return "and\t%0,%2";
- default:
- gcc_unreachable ();
- }
-}
- [(set_attr "move_type" "load,load,load,shift_shift,logical")
+ "@
+ #
+ and\t%0,%2"
+ [(set_attr "move_type" "shift_shift,logical")
(set_attr "mode" "<MODE>")])

(define_expand "ior<mode>3"
@@ -4139,7 +4159,7 @@
[(set (match_operand:DI 0 "register_operand" "=d")
(match_operand:DI 1 "absolute_symbolic_operand" ""))
(clobber (match_scratch:DI 2 "=&d"))]
- "TARGET_EXPLICIT_RELOCS && ABI_HAS_64BIT_SYMBOLS && cse_not_expected"
+ "!TARGET_MIPS16 && TARGET_EXPLICIT_RELOCS && ABI_HAS_64BIT_SYMBOLS && cse_not_expected"
"#"
"&& reload_completed"
[(set (match_dup 0) (high:DI (match_dup 3)))
diff --git gcc/config/mips/mips.opt gcc/config/mips/mips.opt
index 6ee5398..a044031 100644
--- gcc/config/mips/mips.opt
+++ gcc/config/mips/mips.opt
@@ -380,6 +380,10 @@ msynci
Target Report Mask(SYNCI)
Use synci instruction to invalidate i-cache

+mreload
+Target Report Var(TARGET_RELOAD)
+Use reload instead of lra
+
mtune=
Target RejectNegative Joined Var(mips_tune_option) ToLower Enum(mips_arch_opt_value)
-mtune=PROCESSOR Optimize the output for PROCESSOR
diff --git gcc/lra-constraints.c gcc/lra-constraints.c
index ba4d489..ab85495 100644
--- gcc/lra-constraints.c
+++ gcc/lra-constraints.c
@@ -2615,6 +2615,39 @@ valid_address_p (struct address_info *ad)
return ok_p;
}

+/* Make reload base reg from address AD. */
+static rtx
+base_to_reg (struct address_info *ad)
+{
+ enum reg_class cl;
+ int code = -1;
+ rtx new_inner = NULL_RTX;
+ rtx new_reg = NULL_RTX;
+ rtx insn;
+ rtx last_insn = get_last_insn();
+
+ lra_assert (ad->base == ad->base_term && ad->disp == ad->disp_term);
+ cl = base_reg_class (ad->mode, ad->as, ad->base_outer_code,
+ get_index_code (ad));
+ new_reg = lra_create_new_reg (GET_MODE (*ad->base_term), NULL_RTX,
+ cl, "base");
+ new_inner = simplify_gen_binary (PLUS, GET_MODE (new_reg), new_reg,
+ ad->disp_term == NULL
+ ? gen_int_mode (0, ad->mode)
+ : *ad->disp_term);
+ if (!valid_address_p (ad->mode, new_inner, ad->as))
+ return NULL_RTX;
+ insn = emit_insn (gen_rtx_SET (ad->mode, new_reg, *ad->base_term));
+ code = recog_memoized (insn);
+ if (code < 0)
+ {
+ delete_insns_since (last_insn);
+ return NULL_RTX;
+ }
+
+ return new_inner;
+}
+
/* Make reload base reg + disp from address AD. Return the new pseudo. */
static rtx
base_plus_disp_to_reg (struct address_info *ad)
@@ -2818,6 +2851,8 @@ process_address (int nop, rtx *before, rtx *after)

3) the address is a frame address with an invalid offset.

+ 4) the address is a frame address with an invalid base.
+
All these cases involve a non-autoinc address, so there is no
point revalidating other types. */
if (ad.autoinc_p || valid_address_p (&ad))
@@ -2899,14 +2934,19 @@ process_address (int nop, rtx *before, rtx *after)
int regno;
enum reg_class cl;
rtx set, insns, last_insn;
+ /* Try to reload base into register only if the base is invalid
+ for the address but with valid offset, case (4) above. */
+ start_sequence ();
+ new_reg = base_to_reg (&ad);
+
/* base + disp => new base, cases (1) and (3) above. */
/* Another option would be to reload the displacement into an
index register. However, postreload has code to optimize
address reloads that have the same base and different
displacements, so reloading into an index register would
not necessarily be a win. */
- start_sequence ();
- new_reg = base_plus_disp_to_reg (&ad);
+ if (new_reg == NULL_RTX)
+ new_reg = base_plus_disp_to_reg (&ad);
insns = get_insns ();
last_insn = get_last_insn ();
/* If we generated at least two insns, try last insn source as
diff --git gcc/rtlanal.c gcc/rtlanal.c
index 7a9efec..f699d17 100644
--- gcc/rtlanal.c
+++ gcc/rtlanal.c
@@ -5577,7 +5577,8 @@ get_base_term (rtx *inner)
inner = strip_address_mutations (&XEXP (*inner, 0));
if (REG_P (*inner)
|| MEM_P (*inner)
- || GET_CODE (*inner) == SUBREG)
+ || GET_CODE (*inner) == SUBREG
+ || CONSTANT_P (*inner))
return inner;
return 0;
}
--
1.7.9.5
Richard Sandiford
2014-03-29 10:52:01 UTC
Permalink
First of all, thanks a lot for doing this.
Post by Robert Suchanek
diff --git gcc/config/mips/mips.c gcc/config/mips/mips.c
index 143169b..f27a801 100644
--- gcc/config/mips/mips.c
+++ gcc/config/mips/mips.c
@@ -2255,7 +2255,7 @@ mips_regno_mode_ok_for_base_p (int regno, enum machine_mode mode,
All in all, it seems more consistent to only enforce this restriction
during and after reload. */
if (TARGET_MIPS16 && regno == STACK_POINTER_REGNUM)
- return !strict_p || GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
+ return GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
return TARGET_MIPS16 ? M16_REG_P (regno) : GP_REG_P (regno);
}
Not sure about this one. We would need to update the comment that
explains why "!strict_p" is there, but AFAIK reason (1) would still apply.

Was this needed for correctness or because it gave better code?
Post by Robert Suchanek
+/* Return a register priority for hard reg REGNO. */
+
+static int
+mips_register_priority (int hard_regno)
+{
+ if (TARGET_MIPS16)
+ {
+ /* Treat MIPS16 registers with higher priority than other regs. */
+ switch (hard_regno)
+ {
+ return 1;
+ return 0;
+ }
+ }
+ return 0;
+}
+
Easier as:

if (TARGET_MIPS16
&& TEST_HARD_REG_BIT (reg_class_contents[M16_REGS], hard_regno))
return 1;
return 0;
Post by Robert Suchanek
diff --git gcc/config/mips/mips.h gcc/config/mips/mips.h
index a786d4c..712008f 100644
--- gcc/config/mips/mips.h
+++ gcc/config/mips/mips.h
@@ -1871,10 +1871,12 @@ enum reg_class
{
NO_REGS, /* no registers in set */
M16_REGS, /* mips16 directly accessible registers */
+ M16F_REGS, /* mips16 + frame */
Constraints are supposed to be operating on real registers, after
elimination, so it seems odd to include a fake register. What went
wrong with just M16_REGS?
Post by Robert Suchanek
+ SPILL_REGS, /* All but $sp and call preserved regs are in here */
...
Post by Robert Suchanek
+ { 0x0003fffc, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 }, /* SPILL_REGS */ \
These two don't seem to match. I think literally it would be 0x0300fffc,
but maybe you had to make SPILL_REGS a superset of M16_REGs?
Post by Robert Suchanek
+/* Add costs to hard registers based on frequency. This helps to negate
+ some of the reduced cost associated with argument registers which
+ unfairly promotes their use and increases register pressure */
+#define IRA_HARD_REGNO_ADD_COST_MULTIPLIER(REGNO) \
+ (TARGET_MIPS16 && optimize_size \
+ ? ((REGNO) >= 4 && (REGNO) <= 7 ? 2 : 0) : 0)
So we would be trying to use, say, $4 for the first incoming argument
even after it had been spilled? Hmm.

Since this change is aimed specifically at one heuristic, I wonder
whether we should parameterise that heuristic somehow rather than
try to use a general hook to undo it. But I don't think there's
anything particularly special about MIPS16 here, so maybe it's too
eager for all targets.
Post by Robert Suchanek
diff --git gcc/config/mips/mips.md gcc/config/mips/mips.md
index 1e3e9e6..ababd5e 100644
--- gcc/config/mips/mips.md
+++ gcc/config/mips/mips.md
@@ -1622,40 +1622,66 @@
;; copy instructions. Reload therefore thinks that the second alternative
;; is two reloads more costly than the first. We add "*?*?" to the first
;; alternative as a counterweight.
+;;
+;; LRA simulates reload but the cost of reloading scratches is lower
+;; than of the classic reload. For the time being, removing the counterweight
+;; for LRA is more profitable.
(define_insn "*mul_acc_si"
- [(set (match_operand:SI 0 "register_operand" "=l*?*?,d?")
- (plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d")
- (match_operand:SI 2 "register_operand" "d,d"))
- (match_operand:SI 3 "register_operand" "0,d")))
- (clobber (match_scratch:SI 4 "=X,l"))
- (clobber (match_scratch:SI 5 "=X,&d"))]
+ [(set (match_operand:SI 0 "register_operand" "=l*?*?,l,d?")
+ (plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d,d")
+ (match_operand:SI 2 "register_operand" "d,d,d"))
+ (match_operand:SI 3 "register_operand" "0,0,d")))
+ (clobber (match_scratch:SI 4 "=X,X,l"))
+ (clobber (match_scratch:SI 5 "=X,X,&d"))]
"GENERATE_MADD_MSUB && !TARGET_MIPS16"
madd\t%1,%2
+ madd\t%1,%2
#"
[(set_attr "type" "imadd")
(set_attr "accum_in" "3")
(set_attr "mode" "SI")
- (set_attr "insn_count" "1,2")])
+ (set_attr "insn_count" "1,1,2")
+ (set (attr "enabled")
+ (cond [(and (eq_attr "alternative" "0")
+ (match_test "TARGET_RELOAD"))
+ (const_string "yes")
+ (and (eq_attr "alternative" "1")
+ (match_test "!TARGET_RELOAD"))
+ (const_string "yes")
+ (eq_attr "alternative" "2")
+ (const_string "yes")]
+ (const_string "no")))])
Looks good. Glad to see the operand 0 hack going away. :-)
Post by Robert Suchanek
@@ -2986,31 +3023,14 @@
(set_attr "mode" "<MODE>")])
(define_insn "*and<mode>3_mips16"
- [(set (match_operand:GPR 0 "register_operand" "=d,d,d,d,d")
- (and:GPR (match_operand:GPR 1 "nonimmediate_operand" "%W,W,W,d,0")
- (match_operand:GPR 2 "and_operand" "Yb,Yh,Yw,Yw,d")))]
+ [(set (match_operand:GPR 0 "register_operand" "=d,d")
+ (and:GPR (match_operand:GPR 1 "register_operand" "%d,0")
+ (match_operand:GPR 2 "and_operand" "Yw,d")))]
"TARGET_MIPS16 && and_operands_ok (<MODE>mode, operands[1], operands[2])"
-{
- switch (which_alternative)
- {
- operands[1] = gen_lowpart (QImode, operands[1]);
- return "lbu\t%0,%1";
- operands[1] = gen_lowpart (HImode, operands[1]);
- return "lhu\t%0,%1";
- operands[1] = gen_lowpart (SImode, operands[1]);
- return "lwu\t%0,%1";
- return "#";
- return "and\t%0,%2";
- gcc_unreachable ();
- }
-}
- [(set_attr "move_type" "load,load,load,shift_shift,logical")
+ #
+ and\t%0,%2"
+ [(set_attr "move_type" "shift_shift,logical")
(set_attr "mode" "<MODE>")])
I think we want to keep the LWU case at the very least, since I assume
otherwise we'll load the full 64-bit spill slot and use a pair of shifts
to zero-extend it. Reloading the stack address into a base register
should always be better than that.

I agree it's less clear for the byte and halfword cases. All other
things -- including size -- being equal, reloading a stack address into
a base register and using an extending load should be better than
reloading the full spill slot and extending it, since the reloaded stack
address is more likely to be reused in other instructions.

The original MIPS16 didn't have ZEB and ZEH, so on the basis above,
I think using LBU and LHU there was probably a win. I can imagine
it making sense to disable LBU and LHU for MIPS16e though.

It might be better to do any changes to this pattern as a follow-on
patch, since I think LRA should cope with it as-is.
Post by Robert Suchanek
@@ -4139,7 +4159,7 @@
[(set (match_operand:DI 0 "register_operand" "=d")
(match_operand:DI 1 "absolute_symbolic_operand" ""))
(clobber (match_scratch:DI 2 "=&d"))]
- "TARGET_EXPLICIT_RELOCS && ABI_HAS_64BIT_SYMBOLS && cse_not_expected"
+ "!TARGET_MIPS16 && TARGET_EXPLICIT_RELOCS && ABI_HAS_64BIT_SYMBOLS && cse_not_expected"
"#"
"&& reload_completed"
[(set (match_dup 0) (high:DI (match_dup 3)))
Minor nit, but please keep within 80 chars:

"!TARGET_MIPS16
&& TARGET_EXPLICIT_RELOCS
&& ABI_HAS_64BIT_SYMBOLS
&& cse_not_expected"
Post by Robert Suchanek
diff --git gcc/lra-constraints.c gcc/lra-constraints.c
index ba4d489..ab85495 100644
--- gcc/lra-constraints.c
+++ gcc/lra-constraints.c
@@ -2615,6 +2615,39 @@ valid_address_p (struct address_info *ad)
return ok_p;
}
+/* Make reload base reg from address AD. */
+static rtx
+base_to_reg (struct address_info *ad)
+{
+ enum reg_class cl;
+ int code = -1;
+ rtx new_inner = NULL_RTX;
+ rtx new_reg = NULL_RTX;
+ rtx insn;
+ rtx last_insn = get_last_insn();
+
+ lra_assert (ad->base == ad->base_term && ad->disp == ad->disp_term);
+ cl = base_reg_class (ad->mode, ad->as, ad->base_outer_code,
+ get_index_code (ad));
+ new_reg = lra_create_new_reg (GET_MODE (*ad->base_term), NULL_RTX,
+ cl, "base");
+ new_inner = simplify_gen_binary (PLUS, GET_MODE (new_reg), new_reg,
+ ad->disp_term == NULL
+ ? gen_int_mode (0, ad->mode)
+ : *ad->disp_term);
+ if (!valid_address_p (ad->mode, new_inner, ad->as))
+ return NULL_RTX;
+ insn = emit_insn (gen_rtx_SET (ad->mode, new_reg, *ad->base_term));
+ code = recog_memoized (insn);
+ if (code < 0)
+ {
+ delete_insns_since (last_insn);
+ return NULL_RTX;
+ }
+
+ return new_inner;
+}
+
/* Make reload base reg + disp from address AD. Return the new pseudo. */
static rtx
base_plus_disp_to_reg (struct address_info *ad)
@@ -2818,6 +2851,8 @@ process_address (int nop, rtx *before, rtx *after)
3) the address is a frame address with an invalid offset.
+ 4) the address is a frame address with an invalid base.
+
All these cases involve a non-autoinc address, so there is no
point revalidating other types. */
if (ad.autoinc_p || valid_address_p (&ad))
@@ -2899,14 +2934,19 @@ process_address (int nop, rtx *before, rtx *after)
int regno;
enum reg_class cl;
rtx set, insns, last_insn;
+ /* Try to reload base into register only if the base is invalid
+ for the address but with valid offset, case (4) above. */
+ start_sequence ();
+ new_reg = base_to_reg (&ad);
+
/* base + disp => new base, cases (1) and (3) above. */
/* Another option would be to reload the displacement into an
index register. However, postreload has code to optimize
address reloads that have the same base and different
displacements, so reloading into an index register would
not necessarily be a win. */
- start_sequence ();
- new_reg = base_plus_disp_to_reg (&ad);
+ if (new_reg == NULL_RTX)
+ new_reg = base_plus_disp_to_reg (&ad);
insns = get_insns ();
last_insn = get_last_insn ();
/* If we generated at least two insns, try last insn source as
Obviously this is Vlad's call, but the idea looks good to me.
Speculatively calling lra_create_new_reg for cases where (4) doesn't
end up applying is probably too wasteful though.
Post by Robert Suchanek
diff --git gcc/rtlanal.c gcc/rtlanal.c
index 7a9efec..f699d17 100644
--- gcc/rtlanal.c
+++ gcc/rtlanal.c
@@ -5577,7 +5577,8 @@ get_base_term (rtx *inner)
inner = strip_address_mutations (&XEXP (*inner, 0));
if (REG_P (*inner)
|| MEM_P (*inner)
- || GET_CODE (*inner) == SUBREG)
+ || GET_CODE (*inner) == SUBREG
+ || CONSTANT_P (*inner))
return inner;
return 0;
}
This doesn't look right; the general form is base+index+displacement+segment,
with the constant term being treated as the displacement.

Thanks,
Richard
Robert Suchanek
2014-04-09 09:59:58 UTC
Permalink
Post by Jakub Jelinek
FYI, all other targets that have LRA optionally selectable or deselectable
use -mno-lra for this (even when -mlra is the default), it would be better
for consistency not to invent new switch names for that.
Agreed.
Post by Jakub Jelinek
Post by Robert Suchanek
- return !strict_p || GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
+ return GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
return TARGET_MIPS16 ? M16_REG_P (regno) : GP_REG_P (regno);
}
Not sure about this one. We would need to update the comment that
explains why "!strict_p" is there, but AFAIK reason (1) would still apply.
Was this needed for correctness or because it gave better code?
"!strict_p" has been removed because of correctness issue. When LRA validates
memory addresses pseudos are temporarily eliminated to hard registers (if possible)
but the target hook is always called as non-strict. This only affects MIPS16 instructions with
not directly accessible $sp. The strict variant, as I understand, was used in the reload
pass to indicate if a pseudo-register has been allocated a hard register. Unless LRA
should be setting the strict/non-strict depending on whether a temporal elimination
to hard reg was successful or there is something else that I missed?
Post by Jakub Jelinek
if (TARGET_MIPS16
&& TEST_HARD_REG_BIT (reg_class_contents[M16_REGS], hard_regno))
return 1;
return 0;
Indeed.
Post by Jakub Jelinek
Post by Robert Suchanek
+ M16F_REGS, /* mips16 + frame */
Constraints are supposed to be operating on real registers, after
elimination, so it seems odd to include a fake register. What went
wrong with just M16_REGS?
Only the stack pointer has been added to M16_REGS. A number of patterns need to accept
it otherwise LRA inserts a lot of reloads and the code size goes up by about 10%.
The change does have also a positive effect on reload but marginally.
"frame" meant to indicate inclusion of both the stack and hard frame pointers in the class
but perhaps I should name it differently to avoid confusion.
Post by Jakub Jelinek
Post by Robert Suchanek
+ SPILL_REGS, /* All but $sp and call preserved regs are in here */
...
Post by Robert Suchanek
+ { 0x0003fffc, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 }, /* SPILL_REGS */ \
These two don't seem to match. I think literally it would be 0x0300fffc,
but maybe you had to make SPILL_REGS a superset of M16_REGs?
I initially used 0x0300fffc but did some experiments and it turned out that 0x0003fffc (with $16, $17 regs)
gives slightly better code. I haven't updated the comment though. There is yet more to do
and need to return to another thread with MIPS16 at some point as I found some limitations
of IRA/LRA to generate better code. $8-$15 are currently inaccessible as temporary storage
because these registers are marked as fixed (when optimizing for size) but leaving them
as fixed are better for the code size. I don't expect a big gain by using hard registers
for spilling but it more likely to improve the performance.
Post by Jakub Jelinek
Post by Robert Suchanek
+/* Add costs to hard registers based on frequency. This helps to negate
+ some of the reduced cost associated with argument registers which
+ unfairly promotes their use and increases register pressure */
+#define IRA_HARD_REGNO_ADD_COST_MULTIPLIER(REGNO) \
+ (TARGET_MIPS16 && optimize_size \
+ ? ((REGNO) >= 4 && (REGNO) <= 7 ? 2 : 0) : 0)
So we would be trying to use, say, $4 for the first incoming argument
even after it had been spilled? Hmm.
Since this change is aimed specifically at one heuristic, I wonder
whether we should parameterise that heuristic somehow rather than
try to use a general hook to undo it. But I don't think there's
anything particularly special about MIPS16 here, so maybe it's too
eager for all targets.
In a number of cases argument registers appeared to be unfairly promoted
increasing the register pressure and increasing the number of reloads.
Bumping up the cost of using those registers encourages IRA to spill
into memory but this appears to help LRA to do a better allocation. Of course,
not always it is a win but generally the gain outweighs the loss.

I've seen an codesize improvements for other optimization levels
but I'm unsure whether we should make this change generic.
This part of the patch is not crucial though and can be send separately.
Post by Jakub Jelinek
Post by Robert Suchanek
(define_insn "*and<mode>3_mips16"
...
I think we want to keep the LWU case at the very least, since I assume
otherwise we'll load the full 64-bit spill slot and use a pair of shifts
to zero-extend it. Reloading the stack address into a base register
should always be better than that.
I agree it's less clear for the byte and halfword cases. All other
things -- including size -- being equal, reloading a stack address into
a base register and using an extending load should be better than
reloading the full spill slot and extending it, since the reloaded stack
address is more likely to be reused in other instructions.
The original MIPS16 didn't have ZEB and ZEH, so on the basis above,
I think using LBU and LHU there was probably a win. I can imagine
it making sense to disable LBU and LHU for MIPS16e though.
It might be better to do any changes to this pattern as a follow-on
patch, since I think LRA should cope with it as-is.
Keeping LWU case should be fine. Alternatives were removed because
LRA was ICEing for the same reason of $sp not accessible for the byte
and halfword cases. I tried to find a testcase to trigger LWU case
but couldn't. I presume it must be a rare case? The problem with other
alternatives was that spilled pseudos were changed to memory without
checking if the use of $sp is valid. On the other hand, keeping LWU
may only delay triggering the problem because the stack pointer
should not be used. It could be a missing case in LRA too.
Post by Jakub Jelinek
"!TARGET_MIPS16
&& TARGET_EXPLICIT_RELOCS
&& ABI_HAS_64BIT_SYMBOLS
&& cse_not_expected"
Overlooked it.
Post by Jakub Jelinek
Post by Robert Suchanek
diff --git gcc/rtlanal.c gcc/rtlanal.c
index 7a9efec..f699d17 100644
--- gcc/rtlanal.c
+++ gcc/rtlanal.c
@@ -5577,7 +5577,8 @@ get_base_term (rtx *inner)
inner = strip_address_mutations (&XEXP (*inner, 0));
if (REG_P (*inner)
|| MEM_P (*inner)
- || GET_CODE (*inner) == SUBREG)
+ || GET_CODE (*inner) == SUBREG
+ || CONSTANT_P (*inner))
return inner;
return 0;
}
This doesn't look right; the general form is base+index+displacement+segment,
with the constant term being treated as the displacement.
I'm not particularly happy with this either. This was an attempt to fix an ICE for
the following RTL (gcc.dg/torture/asm-subreg-1.c compiled with -mips32r2 -mips16 -O1):

(insn 9 8 0 2 (asm_operands/v ("") ("") 0 [
(mem/v/j/c:HI (lo_sum:SI (const:SI (unspec:SI [
(const_int 0 [0])
] UNSPEC_GP))
(symbol_ref:SI ("_const_32") [flags 0x6] <var_decl 0x7f50acd17558 _const_32>)) [0 _const_32+0 S2 A32])]
[(asm_input:HI ("X") (null):0)]
[] asm-subreg-1.c:13) asm-subreg-1.c:13 -1 (nil))

Any suggestions to handle this case?

Regards,
Robert
Richard Sandiford
2014-04-09 21:24:06 UTC
Permalink
Post by Robert Suchanek
Post by Jakub Jelinek
FYI, all other targets that have LRA optionally selectable or deselectable
use -mno-lra for this (even when -mlra is the default), it would be better
for consistency not to invent new switch names for that.
Agreed.
Post by Jakub Jelinek
Post by Robert Suchanek
- return !strict_p || GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
+ return GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
return TARGET_MIPS16 ? M16_REG_P (regno) : GP_REG_P (regno);
}
Not sure about this one. We would need to update the comment that
explains why "!strict_p" is there, but AFAIK reason (1) would still apply.
Was this needed for correctness or because it gave better code?
"!strict_p" has been removed because of correctness issue. When LRA
validates memory addresses pseudos are temporarily eliminated to hard
registers (if possible) but the target hook is always called as
non-strict. This only affects MIPS16 instructions with not directly
accessible $sp. The strict variant, as I understand, was used in the
reload pass to indicate if a pseudo-register has been allocated a hard
register. Unless LRA should be setting the strict/non-strict depending
on whether a temporal elimination to hard reg was successful or there
is something else that I missed?
Hmm, OK, in that case I agree reason (2) doesn't apply. That part was
always more of a consistency thing anyway, so I agree it's not worth
keeping around for reload. I also had a look to see why
instantiate_virtual_regs_in_insn didn't complain about cases like:

struct s { unsigned char c; };
void foo (int, int, int, int, struct s);
void bar (struct s *ptr) { foo (1, 2, 3, 4, *ptr); }

and I think it's because of the later:

2008-02-14 Michael Matz <***@suse.de>

PR target/34930
* function.c (instantiate_virtual_regs_in_insn): Reload address
before falling back to reloading the whole operand.

which correctly reloads the address if necessary.

So yeah, I agree this is right after all, sorry. Let's delete the
comment starting at "There are two problems here:" at the same time.
Post by Robert Suchanek
Post by Jakub Jelinek
Post by Robert Suchanek
+ M16F_REGS, /* mips16 + frame */
Constraints are supposed to be operating on real registers, after
elimination, so it seems odd to include a fake register. What went
wrong with just M16_REGS?
Only the stack pointer has been added to M16_REGS.
Sorry, I'd read "frame" as meaning "$frame", the soft frame pointer.
I agree M16_REGS + $sp is OK.

mips_regno_to_class should then map $sp to the new class, since it's now
the smallest containing class. (We really should set that up automatically
one day...)
Post by Robert Suchanek
A number of patterns need to accept it otherwise LRA inserts a lot of
reloads and the code size goes up by about 10%. The change does have
also a positive effect on reload but marginally. "frame" meant to
indicate inclusion of both the stack and hard frame pointers in the
class but perhaps I should name it differently to avoid confusion.
How about M16_SP_REGS, to match M16_T_REGS?

Also, the BASE_REG_CLASS/ADDR_REG_CLASS distinction isn't all that
obvious from the names. ADDR_REG_CLASS is only needed for the "d"
constraint so maybe we could just use TARGET_MIPS16 ? M16_REGS : GR_REGS
directly for now.
Post by Robert Suchanek
Post by Jakub Jelinek
Post by Robert Suchanek
+ SPILL_REGS, /* All but $sp and call preserved regs are in here */
...
Post by Robert Suchanek
+ { 0x0003fffc, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000 }, /* SPILL_REGS */ \
These two don't seem to match. I think literally it would be 0x0300fffc,
but maybe you had to make SPILL_REGS a superset of M16_REGs?
I initially used 0x0300fffc but did some experiments and it turned out
that 0x0003fffc (with $16, $17 regs) gives slightly better code. I
haven't updated the comment though.
I can imagine including all M16_REGS makes sense, but it seems odd to
drop the 2 temporaries. Does 0x0303fffc have the same problem?
Post by Robert Suchanek
There is yet more to do and need to return to another thread with
MIPS16 at some point as I found some limitations of IRA/LRA to
generate better code. $8-$15 are currently inaccessible as temporary
storage because these registers are marked as fixed (when optimizing
for size) but leaving them as fixed are better for the code size. I
don't expect a big gain by using hard registers for spilling but it
more likely to improve the performance.
Hmm, marking them fixed was supposed to be a temporary reload-only thing,
until the move to LRA. It should never be worse to spill to these GPRs
over spilling to the stack, if the value isn't live across a call.

But that certainly doesn't need to be part of the initial patch.
Post by Robert Suchanek
Post by Jakub Jelinek
Post by Robert Suchanek
+/* Add costs to hard registers based on frequency. This helps to negate
+ some of the reduced cost associated with argument registers which
+ unfairly promotes their use and increases register pressure */
+#define IRA_HARD_REGNO_ADD_COST_MULTIPLIER(REGNO) \
+ (TARGET_MIPS16 && optimize_size \
+ ? ((REGNO) >= 4 && (REGNO) <= 7 ? 2 : 0) : 0)
So we would be trying to use, say, $4 for the first incoming argument
even after it had been spilled? Hmm.
Since this change is aimed specifically at one heuristic, I wonder
whether we should parameterise that heuristic somehow rather than
try to use a general hook to undo it. But I don't think there's
anything particularly special about MIPS16 here, so maybe it's too
eager for all targets.
In a number of cases argument registers appeared to be unfairly promoted
increasing the register pressure and increasing the number of reloads.
Bumping up the cost of using those registers encourages IRA to spill
into memory but this appears to help LRA to do a better allocation. Of course,
not always it is a win but generally the gain outweighs the loss.
I've seen an codesize improvements for other optimization levels
but I'm unsure whether we should make this change generic.
This part of the patch is not crucial though and can be send separately.
OK, thanks, doing it separately sounds better.
Post by Robert Suchanek
Post by Jakub Jelinek
Post by Robert Suchanek
(define_insn "*and<mode>3_mips16"
...
I think we want to keep the LWU case at the very least, since I assume
otherwise we'll load the full 64-bit spill slot and use a pair of shifts
to zero-extend it. Reloading the stack address into a base register
should always be better than that.
I agree it's less clear for the byte and halfword cases. All other
things -- including size -- being equal, reloading a stack address into
a base register and using an extending load should be better than
reloading the full spill slot and extending it, since the reloaded stack
address is more likely to be reused in other instructions.
The original MIPS16 didn't have ZEB and ZEH, so on the basis above,
I think using LBU and LHU there was probably a win. I can imagine
it making sense to disable LBU and LHU for MIPS16e though.
It might be better to do any changes to this pattern as a follow-on
patch, since I think LRA should cope with it as-is.
Keeping LWU case should be fine. Alternatives were removed because
LRA was ICEing for the same reason of $sp not accessible for the byte
and halfword cases. I tried to find a testcase to trigger LWU case
but couldn't. I presume it must be a rare case? The problem with other
alternatives was that spilled pseudos were changed to memory without
checking if the use of $sp is valid. On the other hand, keeping LWU
may only delay triggering the problem because the stack pointer
should not be used. It could be a missing case in LRA too.
Did you see the failures even after your mips_regno_mode_ok_for_base_p
change? LRA should know how to reload a "W" address.
Post by Robert Suchanek
Post by Jakub Jelinek
Post by Robert Suchanek
diff --git gcc/rtlanal.c gcc/rtlanal.c
index 7a9efec..f699d17 100644
--- gcc/rtlanal.c
+++ gcc/rtlanal.c
@@ -5577,7 +5577,8 @@ get_base_term (rtx *inner)
inner = strip_address_mutations (&XEXP (*inner, 0));
if (REG_P (*inner)
|| MEM_P (*inner)
- || GET_CODE (*inner) == SUBREG)
+ || GET_CODE (*inner) == SUBREG
+ || CONSTANT_P (*inner))
return inner;
return 0;
}
This doesn't look right; the general form is base+index+displacement+segment,
with the constant term being treated as the displacement.
I'm not particularly happy with this either. This was an attempt to fix an ICE for
(insn 9 8 0 2 (asm_operands/v ("") ("") 0 [
(mem/v/j/c:HI (lo_sum:SI (const:SI (unspec:SI [
(const_int 0 [0])
] UNSPEC_GP))
(symbol_ref:SI ("_const_32") [flags 0x6] <var_decl 0x7f50acd17558 _const_32>)) [0 _const_32+0 S2 A32])]
[(asm_input:HI ("X") (null):0)]
[] asm-subreg-1.c:13) asm-subreg-1.c:13 -1 (nil))
Any suggestions to handle this case?
Thanks for the pointer. I think this shows a more fundamental problem
with the handling of "X" constraints. With something like:

void
foo (int **x, int y, int z)
{
int *ptr = *x + y * z / 11;
__asm__ __volatile__ ("" : : "X" (*ptr));
}

the entire expression gets treated as a MEM address, which neither
reload nor LRA can handle. And with something like that, it isn't
obvious what class all the registers in the address should have.
With a sufficiently-complicated expression you could run out of registers.

So perhaps we should limit the propagation to things that
decompose_mem_address can handle.

Thanks,
Richard
Richard Sandiford
2014-04-10 20:29:30 UTC
Permalink
Post by Richard Sandiford
Post by Robert Suchanek
I'm not particularly happy with this either. This was an attempt to fix an ICE for
(insn 9 8 0 2 (asm_operands/v ("") ("") 0 [
(mem/v/j/c:HI (lo_sum:SI (const:SI (unspec:SI [
(const_int 0 [0])
] UNSPEC_GP))
(symbol_ref:SI ("_const_32") [flags 0x6] <var_decl 0x7f50acd17558 _const_32>)) [0 _const_32+0 S2 A32])]
[(asm_input:HI ("X") (null):0)]
[] asm-subreg-1.c:13) asm-subreg-1.c:13 -1 (nil))
Any suggestions to handle this case?
Thanks for the pointer. I think this shows a more fundamental problem
void
foo (int **x, int y, int z)
{
int *ptr = *x + y * z / 11;
__asm__ __volatile__ ("" : : "X" (*ptr));
}
the entire expression gets treated as a MEM address, which neither
reload nor LRA can handle. And with something like that, it isn't
obvious what class all the registers in the address should have.
With a sufficiently-complicated expression you could run out of registers.
So perhaps we should limit the propagation to things that
decompose_mem_address can handle.
Even that might be too loose, since invalid scales will need to be reloaded
as a multiplication or shift, and there's no guarantee that the target
can do that without clobbering the flags. So maybe we should do something
like the patch below.

Alternatively we could stick to the decompose_mem_address-based check
above and teach LRA to keep invalid addresses for 'X'. The problem then
is that we might ICE while printing the operand.

Thanks,
Richard


gcc/
* recog.c (asm_operand_ok): Tighten MEM validity for 'X'.

gcc/testsuite/
* gcc.dg/torture/asm-x-constraint-1.c: New test.

Index: gcc/recog.c
===================================================================
--- gcc/recog.c 2014-04-10 21:18:02.778009424 +0100
+++ gcc/recog.c 2014-04-10 21:18:02.996011570 +0100
@@ -1840,7 +1840,11 @@ asm_operand_ok (rtx op, const char *cons
break;

case 'X':
- result = 1;
+ /* Still enforce memory requirements for non-constant addresses,
+ since we can't reload MEMs with completely arbitrary addresses. */
+ result = (!MEM_P (op)
+ || CONSTANT_P (XEXP (op, 0))
+ || memory_operand (op, VOIDmode));
break;

case 'g':
Index: gcc/testsuite/gcc.dg/torture/asm-x-constraint-1.c
===================================================================
--- /dev/null 2014-04-10 19:40:00.640011981 +0100
+++ gcc/testsuite/gcc.dg/torture/asm-x-constraint-1.c 2014-04-10 21:19:05.405623027 +0100
@@ -0,0 +1,6 @@
+void
+foo (int **x, int y, int z)
+{
+ int *ptr = *x + y * z / 11;
+ __asm__ __volatile__ ("foo %0" : : "X" (*ptr));
+}
Robert Suchanek
2014-04-14 11:12:59 UTC
Permalink
Post by Richard Sandiford
So yeah, I agree this is right after all, sorry. Let's delete the
comment starting at "There are two problems here:" at the same time.
Ok.
Post by Richard Sandiford
mips_regno_to_class should then map $sp to the new class, since it's now
the smallest containing class. (We really should set that up automatically
one day...)
Indeed, it is easy to overlook it.
Post by Richard Sandiford
How about M16_SP_REGS, to match M16_T_REGS?
Also, the BASE_REG_CLASS/ADDR_REG_CLASS distinction isn't all that
obvious from the names. ADDR_REG_CLASS is only needed for the "d"
constraint so maybe we could just use TARGET_MIPS16 ? M16_REGS : GR_REGS
directly for now.
Fair enough. I'll remove ADDR_REG_CLASS.
Post by Richard Sandiford
I can imagine including all M16_REGS makes sense, but it seems odd to
drop the 2 temporaries. Does 0x0303fffc have the same problem?
It's a midway between 0x0003fffc and 0x0300fffc. I think 0x0303fffc will be
a good trade-off. The difference between 0x0303fffc and others is about
~600 bytes in CSiBE which is less than 0.01% of code size change.
Post by Richard Sandiford
Hmm, marking them fixed was supposed to be a temporary reload-only thing,
until the move to LRA. It should never be worse to spill to these GPRs
over spilling to the stack, if the value isn't live across a call.
I would say this also affects IRA/LRA integration. I found that it is more
profitable to hide registers (MIPS16 only) in IRA to encourage spilling to
memory. Otherwise $8-$15 would be treated like any other registers and LRA
would inserts reloads to move in/out values of these registers. My assumption is
that if we could hide some of the registers in IRA but enable them in LRA
then all registers in SPILL_REGS would be available keeping reasonable code
size. Another way would be to increase the cost of moving values between
M16_REGS and GR_REGS but it was already mentioned, and is true that there
should be no difference of costs and it feels like a hack to make things work.
Post by Richard Sandiford
Did you see the failures even after your mips_regno_mode_ok_for_base_p
change? LRA should know how to reload a "W" address.
Yes but I realize there is more. It fails because $sp is now included
in BASE_REG_CLASS and "W" is based on it. However, I suppose that
it would be too eager to say it is wrong and likely there is something missing
in LRA if we want to keep all alternatives. Currently there is no check
if a reloaded operand has a valid address, use of $sp in lbu/lhu cases.
Even if we added extra checks we are less likely to benefit as we need
to reload the base into register.
Post by Richard Sandiford
Even that might be too loose, since invalid scales will need to be reloaded
as a multiplication or shift, and there's no guarantee that the target
can do that without clobbering the flags. So maybe we should do something
like the patch below.
Alternatively we could stick to the decompose_mem_address-based check
above and teach LRA to keep invalid addresses for 'X'. The problem then
is that we might ICE while printing the operand.
Tightening validity for 'X' appears to be reasonable. Will you commit this patch?

Regards,
Robert
Richard Sandiford
2014-05-03 19:21:29 UTC
Permalink
Not sure how the constraint would/should exclude $sp-based address in
LRA. In this particular case, a spilled pseudo is changed to memory
(insn 30 29 31 4 (set (reg:SI 4 $4)
(and:SI (mem/c:SI (plus:SI (reg/f:SI 78 $frame)
(const_int 16 [0x10])) [7 %sfp+16 S4 A32])
(const_int 65535 [0xffff]))) shell.i:17 161 {*andsi3_mips16}
(expr_list:REG_DEAD (reg:SI 194 [ D.1469 ])
(nil)))
The operand 1 during alternative selection is not marked as a bad
operand as it is a memory operand. $frame appears to be fine as it
could be eliminated later to hard register. No reloads are inserted
for the instructions concerned. Unless, $frame should be temporarily
eliminated and then a reload would be inserted?
Yeah, I think the lack of elimination is the problem. process_address
eliminates $frame temporarily before checking whether the address
is valid, but the places that check EXTRA_CONSTRAINT_STR pass the
original uneliminated address. So the legitimate_address_p hook sees
the $sp-based address but the "W" constraint only sees the $frame-based
address (which might or might not be valid, depending on whether $frame
is eliminated to the stack or hard frame pointer). I think the constraints
should see the eliminated address too.
This patch seems to fix it for me. Tested on x86_64-linux-gnu.
Vlad, is this OK for trunk?
#define MODE_BASE_REG_CLASS(MODE) \
(TARGET_MIPS16 \
? ((MODE) == SImode || (MODE) == DImode ? M16_SP_REGS : M16_REGS) \
: GR_REGS)
instead of BASE_REG_CLASS. It might lead to slightly better code
(or not -- if it doesn't then don't bother :-)).
If this patch is OK then I think the only thing blocking the switch
to LRA is the asm-subreg-1.c failure. I think it'd be fine to XFAIL
that test on MIPS for now, until there's a consensus about what "X" means
for asms.
gcc/
* lra-constraints.c (valid_address_p): Move earlier in file.
Add a constraint argument to the address_info version.
(satisfies_memory_constraint_p): New function.
(satisfies_address_constraint_p): Likewise.
(process_alt_operands, curr_insn_transform): Use them.
(process_address): Pass the constraint to valid_address_p when
checking address operands.
Yes, it looks ok for me, Richard. Thanks on working on this.
I am on vacation till May 4th. If the patch results in problems on other
targets, I hope you revert it. But to be honest, I believe it is very
safe and don't expect any problems at all.
Thanks Vlad, belatedly committed on that basis. Like you say I'll revert
it at the first sign of trouble (although it ended up being closer to
your return than originally intended. :-))

Richard
Kyrill Tkachov
2014-05-06 14:06:23 UTC
Permalink
Post by Richard Sandiford
Not sure how the constraint would/should exclude $sp-based address in
LRA. In this particular case, a spilled pseudo is changed to memory
(insn 30 29 31 4 (set (reg:SI 4 $4)
(and:SI (mem/c:SI (plus:SI (reg/f:SI 78 $frame)
(const_int 16 [0x10])) [7 %sfp+16 S4 A32])
(const_int 65535 [0xffff]))) shell.i:17 161 {*andsi3_mips16}
(expr_list:REG_DEAD (reg:SI 194 [ D.1469 ])
(nil)))
The operand 1 during alternative selection is not marked as a bad
operand as it is a memory operand. $frame appears to be fine as it
could be eliminated later to hard register. No reloads are inserted
for the instructions concerned. Unless, $frame should be temporarily
eliminated and then a reload would be inserted?
Yeah, I think the lack of elimination is the problem. process_address
eliminates $frame temporarily before checking whether the address
is valid, but the places that check EXTRA_CONSTRAINT_STR pass the
original uneliminated address. So the legitimate_address_p hook sees
the $sp-based address but the "W" constraint only sees the $frame-based
address (which might or might not be valid, depending on whether $frame
is eliminated to the stack or hard frame pointer). I think the constraints
should see the eliminated address too.
This patch seems to fix it for me. Tested on x86_64-linux-gnu.
Vlad, is this OK for trunk?
#define MODE_BASE_REG_CLASS(MODE) \
(TARGET_MIPS16 \
? ((MODE) == SImode || (MODE) == DImode ? M16_SP_REGS : M16_REGS) \
: GR_REGS)
instead of BASE_REG_CLASS. It might lead to slightly better code
(or not -- if it doesn't then don't bother :-)).
If this patch is OK then I think the only thing blocking the switch
to LRA is the asm-subreg-1.c failure. I think it'd be fine to XFAIL
that test on MIPS for now, until there's a consensus about what "X" means
for asms.
gcc/
* lra-constraints.c (valid_address_p): Move earlier in file.
Add a constraint argument to the address_info version.
(satisfies_memory_constraint_p): New function.
(satisfies_address_constraint_p): Likewise.
(process_alt_operands, curr_insn_transform): Use them.
(process_address): Pass the constraint to valid_address_p when
checking address operands.
Yes, it looks ok for me, Richard. Thanks on working on this.
I am on vacation till May 4th. If the patch results in problems on other
targets, I hope you revert it. But to be honest, I believe it is very
safe and don't expect any problems at all.
Thanks Vlad, belatedly committed on that basis. Like you say I'll revert
it at the first sign of trouble (although it ended up being closer to
your return than originally intended. :-))
Hi all,
This caused some testsuite failures on arm:
FAIL: gcc.target/arm/vfp-ldmdbd.c scan-assembler fldmdbd
FAIL: gcc.target/arm/vfp-ldmdbs.c scan-assembler fldmdbs
FAIL: gcc.target/arm/vfp-ldmiad.c scan-assembler fldmiad
FAIL: gcc.target/arm/vfp-ldmias.c scan-assembler fldmias

From the vfp-ldmdbd.c test this patch changed the codegen from:
fldmdbd r5!, {d7}

into
sub r5, r5, #8
fldd d7, [r5]

Which broke the test.

Kyrill
Richard Sandiford
2014-05-06 19:22:52 UTC
Permalink
Post by Kyrill Tkachov
Post by Richard Sandiford
Not sure how the constraint would/should exclude $sp-based address in
LRA. In this particular case, a spilled pseudo is changed to memory
(insn 30 29 31 4 (set (reg:SI 4 $4)
(and:SI (mem/c:SI (plus:SI (reg/f:SI 78 $frame)
(const_int 16 [0x10])) [7 %sfp+16 S4 A32])
(const_int 65535 [0xffff]))) shell.i:17 161 {*andsi3_mips16}
(expr_list:REG_DEAD (reg:SI 194 [ D.1469 ])
(nil)))
The operand 1 during alternative selection is not marked as a bad
operand as it is a memory operand. $frame appears to be fine as it
could be eliminated later to hard register. No reloads are inserted
for the instructions concerned. Unless, $frame should be temporarily
eliminated and then a reload would be inserted?
Yeah, I think the lack of elimination is the problem. process_address
eliminates $frame temporarily before checking whether the address
is valid, but the places that check EXTRA_CONSTRAINT_STR pass the
original uneliminated address. So the legitimate_address_p hook sees
the $sp-based address but the "W" constraint only sees the $frame-based
address (which might or might not be valid, depending on whether $frame
is eliminated to the stack or hard frame pointer). I think the constraints
should see the eliminated address too.
This patch seems to fix it for me. Tested on x86_64-linux-gnu.
Vlad, is this OK for trunk?
#define MODE_BASE_REG_CLASS(MODE) \
(TARGET_MIPS16 \
? ((MODE) == SImode || (MODE) == DImode ? M16_SP_REGS : M16_REGS) \
: GR_REGS)
instead of BASE_REG_CLASS. It might lead to slightly better code
(or not -- if it doesn't then don't bother :-)).
If this patch is OK then I think the only thing blocking the switch
to LRA is the asm-subreg-1.c failure. I think it'd be fine to XFAIL
that test on MIPS for now, until there's a consensus about what "X" means
for asms.
gcc/
* lra-constraints.c (valid_address_p): Move earlier in file.
Add a constraint argument to the address_info version.
(satisfies_memory_constraint_p): New function.
(satisfies_address_constraint_p): Likewise.
(process_alt_operands, curr_insn_transform): Use them.
(process_address): Pass the constraint to valid_address_p when
checking address operands.
Yes, it looks ok for me, Richard. Thanks on working on this.
I am on vacation till May 4th. If the patch results in problems on other
targets, I hope you revert it. But to be honest, I believe it is very
safe and don't expect any problems at all.
Thanks Vlad, belatedly committed on that basis. Like you say I'll revert
it at the first sign of trouble (although it ended up being closer to
your return than originally intended. :-))
Hi all,
FAIL: gcc.target/arm/vfp-ldmdbd.c scan-assembler fldmdbd
FAIL: gcc.target/arm/vfp-ldmdbs.c scan-assembler fldmdbs
FAIL: gcc.target/arm/vfp-ldmiad.c scan-assembler fldmiad
FAIL: gcc.target/arm/vfp-ldmias.c scan-assembler fldmias
fldmdbd r5!, {d7}
into
sub r5, r5, #8
fldd d7, [r5]
Which broke the test.
Sorry for the breakage. I've reverted the patch for now and will send a
fixed version when I have time.

Thanks,
Richard
Matthew Fortune
2014-05-09 22:58:32 UTC
Permalink
Richard/Vlad,
...snip...
Post by Richard Sandiford
Post by Kyrill Tkachov
Hi all,
FAIL: gcc.target/arm/vfp-ldmdbd.c scan-assembler fldmdbd
FAIL: gcc.target/arm/vfp-ldmdbs.c scan-assembler fldmdbs
FAIL: gcc.target/arm/vfp-ldmiad.c scan-assembler fldmiad
FAIL: gcc.target/arm/vfp-ldmias.c scan-assembler fldmias
fldmdbd r5!, {d7}
into
sub r5, r5, #8
fldd d7, [r5]
Which broke the test.
Sorry for the breakage. I've reverted the patch for now and will send a
fixed version when I have time.
The problem appears to lie with the new satisfies_memory_constraint_p
which is passing just the address to valid_address_p but the constraint
is a memory constraint (which wants the mem to be retained).

One option is to re-construct the MEM using the address_info provided
to valid_address_p. This has mode, address space and whether it was
actually a MEM to start with so there is enough information.

Another issue noticed while looking at this:
process_address does not seem to make an attempt to check for
EXTRA_MEMORY_CONSTRAINT even though it does decompose memory addresses.
For the lea address case then the address is checked with valid_address_p.
This seems inconsistent, is it intentional?

The patch below on top of Richard's addresses both problems and for the
fldmdbd test gets the correct output. I haven't done any more testing
than that though. I suspect there may be a better approach to achieve
the same effect but at least this shows what is wrong with the original
change.

Regards,
Matthew

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index f59bf55..22bb273 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -348,6 +348,9 @@ valid_address_p (struct address_info *ad, const char *constraint = 0)
rtx saved_index_reg = NULL_RTX;
rtx *base_term = strip_subreg (ad->base_term);
rtx *index_term = strip_subreg (ad->index_term);
+#ifdef EXTRA_CONSTRAINT_STR
+ rtx orig_op = NULL_RTX;
+#endif
if (base_term != NULL)
{
saved_base_reg = *base_term;
@@ -360,9 +363,18 @@ valid_address_p (struct address_info *ad, const char *constraint = 0)
saved_index_reg = *index_term;
lra_eliminate_reg_if_possible (index_term);
}
+#ifdef EXTRA_CONSTRAINT_STR
+ if (ad->addr_outer_code == MEM)
+ {
+ orig_op = gen_rtx_MEM (ad->mode, *ad->outer);
+ MEM_ADDR_SPACE (orig_op) = ad->as;
+ }
+ else
+ orig_op = *ad->outer;
+#endif
bool ok_p = (constraint
#ifdef EXTRA_CONSTRAINT_STR
- ? EXTRA_CONSTRAINT_STR (*ad->outer, constraint[0], constraint)
+ ? EXTRA_CONSTRAINT_STR (orig_op, constraint[0], constraint)
#else
? false
#endif
@@ -2865,7 +2877,8 @@ process_address (int nop, rtx *before, rtx *after)
/* Target hooks sometimes reject extra constraint addresses -- use
EXTRA_CONSTRAINT_STR for the validation. */
if (constraint[0] != 'p'
- && EXTRA_ADDRESS_CONSTRAINT (constraint[0], constraint)
+ && (EXTRA_ADDRESS_CONSTRAINT (constraint[0], constraint)
+ || EXTRA_MEMORY_CONSTRAINT (constraint[0], constraint))
&& valid_address_p (&ad, constraint))
return change_p;
#endif
Richard Sandiford
2014-05-10 18:44:29 UTC
Permalink
Thanks for looking at this.
Post by Matthew Fortune
Post by Richard Sandiford
Post by Kyrill Tkachov
Hi all,
FAIL: gcc.target/arm/vfp-ldmdbd.c scan-assembler fldmdbd
FAIL: gcc.target/arm/vfp-ldmdbs.c scan-assembler fldmdbs
FAIL: gcc.target/arm/vfp-ldmiad.c scan-assembler fldmiad
FAIL: gcc.target/arm/vfp-ldmias.c scan-assembler fldmias
fldmdbd r5!, {d7}
into
sub r5, r5, #8
fldd d7, [r5]
Which broke the test.
Sorry for the breakage. I've reverted the patch for now and will send a
fixed version when I have time.
The problem appears to lie with the new satisfies_memory_constraint_p
which is passing just the address to valid_address_p but the constraint
is a memory constraint (which wants the mem to be retained).
Yeah.
Post by Matthew Fortune
One option is to re-construct the MEM using the address_info provided
to valid_address_p. This has mode, address space and whether it was
actually a MEM to start with so there is enough information.
We don't want to create garbage rtl though. The substitution happens
in-place, so the routine does temporarily turn the original MEM into the
eliminated form.

My first reaction after seeing the bug was to pass the operand in as
another parameter to valid_address_p. That made the interface a bit
ridiculous though, so I didn't post it and reverted the original patch
instead.

I think a cleaner way of doing it would be to have helper functions
that switch in and out of the eliminated form, storing the old form
in fields of a new structure (either separate from address_info,
or a local inheritance of it). We probably also want to have arrays
of address_infos, one for each operand, so that we don't analyse the
same address too many times during the same insn.
Post by Matthew Fortune
process_address does not seem to make an attempt to check for
EXTRA_MEMORY_CONSTRAINT even though it does decompose memory addresses.
For the lea address case then the address is checked with valid_address_p.
This seems inconsistent, is it intentional?
Yeah, I think so. Memory constraints are different in two main ways:

(a) it's obvious from the MEM what the mode and address space of the
access actually are. legitimate_address_p is all about the most
general address, so in practice no memory constraint should rely
on accepting more than legitimate_address_p does.

(b) it's useful for one pattern to have several alternatives with
different memory constraints (e.g. "ZS", "ZT" and "m" in the 32-bit
MIPS move patterns). There isn't really anything special about the
first alternative.

IMO it'd be good to get rid of this special handling for extra address
constraints, but I've no idea how easy that would be.

Thanks,
Richard
Robert Suchanek
2014-05-14 13:24:14 UTC
Permalink
Hi Richard,

Are you working on the solution to fix the breakage? I'm about
to look into this and wanted to find out how far we got with this.

Regards,
Robert
-----Original Message-----
Sent: 10 May 2014 19:44
To: Matthew Fortune
Tkachov
Subject: Re: [RFC][PATCH][MIPS] Patch to enable LRA for MIPS backend
Thanks for looking at this.
Post by Matthew Fortune
Post by Richard Sandiford
Post by Kyrill Tkachov
Hi all,
FAIL: gcc.target/arm/vfp-ldmdbd.c scan-assembler fldmdbd
FAIL: gcc.target/arm/vfp-ldmdbs.c scan-assembler fldmdbs
FAIL: gcc.target/arm/vfp-ldmiad.c scan-assembler fldmiad
FAIL: gcc.target/arm/vfp-ldmias.c scan-assembler fldmias
fldmdbd r5!, {d7}
into
sub r5, r5, #8
fldd d7, [r5]
Which broke the test.
Sorry for the breakage. I've reverted the patch for now and will send a
fixed version when I have time.
The problem appears to lie with the new satisfies_memory_constraint_p
which is passing just the address to valid_address_p but the constraint
is a memory constraint (which wants the mem to be retained).
Yeah.
Post by Matthew Fortune
One option is to re-construct the MEM using the address_info provided
to valid_address_p. This has mode, address space and whether it was
actually a MEM to start with so there is enough information.
We don't want to create garbage rtl though. The substitution happens
in-place, so the routine does temporarily turn the original MEM into the
eliminated form.
My first reaction after seeing the bug was to pass the operand in as
another parameter to valid_address_p. That made the interface a bit
ridiculous though, so I didn't post it and reverted the original patch
instead.
I think a cleaner way of doing it would be to have helper functions
that switch in and out of the eliminated form, storing the old form
in fields of a new structure (either separate from address_info,
or a local inheritance of it). We probably also want to have arrays
of address_infos, one for each operand, so that we don't analyse the
same address too many times during the same insn.
Post by Matthew Fortune
process_address does not seem to make an attempt to check for
EXTRA_MEMORY_CONSTRAINT even though it does decompose memory addresses.
For the lea address case then the address is checked with valid_address_p.
This seems inconsistent, is it intentional?
(a) it's obvious from the MEM what the mode and address space of the
access actually are. legitimate_address_p is all about the most
general address, so in practice no memory constraint should rely
on accepting more than legitimate_address_p does.
(b) it's useful for one pattern to have several alternatives with
different memory constraints (e.g. "ZS", "ZT" and "m" in the 32-bit
MIPS move patterns). There isn't really anything special about the
first alternative.
IMO it'd be good to get rid of this special handling for extra address
constraints, but I've no idea how easy that would be.
Thanks,
Richard
Robert Suchanek
2014-05-15 21:05:08 UTC
Permalink
Ping.
________________________________________
From: Robert Suchanek
Sent: 14 May 2014 14:24
To: Richard Sandiford; Matthew Fortune
Cc: Vladimir Makarov; gcc-***@gcc.gnu.org; Kyrill Tkachov
Subject: RE: [RFC][PATCH][MIPS] Patch to enable LRA for MIPS backend

Hi Richard,

Are you working on the solution to fix the breakage? I'm about
to look into this and wanted to find out how far we got with this.

Regards,
Robert
-----Original Message-----
Sent: 10 May 2014 19:44
To: Matthew Fortune
Tkachov
Subject: Re: [RFC][PATCH][MIPS] Patch to enable LRA for MIPS backend
Thanks for looking at this.
Post by Matthew Fortune
Post by Richard Sandiford
Post by Kyrill Tkachov
Hi all,
FAIL: gcc.target/arm/vfp-ldmdbd.c scan-assembler fldmdbd
FAIL: gcc.target/arm/vfp-ldmdbs.c scan-assembler fldmdbs
FAIL: gcc.target/arm/vfp-ldmiad.c scan-assembler fldmiad
FAIL: gcc.target/arm/vfp-ldmias.c scan-assembler fldmias
fldmdbd r5!, {d7}
into
sub r5, r5, #8
fldd d7, [r5]
Which broke the test.
Sorry for the breakage. I've reverted the patch for now and will send a
fixed version when I have time.
The problem appears to lie with the new satisfies_memory_constraint_p
which is passing just the address to valid_address_p but the constraint
is a memory constraint (which wants the mem to be retained).
Yeah.
Post by Matthew Fortune
One option is to re-construct the MEM using the address_info provided
to valid_address_p. This has mode, address space and whether it was
actually a MEM to start with so there is enough information.
We don't want to create garbage rtl though. The substitution happens
in-place, so the routine does temporarily turn the original MEM into the
eliminated form.
My first reaction after seeing the bug was to pass the operand in as
another parameter to valid_address_p. That made the interface a bit
ridiculous though, so I didn't post it and reverted the original patch
instead.
I think a cleaner way of doing it would be to have helper functions
that switch in and out of the eliminated form, storing the old form
in fields of a new structure (either separate from address_info,
or a local inheritance of it). We probably also want to have arrays
of address_infos, one for each operand, so that we don't analyse the
same address too many times during the same insn.
Post by Matthew Fortune
process_address does not seem to make an attempt to check for
EXTRA_MEMORY_CONSTRAINT even though it does decompose memory addresses.
For the lea address case then the address is checked with valid_address_p.
This seems inconsistent, is it intentional?
(a) it's obvious from the MEM what the mode and address space of the
access actually are. legitimate_address_p is all about the most
general address, so in practice no memory constraint should rely
on accepting more than legitimate_address_p does.
(b) it's useful for one pattern to have several alternatives with
different memory constraints (e.g. "ZS", "ZT" and "m" in the 32-bit
MIPS move patterns). There isn't really anything special about the
first alternative.
IMO it'd be good to get rid of this special handling for extra address
constraints, but I've no idea how easy that would be.
Thanks,
Richard
Richard Sandiford
2014-05-15 21:34:39 UTC
Permalink
Post by Robert Suchanek
Are you working on the solution to fix the breakage? I'm about
to look into this and wanted to find out how far we got with this.
You mean the "cleaner way" I suggested, or something else?
If you want to have a go then feel free. Otherwise I'll try to get
to it over the weekend.

Richard
Robert Suchanek
2014-05-16 21:05:09 UTC
Permalink
I was thinking of something else but it doesn't appear to be good enough
and most likely will follow the suggested way.

Regards,
Robert
________________________________________
From: Richard Sandiford [***@googlemail.com]
Sent: 15 May 2014 22:34
To: Robert Suchanek
Cc: Matthew Fortune; Vladimir Makarov; gcc-***@gcc.gnu.org; Kyrill Tkachov
Subject: Re: [RFC][PATCH][MIPS] Patch to enable LRA for MIPS backend
Post by Robert Suchanek
Are you working on the solution to fix the breakage? I'm about
to look into this and wanted to find out how far we got with this.
You mean the "cleaner way" I suggested, or something else?
If you want to have a go then feel free. Otherwise I'll try to get
to it over the weekend.

Richard
Richard Sandiford
2014-05-18 19:37:56 UTC
Permalink
Post by Richard Sandiford
I think a cleaner way of doing it would be to have helper functions
that switch in and out of the eliminated form, storing the old form
in fields of a new structure (either separate from address_info,
or a local inheritance of it). We probably also want to have arrays
of address_infos, one for each operand, so that we don't analyse the
same address too many times during the same insn.
In the end maintaining the array of address_infos seemed like too much
work. It was hard to keep it up-to-date with various other changes
that can be made, including swapping commutative operands, to the point
where it wasn't obvious whether it was really an optimisation or not.

Here's a patch that does the first. Tested on x86_64-linux-gnu.
This time I also compared the assembly output for gcc.dg, g++.dg
and gcc.c-torture at -O2 on:

arch64-linux-gnu arm-eabi mipsisa64-sde-elf s390x-linux-gnu
powerpc64-linux-gnu x86_64-linux-gnu

s390x in particular is very good at exposing problems with this code.
(It caught bugs in the aborted attempt to keep an array of address_infos.)

OK to install?

Thanks,
Richard


gcc/
* lra-constraints.c (valid_address_p): Move earlier in file.
(address_eliminator): New structure.
(satisfies_memory_constraint_p): New function.
(satisfies_address_constraint_p): Likewise.
(process_alt_operands, process_address, curr_insn_transform): Use them.

Index: gcc/lra-constraints.c
===================================================================
--- gcc/lra-constraints.c 2014-05-17 17:49:19.071639652 +0100
+++ gcc/lra-constraints.c 2014-05-18 20:36:17.499181467 +0100
@@ -317,6 +317,118 @@ in_mem_p (int regno)
return get_reg_class (regno) == NO_REGS;
}

+/* Return 1 if ADDR is a valid memory address for mode MODE in address
+ space AS, and check that each pseudo has the proper kind of hard
+ reg. */
+static int
+valid_address_p (enum machine_mode mode ATTRIBUTE_UNUSED,
+ rtx addr, addr_space_t as)
+{
+#ifdef GO_IF_LEGITIMATE_ADDRESS
+ lra_assert (ADDR_SPACE_GENERIC_P (as));
+ GO_IF_LEGITIMATE_ADDRESS (mode, addr, win);
+ return 0;
+
+ win:
+ return 1;
+#else
+ return targetm.addr_space.legitimate_address_p (mode, addr, 0, as);
+#endif
+}
+
+namespace {
+ /* Temporarily eliminates registers in an address (for the lifetime of
+ the object). */
+ class address_eliminator {
+ public:
+ address_eliminator (struct address_info *ad);
+ ~address_eliminator ();
+
+ private:
+ struct address_info *m_ad;
+ rtx *m_base_loc;
+ rtx m_base_reg;
+ rtx *m_index_loc;
+ rtx m_index_reg;
+ };
+}
+
+address_eliminator::address_eliminator (struct address_info *ad)
+ : m_ad (ad),
+ m_base_loc (strip_subreg (ad->base_term)),
+ m_base_reg (NULL_RTX),
+ m_index_loc (strip_subreg (ad->index_term)),
+ m_index_reg (NULL_RTX)
+{
+ if (m_base_loc != NULL)
+ {
+ m_base_reg = *m_base_loc;
+ lra_eliminate_reg_if_possible (m_base_loc);
+ if (m_ad->base_term2 != NULL)
+ *m_ad->base_term2 = *m_ad->base_term;
+ }
+ if (m_index_loc != NULL)
+ {
+ m_index_reg = *m_index_loc;
+ lra_eliminate_reg_if_possible (m_index_loc);
+ }
+}
+
+address_eliminator::~address_eliminator ()
+{
+ if (m_base_loc && *m_base_loc != m_base_reg)
+ {
+ *m_base_loc = m_base_reg;
+ if (m_ad->base_term2 != NULL)
+ *m_ad->base_term2 = *m_ad->base_term;
+ }
+ if (m_index_loc && *m_index_loc != m_index_reg)
+ *m_index_loc = m_index_reg;
+}
+
+/* Return true if the eliminated form of AD is a legitimate target address. */
+static bool
+valid_address_p (struct address_info *ad)
+{
+ address_eliminator eliminator (ad);
+ return valid_address_p (ad->mode, *ad->outer, ad->as);
+}
+
+#ifdef EXTRA_CONSTRAINT_STR
+/* Return true if the eliminated form of memory reference OP satisfies
+ extra address constraint CONSTRAINT. */
+static bool
+satisfies_memory_constraint_p (rtx op, const char *constraint)
+{
+ struct address_info ad;
+
+ decompose_mem_address (&ad, op);
+ address_eliminator eliminator (&ad);
+ return EXTRA_CONSTRAINT_STR (op, *constraint, constraint);
+}
+
+/* Return true if the eliminated form of address AD satisfies extra
+ address constraint CONSTRAINT. */
+static bool
+satisfies_address_constraint_p (struct address_info *ad,
+ const char *constraint)
+{
+ address_eliminator eliminator (ad);
+ return EXTRA_CONSTRAINT_STR (*ad->outer, *constraint, constraint);
+}
+
+/* Return true if the eliminated form of address OP satisfies extra
+ address constraint CONSTRAINT. */
+static bool
+satisfies_address_constraint_p (rtx op, const char *constraint)
+{
+ struct address_info ad;
+
+ decompose_lea_address (&ad, &op);
+ return satisfies_address_constraint_p (&ad, constraint);
+}
+#endif
+
/* Initiate equivalences for LRA. As we keep original equivalences
before any elimination, we need to make copies otherwise any change
in insns might change the equivalences. */
@@ -1941,7 +2053,8 @@ process_alt_operands (int only_alternati
#ifdef EXTRA_CONSTRAINT_STR
if (EXTRA_MEMORY_CONSTRAINT (c, p))
{
- if (EXTRA_CONSTRAINT_STR (op, c, p))
+ if (MEM_P (op)
+ && satisfies_memory_constraint_p (op, p))
win = true;
else if (spilled_pseudo_p (op))
win = true;
@@ -1960,7 +2073,7 @@ process_alt_operands (int only_alternati
}
if (EXTRA_ADDRESS_CONSTRAINT (c, p))
{
- if (EXTRA_CONSTRAINT_STR (op, c, p))
+ if (satisfies_address_constraint_p (op, p))
win = true;

/* If we didn't already win, we can reload
@@ -2576,60 +2689,6 @@ process_alt_operands (int only_alternati
return ok_p;
}

-/* Return 1 if ADDR is a valid memory address for mode MODE in address
- space AS, and check that each pseudo has the proper kind of hard
- reg. */
-static int
-valid_address_p (enum machine_mode mode ATTRIBUTE_UNUSED,
- rtx addr, addr_space_t as)
-{
-#ifdef GO_IF_LEGITIMATE_ADDRESS
- lra_assert (ADDR_SPACE_GENERIC_P (as));
- GO_IF_LEGITIMATE_ADDRESS (mode, addr, win);
- return 0;
-
- win:
- return 1;
-#else
- return targetm.addr_space.legitimate_address_p (mode, addr, 0, as);
-#endif
-}
-
-/* Return whether address AD is valid. */
-
-static bool
-valid_address_p (struct address_info *ad)
-{
- /* Some ports do not check displacements for eliminable registers,
- so we replace them temporarily with the elimination target. */
- rtx saved_base_reg = NULL_RTX;
- rtx saved_index_reg = NULL_RTX;
- rtx *base_term = strip_subreg (ad->base_term);
- rtx *index_term = strip_subreg (ad->index_term);
- if (base_term != NULL)
- {
- saved_base_reg = *base_term;
- lra_eliminate_reg_if_possible (base_term);
- if (ad->base_term2 != NULL)
- *ad->base_term2 = *ad->base_term;
- }
- if (index_term != NULL)
- {
- saved_index_reg = *index_term;
- lra_eliminate_reg_if_possible (index_term);
- }
- bool ok_p = valid_address_p (ad->mode, *ad->outer, ad->as);
- if (saved_base_reg != NULL_RTX)
- {
- *base_term = saved_base_reg;
- if (ad->base_term2 != NULL)
- *ad->base_term2 = *ad->base_term;
- }
- if (saved_index_reg != NULL_RTX)
- *index_term = saved_index_reg;
- return ok_p;
-}
-
/* Make reload base reg + disp from address AD. Return the new pseudo. */
static rtx
base_plus_disp_to_reg (struct address_info *ad)
@@ -2832,7 +2891,7 @@ process_address (int nop, rtx *before, r
EXTRA_CONSTRAINT_STR for the validation. */
if (constraint[0] != 'p'
&& EXTRA_ADDRESS_CONSTRAINT (constraint[0], constraint)
- && EXTRA_CONSTRAINT_STR (op, constraint[0], constraint))
+ && satisfies_address_constraint_p (&ad, constraint))
return change_p;
#endif

@@ -3539,7 +3598,7 @@ curr_insn_transform (void)
break;
#ifdef EXTRA_CONSTRAINT_STR
if (EXTRA_MEMORY_CONSTRAINT (c, constraint)
- && EXTRA_CONSTRAINT_STR (tem, c, constraint))
+ && satisfies_memory_constraint_p (tem, constraint))
break;
#endif
}
Richard Sandiford
2014-06-02 19:36:57 UTC
Permalink
Ping. Imagination's copyright assignment has now gone through and so
in principle we're ready for the MIPS LRA switch to go in. We need
this LRA patch as a prequisite though.

Robert: you also had an LRA change, but is it still needed after this one?
If so, could you repost it and explain the case it handles?

Thanks,
Richard
Post by Richard Sandiford
Post by Richard Sandiford
I think a cleaner way of doing it would be to have helper functions
that switch in and out of the eliminated form, storing the old form
in fields of a new structure (either separate from address_info,
or a local inheritance of it). We probably also want to have arrays
of address_infos, one for each operand, so that we don't analyse the
same address too many times during the same insn.
In the end maintaining the array of address_infos seemed like too much
work. It was hard to keep it up-to-date with various other changes
that can be made, including swapping commutative operands, to the point
where it wasn't obvious whether it was really an optimisation or not.
Here's a patch that does the first. Tested on x86_64-linux-gnu.
This time I also compared the assembly output for gcc.dg, g++.dg
arch64-linux-gnu arm-eabi mipsisa64-sde-elf s390x-linux-gnu
powerpc64-linux-gnu x86_64-linux-gnu
s390x in particular is very good at exposing problems with this code.
(It caught bugs in the aborted attempt to keep an array of address_infos.)
OK to install?
Thanks,
Richard
gcc/
* lra-constraints.c (valid_address_p): Move earlier in file.
(address_eliminator): New structure.
(satisfies_memory_constraint_p): New function.
(satisfies_address_constraint_p): Likewise.
(process_alt_operands, process_address, curr_insn_transform): Use them.
Index: gcc/lra-constraints.c
===================================================================
--- gcc/lra-constraints.c 2014-05-17 17:49:19.071639652 +0100
+++ gcc/lra-constraints.c 2014-05-18 20:36:17.499181467 +0100
@@ -317,6 +317,118 @@ in_mem_p (int regno)
return get_reg_class (regno) == NO_REGS;
}
+/* Return 1 if ADDR is a valid memory address for mode MODE in address
+ space AS, and check that each pseudo has the proper kind of hard
+ reg. */
+static int
+valid_address_p (enum machine_mode mode ATTRIBUTE_UNUSED,
+ rtx addr, addr_space_t as)
+{
+#ifdef GO_IF_LEGITIMATE_ADDRESS
+ lra_assert (ADDR_SPACE_GENERIC_P (as));
+ GO_IF_LEGITIMATE_ADDRESS (mode, addr, win);
+ return 0;
+
+ return 1;
+#else
+ return targetm.addr_space.legitimate_address_p (mode, addr, 0, as);
+#endif
+}
+
+namespace {
+ /* Temporarily eliminates registers in an address (for the lifetime of
+ the object). */
+ class address_eliminator {
+ address_eliminator (struct address_info *ad);
+ ~address_eliminator ();
+
+ struct address_info *m_ad;
+ rtx *m_base_loc;
+ rtx m_base_reg;
+ rtx *m_index_loc;
+ rtx m_index_reg;
+ };
+}
+
+address_eliminator::address_eliminator (struct address_info *ad)
+ : m_ad (ad),
+ m_base_loc (strip_subreg (ad->base_term)),
+ m_base_reg (NULL_RTX),
+ m_index_loc (strip_subreg (ad->index_term)),
+ m_index_reg (NULL_RTX)
+{
+ if (m_base_loc != NULL)
+ {
+ m_base_reg = *m_base_loc;
+ lra_eliminate_reg_if_possible (m_base_loc);
+ if (m_ad->base_term2 != NULL)
+ *m_ad->base_term2 = *m_ad->base_term;
+ }
+ if (m_index_loc != NULL)
+ {
+ m_index_reg = *m_index_loc;
+ lra_eliminate_reg_if_possible (m_index_loc);
+ }
+}
+
+address_eliminator::~address_eliminator ()
+{
+ if (m_base_loc && *m_base_loc != m_base_reg)
+ {
+ *m_base_loc = m_base_reg;
+ if (m_ad->base_term2 != NULL)
+ *m_ad->base_term2 = *m_ad->base_term;
+ }
+ if (m_index_loc && *m_index_loc != m_index_reg)
+ *m_index_loc = m_index_reg;
+}
+
+/* Return true if the eliminated form of AD is a legitimate target address. */
+static bool
+valid_address_p (struct address_info *ad)
+{
+ address_eliminator eliminator (ad);
+ return valid_address_p (ad->mode, *ad->outer, ad->as);
+}
+
+#ifdef EXTRA_CONSTRAINT_STR
+/* Return true if the eliminated form of memory reference OP satisfies
+ extra address constraint CONSTRAINT. */
+static bool
+satisfies_memory_constraint_p (rtx op, const char *constraint)
+{
+ struct address_info ad;
+
+ decompose_mem_address (&ad, op);
+ address_eliminator eliminator (&ad);
+ return EXTRA_CONSTRAINT_STR (op, *constraint, constraint);
+}
+
+/* Return true if the eliminated form of address AD satisfies extra
+ address constraint CONSTRAINT. */
+static bool
+satisfies_address_constraint_p (struct address_info *ad,
+ const char *constraint)
+{
+ address_eliminator eliminator (ad);
+ return EXTRA_CONSTRAINT_STR (*ad->outer, *constraint, constraint);
+}
+
+/* Return true if the eliminated form of address OP satisfies extra
+ address constraint CONSTRAINT. */
+static bool
+satisfies_address_constraint_p (rtx op, const char *constraint)
+{
+ struct address_info ad;
+
+ decompose_lea_address (&ad, &op);
+ return satisfies_address_constraint_p (&ad, constraint);
+}
+#endif
+
/* Initiate equivalences for LRA. As we keep original equivalences
before any elimination, we need to make copies otherwise any change
in insns might change the equivalences. */
@@ -1941,7 +2053,8 @@ process_alt_operands (int only_alternati
#ifdef EXTRA_CONSTRAINT_STR
if (EXTRA_MEMORY_CONSTRAINT (c, p))
{
- if (EXTRA_CONSTRAINT_STR (op, c, p))
+ if (MEM_P (op)
+ && satisfies_memory_constraint_p (op, p))
win = true;
else if (spilled_pseudo_p (op))
win = true;
@@ -1960,7 +2073,7 @@ process_alt_operands (int only_alternati
}
if (EXTRA_ADDRESS_CONSTRAINT (c, p))
{
- if (EXTRA_CONSTRAINT_STR (op, c, p))
+ if (satisfies_address_constraint_p (op, p))
win = true;
/* If we didn't already win, we can reload
@@ -2576,60 +2689,6 @@ process_alt_operands (int only_alternati
return ok_p;
}
-/* Return 1 if ADDR is a valid memory address for mode MODE in address
- space AS, and check that each pseudo has the proper kind of hard
- reg. */
-static int
-valid_address_p (enum machine_mode mode ATTRIBUTE_UNUSED,
- rtx addr, addr_space_t as)
-{
-#ifdef GO_IF_LEGITIMATE_ADDRESS
- lra_assert (ADDR_SPACE_GENERIC_P (as));
- GO_IF_LEGITIMATE_ADDRESS (mode, addr, win);
- return 0;
-
- return 1;
-#else
- return targetm.addr_space.legitimate_address_p (mode, addr, 0, as);
-#endif
-}
-
-/* Return whether address AD is valid. */
-
-static bool
-valid_address_p (struct address_info *ad)
-{
- /* Some ports do not check displacements for eliminable registers,
- so we replace them temporarily with the elimination target. */
- rtx saved_base_reg = NULL_RTX;
- rtx saved_index_reg = NULL_RTX;
- rtx *base_term = strip_subreg (ad->base_term);
- rtx *index_term = strip_subreg (ad->index_term);
- if (base_term != NULL)
- {
- saved_base_reg = *base_term;
- lra_eliminate_reg_if_possible (base_term);
- if (ad->base_term2 != NULL)
- *ad->base_term2 = *ad->base_term;
- }
- if (index_term != NULL)
- {
- saved_index_reg = *index_term;
- lra_eliminate_reg_if_possible (index_term);
- }
- bool ok_p = valid_address_p (ad->mode, *ad->outer, ad->as);
- if (saved_base_reg != NULL_RTX)
- {
- *base_term = saved_base_reg;
- if (ad->base_term2 != NULL)
- *ad->base_term2 = *ad->base_term;
- }
- if (saved_index_reg != NULL_RTX)
- *index_term = saved_index_reg;
- return ok_p;
-}
-
/* Make reload base reg + disp from address AD. Return the new pseudo. */
static rtx
base_plus_disp_to_reg (struct address_info *ad)
@@ -2832,7 +2891,7 @@ process_address (int nop, rtx *before, r
EXTRA_CONSTRAINT_STR for the validation. */
if (constraint[0] != 'p'
&& EXTRA_ADDRESS_CONSTRAINT (constraint[0], constraint)
- && EXTRA_CONSTRAINT_STR (op, constraint[0], constraint))
+ && satisfies_address_constraint_p (&ad, constraint))
return change_p;
#endif
@@ -3539,7 +3598,7 @@ curr_insn_transform (void)
break;
#ifdef EXTRA_CONSTRAINT_STR
if (EXTRA_MEMORY_CONSTRAINT (c, constraint)
- && EXTRA_CONSTRAINT_STR (tem, c, constraint))
+ && satisfies_memory_constraint_p (tem, constraint))
break;
#endif
}
Vladimir Makarov
2014-06-03 20:50:08 UTC
Permalink
Post by Richard Sandiford
Ping. Imagination's copyright assignment has now gone through and so
in principle we're ready for the MIPS LRA switch to go in. We need
this LRA patch as a prequisite though.
Robert: you also had an LRA change, but is it still needed after this one?
If so, could you repost it and explain the case it handles?
Thanks,
Richard
Post by Richard Sandiford
Post by Richard Sandiford
I think a cleaner way of doing it would be to have helper functions
that switch in and out of the eliminated form, storing the old form
in fields of a new structure (either separate from address_info,
or a local inheritance of it). We probably also want to have arrays
of address_infos, one for each operand, so that we don't analyse the
same address too many times during the same insn.
In the end maintaining the array of address_infos seemed like too much
work. It was hard to keep it up-to-date with various other changes
that can be made, including swapping commutative operands, to the point
where it wasn't obvious whether it was really an optimisation or not.
Here's a patch that does the first. Tested on x86_64-linux-gnu.
This time I also compared the assembly output for gcc.dg, g++.dg
arch64-linux-gnu arm-eabi mipsisa64-sde-elf s390x-linux-gnu
powerpc64-linux-gnu x86_64-linux-gnu
s390x in particular is very good at exposing problems with this code.
(It caught bugs in the aborted attempt to keep an array of address_infos.)
OK to install?
Yes, Richard. Thanks for the wide testing. It was also a pleasure to
read this C++ code. Really nice.
Richard Sandiford
2014-06-04 17:44:54 UTC
Permalink
Post by Vladimir Makarov
Post by Richard Sandiford
Ping. Imagination's copyright assignment has now gone through and so
in principle we're ready for the MIPS LRA switch to go in. We need
this LRA patch as a prequisite though.
Robert: you also had an LRA change, but is it still needed after this one?
If so, could you repost it and explain the case it handles?
Thanks,
Richard
Post by Richard Sandiford
Post by Richard Sandiford
I think a cleaner way of doing it would be to have helper functions
that switch in and out of the eliminated form, storing the old form
in fields of a new structure (either separate from address_info,
or a local inheritance of it). We probably also want to have arrays
of address_infos, one for each operand, so that we don't analyse the
same address too many times during the same insn.
In the end maintaining the array of address_infos seemed like too much
work. It was hard to keep it up-to-date with various other changes
that can be made, including swapping commutative operands, to the point
where it wasn't obvious whether it was really an optimisation or not.
Here's a patch that does the first. Tested on x86_64-linux-gnu.
This time I also compared the assembly output for gcc.dg, g++.dg
arch64-linux-gnu arm-eabi mipsisa64-sde-elf s390x-linux-gnu
powerpc64-linux-gnu x86_64-linux-gnu
s390x in particular is very good at exposing problems with this code.
(It caught bugs in the aborted attempt to keep an array of address_infos.)
OK to install?
Yes, Richard. Thanks for the wide testing.
Thanks Vlad, committed as r211242. Given the problems with my first version,
I suppose it'd make sense to wait a few days to see whether there's any
fallout before applying the MIPS backend patch.

Richard
Robert Suchanek
2014-06-11 11:30:22 UTC
Permalink
Hi Richard,
Post by Richard Sandiford
Robert: you also had an LRA change, but is it still needed after this one?
If so, could you repost it and explain the case it handles?
For just turning the LRA for the MIPS backend is not needed but we have issues
with the code size for MIPS16. LRA inserted a lot of reloads and the code size
increased on average by about 10% IIRC. To fix this, a number of patterns
have to accept the stack pointer and a new class, M16_SP_REGS with
M16_REGS + $sp was added.

However, this triggered a reloading problem as the stack pointer was rejected
by the back end and LRA tried to insert base+disp with the displacement not
always present. It only affects $sp not directly accessible as in MIPS16 case.

Regards,
Robert

gcc/
* lra-constraints.c (base_to_reg): New function.
(process_address): Use new function.

diff --git gcc/lra-constraints.c gcc/lra-constraints.c
index 08716fe..d5ed37f 100644
--- gcc/lra-constraints.c
+++ gcc/lra-constraints.c
@@ -2686,6 +2686,39 @@ process_alt_operands (int only_alternative)
return ok_p;
}

+/* Make reload base reg from address AD. */
+static rtx
+base_to_reg (struct address_info *ad)
+{
+ enum reg_class cl;
+ int code = -1;
+ rtx new_inner = NULL_RTX;
+ rtx new_reg = NULL_RTX;
+ rtx insn;
+ rtx last_insn = get_last_insn();
+
+ lra_assert (ad->base == ad->base_term && ad->disp == ad->disp_term);
+ cl = base_reg_class (ad->mode, ad->as, ad->base_outer_code,
+ get_index_code (ad));
+ new_reg = lra_create_new_reg (GET_MODE (*ad->base_term), NULL_RTX,
+ cl, "base");
+ new_inner = simplify_gen_binary (PLUS, GET_MODE (new_reg), new_reg,
+ ad->disp_term == NULL
+ ? gen_int_mode (0, ad->mode)
+ : *ad->disp_term);
+ if (!valid_address_p (ad->mode, new_inner, ad->as))
+ return NULL_RTX;
+ insn = emit_insn (gen_rtx_SET (ad->mode, new_reg, *ad->base_term));
+ code = recog_memoized (insn);
+ if (code < 0)
+ {
+ delete_insns_since (last_insn);
+ return NULL_RTX;
+ }
+
+ return new_inner;
+}
+
/* Make reload base reg + disp from address AD. Return the new pseudo. */
static rtx
base_plus_disp_to_reg (struct address_info *ad)
@@ -2908,6 +2941,8 @@ process_address_1 (int nop, rtx *before, rtx *after)

3) the address is a frame address with an invalid offset.

+ 4) the address is a frame address with an invalid base.
+
All these cases involve a non-autoinc address, so there is no
point revalidating other types. */
if (ad.autoinc_p || valid_address_p (&ad))
@@ -2989,14 +3024,19 @@ process_address_1 (int nop, rtx *before, rtx *after)
int regno;
enum reg_class cl;
rtx set, insns, last_insn;
+ /* Try to reload base into register only if the base is invalid
+ for the address but with valid offset, case (4) above. */
+ start_sequence ();
+ new_reg = base_to_reg (&ad);
+
/* base + disp => new base, cases (1) and (3) above. */
/* Another option would be to reload the displacement into an
index register. However, postreload has code to optimize
address reloads that have the same base and different
displacements, so reloading into an index register would
not necessarily be a win. */
- start_sequence ();
- new_reg = base_plus_disp_to_reg (&ad);
+ if (new_reg == NULL_RTX)
+ new_reg = base_plus_disp_to_reg (&ad);
insns = get_insns ();
last_insn = get_last_insn ();
/* If we generated at least two insns, try last insn source as

Jakub Jelinek
2014-03-29 12:21:04 UTC
Permalink
Post by Robert Suchanek
This patch enables LRA by default for MIPS. The classic reload is still
available and can be enabled via -mreload switch.
FYI, all other targets that have LRA optionally selectable or deselectable
use -mno-lra for this (even when -mlra is the default), it would be better
for consistency not to invent new switch names for that.

Jakub
Continue reading on narkive:
Loading...