Alan Lawrence
2014-09-18 11:41:06 UTC
The end goal here is to remove this code from tree-vect-loop.c
(vect_create_epilog_for_reduction):
if (BYTES_BIG_ENDIAN)
bitpos = size_binop (MULT_EXPR,
bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1),
TYPE_SIZE (scalar_type));
else
as this is the root cause of PR/61114 (see testcase there, failing on all
bigendian targets supporting reduc_[us]plus_optab). Quoting Richard Biener, "all
code conditional on BYTES/WORDS_BIG_ENDIAN in tree-vect* is suspicious". The
code snippet above is used on two paths:
(Path 1) (patches 1-6) Reductions using REDUC_(PLUS|MIN|MAX)_EXPR =
reduc_[us](plus|min|max)_optab.
The optab is documented as "the scalar result is stored in the least significant
bits of operand 0", but the tree code as "the first element in the vector
holding the result of the reduction of all elements of the operand". This
mismatch means that when the tree code is folded, the code snippet above reads
the result from the wrong end of the vector.
The strategy (as per https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html)
is to define new tree codes and optabs that produce scalar results directly;
this seems better than tying (the element of the vector into which the result is
placed) to (the endianness of the target), and avoids generating extra moves on
current bigendian targets. However, the previous optabs are retained for now as
a migration strategy so as not to break existing backends; moving individual
platforms over will follow.
A complication here is on AArch64, where we directly generate REDUC_PLUS_EXPRs
from intrinsics in gimple_fold_builtin; I temporarily remove this folding in
order to decouple the midend and AArch64 backend.
(Path 2) (patches 7-13) Reductions using whole-vector-shifts, i.e.
VEC_RSHIFT_EXPR and vec_shr_optab. Here the tree code as well as the optab is
defined in an endianness-dependent way, leading to significant complication in
fold-const.c. (Moreover, the "equivalent" vec_shl_optab is never used!). Few
platforms appear to handle vec_shr_optab (and fewer bigendian - I see only
PowerPC and MIPS), so it seems pertinent to change the existing optab to be
endianness-neutral.
Patch 10 defines vec_shr for AArch64, for the old specification; patch 13
updates that implementation to fit the new endianness-neutral specification,
serving as a guide for other existing backends. Patches/RFCs 15 and 16 are
equivalents for MIPS and PowerPC; I haven't tested these but hope they act as
useful pointers for the port maintainers.
Finally patch 14 cleans up the affected part of tree-vect-loop.c
(vect_create_epilog_for_reduction).
--Alan
(vect_create_epilog_for_reduction):
if (BYTES_BIG_ENDIAN)
bitpos = size_binop (MULT_EXPR,
bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1),
TYPE_SIZE (scalar_type));
else
as this is the root cause of PR/61114 (see testcase there, failing on all
bigendian targets supporting reduc_[us]plus_optab). Quoting Richard Biener, "all
code conditional on BYTES/WORDS_BIG_ENDIAN in tree-vect* is suspicious". The
code snippet above is used on two paths:
(Path 1) (patches 1-6) Reductions using REDUC_(PLUS|MIN|MAX)_EXPR =
reduc_[us](plus|min|max)_optab.
The optab is documented as "the scalar result is stored in the least significant
bits of operand 0", but the tree code as "the first element in the vector
holding the result of the reduction of all elements of the operand". This
mismatch means that when the tree code is folded, the code snippet above reads
the result from the wrong end of the vector.
The strategy (as per https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html)
is to define new tree codes and optabs that produce scalar results directly;
this seems better than tying (the element of the vector into which the result is
placed) to (the endianness of the target), and avoids generating extra moves on
current bigendian targets. However, the previous optabs are retained for now as
a migration strategy so as not to break existing backends; moving individual
platforms over will follow.
A complication here is on AArch64, where we directly generate REDUC_PLUS_EXPRs
from intrinsics in gimple_fold_builtin; I temporarily remove this folding in
order to decouple the midend and AArch64 backend.
(Path 2) (patches 7-13) Reductions using whole-vector-shifts, i.e.
VEC_RSHIFT_EXPR and vec_shr_optab. Here the tree code as well as the optab is
defined in an endianness-dependent way, leading to significant complication in
fold-const.c. (Moreover, the "equivalent" vec_shl_optab is never used!). Few
platforms appear to handle vec_shr_optab (and fewer bigendian - I see only
PowerPC and MIPS), so it seems pertinent to change the existing optab to be
endianness-neutral.
Patch 10 defines vec_shr for AArch64, for the old specification; patch 13
updates that implementation to fit the new endianness-neutral specification,
serving as a guide for other existing backends. Patches/RFCs 15 and 16 are
equivalents for MIPS and PowerPC; I haven't tested these but hope they act as
useful pointers for the port maintainers.
Finally patch 14 cleans up the affected part of tree-vect-loop.c
(vect_create_epilog_for_reduction).
--Alan