Discussion:
[PATCH, ARM] attribute target (thumb,arm)
Christian Bruel
2014-09-29 11:03:30 UTC
Permalink
Hi Ramana, Richard,

This patch implements the attribute target (and pragma) to allow
function based interworking.

as in the updated documentation, the syntax is:

__attribute__((target("thumb"))) int foo()
Forces thumb mode for function foo only. If the file was compiled with
-mthumb iit has no effect.

Similarly

__attribute__((target("arm"))) int foo()
Forces arm mode for function foo. It has no effect if the file was not
compiled with -mthumb.

and regions can be grouped together with

#pragma GCC target ("thumb")
or
#pragma GCC target ("arm")

a few notes
- Inlining is allowed between functions of the same mode (compilation
switch, #pragma and attribute)
- 'arm_option_override' is now reorganized around
'arm_option_override_internal' for thumb related macros
- I kept TARGET_UNIFIED_ASM to minimize changes. Although removing it
would avoid to switch between unified/divided asms
and simplify arm_declare_function_name. Should be considered at some
point.
- It is only available for Thumb2 variants (for thumb1 lack of interest
and a few complications I was unable to test, although this could be
added easily if needed, I think)

Tested for no regression for arm-none-eabi [,-with-arch=armv7-a]

OK for trunk ?

many thanks,

Christian
Ramana Radhakrishnan
2014-10-08 13:05:57 UTC
Permalink
Hi Christian,

Thanks for looking at this. I will need to read the code in detail but
this is a first top level reivew.
Post by Christian Bruel
Hi Ramana, Richard,
This patch implements the attribute target (and pragma) to allow
function based interworking.
__attribute__((target("thumb"))) int foo()
Forces thumb mode for function foo only. If the file was compiled with
-mthumb iit has no effect.
Indeed
Post by Christian Bruel
Similarly
__attribute__((target("arm"))) int foo()
Forces arm mode for function foo. It has no effect if the file was not
compiled with -mthumb.
Indeed.
Post by Christian Bruel
and regions can be grouped together with
#pragma GCC target ("thumb")
or
#pragma GCC target ("arm")
a few notes
- Inlining is allowed between functions of the same mode (compilation
switch, #pragma and attribute)
Why shouldn't we allow inlining between functions of ARM mode vs Thumb
mode ? After all the choice of mode is irrelevant at the time of
inlining (except possibly for inline assembler).

Perhaps an option is to try to disable inlining in the presence of
inline assembler or if not gate it from a command line option.
Post by Christian Bruel
- 'arm_option_override' is now reorganized around
'arm_option_override_internal' for thumb related macros
Looks like a reasonable start - We need a couple of tests to make sure
that __attribute__(("arm")) on a file compiled for the M profile results
in a syntax error. v7(e)m is Thumb2 only.

for bonus points it would be great to get __attribute__(("target"))
working properly in the backend. I suspect a number of the tuning flags
and the global architecture state needs to be moved into this as well to
handle cases where __attribute__(("arm")) used with M profile options is
error'd out.
Post by Christian Bruel
- I kept TARGET_UNIFIED_ASM to minimize changes. Although removing it
would avoid to switch between unified/divided asms
I know Terry's been trying to get Thumb1 to also switch by default to
unified asm. So I think a lot of the logic with "emit_thumb" could just
go away. Maybe we should just consider switching ARM state to unified
syntax and that would be as simple as changing TARGET_UNIFIED_SYNTAX in
arm.h to be TARGET_32BIT. Long overdue IMHO.

The only gotcha here is inline assembler but GAS is so permissive that
I'm not too worried about it in ARM state and Thumb2 state. I'm a bit
worried about Thumb1.
Post by Christian Bruel
and simplify arm_declare_function_name. Should be considered at some
point.
I think that can be done for a lot of newer cores - some of that logic
is dated now IIUC.

I remember why my original project failed - I couldn't get enough of the
backend in shape for the state to be saved and restored and then I moved
on to other more interesting things, so whatever is done here needs to
make sure that all ISA mode state is saved and restored.

One thing I experimented with while doing this work was adding something
like the mflip-mips16 option and then have the testsuite run with this
option to try and make sure enough of the backend state is saved and
restored properly.

This will give more testing coverage hopefully to the switching logic
for the attributes and hopefully expose any issues that are there with
respect to saving and restoring state. The problem you'll probably find
is that in some of the gcc.target/arm tests the flipping of state will
cause various interesting failures. The other side is making sure that
all state that is global is now captured and reinitialized everytime we
switch context between ARM and Thumb.

I'm not sure how to best test the pragma switching logic but given that
the 2 hang off each other, it should just work (TM) if one is right. In
any case adding some testcases that direct test this would be useful.
Post by Christian Bruel
- It is only available for Thumb2 variants (for thumb1 lack of interest
and a few complications I was unable to test, although this could be
added easily if needed, I think)
I don't think we should restrict this to Thumb1 or Thumb2 it should be
allowed if the architecture allows it.

For example __attribute__((thumb)) on a function compiled with
-march=armv5te -mfloat-abi=softfp -mfpu=vfpv3 should give a syntax error
as this is not supported. No VFP instructions in Thumb1. Explicit tests
for this would be appreciated.

IIRC all other cases should be accepted.
Post by Christian Bruel
Tested for no regression for arm-none-eabi [,-with-arch=armv7-a]
OK for trunk ?
Sorry not yet.

I would also like some documentation added to extend.texi for these
attributes.

So in summary.

1. Some documentation in extend.texi please.
2. TARGET_UNIFIED_SYNTAX to be turned on for ARM state too.
3. Testcases for this and some testing with a mflip-thumbness switch
(added only for testing).
4. Investigate further giving up restriction on Thumb1.
5. Investigate lifting inlining restriction for __attribute__(("arm"))
or __attribute__ (("thumb")) or maybe gate it off a command line option.
6. Split this patch into smaller logical chunks that can help with
review i.e. the restructuring from the attribute and pragma addition,
the testcases.
7. Tests for the diagnostics and error cases i.e. __attribute__(("arm"))
used with -march=armv7em -march=armv7m command line options.


regards
Ramana
Christian Bruel
2014-10-08 14:38:21 UTC
Permalink
Hi Ramana,

Thanks for your feedback. Just a few comments while you continue the review

1) about the documentation in extend.texi, it was in the patch already
: did I miss a part ?
* doc/extend.texi (arm, thumb): Document target attributes.
* doc/invoke.texi (arm, thumb): Mention target attributes.

2) about supporting thumb1
OK I'll suppress this limitation. But I covered the testing only for
thumb2 as I don't have a thumb1 platform, if it's OK with you thumb1
will only be covered by "visual checking". Can you help to test this mode ?

3) about inlining
I dislike inlining different modes, From a conceptual use, a user
might want to switch mode only when changing a function's hotness.
Usually inlining a cold function into a hot one is not what the user
explicitly required when setting different mode attributes for them,

The compiler would take a decision that is not what the user wrote. And
in addition if you consider the few instructions to modify R15 to switch
state that would end up with more code executed in the critical path,
voiding a possible size of speed gain.

4) about coverage.
Thanks for your idea about a mflip like internal option for the
testsuite. I'll give it a try. Note that in the meantime I gave a few
successful tries with LTO, and I'm in the process of running a
combinatorial exploration of a set of larger benchmarcks.
Thanks for you hint about testing the -march=armv7em -march=armv7m
error cases. This is indeed needed.

5) I'm still not sure about what to do with TARGET_UNIFIED_ASM. In one
hand I'm reluctant to bind to this development an improvement that
should be orthogonal (or a prerequisite), in another hand I don't really
like the logic with "emit_thumb". If your recommendation is to make
TARGET_UNIFIED_ASM the default for ARM that great, but I'm still
worrying for thumb1. Terry's feedback might be useful for this.

I'll resent the patch in different parts and thumb1 support. For your
inlining concern do you agree that inlining different modes might not be
mandatory (or even counter-productive) at this stage ?

Best Regards

Christian
Post by Ramana Radhakrishnan
Hi Christian,
Thanks for looking at this. I will need to read the code in detail but
this is a first top level reivew.
Post by Christian Bruel
Hi Ramana, Richard,
This patch implements the attribute target (and pragma) to allow
function based interworking.
__attribute__((target("thumb"))) int foo()
Forces thumb mode for function foo only. If the file was compiled with
-mthumb iit has no effect.
Indeed
Post by Christian Bruel
Similarly
__attribute__((target("arm"))) int foo()
Forces arm mode for function foo. It has no effect if the file was not
compiled with -mthumb.
Indeed.
Post by Christian Bruel
and regions can be grouped together with
#pragma GCC target ("thumb")
or
#pragma GCC target ("arm")
a few notes
- Inlining is allowed between functions of the same mode (compilation
switch, #pragma and attribute)
Why shouldn't we allow inlining between functions of ARM mode vs Thumb
mode ? After all the choice of mode is irrelevant at the time of
inlining (except possibly for inline assembler).
Perhaps an option is to try to disable inlining in the presence of
inline assembler or if not gate it from a command line option.
Post by Christian Bruel
- 'arm_option_override' is now reorganized around
'arm_option_override_internal' for thumb related macros
Looks like a reasonable start - We need a couple of tests to make sure
that __attribute__(("arm")) on a file compiled for the M profile results
in a syntax error. v7(e)m is Thumb2 only.
for bonus points it would be great to get __attribute__(("target"))
working properly in the backend. I suspect a number of the tuning flags
and the global architecture state needs to be moved into this as well to
handle cases where __attribute__(("arm")) used with M profile options is
error'd out.
Post by Christian Bruel
- I kept TARGET_UNIFIED_ASM to minimize changes. Although removing it
would avoid to switch between unified/divided asms
I know Terry's been trying to get Thumb1 to also switch by default to
unified asm. So I think a lot of the logic with "emit_thumb" could just
go away. Maybe we should just consider switching ARM state to unified
syntax and that would be as simple as changing TARGET_UNIFIED_SYNTAX in
arm.h to be TARGET_32BIT. Long overdue IMHO.
The only gotcha here is inline assembler but GAS is so permissive that
I'm not too worried about it in ARM state and Thumb2 state. I'm a bit
worried about Thumb1.
Post by Christian Bruel
and simplify arm_declare_function_name. Should be considered at some
point.
I think that can be done for a lot of newer cores - some of that logic
is dated now IIUC.
I remember why my original project failed - I couldn't get enough of the
backend in shape for the state to be saved and restored and then I moved
on to other more interesting things, so whatever is done here needs to
make sure that all ISA mode state is saved and restored.
One thing I experimented with while doing this work was adding something
like the mflip-mips16 option and then have the testsuite run with this
option to try and make sure enough of the backend state is saved and
restored properly.
This will give more testing coverage hopefully to the switching logic
for the attributes and hopefully expose any issues that are there with
respect to saving and restoring state. The problem you'll probably find
is that in some of the gcc.target/arm tests the flipping of state will
cause various interesting failures. The other side is making sure that
all state that is global is now captured and reinitialized everytime we
switch context between ARM and Thumb.
I'm not sure how to best test the pragma switching logic but given that
the 2 hang off each other, it should just work (TM) if one is right. In
any case adding some testcases that direct test this would be useful.
Post by Christian Bruel
- It is only available for Thumb2 variants (for thumb1 lack of interest
and a few complications I was unable to test, although this could be
added easily if needed, I think)
I don't think we should restrict this to Thumb1 or Thumb2 it should be
allowed if the architecture allows it.
For example __attribute__((thumb)) on a function compiled with
-march=armv5te -mfloat-abi=softfp -mfpu=vfpv3 should give a syntax error
as this is not supported. No VFP instructions in Thumb1. Explicit tests
for this would be appreciated.
IIRC all other cases should be accepted.
Post by Christian Bruel
Tested for no regression for arm-none-eabi [,-with-arch=armv7-a]
OK for trunk ?
Sorry not yet.
I would also like some documentation added to extend.texi for these
attributes.
So in summary.
1. Some documentation in extend.texi please.
2. TARGET_UNIFIED_SYNTAX to be turned on for ARM state too.
3. Testcases for this and some testing with a mflip-thumbness switch
(added only for testing).
4. Investigate further giving up restriction on Thumb1.
5. Investigate lifting inlining restriction for __attribute__(("arm"))
or __attribute__ (("thumb")) or maybe gate it off a command line option.
6. Split this patch into smaller logical chunks that can help with
review i.e. the restructuring from the attribute and pragma addition,
the testcases.
7. Tests for the diagnostics and error cases i.e. __attribute__(("arm"))
used with -march=armv7em -march=armv7m command line options.
regards
Ramana
Ramana Radhakrishnan
2014-10-08 16:56:36 UTC
Permalink
Hi Christian,
Post by Christian Bruel
Hi Ramana,
Thanks for your feedback. Just a few comments while you continue the review
Sure - all thoughts are welcome, this isn't a trivial project and I'm
dredging tertiary storage in my brain and old notes for context on all
the gotchas involved in this.
Post by Christian Bruel
1) about the documentation in extend.texi, it was in the patch already
: did I miss a part ?
* doc/extend.texi (arm, thumb): Document target attributes.
* doc/invoke.texi (arm, thumb): Mention target attributes.
Indeed - I don't know how I missed it and as I said it was a brief
review. I am more interested in discussing the semantics up front.
Post by Christian Bruel
2) about supporting thumb1
OK I'll suppress this limitation. But I covered the testing only for
thumb2 as I don't have a thumb1 platform, if it's OK with you thumb1
will only be covered by "visual checking". Can you help to test this mode ?
3) about inlining
I dislike inlining different modes, From a conceptual use, a user
might want to switch mode only when changing a function's hotness.
Usually inlining a cold function into a hot one is not what the user
explicitly required when setting different mode attributes for them,
__attribute__((thumb)) should not imply coldness or hotness. Inlining
between cold and hot functions should be done based on profile feedback.
The choice of compiling in "Thumb1" state for coldness is a separate one
because that's where the choice needs to be made.
Post by Christian Bruel
The compiler would take a decision that is not what the user wrote. And
in addition if you consider the few instructions to modify R15 to switch
state that would end up with more code executed in the critical path,
voiding a possible size of speed gain.
I do not expect there to be any additional instructions needed to switch
state. If function x is inlined into function y the state would be lost
and the state would be in terms of the state of function x.

Obviously if the user doesn't want inlining - the user would add
attributes to disable inlining. You do have extensions such as
__attribute__((noinline)) and __attribute__((never_inline)) to give the
user that control and those bits need to be used in addition.

The attribute then purely reflects then the output instruction state of
the function if a copy of it's body is laid out separately in the output.

IMHO, the heuristics for inlining should be the best judge of when
functions should be inlined between one and another and we shouldn't be
second guessing that in the backend.

If there is a copy of the function to be put out by the compiler, only
then should we choose this based on the state of the "target" i.e. arm
or thumb.
Post by Christian Bruel
4) about coverage.
Thanks for your idea about a mflip like internal option for the
testsuite. I'll give it a try. Note that in the meantime I gave a few
successful tries with LTO, and I'm in the process of running a
combinatorial exploration of a set of larger benchmarcks.
Would be interesting to hear in terms of how you played with LTO.
Post by Christian Bruel
Thanks for you hint about testing the -march=armv7em -march=armv7m
error cases. This is indeed needed.
Adding some directed tests for the pragmas would also be required.
Post by Christian Bruel
5) I'm still not sure about what to do with TARGET_UNIFIED_ASM. In one
hand I'm reluctant to bind to this development an improvement that
should be orthogonal (or a prerequisite), in another hand I don't really
like the logic with "emit_thumb". If your recommendation is to make
TARGET_UNIFIED_ASM the default for ARM that great, but I'm still
worrying for thumb1. Terry's feedback might be useful for this.
I want to think about this carefully too.
Post by Christian Bruel
I'll resent the patch in different parts and thumb1 support. For your
inlining concern do you agree that inlining different modes might not be
mandatory (or even counter-productive) at this stage ?
More practically and from memory I remember that there maybe quite a lot
of false positives in the other parts of the testsuite if you put in the
-mflip-thumbness switch if inlining is turned off.

I think I've also explained my reasons for allowing inlining above.

Also with the target macros that you are changing between ARM and Thumb
it would be good to make sure that you capture everything. I am
concerned about the __ARM_ARCH_ISA_ARM and __ARM_ARCH_ISA_THUMB macros
with the M profile cores. Those probably also need handling please.

I'll review this again when the next patch set arrives with the
different parts.

regards
Ramana
Post by Christian Bruel
Best Regards
Christian
Post by Ramana Radhakrishnan
Hi Christian,
Thanks for looking at this. I will need to read the code in detail but
this is a first top level reivew.
Post by Christian Bruel
Hi Ramana, Richard,
This patch implements the attribute target (and pragma) to allow
function based interworking.
__attribute__((target("thumb"))) int foo()
Forces thumb mode for function foo only. If the file was compiled with
-mthumb iit has no effect.
Indeed
Post by Christian Bruel
Similarly
__attribute__((target("arm"))) int foo()
Forces arm mode for function foo. It has no effect if the file was not
compiled with -mthumb.
Indeed.
Post by Christian Bruel
and regions can be grouped together with
#pragma GCC target ("thumb")
or
#pragma GCC target ("arm")
a few notes
- Inlining is allowed between functions of the same mode (compilation
switch, #pragma and attribute)
Why shouldn't we allow inlining between functions of ARM mode vs Thumb
mode ? After all the choice of mode is irrelevant at the time of
inlining (except possibly for inline assembler).
Perhaps an option is to try to disable inlining in the presence of
inline assembler or if not gate it from a command line option.
Post by Christian Bruel
- 'arm_option_override' is now reorganized around
'arm_option_override_internal' for thumb related macros
Looks like a reasonable start - We need a couple of tests to make sure
that __attribute__(("arm")) on a file compiled for the M profile results
in a syntax error. v7(e)m is Thumb2 only.
for bonus points it would be great to get __attribute__(("target"))
working properly in the backend. I suspect a number of the tuning flags
and the global architecture state needs to be moved into this as well to
handle cases where __attribute__(("arm")) used with M profile options is
error'd out.
Post by Christian Bruel
- I kept TARGET_UNIFIED_ASM to minimize changes. Although removing it
would avoid to switch between unified/divided asms
I know Terry's been trying to get Thumb1 to also switch by default to
unified asm. So I think a lot of the logic with "emit_thumb" could just
go away. Maybe we should just consider switching ARM state to unified
syntax and that would be as simple as changing TARGET_UNIFIED_SYNTAX in
arm.h to be TARGET_32BIT. Long overdue IMHO.
The only gotcha here is inline assembler but GAS is so permissive that
I'm not too worried about it in ARM state and Thumb2 state. I'm a bit
worried about Thumb1.
Post by Christian Bruel
and simplify arm_declare_function_name. Should be considered at some
point.
I think that can be done for a lot of newer cores - some of that logic
is dated now IIUC.
I remember why my original project failed - I couldn't get enough of the
backend in shape for the state to be saved and restored and then I moved
on to other more interesting things, so whatever is done here needs to
make sure that all ISA mode state is saved and restored.
One thing I experimented with while doing this work was adding something
like the mflip-mips16 option and then have the testsuite run with this
option to try and make sure enough of the backend state is saved and
restored properly.
This will give more testing coverage hopefully to the switching logic
for the attributes and hopefully expose any issues that are there with
respect to saving and restoring state. The problem you'll probably find
is that in some of the gcc.target/arm tests the flipping of state will
cause various interesting failures. The other side is making sure that
all state that is global is now captured and reinitialized everytime we
switch context between ARM and Thumb.
I'm not sure how to best test the pragma switching logic but given that
the 2 hang off each other, it should just work (TM) if one is right. In
any case adding some testcases that direct test this would be useful.
Post by Christian Bruel
- It is only available for Thumb2 variants (for thumb1 lack of interest
and a few complications I was unable to test, although this could be
added easily if needed, I think)
I don't think we should restrict this to Thumb1 or Thumb2 it should be
allowed if the architecture allows it.
For example __attribute__((thumb)) on a function compiled with
-march=armv5te -mfloat-abi=softfp -mfpu=vfpv3 should give a syntax error
as this is not supported. No VFP instructions in Thumb1. Explicit tests
for this would be appreciated.
IIRC all other cases should be accepted.
Post by Christian Bruel
Tested for no regression for arm-none-eabi [,-with-arch=armv7-a]
OK for trunk ?
Sorry not yet.
I would also like some documentation added to extend.texi for these
attributes.
So in summary.
1. Some documentation in extend.texi please.
2. TARGET_UNIFIED_SYNTAX to be turned on for ARM state too.
3. Testcases for this and some testing with a mflip-thumbness switch
(added only for testing).
4. Investigate further giving up restriction on Thumb1.
5. Investigate lifting inlining restriction for __attribute__(("arm"))
or __attribute__ (("thumb")) or maybe gate it off a command line option.
6. Split this patch into smaller logical chunks that can help with
review i.e. the restructuring from the attribute and pragma addition,
the testcases.
7. Tests for the diagnostics and error cases i.e. __attribute__(("arm"))
used with -march=armv7em -march=armv7m command line options.
regards
Ramana
Christian Bruel
2014-10-09 11:35:25 UTC
Permalink
Post by Ramana Radhakrishnan
Hi Christian,
<< snipped agreed stuf >>
Post by Ramana Radhakrishnan
3) about inlining
I dislike inlining different modes, From a conceptual use, a user
might want to switch mode only when changing a function's hotness.
Usually inlining a cold function into a hot one is not what the user
explicitly required when setting different mode attributes for them,
__attribute__((thumb)) should not imply coldness or hotness. Inlining
between cold and hot functions should be done based on profile feedback.
The choice of compiling in "Thumb1" state for coldness is a separate one
because that's where the choice needs to be made.
Ideally yes. but I think that a user motivation to use target attribute
(("thumb") is to reduce code size even in the cases where PFO is not
available (libraries, kernel or user application build spec packaged
without profile data). And there are cases where static probabilities
are not enough and that a user wants it own control with gprof or oprofile.
But in this case, we could point to the __attribute__ (("cold")) on the
function ? That would probably be the best workaround to propose if we
recommend this

But here is another scenario: Using of attribute (("arm")) for exception
entry points is indeed not related to hotness. But consider a typical
thumb binary with an entry point in arm compiled in C (ex handler, a
kernel...). Today due to the file boundary the thumb part is not inlined
into the arm point. (Using -flto is not possible because the whole
gimple would be thumb).

Now, using attribute (("target")) for the functions others than the
entry point, with your approach they would all be inlined (assuming the
cost allow this) and we would end up with a arm binary instead of a
thumb binary...

But there are still 3 points :

- At least 2 other target (i386, Powerpc) that support attribute_target
disable inlining between modes that are not subsets. I like to think
about homogeneity between targets and I find odd to have different
inlining rules...

- Scanning the function body to check for ASM_INPUT does not look very
elegant (if this matters) because the asm could well be unrelated

The only case when it will always be a win to inline thumb into arm is
when the cost of the inlined body is less than a BX instruction (but
still, with branch prediction this cost is pondered).
Post by Ramana Radhakrishnan
Post by Christian Bruel
The compiler would take a decision that is not what the user wrote. And
in addition if you consider the few instructions to modify R15 to switch
state that would end up with more code executed in the critical path,
voiding a possible size of speed gain.
I do not expect there to be any additional instructions needed to switch
state. If function x is inlined into function y the state would be lost
and the state would be in terms of the state of function x.
Yes, indeed. I was in a LCM/mode-switching thinking mode when writing
this. In this case the mode is inherited.
Post by Ramana Radhakrishnan
Obviously if the user doesn't want inlining - the user would add
attributes to disable inlining. You do have extensions such as
__attribute__((noinline)) and __attribute__((never_inline)) to give the
user that control and those bits need to be used in addition.
Those attributes are overkill. They would disable inlining between
caller-callee of a same mode. This is not what we want
Post by Ramana Radhakrishnan
The attribute then purely reflects then the output instruction state of
the function if a copy of it's body is laid out separately in the output.
IMHO, the heuristics for inlining should be the best judge of when
functions should be inlined between one and another and we shouldn't be
second guessing that in the backend
If there is a copy of the function to be put out by the compiler, only
then should we choose this based on the state of the "target" i.e. arm
or thumb.
Yes,

So to summarize, we can:

1) don't inline between different modes. Same behavior with other
targets. Solves asm case
2) always inline unless the function contains asm statements. ( I
reject adding a new compilation switch)
3) always inline. But recommend the use of attribute ((noinline)) to
handle asm or attribute ((cold,hot)) in the absence of profile datas

I obviously prefer 1) safe and homogenous, 3) is the worse as it
requires additional user action (poor user). 2) is less worse.

Thanks for supporting me ::)

Christian
Richard Earnshaw
2014-10-09 14:11:25 UTC
Permalink
Post by Christian Bruel
Post by Ramana Radhakrishnan
Hi Christian,
<< snipped agreed stuf >>
Post by Ramana Radhakrishnan
3) about inlining
I dislike inlining different modes, From a conceptual use, a user
might want to switch mode only when changing a function's hotness.
Usually inlining a cold function into a hot one is not what the user
explicitly required when setting different mode attributes for them,
__attribute__((thumb)) should not imply coldness or hotness. Inlining
between cold and hot functions should be done based on profile feedback.
The choice of compiling in "Thumb1" state for coldness is a separate one
because that's where the choice needs to be made.
Ideally yes. but I think that a user motivation to use target attribute
(("thumb") is to reduce code size even in the cases where PFO is not
available (libraries, kernel or user application build spec packaged
without profile data). And there are cases where static probabilities
are not enough and that a user wants it own control with gprof or oprofile.
But in this case, we could point to the __attribute__ (("cold")) on the
function ? That would probably be the best workaround to propose if we
recommend this
Hot vs cold is interesting, but arm/thumb shouldn't be used to imply
that. The days when ARM=fast, thumb=small are in the past now, and
thumb2 code should be both fast and small. Indeed, smaller thumb2 code
can be faster than larger ARM code simply because you can get more of it
in the cache. The use of arm vs thumb is likely to be much more subtle now.
Post by Christian Bruel
But here is another scenario: Using of attribute (("arm")) for exception
entry points is indeed not related to hotness. But consider a typical
thumb binary with an entry point in arm compiled in C (ex handler, a
kernel...). Today due to the file boundary the thumb part is not inlined
into the arm point. (Using -flto is not possible because the whole
gimple would be thumb).
We have no-inline attributes for scenarios like that. I don't think a
specific use case should dominate other cases.
Post by Christian Bruel
Now, using attribute (("target")) for the functions others than the
entry point, with your approach they would all be inlined (assuming the
cost allow this) and we would end up with a arm binary instead of a
thumb binary...
- At least 2 other target (i386, Powerpc) that support attribute_target
disable inlining between modes that are not subsets. I like to think
about homogeneity between targets and I find odd to have different
inlining rules...
That's because use of specific instructions must not be allowed to leak
past a gating check that is in the caller. It would be a disaster if a
function that used a neon register, for example, was allowed to leak
into code that might run on a target with no Neon register file. The
ARM/thumb distinction shouldn't, by default, be limited in that manner.

I believe inlining could happen from a subset of the archtiecture into a
function using a superset, just not vice-versa.
Post by Christian Bruel
- Scanning the function body to check for ASM_INPUT does not look very
elegant (if this matters) because the asm could well be unrelated
The only case when it will always be a win to inline thumb into arm is
when the cost of the inlined body is less than a BX instruction (but
still, with branch prediction this cost is pondered).
One of the problems with not inlining is that the C++ abstraction
penalty is likely to shoot up. There will be many major lost
optimization opportunities if we start down that path.

So I think inlining should only be disabled if there's some technical
reason why it should be disabled, not because of some 'it might not
always be ideal' feelings. Furthermore, we should expect users to use
the other attributes consistently when they expect specific behaviours
to occur.

My 2p.

R.
Post by Christian Bruel
Post by Ramana Radhakrishnan
Post by Christian Bruel
The compiler would take a decision that is not what the user wrote. And
in addition if you consider the few instructions to modify R15 to switch
state that would end up with more code executed in the critical path,
voiding a possible size of speed gain.
I do not expect there to be any additional instructions needed to switch
state. If function x is inlined into function y the state would be lost
and the state would be in terms of the state of function x.
Yes, indeed. I was in a LCM/mode-switching thinking mode when writing
this. In this case the mode is inherited.
Post by Ramana Radhakrishnan
Obviously if the user doesn't want inlining - the user would add
attributes to disable inlining. You do have extensions such as
__attribute__((noinline)) and __attribute__((never_inline)) to give the
user that control and those bits need to be used in addition.
Those attributes are overkill. They would disable inlining between
caller-callee of a same mode. This is not what we want
Post by Ramana Radhakrishnan
The attribute then purely reflects then the output instruction state of
the function if a copy of it's body is laid out separately in the output.
IMHO, the heuristics for inlining should be the best judge of when
functions should be inlined between one and another and we shouldn't be
second guessing that in the backend
If there is a copy of the function to be put out by the compiler, only
then should we choose this based on the state of the "target" i.e. arm
or thumb.
Yes,
1) don't inline between different modes. Same behavior with other
targets. Solves asm case
2) always inline unless the function contains asm statements. ( I
reject adding a new compilation switch)
3) always inline. But recommend the use of attribute ((noinline)) to
handle asm or attribute ((cold,hot)) in the absence of profile datas
I obviously prefer 1) safe and homogenous, 3) is the worse as it
requires additional user action (poor user). 2) is less worse.
Thanks for supporting me ::)
Christian
Loading...