Discussion:
support for -fdump-ada-spec
(too old to reply)
Arnaud Charlet
2010-04-26 12:46:45 UTC
Permalink
This patch adds to GCC a capability which is very useful for Ada developers:
the ability to automatically generate Ada specs from C and C++ header files.

This feature has been in production on our GCC 4.3 based compilers at AdaCore
for more than a year now, and used by many customers, and has been updated to
GCC 4.4 and then GCC 4.5 and GCC 4.6 as it evolved.

The documentation for this feature is actually already in place:

http://gcc.gnu.org/onlinedocs/gnat_ugn_unw/Generating-Ada-Bindings-for-C-and-C_002b_002b-headers.html

and in particular:

http://gcc.gnu.org/onlinedocs/gnat_ugn_unw/Running-the-binding-generator.html#Running-the-binding-generator

This patch requires approval from both C and C++ (or global I guess)
maintainers, although the C++ part can be committed separately if the first
part if approved/in place.

The general idea behind the patch is to traverse all declarations generated
by the C or C++ front-end after parsing is performed, and generate
corresponding Ada code when -fdump-ada-spec is specified (and do nothing if
not). In addition, macros and comments stored by libcpp
are also translated into their Ada equivalent whenever possible.

All the macro and tree transformations is performed in a new file which is
used by the C and C++ front-end: c-ada-spec.c

The only side effect which is visible outside the use -f the -fdump-ada-spec
switch from this patch is that we now allow -C (keep comments) switch to be
used without -E: this is needed so that we can take C/C++ comments from header
files and convert them into Ada comment.

To achieve this, we modify the C/C++ specs to allow -C without -E and ignore
the CPP_COMMENT nodes in c-lex.c (c_lex_with_flags).

Although if this change is felt too intrusive/controversial, I could remove
it from my patch, that would still leave -fdump-ada-spec functional, even
though it would generate slightly less useful Ada spec files.

We also modify the C/C++ specs files to allow calling the gcc driver on a
.h file when using -fdump-ada-spec without generating precompiled (.pch)
files, since by default gcc assumes that when called with a .h file, it
should always generate .pch files and do nothing else (--output-pch=xxx).

Other than that, I've tried to make the patch as little intrusive as possible,
and almost all the work is done in c-ada-spec.c, although some work is
needed in the C (c-decl.c) and C++ (decl2.c) front-ends to collect the
relevant nodes to pass to dump_ada_specs.

Note that the idea after this patch is integrated, is to work on removing
some of the hand made Ada run-time files (such as system.os_interface,
s-osinte-*.ads files) by automatically generate them via -fdump-ada-spec
during the Ada run-time build.

Tested on x86_64-pc-linux-gnu, OK for the mainline?

gcc/

2010-04-26 Arnaud Charlet <***@adacore.com>
Matthew Gingell <***@adacore.com>

* tree-dump.c (dump_files): Add ada-spec.
(FIRST_AUTO_NUMBERED_DUMP): Bump to 8.
* tree-pass.h (tree_dump_index): Add TDI_ada.
* diagnostic.h (decl_sloc, to_ada_name, collect_ada_nodes,
collect_source_ref, dump_ada_specs, dump_ada_macros): Declare.
* gcc.c: Add support for -C without -E and for -fdump-ada-spec.
(cpp_unique_options): Do not reject -C or -CC when -E isn't present.
(default_compilers) <@c-header>: Allow -fdump-ada-spec on header files.
* c-decl.c (collect_source_ref_cb, collect_all_refs,
for_each_global_decl): New functions.
(c_write_global_declarations): Add handling of -fdump-ada-spec.
* c-lex.c (c_lex_with_flags): Add handling of CPP_COMMENT.
* Makefile.in (C_AND_OBJC_OBJS): Add c-ada-spec.o.
* c-ada-spec.c: New file.

cp/

2010-04-26 Arnaud Charlet <***@adacore.com>
Matthew Gingell <***@adacore.com>

* Make-lang.in (CXX_C_OBJS): Add c-ada-spec.o.
* decl2.c: Include langhooks.h and diagnostic.h.
(cpp_check, collect_source_refs, collect_ada_namespace,
collect_all_refs): New functions.
(cp_write_global_declarations): Add handling of -fdump-ada-spec.
* lang-specs.h: Ditto.
Manuel López-Ibáñez
2010-04-26 13:45:52 UTC
Permalink
       * diagnostic.h (decl_sloc, to_ada_name, collect_ada_nodes,
       collect_source_ref, dump_ada_specs, dump_ada_macros): Declare.
I don't want to sound picky but I disagree with polluting the
diagnostics interface further. If diagnostics.c was a stand-alone
library, these functions wouldn't be declared here. In fact, we should
start moving in the opposite direction and moving declarations out of
diagnostics.h that have nothing to do with diagnostics.c. Why these
functions cannot be declared in their own module/header file
c-ada-spec.h? why they need to be included by every user of
diagnostics.h?

Cheers,

Manuel.
Arnaud Charlet
2010-04-26 13:52:02 UTC
Permalink
Post by Manuel López-Ibáñez
       * diagnostic.h (decl_sloc, to_ada_name, collect_ada_nodes,
       collect_source_ref, dump_ada_specs, dump_ada_macros): Declare.
I don't want to sound picky but I disagree with polluting the
diagnostics interface further. If diagnostics.c was a stand-alone
library, these functions wouldn't be declared here. In fact, we should
start moving in the opposite direction and moving declarations out of
diagnostics.h that have nothing to do with diagnostics.c. Why these
functions cannot be declared in their own module/header file
c-ada-spec.h? why they need to be included by every user of
diagnostics.h?
I guess I just "followed the trend" here, but I certainly have no troubles
moving these declarations elsewhere (e.g. in c-ada-spec.h) if people
think it's better this way, let me know.

FWIW, I agree that without the existing "wrong" usage, it would be far cleaner
to have each .c file has it's own corresponding .h file (that's mandatory
in Ada actually :-).

Arno
Manuel López-Ibáñez
2010-04-26 14:03:06 UTC
Permalink
Post by Arnaud Charlet
FWIW, I agree that without the existing "wrong" usage, it would be far cleaner
to have each .c file has it's own corresponding .h file (that's mandatory
in Ada actually :-).
Or one .h file (the interface) for a module/library of several *.c
files. But those functions in diagnostics.h are not part of the
interface of the diagnostics machinery.

Thanks!

Manuel.
Mark Mitchell
2010-04-26 14:44:13 UTC
Permalink
Post by Arnaud Charlet
FWIW, I agree that without the existing "wrong" usage, it would be far cleaner
to have each .c file has it's own corresponding .h file (that's mandatory
in Ada actually :-).
Without taking any position on general practices for C or C++, and with
apologies for any previous language-specific bits in diagnostic.h, I
think that in this case it would certainly be better to move the
Ada-specific bits out of diagnostic.h.

I think this patch is generally useful. The C++ bits are mostly OK, but
I find cpp_check very unappealing due to the way in which it uses
predefined constants. Please (a) make an enum (rather than macros) for
the various things for which you need to check, (b) define that type in
a header, and (c) have the case statement in cpp_check switch on that:

switch (op)
{
case IS_ABSTRACT:
return DECL_PURE_VIRTUAL (t);
...
}

Also, please make sure that *all* functions have a comment which
documents *all* parameters, using the ALL_CAPS format, e.g.,:

/* Collect source file references recursively, starting
from NAMESPC (a NAMESPACE_DECL). */

With those changes, the C++ front-end changes are OK.
--
Mark Mitchell
CodeSourcery
***@codesourcery.com
(650) 331-3385 x713
Arnaud Charlet
2010-04-26 15:17:52 UTC
Permalink
Post by Mark Mitchell
With those changes, the C++ front-end changes are OK.
Thanks for the quick review.
Here is an updated patch taking all comments into account (left to review is
the C front-end and "driver" parts, thanks in advance).

2010-04-26 Arnaud Charlet <***@adacore.com>
Matthew Gingell <***@adacore.com>

* tree-dump.c (dump_files): Add ada-spec.
(FIRST_AUTO_NUMBERED_DUMP): Bump to 8.
* tree-pass.h (tree_dump_index): Add TDI_ada.
* gcc.c: Add support for -C without -E and for -fdump-ada-spec.
(cpp_unique_options): Do not reject -C or -CC when -E isn't present.
(default_compilers) <@c-header>: Allow -fdump-ada-spec on header files.
* c-decl.c: Include c-ada-spec.h.
(collect_source_ref_cb, collect_all_refs, for_each_global_decl): New
functions.
(c_write_global_declarations): Add handling of -fdump-ada-spec.
* c-lex.c (c_lex_with_flags): Add handling of CPP_COMMENT.
* Makefile.in (C_AND_OBJC_OBJS): Add c-ada-spec.o.
* c-ada-spec.h, c-ada-spec.c: New files.

cp/
* Make-lang.in (CXX_C_OBJS): Add c-ada-spec.o.
* decl2.c: Include langhooks.h and c-ada-spec.h.
(cpp_check, collect_source_refs, collect_ada_namespace,
collect_all_refs): New functions.
(cp_write_global_declarations): Add handling of -fdump-ada-spec.
* lang-specs.h: Ditto.
Joseph S. Myers
2010-04-26 17:01:48 UTC
Permalink
Post by Arnaud Charlet
* tree-dump.c (dump_files): Add ada-spec.
(FIRST_AUTO_NUMBERED_DUMP): Bump to 8.
* tree-pass.h (tree_dump_index): Add TDI_ada.
* gcc.c: Add support for -C without -E and for -fdump-ada-spec.
(cpp_unique_options): Do not reject -C or -CC when -E isn't present.
* c-decl.c: Include c-ada-spec.h.
(collect_source_ref_cb, collect_all_refs, for_each_global_decl): New
functions.
(c_write_global_declarations): Add handling of -fdump-ada-spec.
* c-lex.c (c_lex_with_flags): Add handling of CPP_COMMENT.
* Makefile.in (C_AND_OBJC_OBJS): Add c-ada-spec.o.
* c-ada-spec.h, c-ada-spec.c: New files.
The main manual (invoke.texi) needs to document the new option with a
cross-reference to the Ada manual.

As I understand it, this feature

(a) is deliberately limited to a subset of C and C++ declarations and
macros;

(b) may, for some input, generate invalid Ada output rather than a
diagnostic.

If this is as intended it doesn't itself block the feature, but it would
seem like a good idea to have documentation of what is / is not supported.

The driver changes are OK once the rest of the changes are OK. The
following is not an exhaustive review of the C changes.

It looks like you will always output extended identifiers as UTF-8. Is
this a form that will always be acceptable as input to the Ada compiler?
Post by Arnaud Charlet
+static void
+print_ada_macros (pretty_printer *pp, cpp_hashnode **macros, int max_ada_macros)
+{
+ int j, num_macros = 0, prev_line = -1;
+
+ for (j = 0; j < max_ada_macros; j++)
+ {
+ cpp_hashnode *node = macros [j];
+ const cpp_macro *macro = node->value.macro;
+ unsigned i;
+ unsigned char s [4096], params [4096];
+ unsigned char *buffer = s, *buf_param = params, *prev;
I can see no checks that you don't overflow these fixed-width buffers.
Buffer overruns are never OK; you should preferably calculate the memory
needed and always allocate enough (that might need two passes, one to
calculate and one to write the output after allocation), but failing that
make sure to avoid buffer overruns.
Post by Arnaud Charlet
+ if (!is_string)
+ switch (*prev)
+ {
What if there are both string and non-string tokens in the macro
expansion? And will the string code handle string concatenation?
Post by Arnaud Charlet
+ if (prev + 2 == buffer && prev [1] == '<')
+ {
+ if (prev [-1] == '1')
+ prev [-1] = '2';
+ else if (prev [-1] == ' '
+ && prev [-2] == '1')
+ prev [-2] = '2';
+
+ prev [0] = '*';
+ prev [1] = '*';
+ }
Quite what is this code meant to be doing?
Post by Arnaud Charlet
+ if (prev + 1 < buffer && prev [1] == 'x')
+ {
+ /* convert 0x.... into 16#....# */
Don't you need to handle 0X the same as 0x, and handle initial 0, not 0x
or 0X, on integer but not floating-point constants, as octal? (Then there
are GNU C binary constants starting 0b.)

Hex floats would also need appropriate handling of the exponent. Floating
constants in general can have 'f' suffixes in C.
Post by Arnaud Charlet
+ if (*buffer != 'L' && *buffer != 'l' && *buffer != 'U')
+ buffer++;
+ if (buffer [-1] == 'L'
+ || buffer [-1] == 'l'
+ || buffer [-1] == 'U')
In both these cases, you can have suffixes such as "UL" or "LLU".

In essence, the point with all these cases is that what you are doing with
taking a C string representation of the macro expansion and trying to
convert it to an Ada string seems very fragile. It would be better to
take the actual tokens, interpret their semantics and produce equivalent
Ada tokens - call cpp_classify_number then the various subsequent
functions the C lexer uses to generate trees for the constants, then write
out an Ada constant with the same value.

Much the same applies to other types of tokens. If you work on the
preprocessing tokens rather than a string, new forms of constants don't
cause problems; nor do C escape sequences in strings (I see nothing in
this code to handle them); nor do digraphs and C++ alternate spellings. I
don't know if there are cases where C and Ada operator precedence differ -
if there are, those could still cause issues in principle - but using
preprocessing tokens for their semantics rather than their spelling seems
generally likely to produce better results on a wider range of input, as
well as more reliably detecting cases that you can't convert to Ada.
Post by Arnaud Charlet
+ /* Special case null for now, need to find a more generic approach */
+ if (!strcasecmp ((const char *)NODE_NAME (node), "null"))
+ supported = 0;
You have an array ada_reserved later in this code....

(Also, note formatting. "." and two spaces at end of comment, space after
(type name) in cast.)
Post by Arnaud Charlet
+ macros = (cpp_hashnode **) alloca (sizeof (cpp_hashnode *) * max_ada_macros);
There are lots of casted allocations in this patch that would better be
using the macros from libiberty.h such as XALLOCAVEC (in this case).
--
Joseph S. Myers
***@codesourcery.com
Arnaud Charlet
2010-04-26 17:27:01 UTC
Permalink
Post by Joseph S. Myers
The main manual (invoke.texi) needs to document the new option with a
cross-reference to the Ada manual.
Fair enough, I'll add that.
Post by Joseph S. Myers
As I understand it, this feature
(a) is deliberately limited to a subset of C and C++ declarations and
macros;
Right.
Post by Joseph S. Myers
(b) may, for some input, generate invalid Ada output rather than a
diagnostic.
Right, the aim is to help developers doing most of the tedious work of
generating bindings rather than doing it manually, the intent is not to
aim at a 100% solution which would require a full C to Ada compiler.
Post by Joseph S. Myers
If this is as intended it doesn't itself block the feature, but it would
seem like a good idea to have documentation of what is / is not supported.
That part in itself is actually hard to do since by nature, the light approach
taken allows for some "holes", without necessarily knowing them all, although
I'll try to add some mentions of unsupported features, and in particular
explain that the translation of macros is pretty limited currently, more
on that below.
Post by Joseph S. Myers
The driver changes are OK once the rest of the changes are OK. The
following is not an exhaustive review of the C changes.
OK, thanks for the review, I'll work on a new version taking your comments
into account.
Post by Joseph S. Myers
It looks like you will always output extended identifiers as UTF-8. Is
this a form that will always be acceptable as input to the Ada compiler?
Yes, GNAT has a -gnatW8 switch to take identifiers as UTF-8.
Post by Joseph S. Myers
I can see no checks that you don't overflow these fixed-width buffers.
Buffer overruns are never OK; you should preferably calculate the memory
needed and always allocate enough (that might need two passes, one to
calculate and one to write the output after allocation), but failing that
make sure to avoid buffer overruns.
Fair enough.
Post by Joseph S. Myers
Post by Arnaud Charlet
+ if (prev + 2 == buffer && prev [1] == '<')
+ {
+ if (prev [-1] == '1')
+ prev [-1] = '2';
+ else if (prev [-1] == ' '
+ && prev [-2] == '1')
+ prev [-2] = '2';
+
+ prev [0] = '*';
+ prev [1] = '*';
+ }
Quite what is this code meant to be doing?
Transform 1 << N into 2 ** N, I'll add a comment.
Post by Joseph S. Myers
Post by Arnaud Charlet
+ if (prev + 1 < buffer && prev [1] == 'x')
+ {
+ /* convert 0x.... into 16#....# */
Don't you need to handle 0X the same as 0x, and handle initial 0, not 0x
or 0X, on integer but not floating-point constants, as octal? (Then there
are GNU C binary constants starting 0b.)
We don't "need" to handle all these forms since we're not aiming at
transforming 100% of all possible cases, this code (in particular the
macro transformation) is only handling the most common cases.

I'll document that for now as part of the limitations, and put as a TODO
to add this support, since it shouldn't be too hard to add.
Post by Joseph S. Myers
Post by Arnaud Charlet
+ if (buffer [-1] == 'L'
+ || buffer [-1] == 'l'
+ || buffer [-1] == 'U')
In both these cases, you can have suffixes such as "UL" or "LLU".
Right, same answer as above: that can certainly be put on the TODO list
and be left "unsupported" for now (and documented as such, since now
identified).
Post by Joseph S. Myers
In essence, the point with all these cases is that what you are doing with
taking a C string representation of the macro expansion and trying to
convert it to an Ada string seems very fragile.
Right, the macro is certainly much more fragile than the 'node' part,
the idea is really to try to handle common simple cases for now such as

#define X 12
#define Y "foo"

and not much more, and possibly enhance this later.

The idea behind this patch is that the capability is already useful as is,
and it's much easier to enhance it by steps once it's there (and I expect
other people from the Ada community, possibly e.g. Laurent Guerby, to
be able to contribute improvements to this capability once committed).
Post by Joseph S. Myers
It would be better to take the actual tokens, interpret their semantics and
produce equivalent Ada tokens - call cpp_classify_number then the various
subsequent functions the C lexer uses to generate trees for the constants,
then write out an Ada constant with the same value.
Possibly, although this sounds like much more work. It certainly sounds
like in the medium term, that would indeed be the way to go though.

Would you consider this point blocking for an initial integration? If so
I guess the other option would be to remove completely support for macros
as a first step and get the support for nodes in, and then work on redoing
the macro support.

In other words, it is kind of "expected" that the macro translation isn't
very powerful right now, and I'll certainly make sure this is properly
documented as part of the limitations.
Post by Joseph S. Myers
Post by Arnaud Charlet
+ /* Special case null for now, need to find a more generic approach */
+ if (!strcasecmp ((const char *)NODE_NAME (node), "null"))
+ supported = 0;
You have an array ada_reserved later in this code....
Right, this code can actually be removed completely, since we're now using
to_ada_name to handle this as you noted. I'll remove it.

I'll work on the other comments, thanks again for your review.

Arno
Joseph S. Myers
2010-04-26 19:31:21 UTC
Permalink
Post by Arnaud Charlet
Post by Joseph S. Myers
Post by Arnaud Charlet
+ if (prev + 2 == buffer && prev [1] == '<')
+ {
+ if (prev [-1] == '1')
+ prev [-1] = '2';
+ else if (prev [-1] == ' '
+ && prev [-2] == '1')
+ prev [-2] = '2';
+
+ prev [0] = '*';
+ prev [1] = '*';
+ }
Quite what is this code meant to be doing?
Transform 1 << N into 2 ** N, I'll add a comment.
It would seem to have unfortunate effects if the previous token ends with
1 but isn't just the integer 1 (or octal 1, or hex 1), and fail to be
effective (but you don't mark the macro unsupported in that case) if the
previous token doesn't end with 1. Detecting those cases should be
straightforward; detecting cases where precedence comes in (1 + 1 << N
means (1 + 1) << N not 1 + (1 << N)) would probably be rather harder.
Post by Arnaud Charlet
Post by Joseph S. Myers
It would be better to take the actual tokens, interpret their semantics and
produce equivalent Ada tokens - call cpp_classify_number then the various
subsequent functions the C lexer uses to generate trees for the constants,
then write out an Ada constant with the same value.
Possibly, although this sounds like much more work. It certainly sounds
like in the medium term, that would indeed be the way to go though.
I don't think it's really more work; you have a check for CPP_MACRO_ARG
followed by calling cpp_spell_token and a switch over characters and you
could change that to a simple switch over the token type, where cases
including the default could still call cpp_spell_token if desired. That
is, you could effectively switch over one token type at a time to working
based on the token type rather than the spelling.

Actually parsing (to deal with precedence issues etc.) would be much more
work, but working with tokens rather than their spellings seems like a
similar amount of work to the present version while being much more
robust.
Post by Arnaud Charlet
Would you consider this point blocking for an initial integration? If so
I guess the other option would be to remove completely support for macros
as a first step and get the support for nodes in, and then work on redoing
the macro support.
Not blocking, but as noted above I think working with tokens has similar
complexity while being more reliable and covering more cases "for free".
--
Joseph S. Myers
***@codesourcery.com
Arnaud Charlet
2010-04-27 10:11:52 UTC
Permalink
Post by Joseph S. Myers
Actually parsing (to deal with precedence issues etc.) would be much more
work, but working with tokens rather than their spellings seems like a
similar amount of work to the present version while being much more
robust.
Indeed, I was thinking about parsing rather than working with tokens.

I agree that using tokens is cleaner and not more difficult, actually a bit
simpler indeed.

Here is a reworked version taking into account all your comments, with
added documentation xref, improved comments and reworked macro handling as
per above.

I'm attaching the new patch file, as well as a diff between previous and
new version of c-ada-spec.c where I did most of the changes based on your
review.

gcc/

2010-04-27 Arnaud Charlet <***@adacore.com>
Matthew Gingell <***@adacore.com>

* doc/invoke.texi: Mention -fdump-ada-spec.
* tree-dump.c (dump_files): Add ada-spec.
(FIRST_AUTO_NUMBERED_DUMP): Bump to 8.
* tree-pass.h (tree_dump_index): Add TDI_ada.
* gcc.c: Add support for -C without -E and for -fdump-ada-spec.
(cpp_unique_options): Do not reject -C or -CC when -E isn't present.
(default_compilers) <@c-header>: Allow -fdump-ada-spec on header files.
* c-decl.c: Include c-ada-spec.h.
(collect_source_ref_cb, collect_all_refs, for_each_global_decl): New
functions.
(c_write_global_declarations): Add handling of -fdump-ada-spec.
* c-lex.c (c_lex_with_flags): Add handling of CPP_COMMENT.
* Makefile.in (C_AND_OBJC_OBJS): Add c-ada-spec.o.
* c-ada-spec.h, c-ada-spec.c: New files.

gcc/ada

2010-04-27 Arnaud Charlet <***@adacore.com>

* gnat_ugn.texi: Improve doc on -fdump-ada-spec, mention limitations.

gcc/cp/

2010-04-27 Arnaud Charlet <***@adacore.com>
Matthew Gingell <***@adacore.com>

* Make-lang.in (CXX_C_OBJS): Add c-ada-spec.o.
* decl2.c: Include langhooks.h and c-ada-spec.h.
(cpp_check, collect_source_refs, collect_ada_namespace,
collect_all_refs): New functions.
(cp_write_global_declarations): Add handling of -fdump-ada-spec.
* lang-specs.h: Ditto.

Thanks in advance,

Arno
Arnaud Charlet
2010-04-27 10:53:48 UTC
Permalink
Post by Arnaud Charlet
I'm attaching the new patch file, as well as a diff between previous and
new version of c-ada-spec.c where I did most of the changes based on your
review.
Actually we can't support out of the box "complex" chars/strings, so I've
modified the macro handling to do:

case CPP_WSTRING:
case CPP_STRING16:
case CPP_STRING32:
case CPP_UTF8STRING:
case CPP_WCHAR:
case CPP_CHAR16:
case CPP_CHAR32:
case CPP_NAME:
if (!macro->fun_like)
supported = 0;
else
buffer = cpp_spell_token (parse_in, token, buffer, false);
break;

case CPP_STRING:
is_string = 1;
buffer = cpp_spell_token (parse_in, token, buffer, false);
break;

case CPP_CHAR:
is_char = 1;
buffer = cpp_spell_token (parse_in, token, buffer, false);
break;

instead of handling e.g. CPP_WSTRING like CPP_STRING
Arnaud Charlet
2010-05-01 10:59:00 UTC
Permalink
I've added even more comments and a few clean ups on my previous patch.

I believe this version should be suitable (or very close) for inclusion
on the mainline, so would appreciate your review.

Tested on i686-pc-linux-gnu, OK for the mainline?

gcc/

2010-05-01 Arnaud Charlet <***@adacore.com>
Matthew Gingell <***@adacore.com>

* doc/invoke.texi: Mention -fdump-ada-spec.
* tree-dump.c (dump_files): Add ada-spec.
(FIRST_AUTO_NUMBERED_DUMP): Bump to 8.
* tree-pass.h (tree_dump_index): Add TDI_ada.
* gcc.c: Add support for -C without -E and for -fdump-ada-spec.
(cpp_unique_options): Do not reject -C or -CC when -E isn't present.
(default_compilers) <@c-header>: Allow -fdump-ada-spec on header
files.
* c-decl.c: Include c-ada-spec.h.
(collect_source_ref_cb, collect_all_refs, for_each_global_decl): New
functions.
(c_write_global_declarations): Add handling of -fdump-ada-spec.
* c-lex.c (c_lex_with_flags): Add handling of CPP_COMMENT.
* Makefile.in (C_AND_OBJC_OBJS): Add c-ada-spec.o.
* c-ada-spec.h, c-ada-spec.c: New files.

gcc/ada

* gnat_ugn.texi: Improve doc on -fdump-ada-spec, mention limitations.

gcc/cp/

* Make-lang.in (CXX_C_OBJS): Add c-ada-spec.o.
* decl2.c: Include langhooks.h and c-ada-spec.h.
(cpp_check, collect_source_refs, collect_ada_namespace,
collect_all_refs): New functions.
(cp_write_global_declarations): Add handling of -fdump-ada-spec.
* lang-specs.h: Ditto.
Joseph S. Myers
2010-05-05 16:32:46 UTC
Permalink
+-fdump-ada-spec[-slim]}
+Ada spec (via the -fdump-ada-spec switch).
@option{-fdump-ada-spec}.
Same point as above on [].
+/* Dump all digits/hex chars from NUMBER to BUFFER.
+ Returns a pointer to the character after the last character written. */
+
+static unsigned char *
+dump_number (unsigned char *number, unsigned char *buffer)
+{
+ while (*number != '\0' && *number != 'U' && *number != 'l' && *number != 'L')
+ *buffer++ = *number++;
You may as well handle 'u' as well here, even if you don't want to handle
hex floats and "float" literals ending 'f' or 'F'.
+ if (token->flags & STRINGIFY_ARG || token->flags & PASTE_LEFT)
+ {
+ supported = 0;
Weren't those cases detected earlier as unsupported when you calculated
the length?
+ *buffer++ = 'n'; *buffer++ = 'o'; *buffer++ = 't'; break;
+ *buffer++ = 'm'; *buffer++ = 'o'; *buffer++ = 'd'; break;
+ *buffer++ = 'a'; *buffer++ = 'n'; *buffer++ = 'd'; break;
+ *buffer++ = 'o'; *buffer++ = 'r'; break;
+ *buffer++ = 'x'; *buffer++ = 'o'; *buffer++ = 'r'; break;
+ strcpy ((char *) buffer, "and then");
+ buffer += 8;
+ break;
+ strcpy ((char *) buffer, "or else");
+ buffer += 7;
+ break;
Do you need to insert spaces before and after these strings? The C tokens
could be adjacent to identifiers with no spaces in between, but you don't
want "a&b" to become "aandb".
+ if (!macro->fun_like)
+ supported = 0;
+ else
+ buffer = cpp_spell_token (parse_in, token, buffer, false);
+ break;
+
+ is_string = 1;
+ buffer = cpp_spell_token (parse_in, token, buffer, false);
+ break;
+
+ is_char = 1;
+ buffer = cpp_spell_token (parse_in, token, buffer, false);
+ break;
+
Although better than the previous code, you'd still get correct results
for more cases if you interpret the value of the token rather than the
spelling - for example, that way you could handle escape sequences
appearing in strings and convert them appropriately for Ada without any
need to know about the exact semantics of C escape sequences.
And 'u', similarly.
And 'B'.
+ if (tmp[1] == '\0' || tmp[1] == 'l'
+ || tmp[1] == 'L' || tmp [1] == 'U')
And 'u'.
+ if (!macro->fun_like)
+ supported = 0;
I'm not clear on the logic of saying these things are supported for
function-like macros but not otherwise. Are there actually cases where
respelling these C tokens will produce valid Ada, or should these tokens
just make the whole macro expansion unsupported for Ada, or are you
hoping a partial conversion, though not valid Ada, will still be useful
to the users of this feature?
+ macros = XALLOCAVEC (cpp_hashnode *, sizeof (cpp_hashnode *)*max_ada_macros);
XALLOCAVEC already does the multiplication by the size of the type passed,
so it seems suspicious for the second argument to this call to be doing
the multiplication itself.

I note that there are no testcases included with this patch - it would
seem desirable to have testsuite coverage for this feature. Practically
this may mean having input files that provide reasonable coverage of the
various cases covered in the code (for the whole feature, not just macros)
and associated expected output files for comparison, though to avoid
having to update lots of output files (or lots of cases in one big output
file) when making minor changes to the output it would be better in
principle to test not the output text but that it is correct Ada that
passes through the Ada compiler and declares things in the expected way
for Ada code using the generated declarations.
--
Joseph S. Myers
***@codesourcery.com
Arnaud Charlet
2010-05-06 13:00:49 UTC
Permalink
Fixed.
Post by Joseph S. Myers
+Ada spec (via the -fdump-ada-spec switch).
@option{-fdump-ada-spec}.
Fixed.
Post by Joseph S. Myers
Same point as above on [].
Fixed.
Post by Joseph S. Myers
You may as well handle 'u' as well here, even if you don't want to handle
hex floats and "float" literals ending 'f' or 'F'.
Done.
Post by Joseph S. Myers
+ if (token->flags & STRINGIFY_ARG || token->flags & PASTE_LEFT)
+ {
+ supported = 0;
Weren't those cases detected earlier as unsupported when you calculated
the length?
Yes, but I decided to make the two algortihms usable independently, so that
if one part is modified, it might not invalidate the other.

If you think it would be better to remove the extra test, that can be done
easily though.
Post by Joseph S. Myers
Do you need to insert spaces before and after these strings? The C tokens
could be adjacent to identifiers with no spaces in between, but you don't
want "a&b" to become "aandb".
Indeed, fixed.
Post by Joseph S. Myers
Although better than the previous code, you'd still get correct results
for more cases if you interpret the value of the token rather than the
spelling - for example, that way you could handle escape sequences
appearing in strings and convert them appropriately for Ada without any
need to know about the exact semantics of C escape sequences.
OK, See updated patch below. If this is not what you had in mind, could
you clarify what you mean by 'interpret the value of the token'?
Post by Joseph S. Myers
And 'u', similarly.
Done.
Post by Joseph S. Myers
And 'B'.
Done.
Post by Joseph S. Myers
And 'u'.
Done.
Post by Joseph S. Myers
+ if (!macro->fun_like)
+ supported = 0;
I'm not clear on the logic of saying these things are supported for
function-like macros but not otherwise. Are there actually cases where
respelling these C tokens will produce valid Ada, or should these tokens
just make the whole macro expansion unsupported for Ada, or are you
hoping a partial conversion, though not valid Ada, will still be useful
to the users of this feature?
For function-like macros, we do not attempt to convert them to valid Ada, but
instead we generate a possible conversion in the form of comments that can
help the user to manually edit the file and create real inline functions.

In other words, for fun_like, we only generate comments, so it does not
really matter if the syntax used is C syntax, or a mix of Ada and C.

For example, the following fun-like macro:

#define FOO(A,B) call (a, b)

will generate the following Ada comments:

-- arg-macro: procedure FOO (A, B)
-- call (a, b)
Post by Joseph S. Myers
XALLOCAVEC already does the multiplication by the size of the type passed,
so it seems suspicious for the second argument to this call to be doing
the multiplication itself.
Indeed, fixed.
Post by Joseph S. Myers
I note that there are no testcases included with this patch - it would
seem desirable to have testsuite coverage for this feature.
Agreed. We do have several tests, but not being familiar with dejagnu, that's
why I didn't include them in my initial patch.

Now that we're apparently converging and are getting close to inclusion (at
least I hope so!), I've investigated how to add such tests in the current
framework and came up with the attached patch which adds 18 tests with most
difficult/tricky issues that we've had to deal with as well as some simple
tests. The tests behave as you suggested indeed: first we generate .ads files
using -fdump-ada-spec, and then we compile the .ads files using the Ada
compiler and make sure that no errors are generated, which works nicely
in our experience (that's how we've been testing this capability at AdaCore
for the past 2 years).

Updated ChangeLog and patch attached, thanks in advance!

gcc/

2010-05-06 Arnaud Charlet <***@adacore.com>
Matthew Gingell <***@adacore.com>

* doc/invoke.texi: Mention -fdump-ada-spec.
* tree-dump.c (dump_files): Add ada-spec.
(FIRST_AUTO_NUMBERED_DUMP): Bump to 8.
* tree-pass.h (tree_dump_index): Add TDI_ada.
* gcc.c: Add support for -C without -E and for -fdump-ada-spec.
(cpp_unique_options): Do not reject -C or -CC when -E isn't present.
(default_compilers) <@c-header>: Allow -fdump-ada-spec on header files.
* c-decl.c: Include c-ada-spec.h.
(collect_source_ref_cb, collect_all_refs, for_each_global_decl): New
functions.
(c_write_global_declarations): Add handling of -fdump-ada-spec.
* c-lex.c (c_lex_with_flags): Add handling of CPP_COMMENT.
* Makefile.in (C_AND_OBJC_OBJS): Add c-ada-spec.o.
* c-ada-spec.h, c-ada-spec.c: New files.

gcc/ada

* gnat_ugn.texi: Improve doc on -fdump-ada-spec, mention limitations.

gcc/cp/

* Make-lang.in (CXX_C_OBJS): Add c-ada-spec.o.
* decl2.c: Include langhooks.h and c-ada-spec.h.
(cpp_check, collect_source_refs, collect_ada_namespace,
collect_all_refs): New functions.
(cp_write_global_declarations): Add handling of -fdump-ada-spec.
* lang-specs.h: Ditto.

gcc/testsuite

* gnat.dg/c-specs: New directory with initial set of -fdump-ada-spec
tests.
* lib/gcc-dg.exp (gcc-dg-test-1): Add support for -fdump-ada-spec tests.
* lib/gnat-dg.exp (gnat-dg-test): Ditto.
Joseph S. Myers
2010-05-06 14:35:11 UTC
Permalink
Post by Arnaud Charlet
Post by Joseph S. Myers
Although better than the previous code, you'd still get correct results
for more cases if you interpret the value of the token rather than the
spelling - for example, that way you could handle escape sequences
appearing in strings and convert them appropriately for Ada without any
need to know about the exact semantics of C escape sequences.
OK, See updated patch below. If this is not what you had in mind, could
you clarify what you mean by 'interpret the value of the token'?
What I mean is convert the preprocessing token to a tree value the same
way the C front end does, then convert the tree to an appropriate Ada
representation. This should work for the tokens giving INTEGER_CSTs,
REAL_CSTs, STRING_CSTs (ordinary or UTF-8 ones at least; I don't know
about Ada equivalents to wide, UTF-16 or UTF-32 strings); if you get
something else such as a FIXED_CST or COMPLEX_CST you could mark it
unsupported and fall back to the original C source representation of the
token.

That way you don't need to duplicate any logic for interpreting escape
sequences in strings, for example, since you let the existing code produce
the sequence of bytes that would represent the string on the target. Nor
do you need to know about the various different forms of integer and real
constants unless you feel it's important for the Ada code to use the same
base for them as the C code does.
--
Joseph S. Myers
***@codesourcery.com
Arnaud Charlet
2010-05-06 15:09:39 UTC
Permalink
Post by Joseph S. Myers
What I mean is convert the preprocessing token to a tree value the same
way the C front end does, then convert the tree to an appropriate Ada
representation.
Could you give some pointers to do that? Initially I said this sounded
like lots of extra work, but if it's simple to do, why not. I'm in particular
concerned that I'll need to add an expression converter (from C expressions
to Ada expression), which is currently intentionally avoided, we only
convert tree declarations, and no expressions.

If it's complex, I'd like to defer it to a second stage, since this is
more in the 'nice to have category'.
Post by Joseph S. Myers
That way you don't need to duplicate any logic for interpreting escape
sequences in strings, for example, since you let the existing code produce
the sequence of bytes that would represent the string on the target. Nor
do you need to know about the various different forms of integer and real
constants unless you feel it's important for the Ada code to use the same
base for them as the C code does.
Actually keeping the information on the 'base' is certainly important and
useful, so that's one area where we do want to keep the original info/token.

As for strings, we will need to perform so extra processing no matter what
representation is used (e.g. all non printable chars will need to be
either declared unsupported or handled as done in my patch for \n, \t, \r),
so I'm not convinced of the added value.

Again, the idea as far as preprocessor handling is to convert the most common
cases which are also the simple cases: simple ASCII strings and simple
integer literals. I am not convinced that converting to a tree value would
add much value for these cases, even though in the medium term, doing such
convertion may prove useful for more complex cases.

Arno
Joseph S. Myers
2010-05-06 15:59:55 UTC
Permalink
Post by Arnaud Charlet
Post by Joseph S. Myers
What I mean is convert the preprocessing token to a tree value the same
way the C front end does, then convert the tree to an appropriate Ada
representation.
Could you give some pointers to do that? Initially I said this sounded
like lots of extra work, but if it's simple to do, why not. I'm in particular
concerned that I'll need to add an expression converter (from C expressions
to Ada expression), which is currently intentionally avoided, we only
convert tree declarations, and no expressions.
See what c_lex_with_flags does.

I'm concerned about having multiple separate implementations of processing
C tokens that make it unclear what should be updated when a new type of
constant or string is added, and are liable to get out of sync.
--
Joseph S. Myers
***@codesourcery.com
Arnaud Charlet
2010-05-07 08:10:27 UTC
Permalink
Post by Joseph S. Myers
See what c_lex_with_flags does.
I see. I've looked at c_lex_with_flags, and this subprogram uses the
following routines mainly:

for characters: lex_charconst, which uses cpp_interpret_charconst,
this one can be reused
for strings: lex_string
for numbers: interpret_integer and interpret_float

lex_string, interpret_integer and interpret_float are functions local
to c-lex.c, so cannot be reused outside this file.

Duplicating their work would defeat your suggestion, so are you
suggesting that we make interpret_integer/interpret_float/lex_string
global functions instead?

That would actually not work, since e.g. lex_string uses cpp_get_token(),
in other words, will get tokens from the parsers, while in c-ada-spec.c,
we only have access to the MACRO definitions, in other words, we're
working before the preprocessor expansion, not after.
Post by Joseph S. Myers
I'm concerned about having multiple separate implementations of
processing C tokens that make it unclear what should be updated when a new
type of constant or string is added, and are liable to get out of sync.
Given that we only process a subset of CPP tokens in c-ada-spec.c, I
doubt this will be an issue, and am of course willing to maintain this part
when such update is needed. In most cases, it will be a no-op (part of
a 'default:' in the switch) anyway.

Here is the updated patch which uses cpp_interpret_charconst for characters
which is indeed an improvement (same ChangeLog).

Tested on i686-pc-linux-gnu, OK for mainline?

Arno
Joseph S. Myers
2010-05-07 15:18:21 UTC
Permalink
Post by Arnaud Charlet
Post by Joseph S. Myers
See what c_lex_with_flags does.
I see. I've looked at c_lex_with_flags, and this subprogram uses the
for characters: lex_charconst, which uses cpp_interpret_charconst,
this one can be reused
for strings: lex_string
for numbers: interpret_integer and interpret_float
lex_string, interpret_integer and interpret_float are functions local
to c-lex.c, so cannot be reused outside this file.
Duplicating their work would defeat your suggestion, so are you
suggesting that we make interpret_integer/interpret_float/lex_string
global functions instead?
I'm not making a suggestion at the level of making particular functions
global; rather, that there should be some function used in both c-lex.c
and for this c-ada-spec.c code. It's possible it might correspond to the
entire CPP_NUMBER case, if that's more useful than exporting the
individual interpret_integer and interpret_float functions used as part of
that case, for example.
--
Joseph S. Myers
***@codesourcery.com
Arnaud Charlet
2010-05-07 15:47:43 UTC
Permalink
Post by Joseph S. Myers
I'm not making a suggestion at the level of making particular functions
global; rather, that there should be some function used in both c-lex.c
and for this c-ada-spec.c code. It's possible it might correspond to the
entire CPP_NUMBER case, if that's more useful than exporting the
individual interpret_integer and interpret_float functions used as part of
that case, for example.
OK, well sounds like we're talking non trivial work here. Do you
consider this blocking for initial inclusion?

If yes, can you give more precise ideas about what parts should be
made visible to c-ada-spec.c? Are we talking about handling of numbers?
Handling of strings? Both? More?

If not, I'd still be interested in going this route in the medium term
(so answers to the above questions would be helpful), but at least having
the current version integrated would really make things simpler to manage
incrementally.

Arno
Joseph S. Myers
2010-05-07 16:21:14 UTC
Permalink
Post by Arnaud Charlet
Post by Joseph S. Myers
I'm not making a suggestion at the level of making particular functions
global; rather, that there should be some function used in both c-lex.c
and for this c-ada-spec.c code. It's possible it might correspond to the
entire CPP_NUMBER case, if that's more useful than exporting the
individual interpret_integer and interpret_float functions used as part of
that case, for example.
OK, well sounds like we're talking non trivial work here. Do you
consider this blocking for initial inclusion?
Not necessarily.
Post by Arnaud Charlet
If yes, can you give more precise ideas about what parts should be
made visible to c-ada-spec.c? Are we talking about handling of numbers?
Handling of strings? Both? More?
Both, with whatever changes are needed to be able to take a sequence of
strings from either the source file (normal lexing) or a macro expansion
(c-ada-spec), do the concatenation and interpretation of escape sequences
and return the final tree.
--
Joseph S. Myers
***@codesourcery.com
Arnaud Charlet
2010-05-07 19:11:32 UTC
Permalink
Post by Joseph S. Myers
Post by Arnaud Charlet
OK, well sounds like we're talking non trivial work here. Do you
consider this blocking for initial inclusion?
Not necessarily.
Great.
Post by Joseph S. Myers
Post by Arnaud Charlet
If yes, can you give more precise ideas about what parts should be
made visible to c-ada-spec.c? Are we talking about handling of numbers?
Handling of strings? Both? More?
Both, with whatever changes are needed to be able to take a sequence of
strings from either the source file (normal lexing) or a macro expansion
(c-ada-spec), do the concatenation and interpretation of escape
sequences and return the final tree.
Note that it's not a macro expansion, but a macro definition (in case it
makes a difference).

As I said, this looks like a useful medium term plan, since this will
require getting familiar with c-lex.c and carefully change this common
code, I'd prefer to do it in a second step if you don't mind.

So do you see other blocking points in the patch I submitted?

Arno
Arnaud Charlet
2010-05-10 09:09:54 UTC
Permalink
Here is a new version of the patch which removes all the problematic
issues you've raised by, for now, not handling strings or numbers in
macros. I've updated the doc accordingly.

Tested on i686-pc-linux-gnu, OK for mainline?

gcc/

2010-05-10 Arnaud Charlet <***@adacore.com>
Matthew Gingell <***@adacore.com>

* doc/invoke.texi: Mention -fdump-ada-spec.
* tree-dump.c (dump_files): Add ada-spec.
(FIRST_AUTO_NUMBERED_DUMP): Bump to 8.
* tree-pass.h (tree_dump_index): Add TDI_ada.
* gcc.c: Add support for -C without -E and for -fdump-ada-spec.
(cpp_unique_options): Do not reject -C or -CC when -E isn't present.
(default_compilers) <@c-header>: Allow -fdump-ada-spec on header files.
* c-decl.c: Include c-ada-spec.h.
(collect_source_ref_cb, collect_all_refs, for_each_global_decl): New
functions.
(c_write_global_declarations): Add handling of -fdump-ada-spec.
* c-lex.c (c_lex_with_flags): Add handling of CPP_COMMENT.
* Makefile.in (C_AND_OBJC_OBJS): Add c-ada-spec.o.
* c-ada-spec.h, c-ada-spec.c: New files.

gcc/ada

* gnat_ugn.texi: Improve doc on -fdump-ada-spec, mention limitations.

gcc/cp/

* Make-lang.in (CXX_C_OBJS): Add c-ada-spec.o.
* decl2.c: Include langhooks.h and c-ada-spec.h.
(cpp_check, collect_source_refs, collect_ada_namespace,
collect_all_refs): New functions.
(cp_write_global_declarations): Add handling of -fdump-ada-spec.
* lang-specs.h: Ditto.

gcc/testsuite

* gnat.dg/c-specs: New directory with initial set of -fdump-ada-spec
tests.
* lib/gcc-dg.exp (gcc-dg-test-1): Add support for -fdump-ada-spec tests.
* lib/gnat-dg.exp (gnat-dg-test): Ditto.
Joseph S. Myers
2010-05-11 00:04:37 UTC
Permalink
Post by Arnaud Charlet
Here is a new version of the patch which removes all the problematic
issues you've raised by, for now, not handling strings or numbers in
macros. I've updated the doc accordingly.
Tested on i686-pc-linux-gnu, OK for mainline?
+ /* Include enough extra space to handle e.g. special characters as
+ handled by handle_escape_character below. */
+ *buffer_len += (cpp_token_len (token) + 1) * 8;
There's no handle_escape_character function in this patch version so the
comment should not reference it.
Post by Arnaud Charlet
+ if (withs == NULL)
+ withs = (struct with *) xmalloc (withs_max * sizeof (struct with));
Should use XNEWVEC. Likewise all other casted calls to xmalloc / xrealloc
should be using XNEWVEC / XRESIZEVEC.
Post by Arnaud Charlet
+/* ??? would be nice to specify this list via a config file, so that users
+ can create their own dictionnary of conflicts. */
"dictionary".

OK with those changes.
--
Joseph S. Myers
***@codesourcery.com
Joseph S. Myers
2010-05-11 00:06:21 UTC
Permalink
Post by Joseph S. Myers
OK with those changes.
To be clear, that OK is for the C front-end changes. I have previously
approved the driver changes and Mark approved the C++ changes; I can't
speak for any other parts of the patch that may require approval.
--
Joseph S. Myers
***@codesourcery.com
Arnaud Charlet
2010-06-01 13:09:22 UTC
Permalink
Post by Joseph S. Myers
To be clear, that OK is for the C front-end changes. I have previously
approved the driver changes and Mark approved the C++ changes; I can't
speak for any other parts of the patch that may require approval.
Thanks. That covers everything actually, except for the small test
infrastructure changes in gcc/testsuite/lib:

gcc/testsuite

* lib/gcc-dg.exp (gcc-dg-test-1): Add support for -fdump-ada-spec tests.
* lib/gnat-dg.exp (gnat-dg-test): Ditto.

Who can review/approve these bits which were modelled after the -frepo*
testing approach?

The idea as discussed under this thread is, for testing -fdump-ada-spec,
to first run gcc -c -fdump-ada-spec on .h files, and then run gcc
(gnat) on the resulting Ada file to verify that the generated file
is proper Ada (gcc -c xxx_h.ads).

I don't see any 'testsuite' maintainer listed under MAINTAINERS, although
I've seen Uros approve this kind of patch recently, so tentavely cc:ing
him, thanks in advance!
Steven Bosscher
2010-06-01 22:45:35 UTC
Permalink
Post by Arnaud Charlet
* c-ada-spec.h, c-ada-spec.c: New files.
New files, clearly blindly copied from some other file without
cleaning up what you do not need. You are importing almost the entire
GIMPLE part of the middle end into the C front end.

Could reviewers please also watch out that this kind of thing doesn't
happen again?

Will commit after bootstrap&testing.

Ciao!
Steven


* c-ada-spec.c: Clean up redundant includes.

Index: c-ada-spec.c
===================================================================
--- c-ada-spec.c (revision 160125)
+++ c-ada-spec.c (working copy)
@@ -26,16 +26,6 @@ along with GCC; see the file COPYING3.
#include "tree.h"
#include "output.h"
#include "c-ada-spec.h"
-#include "real.h"
-#include "hashtab.h"
-#include "tree-flow.h"
-#include "langhooks.h"
-#include "tree-iterator.h"
-#include "tree-chrec.h"
-#include "tree-pass.h"
-#include "fixed-value.h"
-#include "value-prof.h"
-#include "predict.h"
#include "cpplib.h"
#include "c-pragma.h"
#include "cpp-id-data.h"
Arnaud Charlet
2010-06-02 06:44:08 UTC
Permalink
Post by Steven Bosscher
New files, clearly blindly copied from some other file without
cleaning up what you do not need. You are importing almost the entire
GIMPLE part of the middle end into the C front end.
Could reviewers please also watch out that this kind of thing doesn't
happen again?
Will commit after bootstrap&testing.
Thanks for the clean up, and sorry about that!

Arno

Continue reading on narkive:
Loading...