| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Here we document details of how the preprocessor's implementation affects its user-visible behavior. You should try to avoid undue reliance on behavior described here, as it is possible that it will change subtly in future implementations.
Also documented here are obsolete features and changes from previous versions of CPP.
| 11.1 Implementation-defined behavior | ||
| 11.2 Implementation limits | ||
| 11.3 Obsolete Features | ||
| 11.4 Differences from previous versions |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This is how CPP behaves in all the cases which the C standard describes as implementation-defined. This term means that the implementation is free to do what it likes, but must document its choice and stick to it.
The input character set can be specified using the `-finput-charset' option, while the execution character set may be controlled using the `-fexec-charset' and `-fwide-exec-charset' options.
The C and C++ standards allow identifiers to be composed of `_' and the alphanumeric characters. C++ and C99 also allow universal character names, and C99 further permits implementation-defined characters. GCC currently only permits universal character names if `-fextended-identifiers' is used, because the implementation of universal character names in identifiers is experimental.
GCC allows the `$' character in identifiers as an extension for most targets. This is true regardless of the `std=' switch, since this extension cannot conflict with standards-conforming programs. When preprocessing assembler, however, dollars are not identifier characters by default.
Currently the targets that by default do not permit `$' are AVR, IP2K, MMIX, MIPS Irix 3, ARM aout, and PowerPC targets for the AIX operating system.
You can override the default with `-fdollars-in-identifiers' or `fno-dollars-in-identifiers'. See fdollars-in-identifiers.
In textual output, each whitespace sequence is collapsed to a single space. For aesthetic reasons, the first token on each non-directive line of output is preceded with sufficient spaces that it appears in the same column as it did in the original source file.
The preprocessor and compiler interpret character constants in the same way; i.e. escape sequences such as `\a' are given the values they would have on the target machine.
The compiler evaluates a multi-character character constant a character
at a time, shifting the previous value left by the number of bits per
target character, and then or-ing in the bit-pattern of the new
character truncated to the width of a target character. The final
bit-pattern is given type int, and is therefore signed,
regardless of whether single characters are signed or not (a slight
change from versions 3.1 and earlier of GCC). If there are more
characters in the constant than would fit in the target int the
compiler issues a warning, and the excess leading characters are
ignored.
For example, 'ab' for a target with an 8-bit char would be
interpreted as `(int) ((unsigned char) 'a' * 256 + (unsigned char)
'b')', and '\234a' as `(int) ((unsigned char) '\234' *
256 + (unsigned char) 'a')'.
For a discussion on how the preprocessor locates header files, Include Operation.
See section Computed Includes.
No macro expansion occurs on any `#pragma' directive line, so the question does not arise.
Note that GCC does not yet implement any of the standard pragmas.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
CPP has a small number of internal limits. This section lists the limits which the C standard requires to be no lower than some minimum, and all the others known. It is intended that there should be as few limits as possible. If you encounter an undocumented or inconvenient limit, please report that as a bug. See (gcc)Bugs section `Reporting Bugs' in Using the GNU Compiler Collection (GCC).
Where we say something is limited only by available memory, that
means that internal data structures impose no intrinsic limit, and space
is allocated with malloc or equivalent. The actual limit will
therefore depend on many things, such as the size of other things
allocated by the compiler at the same time, the amount of memory
consumed by other processes on the same computer, etc.
We impose an arbitrary limit of 200 levels, to avoid runaway recursion. The standard requires at least 15 levels.
The C standard mandates this be at least 63. CPP is limited only by available memory.
The C standard requires this to be at least 63. In preprocessor conditional expressions, it is limited only by available memory.
The preprocessor treats all characters as significant. The C standard requires only that the first 63 be significant.
The standard requires at least 4095 be possible. CPP is limited only by available memory.
We allow USHRT_MAX, which is no smaller than 65,535. The minimum
required by the standard is 127.
The C standard requires a minimum of 4096 be permitted. CPP places no limits on this, but you may get incorrect column numbers reported in diagnostics for lines longer than 65,535 characters.
The standard does not specify any lower limit on the maximum size of a source file. GNU cpp maps files into memory, so it is limited by the available address space. This is generally at least two gigabytes. Depending on the operating system, the size of physical memory may or may not be a limitation.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
CPP has some features which are present mainly for compatibility with older programs. We discourage their use in new code. In some cases, we plan to remove the feature in a future version of GCC.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Assertions are a deprecated alternative to macros in writing conditionals to test what sort of computer or system the compiled program will run on. Assertions are usually predefined, but you can define them with preprocessing directives or command-line options.
Assertions were intended to provide a more systematic way to describe the compiler's target system and we added them for compatibility with existing compilers. In practice they are just as unpredictable as the system-specific predefined macros. In addition, they are not part of any standard, and only a few compilers support them. Therefore, the use of assertions is less portable than the use of system-specific predefined macros. We recommend you do not use them at all.
An assertion looks like this:
#predicate (answer) |
predicate must be a single identifier. answer can be any
sequence of tokens; all characters are significant except for leading
and trailing whitespace, and differences in internal whitespace
sequences are ignored. (This is similar to the rules governing macro
redefinition.) Thus, (x + y) is different from (x+y) but
equivalent to ( x + y ). Parentheses do not nest inside an
answer.
To test an assertion, you write it in an `#if'. For example, this
conditional succeeds if either vax or ns16000 has been
asserted as an answer for machine.
#if #machine (vax) || #machine (ns16000) |
You can test whether any answer is asserted for a predicate by omitting the answer in the conditional:
#if #machine |
Assertions are made with the `#assert' directive. Its sole argument is the assertion to make, without the leading `#' that identifies assertions in conditionals.
#assert predicate (answer) |
You may make several assertions with the same predicate and different answers. Subsequent assertions do not override previous ones for the same predicate. All the answers for any given predicate are simultaneously true.
Assertions can be canceled with the `#unassert' directive. It has the same syntax as `#assert'. In that form it cancels only the answer which was specified on the `#unassert' line; other answers for that predicate remain true. You can cancel an entire predicate by leaving out the answer:
#unassert predicate |
In either form, if no such assertion has been made, `#unassert' has no effect.
You can also make or cancel assertions using command line options. See section Invocation.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This section details behavior which has changed from previous versions of CPP. We do not plan to change it again in the near future, but we do not promise not to, either.
The "previous versions" discussed here are 2.95 and before. The behavior of GCC 3.0 is mostly the same as the behavior of the widely used 2.96 and 2.97 development snapshots. Where there are differences, they generally represent bugs in the snapshots.
This option has been deprecated in 4.0. `-iquote' is meant to replace the need for this option.
The standard does not specify the order of evaluation of a chain of `##' operators, nor whether `#' is evaluated before, after, or at the same time as `##'. You should therefore not write any code which depends on any specific ordering. It is possible to guarantee an ordering, if you need one, by suitable use of nested macros.
An example of where this might matter is pasting the arguments `1', `e' and `-2'. This would be fine for left-to-right pasting, but right-to-left pasting would produce an invalid token `e-2'.
GCC 3.0 evaluates `#' and `##' at the same time and strictly left to right. Older versions evaluated all `#' operators first, then all `##' operators, in an unreliable order.
See section Preprocessor Output, for the current textual format. This is also the format used by stringification. Normally, the preprocessor communicates tokens directly to the compiler's parser, and whitespace does not come up at all.
Older versions of GCC preserved all whitespace provided by the user and inserted lots more whitespace of their own, because they could not accurately predict when extra spaces were needed to prevent accidental token pasting.
As an extension, GCC permits you to omit the variable arguments entirely when you use a variable argument macro. This is forbidden by the 1999 C standard, and will provoke a pedantic warning with GCC 3.0. Previous versions accepted it silently.
Formerly, in a macro expansion, if `##' appeared before a variable arguments parameter, and the set of tokens specified for that argument in the macro invocation was empty, previous versions of CPP would back up and remove the preceding sequence of non-whitespace characters (not the preceding token). This extension is in direct conflict with the 1999 C standard and has been drastically pared back.
In the current version of the preprocessor, if `##' appears between a comma and a variable arguments parameter, and the variable argument is omitted entirely, the comma will be removed from the expansion. If the variable argument is empty, or the token before `##' is not a comma, then `##' behaves as a normal token paste.
The `#line' directive used to change GCC's notion of the "directory containing the current file", used by `#include' with a double-quoted header file name. In 3.0 and later, it does not. See section Line Control, for further explanation.
In GCC 2.95 and previous, the string constant argument to `#line' was treated the same way as the argument to `#include': backslash escapes were not honored, and the string ended at the second `"'. This is not compliant with the C standard. In GCC 3.0, an attempt was made to correct the behavior, so that the string was treated as a real string constant, but it turned out to be buggy. In 3.1, the bugs have been fixed. (We are not fixing the bugs in 3.0 because they affect relatively few people and the fix is quite invasive.)
| [ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated on March, 29 2011 using texi2html 1.76.