Introduction
Announcements

Schedule
Labs
Assignments
TA office hours

Tests, exam

Topic videos
Some course notes
Extra problems
Lecture recordings

Discussion board

Grades so far

The C pre-processor

The "C pre-processor" is the '#' stuff in C.

It's a separate language, superimposed on top of your C source code files -- a transformation on the document before it gets presented to the rest of the C compiler.

Pre-processor commands begin with '#'. They must be on their own lines, and the '#' must be the first non-whitespace character on the line.

You've seen "#include". This simply copies in the specified file at this point. By convention, files which contain type declarations and other similar stuff which is meant to be #included at the top of your .c file are named (modulename).h, but the C compiler doesn't enforce this.

The file name following "#include" can be enclosed either in double quotes or in angle brackets (< >) (a very weird choice of syntax, really). If, as usual, it is a relative path name (i.e. it doesn't begin with a slash), then if in double quotes it is interpreted relative to the directory the file doing the #include is in; if in angle brackets it is interpreted relative to a specified system directory, which in unix is /usr/include.

The original C programming language had no "const". The "const" keyword was introduced by the 1989 ANSI C standard, but it's not as flexible as in C++ (or as 'final' in java). So although these days we usually prefer to use const for computational-oriented constants, we can't use it for things such as the size of an array. And because of history, we often use the pre-processor for computational constants too (value of pi, that sort of thing). Whereas "consts" in C are really much more like variables which you just can't modify, rather than symbolic constants like in C++.

For symbolic constants in C, we have two choices: "enum" (part of the C language, but wasn't in the very beginning), and #define (part of the C pre-processor).

We write a line of the form

#define identifier replacement

Example:

#define FOOSIZE 10

It has its tricky bits. Consider the following:

#define i 3;
j = i - 2;
The semicolon at the end of the #define line is a mistake; C pre-processor commands are not terminated with semicolons, but with end-of-line. But it's an easy mistake to make.

But when substituted, the assignment line because "j = 3; -2;", which does not have the desired effect. Actually, "-2;" is a valid null statement, so j is assigned to be 3!

This shows why real consts, like in C++, are a vast improvement over #define. But of course the main reason I mention it is because this is a lurking danger which you should understand so as to be able to avoid, and to be able to diagnose if you do introduce a bug in your program in this sort of way. Remember that the preprocessor commands are a separate language. They don't do C-ish things like end with semi-colons.

More #define warts:
Consider this:

#define x 5
#define y x+2
Now suppose we use the expression "y*3".
The result of the substitution is "x + 2 * 3"! This does not evaluate as intended.
The solution is that the #define for y actually has to be
#define y (x+2)
This way, no matter what context it's substituted into, the x+2 remains a syntactic unit.

We also have "function-like macros", which look like this:

#define twice(i) ((i) * 2)
You could also call this a "parameterized #define".
We can use it by writing something like "twice(x+3)", which would expand to "((x+3) * 2)".
There are a lot of parentheses in this #define. But they're all necessary. The need for the inner parentheses is actually demonstrated by this example substitution. More generally, you need to parenthesize any substituted parameter in case it contains lower-precedence operators than the context you're substituting it in; and you need to parenthesize the overall string in case the context it is being substituted into contains higher-precedence operators than your string.

We can continue to get weirder here. Some people write things like

#define printstring(s) printf("%s\n", (s))
In addition to requiring the extra parentheses, note the absence of a terminating semi-colon here! The idea is that you will write something like
printstring("foo");
so the caller supplies the semi-colon.
Also consider its use in a context such as this:
if (x == 3)
    printstring("foo");
else
    y = 12;
While an extra semi-colon often wouldn't hurt (just forming a spurious null statement), in this last case it would introduce a syntax error because the 'else' would no longer be immediately subsequent to the 'if' as required.

So, you might ask, why would someone do something like the above?

I would like to suggest that you do not do something like the above!

Once upon a time, people fretted about the "function call overhead" -- when you do a function call, several machine language statements are executed to set up the arguments, do the transfer of control, and do the transfer of control back at the end of the function.

But these days, the "function call overhead" is not something we worry about very much. It was always overrated. Don't overuse these sorts of "macros". #define is best just for "constants". C++ has a facility for stating constants in the language, but in C, you need to use #define for at least some kinds of constants, but you can restrict your use of #define to that.

More notes about parameterized #defines:

#define twice(i) ((i) * 2)
There MUST NOT be a space between the 'twice' and the '(i)', because this is how it tells that that "(i)" is a parameter, rather than part of the substitution. Similarly, in "#define y (x+2)" there MUST be a space between the y and the left parenthesis! It's a very low-level situation, reminiscent more of things like shell scripts than a high-level language, and worth steering clear of the dark areas.

The C++ programming language adds language features to replace almost all uses of the C pre-processor, except for #include. Unlike in C, the C++ "const" storage class makes a genuine constant that can even be used for the dimension in non-dynamic array definitions. And to replace "function-like macros" there are "inline functions", which are basically the same concept except that they obey the rules of the programming language, with all relevant syntax and semantics checking, thus eliminating all of the pitfalls of function-like macros in the C pre-processor.

But since C++ is a superset of C, with almost all C programs being valid C++ programs, it was not possible to eliminate the C pre-processor entirely. This is very different to the situation with Java, which although being syntactically C-like in many ways is expressly not designed to be literally C compatible, so that they were able to make changes such as removing the C pre-processor entirely.

Conditional compilation

"Conditional compilation" is not generally part of CSC 209, but is implemented in C using the C pre-processor so I'll talk about it here too.

The idea of conditional compilation is that sometimes, not all of your code is presented to the compiler. The easiest example use of this is when code is necessarily platform-specific.

We try to write code to be portable as much as possible, but sometimes this is not possible. Imagine if you were developing for a platform which had unix-like pipes, except that the designers foolishly decided to insert an additional first parameter saying the size of the argument array, which has to be two. This platform is called "stupidnix". So instead of saying

	if (pipe(pipefd)) {
	    ...
on the stupidnix platform you have to say
	if (pipe(2, pipefd)) {
	    ...

If you are happy with having your code only run on stupidnix, that's no problem. But there are many more computers out there running unix than stupidnix, so you really want your program to compile and run on unix too.

This is a classic case for conditional compilation. We write (in a larger code segment as shown):

	int pipefd[2];

	if (
    #ifdef stupidnix
		pipe(2, pipefd)
    #else
		pipe(pipefd)
    #endif
				) {
	    perror("pipe");
	    exit(1);
	}
	...

"#ifdef" tests whether a particular name is "defined" in the pre-processor. This could be a value you have #defined in the usual way, or, platforms generally define a certain predefined set of names which you can use for testing in this way.

Like all C pre-processor directives, this affects what the core compiler sees. The C pre-processor will present only one of those two lines to the compiler, as appropriate.

The above looks pretty ugly, but I did my best to make it readable under the circumstances. The code excerpt which is in the condition is a syntactic unit (the call to pipe()), but as usual with the C pre-processor you can play arbitrary games with this. But for actual code, we try to keep it looking as clear as we can.

Actually, better than testing simply which platform you are on is to do "feature-testing". For example, if some versions of unix have a signal number "SIGFOOEY" defined in signal.h, and some don't, then rather than testing for your particular version of unix you can test "#ifdef SIGFOOEY". This remains accurate as platforms' set of offered features changes over time.

But "#ifdef" doesn't accommodate all desired conditional compilation. Suppose, for example, there is a #define of a value FOO, and this particular code needs a special case for if FOO is 4?

We could use a normal 'if' statement, but traditionally, if this is a #define, people would be happier to have the resolution at compile time, and if FOO isn't 4, you don't even have the code for dealing with the FOO==4 case in your compiled program at all.

For this we would do "#if FOO == 4", just like that. Instead of "#ifdef", which just checks whether something is defined, you can write an arbitrary boolean expression. You later use "#endif", and optionally "#else", in the same way. There is also "#elseif".

The expression which follows #if can use all of the standard C operators, but it is evaluated at compile time, and it is independent of the C compiler proper. This means that it can't refer to variables in your C program, and, perhaps less obviously, it can't use sizeof (e.g. you can't do "#if sizeof(int) == 8").

Conditional compilation important programming note

There is an important programming guideline for using conditional compilation, which is important for keeping your programs under control.

The basic form of this guideline about using conditional compilation is:

Don't.

More subtlely, sometimes conditional compilation is necessary, but you should use it as little as possible. Almost all other programming techniques are preferable.

If you do use conditional compilation, use it in as isolated and few places as possible. For example, don't have

	#ifdef SOMETHING
	    4
	#else
	    8
	#endif
in twenty places in your program, but instead have something like
	#ifdef SOMETHING
	#define BARSIZE 4
	#else
	#define BARSIZE 8
	#endif
once, at the top or in a .h file.