Introduction
Announcements

Schedule
Labs
Assignments
TA office hours

Tests, exam

Topic videos
Some course notes
Extra problems
Lecture recordings

Discussion board

Grades so far

structs

Here are some notes on structs in C.

Contents:


Introduction to structs

A struct is an aggregate data type, like an array.

However, an array has a certain regularity which a struct does not have. An array is an array of some number of something, all the same. E.g. an array of 100 ints. Each array member is an int. You can index which of the 100 ints you want to refer to with an arbitrary expression. This is only possible because all of the elements are the same size and are in order. This observation is similar to the statement that array indexing is pointer arithmetic.

A struct, on the other hand, is an aggregate in which the elements can be all different. They are referred to by name, and there is no analogue of a non-constant expression for an index.

Example: ordered pairs representing points in a plane.

struct point {
    int x, y;
};

"point" is called the "struct tag". Altogether the above declares a type which we can call "struct point". It does not declare any objects (variables).

We can later say

struct point topleft, bottomright;
to define a rectangle perpendicular to the axes.

Here we use some structs as parameters, after the above type declaration:

double dist(struct point a, struct point b)
{
    double xdist = a.x - b.x;
    double ydist = a.y - b.y;
    return(sqrt(xdist * xdist + ydist * ydist));
}

Struct "members" are accessed by saying the struct variable name, a dot, and then the member of the struct.

You may notice that this looks a lot like a class declaration. In fact structs are basically data-only classes (no "methods"). The "class" syntax in C++ (and subsequently in java) was derived from the C "struct" syntax.

However, a C struct, unlike a java class reference, is the data, not a reference to it. For example, in java, if you have an object of type "point" similar to the above, in which a.x is 3,

because the "b=a" assignment copies the reference.

Whereas in C, if you have an object of type "struct point" as above, in which a.x is 3,

because the "b=a" assignment copies the data, not a reference (pointer) to the data.

That is, "struct point a;" actually allocates space for the data. The analogue in C of java's "point a;" would be "struct point *a;", to allocate a pointer to a struct point, rather than to allocate a struct point itself.

Similarly, if you pass a struct to a function and the function modifies some struct members, it is modifying a copy, not the original. As with all parameter passing, a copy is made.

One advantage of structs in connection with functions is that a function can only return a single value; but you can return (for example) two ints by declaring a struct of two ints and returning an item of that struct type.


Full syntax

It looks like we have two totally different struct declaration syntaxes above, but actually they're both special cases of the general form.

A struct declaration in C can declare a struct tag (type), or items of that struct type, or both.

The syntax begins with the keyword "struct". Then there is an optional tag name, an optional struct member list (in braces), an optional list of variable names being declared, and then a semicolon.

If there is a struct member list in braces, then this declaration is creating that struct type. In this case, if there is a tag then it creates that struct tag name and associates it with this member list.

If there is a struct tag but no struct member list in braces, then this declaration is declaring variables in the list to be of that struct type, but not saying what the struct type is. In most cases the struct tag must have previously been declared with the actual struct member list in braces. (One exception to this is if you are only declaring objects of type pointer-to-that-struct, rather than objects of type that-struct.)

Then, the variable list, if present, causes variables to be declared in the normal C way. Other modifiers such as storage classes and declaration type modifiers can be applied. For example, you can say "extern struct something x;" to declare that "x" is a global variable of type "struct something"; or you can say "struct something a[100];" to declare 'a' to be an array of 100 struct somethings.

Examples:

1. Create type "struct point", as above.

struct point {
    int x, y;
};

2. Simultaneously create type "struct point" and declare variable "a" to be of this type.

struct point {
    int x, y;
} a;

3. Declare variable "a" to be of type "struct point", when that type was previously created (e.g. by #1 above).

struct point a;

4. Declare variable "b" to be an array of 100 structs each containing an int and a double.

struct {
    int x;
    double y;
} b[100];

5. Assign the second member of item 38 of the array defined in #4 to have the value 123.456.

b[38].y = 123.456;


Pointers to structs

First, you need to know about the idea of the "null pointer" (or "nil"): It's a special signal value, not equal to a pointer to any object. Like "null" in Java. Except:

In C, you get a null pointer of a given type by converting a constant zero to pointer type.

stdio.h says:   #define NULL 0
(a rather odd place for it, but #include <stdio.h> is more common than #include any other one file)

So we can declare a pointer to struct list like this:

struct list *top;
or we can declare and initialize it to the null pointer value like this:
struct list *top = NULL;
That's a constant zero; it is being assigned to a variable of type pointer-to-struct-list; thus this is a conversion; and converting a constant zero to a pointer type causes the resulting value to be the null pointer of type pointer-to-struct-list.

You've seen a pointer to a struct in your use of the stat() system call. Your call went something like this:

struct stat statbuf;
... stat("...", &statbuf)
A stat() library function would be defined starting something like this:
int stat(char *filename, struct stat *p)
Inside the function stat, to set the st_size value to 38, we could write:
(*p).st_size = 38;
This is a bit clumsy, and the parentheses are needed because of operator precedence. However, there is another operator which dereferences and selects at the same time:
p->st_size = 38;
This is the same as the previous statement, just a nicer syntax. That is, a->b is defined as (*a).b.

This "->" syntax is actually quite useful. If struct list contains an item "data" and an item "next", and if our linked list is of length at least three, we can refer to the data item in the third item of the linked list with top->next->next->data, as opposed to (*(*(*top).next).next).data.


Dynamic allocation of structs

Another thing we need to be able to do to make a linked list is to allocate and deallocate memory similarly to the use of "new" in java or C++.

For this purpose we use a library function "malloc", short for "memory-allocate".

malloc's single parameter specifies a number of bytes of memory to allocate, and it returns a pointer to the beginning of the allocated data area, or NULL if there is not enough memory available.

To figure out how much to allocate, we often use the C operator "sizeof". You can do "sizeof" of a data object or of a type. The type name is parenthesized.

top = malloc(sizeof(struct list));
if (top == NULL) {
    fprintf(stderr, "out of memory\n");
    exit(1);
}
(This kind of error handling of null pointer malloc returns will, actually, be adequate for most CSC 209 assignments and for many unix tools — an out-of-memory condition is not expected and is quite surprising. However, if you don't even test for it, you'll be sorry one day.)

Struct members do not start off as zero!
And there's no "constructor" to initialize them!

top->data = 5;
top->next = NULL;

The opposite of malloc is free:

free(top);
after which the pointer 'top' is now invalid, just like an uninitialized variable.

All memory is freed upon program termination. As with fopen()/fclose(), we tend to use free() if the memory is being re-malloc'd in a loop, and otherwise to allow it to vanish with everything else at program termination. Larger programs generally need to free() all malloc'd memory eventually, and small unix tools often don't.


Another malloc use: dynamically-sized arrays.

int x[10]; ← ok
int y[z]; ← not ok in C89 if 'z' is a variable

instead:

int *y;
y = malloc(z * sizeof(int));
then we can use y[i], etc

malloc'd memory persists after the function returns. You can assign malloc results to global variables and use them later.

You can't say:

int *f()
{
    int a[10];
    ...
    return(a);
}
because that 'a' array is deallocated as the function returns.

But returning a pointer to malloc'd memory is fine:

int *f()
{
    int *a;
    if ((a = malloc(10 * sizeof(int))) == NULL)
	...
    ...
    return(a);
}
because the malloc'd memory persists until free() is called on the pointer — its existence is not tied to the duration of the execution of the function.