Introduction
Announcements

Schedule
Labs
Assignments
TA office hours

Tests, exam

Topic videos
Some course notes
Extra problems
Lecture recordings

Discussion board

Grades so far

Strings in C

We make a string out of chars by using an array. The size of the array then indicates the longest string we can store.
#include <stdio.h>
#include <string.h>

int main()
{
    char a[20];
    strcpy(a, "Hello");
    printf("%s, world\n", a);
    return(0);
}

How does it know to print just the first five chars of a[]?

Convention: use the byte zero to terminate a string. All the string library functions use this. (almost all)

printf's "%s" format uses this. It expects an array of char which ends with a zero byte.

String literals use this. When you say "hello", that defines an unnamed array which is used right there. It is of size 6, not 5.

#include <stdio.h>

int main()
{
    int i;
    char a[10];
    for (i = 0; i < 5; i++)
	a[i] = 'a' + i;
    a[5] = 0;
    printf("%s\n", a);
    return(0);
}
When we're writing the string terminator as a literal, we usually write '\0'. Thus, usually we would write
    a[5] = '\0';
'\0' and 0 are EXACTLY the same (both type int, even). But they mean something different to the human reading your code.


void mystrcpy(char *a, char *b)
{
    for (i = 0; b[i] != '\0'; i++)
	a[i] = b[i];
    a[i] = '\0';
}

We can save the extra assignment by making this a post-tested loop:

void mystrcpy(char *a, char *b)
{
    int i = 0;
    do {
	a[i] = b[i];
    } while (b[i++] != '\0');
}

In practice, we might write:

void mystrcpy(char *a, char *b)
{
    while ((*a = *b) != '\0') {
	a++;
	b++;
    }
}

In fact, any 1970s unix programmer would surely write:

void mystrcpy(char *s, char *t)
{
    while (*s++ = *t++)
	;
}
That's all you need! (These days we would prefer at least to include additional parentheses to make it clear that we are really intending the embedded assignment...)


Another string function: strcat

int main()
{
    char a[20];
    strcpy(a, "Hello");
    strcat(a, ", world");
    printf("%s\n", a);
    return(0);
}
What happens if that 20-byte array isn't big enough?

Another string.h function: strlen("Hello") -> 5

Used in dynamic memory allocation, and in checking whether something will fit in a target array.

strcmp:

strcmp("hello", "hello") -> 0
strcmp("goodbye", "hello") -> -1
strcmp("hello", "goodbye") -> 1

Can be used for sorting... not dictionary order, but used, for example, by ls.

Traditionally, returns a[i]-b[i] at first difference. This is no longer the case (it becomes too tricky in the event of bytes whose high bits are not zero), but it's still a good way to remember when it returns negative and when positive. However, in the commonest use by far, we only care about the zero case:

if (strcmp(s, "foo") == 0)


int mystrlen(char *s)
{
    int count = 0;
    while (*s++)
	count++;
    return(count);
}

strchr: a linear search.

char *mystrchr(char *s, int c)
{
    while (*s && *s != c)
	s++;
    if (*s == '\0')
	return(NULL);
    else
	return(s);
}
Actually you can call strchr(s, '\0') to find the terminating zero, so my "mystrchr" above isn't complete.
Adding this facility requires only a tiny change: the above loop does in fact find the zero; it's just that the subsequent 'if' misdiagnoses it. Just add "&& c":
char *mystrchr(char *s, int c)
{
    while (*s && *s != c)
	s++;
    if (*s == '\0' && c)
	return(NULL);
    else
	return(s);
}

strrchr -- 'r' for 'right' -- easiest algorithm does NOT go right-to-left.

char *mystrrchr(char *s, int c)
{
    char *retval = NULL;
    for (; *s; s++)
	if (*s == c)
	    retval = s;
    return(retval);
}

You could go right to left; try it, you'll see it's longer. Because you first have to find the end. (And you have more to keep track of, because you have to make sure you don't go off the beginning of the array now.)

void mystrcpy(char *s, char *t)
{
    while ((*s++ = *t++))
	;
}
n.b. single-equals-sign

int mystrcmp(char *s, char *t)
{
    while (*s == *t && *s) {
	s++;
	t++;
    }
    return(*s - *t);
}

How to write something which adds to a string?
E.g. we want:

printf("%s\n", helloize("world"));

Answer:

char *helloize(char *s)
{
    static char buf[100];
    sprintf(buf, "Hello, %s", s);
    return(buf);
}
buf must be static so that it doesn't get deallocated when the function returns.

What is the bug in this function?
Array overflow possibility.
Add limit on string: "Hello, %.90s"

A program like cat -s:

#include <stdio.h>

int main()
{
    int nlcount = 1;
    int c;
    while ((c = getchar()) != EOF) {
	if (c == '\n') {
	    if (++nlcount <= 2)
		putchar(c);
	} else {
	    nlcount = 0;
	    putchar(c);
	}
    }
    return(0);
}

The fgets library function, without the EOF signal:

void myfgets(char *s, int size, FILE *fp)
{
    int pos = 0, limit = size - 1;
    int c;
    while ((c = getc(fp)) != EOF && pos < limit) {
	s[pos++] = c;
	if (c == '\n')
	    break;
    }
    s[pos] = '\0';
}

Or, better in my opinion (avoiding the array address calculations):

void myfgets(char *s, int size, FILE *fp)
{
    char *limit = s + size - 1;
    int c;
    while ((c = getc(fp)) != EOF && s < limit) {
	*s++ = c;
	if (c == '\n')
	    break;
    }
    *s = '\0';
}

String function summary

Here are some string functions you want to be sure to be aware of from the string library, mostly discussed above: The char pointers b, c, d, e, f, g, h, i, and j (i.e. all of the string arguments except for 'a') must point to properly zero-terminated strings.
The char pointers a and c must point into an array with enough space to store the result.