Introduction
Announcements

Schedule
Labs
Assignments
TA office hours

Mid-term test

Videos
Course notes
Extra problems
Discussion board

CSC 209 Assignment 2, Summer 2017

Due by the end of Sunday July 9, 2017; no late assignments without written explanation.

1. A tool to put a box around file contents

The box command which you will write for this part of the assignment transforms input data which looks like this:
	"It's impossible to foresee the consequences of being clever."
	                -- Christopher Strachey
into something which looks like this:
	**********************************************************************
	* "It's impossible to foresee the consequences of being clever."     *
	*                 -- Christopher Strachey                            *
	**********************************************************************

Your program will take zero or more file names in the usual way, including accepting the use of '−' for the standard input. It will also take options "−e", "−w", and "−c". The "w" and "c" options take an argument. You must use getopt() to parse the command-line options in the standard manner.

"−w" sets the width of the box (including the asterisks and spaces), which defaults to 70. "−c" sets the character used for making the box, which defaults to asterisk. If a line in the input file is too wide to fit in the box, by default it is cut off so that the right side of the box remains straight, and further characters on that line are discarded. However, the "−e" option makes it output the entire wide line without regard for the right side of the box.

Algorithm: Where width is the −w value or 70 as applicable, and replacing "asterisk" by the −c character as appropriate: After option parsing and checking, your program will print a row of width asterisks. Then it loops through the files. For each line of each file, it outputs asterisk, space; then up to width−4 characters of the line; then enough spaces to make the number of characters output for this line so far add up to width−2 characters; then another space and another asterisk. (This needs to be modified as appropriate for −e.) After processing all of the files, it outputs a row of width asterisks again. Note that rows of asterisks do not appear between the files.

You do not need to check the validity of the arguments to the options. For example, you can simply call atoi() on the −w argument. If the result is zero or negative, your program will probably produce unuseably-ugly output, although it must not crash.

2. strstr

Write the "strstr" string library function, which searches for one string in another. If the search string is not found, it returns the null pointer (which you get, as always, by converting a constant zero to the appropriate pointer type).

Call your function "mystrstr" for ease of testing (and you must submit it under the name "mystrstr" too).

Your function must not call any string library functions.

The supplied "testmystrstr.c" might be helpful for testing your function, including for comparing it to the behaviour of the "real" strstr(). Note that your submitted file should not contain a main(), just mystrstr().

3. A tool to find binaries (compiler output)

There are a number of situations in which we want to reduce the disk space consumption of a directory subtree, just to save space, or for packaging for transmission or posting on the web, or any other reason to reduce file sizes.

One tool for reducing disk space consumption is compression. But some files don't need to be preserved at all, and deleting a file saves more space than compressing it.

In most cases, any file which is automatically generated can be removed because it can be generated again. In this assignment question you will write a C program named findbin to find compiler output files: .o files and a.out files. (Your program will just find them, not remove them; it could be piped into "xargs rm" to remove them.)

Rather than going by file name (which is particularly impossible in the case of a.out files), you will examine the first four bytes of the file contents. By convention, this is called the file's "magic number" and has unique values for particular non-text file types.

Both .o files and a.out files are in the "ELF" format, in which the first four bytes are 127, 'E', 'L', and 'F', in that order. Once the file is in the "ELF" format, subsequent bytes indicate further details, such as what platform it's for; but we don't need to look at those to write findbin.

Your program takes one or more command-line arguments, which are directory names. Examine all files under that point in the filesystem hierarchy, and for "ELF" format files output their full path name beginning with the original command-line argument. For example, if the argument is "foo/bar", then a possible output line might be "foo/bar/a/b". So this is very similar to the output of "find ... −print".

Remember to use lstat() instead of stat(), and do not report (nor follow) symbolic links. Your program never outputs the path names of directories, only plain files.

findbin should not do a chdir() (which you might be tempted to do to avoid string processing). If it does a chdir(), your program will no longer always work with multiple directory arguments, since a chdir() in processing the first directory may invalidate the name of the specified second directory.

You may assume a maximum likely complete path name of, say, 2000 characters, so long as your program aborts with an error, rather than exceeding array bounds, should the situation turn out to be more complex than this.

You may not use ftw() or fts().


† The HTML version of the assignment file said "Your program takes one mandatory command-line argument". This was a version control error. This was brought to my attention and I fixed it only on July 5, so either version will be accepted for the assignment -- you can write your program to take only one argument if you wish. Your argc checking and usage message must reflect the actual behaviour of your program. See ../notes/tiny/usage.html for a reminder of how usage messages work.

Other notes

Your C programs must be in standard C. They must compile on the teach.cs machines with "gcc −Wall" with no errors or warning messages, and may not use linux-specific or GNU-specific features.

Pay attention to process exit statuses. Your programs must return exit status zero or one as appropriate.

Call your assignment files box.c, mystrstr.c, and findbin.c. Your files must have the correct names so as to be processed correctly by the grading software, and auxiliary files are not permitted. Submit with the command

	submit -c csc209h -a a2 box.c mystrstr.c findbin.c
and the other 'submit' commands are as before.

Please see the assignment Q&A web page at http://www.teach.cs.toronto.edu/~ajr/209/a2/qna.html for other reminders, and answers to common questions.

Remember:

This assignment is due at the end of Sunday, July 9, by midnight. Late assignments are not ordinarily accepted, and always require a written explanation. If you are not finished your assignment by the submission deadline, you should just submit what you have, for partial marks.

Despite the above, I'd like to be clear that if there is a legitimate reason for lateness, please do submit your assignment late and send me that written explanation.

And above all: Do not commit an academic offence. Your work must be your own. Do not look at other students' assignments, and do not show your assignment (complete or partial) to other students. Collaborate with other students only on material which is not being submitted for course credit.