Assignment 2: Rent-a-Bike

Due Date: Tuesday 31 October 2017 before 9:00pm

This assignment must be completed alone (no partners). Please see the syllabus for information about academic offenses.

Introduction

Toronto's bike share network debuted in 2011, offering rental bikes to Torontonians and visitors in the downtown core. This network consists of hundreds of docking stations scattered around downtown. Bikes can be rented from any docking station and returned to any docking station in the city. In this assignment, you will write several functions to help manage and track bike rentals across this network. Using real data from Toronto's bike share system, your functions will simulate bike rentals and returns as well as keep track of the current state of the network and even provide directions to riders.

The data that you will work with is provided by the Toronto bike share network. The data contains information about the docking stations, such as the location of the station and how many bikes are currently available. More information about the data provided and where it comes from is given later in this handout.

The purpose of this assignment is to give you practice using the programming concepts that you have seen in the course so far, including (but not limited to) strings, lists and list methods, and loops.

This handout explains the problem you are to solve, and the tasks you need to complete for the assignment. Please read it carefully.

Files to Download

Please download the Assignment 2 Data Files and extract the zip archive. A description of each of the files that we have provided is given in the paragraphs below:

Starter code: bikes.py

The bikes.py file contains some constants, and a couple of complete helper functions that you may use. You must not modify the provided helper functions.

The bikes.py file also contains function headers and docstrings for the A2 functions to which you are required to add function bodies. For each function, read the header and docstring (especially the examples) to learn what task the function performs. Doing so may help you to determine what you need to do for each required function. To gain a better understanding of each function, you may want to add another example to the docstring.

Data: stations.csv

The stations.csv file contains bike share data in comma-separated values (CSV) format. You must not modify this file.

Checker: checker.py

We have provided a checker program (checker.py) that tests two things:

The checker program does not test the correctness of your functions, so you must do that yourself.

The Data

For this assignment, you will use data from a Comma Separated Value (CSV) file named stations.csv. Each row of this file contains the following information about a single docking station:

Note: While the sum of the number of bikes available at a station and the number of docks available at a station will usually equal the station's capacity, this need not be the case. When a bike or a dock is broken, the sum of the two availabilty numbers will not match the capacity.

We have provided a function named csv_to_list, which reads a CSV file and returns its contents as a List[List[str]]. As you develop your program, you can use the csv_to_list function to produce a larger data set for testing your code. See the main block at the end of bikes.py for an example.

Your Tasks

Imagine that it is your job to manage Toronto's bike share system. As the manager, you need to know everything about the system. But, there are hundreds of docking stations, which is way too many to keep track of in your head. To make your life easier, you will write Python functions to help you manage the system.

Your functions will fall into three categories: functions for data cleaning, functions for data queries, and functions for data modification.

Data cleaning

We provided a function named csv_to_list that reads data from a CSV file and returns it in a List[List[str]]. Here is a sample of the type of list returned:

[['7000', 'Ft. York / Capreol Crt.', '43.639832', '-79.395954', '31', '20', '11', 'True', 'True'],
['7001', 'Lower Jarvis St / The Esplanade', '43.647992', '-79.370907', '15', '5', '10', 'True', 'True']]

Notice that all of the data in the inner lists are represented as strings. You are to write the function clean_data, which should make modifications to the list according to the following rules:

After applying the clean_data function to the example list, it should look like:

[[7000, 'Ft. York / Capreol Crt.', 43.639832, -79.395954, 31, 20, 11, True, True],
[7001, 'Lower Jarvis St / The Esplanade', 43.647992, -79.370907, 15, 5, 10, True, True]]

Before you write the clean_data function body, please note:

Data cleaning function to implement in bikes.py.
Function name
(Parameter types) -> Return type
Full Description
clean_data
(List[list]) -> None
The parameter represents a list of list of strings. The list could have the format of stations data, but is not required to. See the starter code docstring for some examples.

Modify the parameter so that strings that represent whole numbers are converted to ints, strings that represent numbers that are not whole numbers are converted to floats, and strings that represent Boolean values are converted to bools.

Data queries

Once the data has been cleaned, you can use the following functions to extract information from the data. All the examples shown below assume that you are calling the function on the cleaned example list shown above.

List of data query functions to implement in bikes.py.
Function name
(Parameter types) -> Return type
Full Description
get_station_info
(int, List[list]) -> list
The first parameter represents a station ID number and the second parameter represents cleaned stations data.

Return a list containing the name, the number of bikes available, and the number of docks available (in this order), for the station whose station ID is the first parameter.

Precondition: the station ID will appear in the cleaned stations data.
get_total
(int, List[list]) -> int
The first parameter represents an index and the second parameter represents cleaned stations data.

Return the sum of the values at the given index in each inner list of the cleaned stations data.

Precondition: the items in the list at the given index position are integers.
get_station_with_max_bikes
(List[list]) -> int
The parameter represents cleaned stations data.

Return the station ID of the station that has the most bikes available. If there is a tie for the most available, return the station ID that appears first in the stations list.

Precondition: the cleaned stations data will contain at least one station
get_stations_with_n_docks
(int, List[list]) -> List[int]
The first parameter represents a required minimum number of available docks and the second parameter represents cleaned stations data.

Return a list of the station IDs of stations that have at least the required minimum number of available docks . The station IDs should appear in the same order as in the given stations data list.

Precondition: the required minimum number of available docks will be >= 0.
get_direction
(int, int, List[list]) -> str
The first two parameters represent station ID numbers and the third represents cleaned stations data.

Return a string that contains the direction to travel to get from the first station to the second. The string should contain one of 'NORTH', 'SOUTH', or '', followed by one of 'EAST', 'WEST', or ''. Make sure your spelling is correct so your strings match our tests!

For example, if the first ID is 7000 and the second is 7001, the function should return 'NORTHEAST'. Similarly, if the first ID is 7001 and the second is 7000, the function should return 'SOUTHWEST'. Take a look at the latitude and longitude values for these stations in the CSV file, and if needed, look up the meaning of latitude and longitude. Try drawing some pictures!

Precondition: the two station ID numbers will appear in the cleaned stations data.

Data modification

The functions that we have described up to this point have allowed us to clean our data and extract specific information from it. Now we will descibe functions that let us change the data.

List of data modification functions to implement in bikes.py.
Function name
(Parameter types) -> Return type
Full Description
rent_bike
(int, List[list]) -> bool
The first parameter represents a station ID and the second represents cleaned stations data.

A bike can be rented from a station if and only if:
  • there is at least one bike available at that station, and
  • the station is renting bikes (according to the station's IS_RENTING information).
If the conditions above are met, this function successfully rents a single bike from the station. A successful bike rental requires updating the bikes available count and the docks available count as if a single bike was removed from the station. Return True if the bike rental is successful, and False otherwise.

Precondition: the station ID will appear in the cleaned stations data.
return_bike
(int, List[list]) -> bool
The first parameter represents a station ID and the second represents cleaned stations data.

A bike can be returned to a station if and only if:
  • there is at least one dock available at that station, and
  • the station is allowing returns (according to the station's IS_RETURNING information).
If the conditions above are met, this function successfully returns a single bike to the station. A successful bike return requires updating the bikes available count and the docks available count as if a single bike was added to the station. Return True if the bike return is successful, and False otherwise.

Precondition: the station ID will appear in the cleaned stations data.
balance_all_bikes
(List[list]) -> int
The parameter represents cleaned stations data.

Modify the stations data so that the percentage of bikes available at each station is as close as possible to the overall percentage of bikes available across all stations. To calculate the overall percentage of bikes available across all stations, include all stations, regardless of whether they are currently renting or returning.

Return the difference between the number of bikes rented and the number returned after completing this modification. If more bikes were returned than were rented, the function should produce a negative number. (Note: this means that this function could add or remove bikes from the overall bike network.)

To illustrate this, let's consider our cleaned example list from before. This list contains two stations, one that is 65% full (20 bikes available out of a 31 dock capacity) and one that is 33% full (5 bikes available out of a 15 dock capacity). We want both of these docking stations to have a percentage available that is as close as possible to the percentage available across all stations. In our example, 20+5 bikes out of 31+15 bikes gives a goal percentage of 54%. For each station, based on its capacity, we calculate how close we can get to the goal percentage. With the cleaned example list, we are aiming for 17 bikes in the first station (54% of 31 is 17, after rouding to a whole number of bikes) and 8 in the other station (54% of 15 is 8, after rouding to a whole number of bikes). Now, for each station, we rent and/or return enough bikes to reach the target.

As with the other data modification functions, you should only remove bikes from a station if and only if the station is renting and there is a bike available to rent, and you should only return a bike if and only if the station is allowing returns and there is a dock available. Keep track of the overall number of bikes rented and returned. This function is to return the difference (a positive number if more bikes were rented than returned, 0 if the same number were rented as returned, and a negative number if fewer bikes were rented than returned). For the example above, 3 bikes were rented from one station and 3 were returned to the other, so the function returns 0.

This function must be able to redistribute the bikes no matter how many stations are in the cleaned stations data.

Testing your Code

It is strongly recommended that you test each function as you write it. As usual, follow the Function Design Recipe (we've done the first couple of steps for you). Once you've implemented a function, run it on the examples in the docstring.

Here are a few tips:

Remember to run the checker!

Additional requirements

Marking

These are the aspects of your work that will be marked for Assignment 2:

What to Hand In

The very last thing you do before submitting should be to run the checker program one last time. Otherwise, you could make a small error in your final changes before submitting that causes your code to receive zero for correctness.

Submit bikes.py on MarkUs by following the instructions on the course website. Remember that spelling of filenames, including case, counts: your file must be named exactly as above.

Getting data from the web (Optional)

In this assignment, you worked with real world data. In this section, we'll tell you a bit more about this dataset. Note, this part is optional and not for any marks, so read it only to satisfy your curiosity!

The City of Toronto Bike Share website provides data about stations in a file format called JSON. We used two of the City's bike share JSON files, one with station information and another with station status.

Although we could have manually downloaded the files from the bike share website, we wrote a program to do that for us. We are computer scientists after all! Our program, download_data.py, is used to read the data from the bike share website and save it two JSON files (current_info.json and current_status.json) on our local computer. Once we have those two files, we then run a second program, prepare_data.py to convert the data from the JSON files to CSV format, remove some fields, and merge data about the same station into a single row.

Although the download_data.py and prepare_data.py files may be a bit challenging to read and understand at the moment, you should be able to write this type of code yourself by the end of the course. In the meantime, you can use our programs to get the most current bike share data.