# Chapter 3 – Working with Batches of Data

Solutions to exercises in Chapter 3 of Accelerated C++, “Working with batches of data.”

## Exercise 3-1

Suppose we wish to find the median of a collection of values. Assume that we have read some of the values so far, and that we have no idea how many values remain to be read. Prove that we cannot afford to discard any of the values that we have read. *Hint:* One proof strategy is to assume that we can discard a value, and then find values for the unread – and therefore unknown – part of our collection that would cause the median to be the value that we discarded.

### Hint

There is more than one way to prove this. Consider the definition of *median* and the steps described in the chapter to calculate it.

### Solution

Given the following:

- The median is the numeric value separating the higher half of a set of values from the lower half.
- The set of values must be sorted (e.g., arranged from lowest value to highest value) in order to determine the median.
- The midpoint is determined by examining the complete set of sorted values.

It follows that if any previously read items in the set are discarded, and a value is read that changes the midpoint of the set to a number that was previously discarded, the midpoint cannot be accurately calculated.

For example, we will assume a strategy of retaining the middle three numbers of an odd number of inputs and the middle two numbers of an even number of inputs. New inputs will be sorted into the retained set.

The numbers 1, 3, and 5 are read, and the midpoint of this set is determined to be 3. The next number read is 15, so the median may be calculated as the average of 3 and 5 (4). The values 3 and 5 are retained, and the first and last values (1 and 15) are discarded. The next number read is 1, so the new midpoint is assumed to be 3. The numbers 1, 3, and 5 are retained. If the final numbers read are 29, 40, and 33, then the median of the complete set should be 15. This number, however, has been discarded, and cannot be calculated from the numbers that have been retained.

## Exercise 3-2

Write a program to compute and print the quartiles (that is, the quarter of the numbers with the largest values, the next highest quarter, and so on) of a set of integers.

### Hint

Calculate the median on the entire set of integers, then calculate the median of the lower half and the median of the upper half to obtain the quartiles. Also, consider that even though the input values are integers, the median calculations may contain fractional values.

### Solution

This solution reads integers from the user, but the integers are placed into a `vector`

of `double`

values. The quartile calculations are `doubles`

as well, since a quartile may be the average of two integers such as 2 and 3, yielding a floating-point result of 2.5.

The calculation of the lower and upper quartiles is a little more tricky than the calculation of the median because there are four situations that need to be handled:

- An odd number of total values with an odd number of values above and below the midpoint.
- An odd number of total values with an even number of values above and below the midpoint.
- An even number of total values with an odd number of values above and below the midpoint.
- An even number of total values with an even number of values above and below the midpoint.

#include <algorithm> #include <iostream> #include <vector> using std::cin; using std::cout; using std::endl; using std::sort; using std::vector; int main() { // ask for and read the integers cout << "Please enter the integers that are to be " "arranged into quartiles, followed by end-of-file: "; vector<double> values; int x; // invariant: values contains all of the integers read so far while (cin >> x) values.push_back(x); typedef vector<double>::size_type vec_sz; vec_sz size = values.size(); // check that the user entered more than one value, since a // single value would cause an error if (size < 2) { cout << endl << "You must enter at least two values. " "Please try again." << endl; return 1; } // sort the values sort(values.begin(), values.end()); // find the median of the entire set of values vec_sz mid = size/2; double median; median = size % 2 == 0 ? (values[mid] + values[mid-1]) / 2 : values[mid]; // find the size of the lower and upper sets vec_sz half_size = size % 2 == 0 ? mid : (mid % 2 == 0 ? mid - 1 : mid); // find the midpoint of the lower and upper sets vec_sz half_mid = half_size / 2; // find the median of the lower set, which is the lower quartile double lower_quartile; lower_quartile = half_size % 2 == 0 ? (values[half_mid] + values[half_mid-1]) / 2 : values[half_mid]; // find the median of the upper set, which is the upper quartile vec_sz upper_mid = size % 2 == 0 ? half_size + half_mid : half_size + half_mid + 1; double upper_quartile; upper_quartile = half_size % 2 == 0 ? (values[upper_mid] + values[upper_mid-1]) / 2 : values[upper_mid]; cout << "The quartiles are " << lower_quartile << ", " << median << ", and " << upper_quartile << endl; return 0; }

When executed, the program gives the following output:

Please enter the integers that are to be arranged into quartiles, followed by end-of-file: 11 43 7 24 89 10 ^Z The quartiles are 10, 17.5, and 43

## Exercise 3-3

Write a program to count how many times each distinct word appears in its input.

### Hint

Use a `vector`

to store the words, then sort the `vector`

. That way you can count the words, in order, as you iterate over the `vector`

.

### Solution

This solution stores the words in a `vector`

, sorts the `vector`

, then retrieves each word from the `vector`

and increments a count each time the same word is retrieved from the `vector`

. When a word is retrieved from the `vector`

that does not match the previous word, the count is reported for the last word, then reset for the new word.

Note that this program does not account for differences in case.

#include <algorithm> #include <iostream> #include <string> #include <vector> using std::cin; using std::cout; using std::endl; using std::sort; using std::string; using std::vector; int main() { // Ask for and read the words cout << "Please enter a few words, followed by end-of-file: "; vector<string> words; string word; // Invariant: words contains all of the words read so far while (cin >> word) words.push_back(word); typedef vector<string>::size_type vec_sz; vec_sz size = words.size(); // Check that the user entered some words if (size == 0) { cout << endl << "You didn't enter any words. " "Please try again." << endl; return 1; } // sort the words sort(words.begin(), words.end()); string current_word; int count; // Set the initial word to the first word in the vector current_word = words[0]; // Set the initial count for the first word count = 1; // Invariant: we have counted current_index of the total words // in the vector for (vec_sz current_index = 1; current_index < size; ++current_index) { // Report the count for the current word if it does not match // the word at the current index in the vector, and reset the // count to zero so that it will one when the variable is // incremented outside the if statement. if (current_word != words[current_index]) { cout << "The word \"" << current_word << "\" appears " << count << " times." << endl; current_word = words[current_index]; count = 0; } ++count; } // Report the count for the final word cout << "The word \"" << current_word << "\" appears " << count << " times." << endl; // We have reported the count of all the words in the vector, so exit. return 0; }

When executed, the program gives the following output:

Please enter a few words, followed by end-of-file: Na na na na hey hey goodbye ^Z The word "Na" appears 1 times. The word "goodbye" appears 1 times. The word "hey" appears 2 times. The word "na" appears 3 times.

## Exercise 3-4

Write a program to report the length of the longest and shortest `string`

in its input.

### Hint

For each word that is input, keep track of the shortest and longest words you’ve encountered.

### Solution

Create two variables of type `string::size_type`

: one to hold the shortest size, and one to hold the longest size. As the program proceeds through the input loop it first checks to see if `shortest_size`

has been assigned a value other than zero. If not, it assigns it the first string’s size; otherwise it checks to see if the current string is shortest than `shortest_size`

. It then checks to see if the current string is longer than `longest_size`

.

#include <iostream> #include <string> using std::cin; using std::cout; using std::endl; using std::string; int main() { // Ask for and read the words cout << "Please enter a few words, followed by end-of-file: "; string::size_type shortest_size = 0; string::size_type longest_size = 0; string word; while (cin >> word) { // If shortest_size is zero, it hasn't been assigned yet. // Otherwise, see if the current word is shorter. If so, assign // the current word's size to shortest_size. if (shortest_size == 0 || word.size() < shortest_size) shortest_size = word.size(); // See if the current word is longer than longest_size. If so, // assign the current word's size to longest_size. if (word.size() > longest_size) longest_size = word.size(); } // Report the lengths of the shortest and longest words cout << "The length of the shortest word is " << shortest_size << " and the length of the longest word is " << longest_size << endl; return 0; }

When executed the output looks like the following:

Please enter a few words, followed by end-of-file: this was their finest hour ^Z The length of the shortest word is 3 and the length of the longest word is 6

## Exercise 3-5

Write a program that will keep track of grades for several students at once. The program could keep two `vector`

s in sync. The first should hold the student’s names, and the second the final grades that can be computed as input is read. For now, you should assume a fixed number of homework grades. We’ll see in §4.1.3/56 how to handle a variable number of grades intermixed with student names.

### Hint

As suggested, use two vectors. Define a maximum number of grades to request for each student. The grades vector will contain that number of grades multiplied by the number of students entered.

### Solution

This solution uses two vectors as suggested in the exercise description. The first vector contains the names of each student, and the second vector contains five grades per student. The total number of grades requested for each student is controlled by a constant, `max_grades`

.

This program expects the user to press the enter key after each grade to prevent the program from gathering too many grades. If the user enters more than `max_grades`

numbers on one line, the program will function incorrectly. We will see in later chapters how to avoid this problem.

After the user presses the end-of-file key at the prompt for the next student name, the program will loop through the `student_names`

and `student_grades`

vectors to output each student’s grades and the average of those grades.

#include <iostream> #include <string> #include <vector> using std::cin; using std::cout; using std::endl; using std::string; using std::vector; int main() { /* The number of grades accepted may be modified by changing this constant. */ const int max_grades = 5; vector<string> student_names; vector<double> student_grades; cout << "Please enter the first student's name: "; string name; int grade; int grade_count; while (cin >> name) { cout << "Enter " << max_grades << " grades for " << name << ": "; student_names.push_back(name); grade_count = 0; while (grade_count < max_grades && cin >> grade) { student_grades.push_back(grade); ++grade_count; } cout << "Please enter the next student's name " "or end-of-file to exit: "; } cout << endl; cout << "** Grades and averages for each student **" << endl; typedef vector<string>::size_type name_vec_sz; name_vec_sz student_count = student_names.size(); typedef vector<double>::size_type grade_vec_sz; grade_vec_sz grade_index = 0; double grade_total; /* For each student in student_names, retrieve max_grades values from student_grades, output each grade, then calculate and output average. */ for (name_vec_sz student_name_index = 0; student_name_index < student_count; ++student_name_index) { /* Reset the accumulator for each student */ grade_total = 0; cout << "Student: " << student_names[student_name_index] << endl; cout << "Grades: "; /* Retrieve only max_grades grades from student_grades vector */ for (grade_count = 0; grade_count < max_grades; ++grade_count) { cout << student_grades[grade_index] << " "; grade_total += student_grades[grade_index]; ++grade_index; } cout << endl; cout << "Average: " << grade_total / max_grades << endl; cout << endl; } return 0; }

When executed the output is as follows:

Please enter the first student's name: Larry Enter 5 grades for Larry: 89 88 78 98 85 Please enter the next student's name or end-of-file to exit: Moe Enter 5 grades for Moe: 99 88 79 78 96 Please enter the next student's name or end-of-file to exit: Curly Enter 5 grades for Curly: 99 98 99 97 100 Please enter the next student's name or end-of-file to exit: ^Z ** Grades and averages for each student ** Student: Larry Grades: 89 88 78 98 85 Average: 87.6 Student: Moe Grades: 99 88 79 78 96 Average: 88 Student: Curly Grades: 99 98 99 97 100 Average: 98.6

## Exercise 3-6

The average-grade computation in §3.1/36 might divide by zero if the student didn’t enter any grades. Division by zero is undefined in C++, which means that the implementation is permitted to do anything it likes. What does your C++ implementation do in this case? Rewrite the program so that its behavior does not depend on how the implementation treats division by zero.

### Hint

Look in the code and see which variable could be used as the denominator in division. Test that value against zero prior to the calculation and take appropriate action.

### Solution

On my Windows 7 64-bit system, running the exercise as compiled by Visual Studio 2010 Professional, pressing end-of-file when the first grades are requested generates the following output:

Please enter your first name: Paul Hello, Paul! Please enter your midterm and final exam grades: ^Z Enter all your homework grades, followed by end-of-file: Your final grade is -1.#J

The odd-looking output “-1.#J” is the result of the division of `sum`

by zero. Internally, this number is “-1.#IND” which means “indeterminate.” Other systems might output “nan” or “NaN,” meaning “not a number.”

To prevent division by zero, test the `count`

variable before using it as the denominator in the division calculation and report the error to the user.

#include <iomanip> #include <ios> #include <iostream> #include <string> using std::cin; using std::cout; using std::endl; using std::setprecision; using std::string; using std::streamsize; int main() { // ask for and read the student's name cout << "Please enter your first name: "; string name; cin >> name; cout << "Hello, " << name << "!" << endl; // ask for and read the midterm and final grades cout << "Please enter your midterm and final exam grades: "; double midterm, final; cin >> midterm >> final; // ask for the homework grades cout << "Enter all your homework grades, " "followed by end-of-file: "; // the number and sum of grades read so far int count = 0; double sum = 0; // a variable into which to read double x; // invariant: // we have read count grades so far, and // sum is the sum of the first count grades while (cin >> x) { ++count; sum += x; } if (count == 0) { cout << endl << "No grades were entered. Your final grade cannot be calculated." << endl; return 1; } // write the result streamsize prec = cout.precision(); cout << "Your final grade is " << setprecision(3) << 0.2 * midterm + 0.4 * final + 0.4 * sum / count << setprecision(prec) << endl; return 0; }

When executed and no grades are entered the output is as follows:

Please enter your first name: Paul Hello, Paul! Please enter your midterm and final exam grades: ^Z Enter all your homework grades, followed by end-of-file: No grades were entered. Your final grade cannot be calculated.