Merge Sort: Create a sorted file by merging two sorted files

The Problem

Suppose that we have two data files with numbers. The numbers are stored in ascending order. Create a program that merges the two files, i.e. generates a third file where all the numbers from both of the files are stored in ascending order.

Problem Analysis

The algorithm is relatively simple:

  1. Start from the beginning of the two files
  2. While there are numbers in both files:
    1. Select the bigger number and move the pointer of that file
    2. Write the number to the output

The only complication to above algorithm comes when the one of the files is exhausted before the other. In this case we should take the next number from that file.

Further analysis of the algorithm identifies three major responsibilities:

  • sequential number input from a file
  • compare numbers and build output
  • store the output

If we encapsulate the number input into a NumberReader class, the class must provide the following services to the client:

  • hasValue() - tells the client that the file hasn't been exhausted and there is a valid value ready to be used.
  • getValue() - gets the current value.
  • fetchNext() - retrieve the next value from the underlying file.

As one might notice the NumberReader class is similar to the Iterator pattern, but because we might need to query one and the same value many times until it is actually used, the class doesn't fetch automatically the next value.

The constructor of the NumberReader class will take one argument - the file name. It will open the file and intialize the instance by fetching the first number.

The responsibility to compare numbers and build output might be encapsulated in a separate class MergeSort. It might be a little overhead for the problem space, but will add possibility for reuse. During the intialization, the MergeSort should receive number input and result output services.

The third service can be implemented directly by using one of the Java's IO classes. We will use Adaptor pattern and will encapsulate this functionality into a NumberWriter class. This class should provide only one service: writeValue().

The Solution

Here is the code for the NumberReader class. We added a requirement the class to be able to read numbers separated by whitespaces, i.e. new lines, tabs and spaces. We also want that the reader automatically closes the file when end of file is reached.

/**
 * Class that implements a number queue by reading
 * a file in a sequential manner.
 * @author Ivan Georgiev
 */
import java.io.*;
import java.util.StringTokenizer;
class NumberReader {
	private BufferedReader reader;
	private StringTokenizer tokens = null;
	private double value;
	private boolean isValid;
 
	public NumberReader(String fileName) 
		throws FileNotFoundException, IOException
	{
		reader = new BufferedReader(new FileReader(fileName));
		isValid = true;
		fetchNext();
	}
 
	/**
	 * Checks the queue status.
	 * @return boolean Returns true if the current value is valid or false 
	 * if the queue is empty and current value is not valid.
	 */
	public boolean hasValue() {
		return isValid;
	}
 
	/**
	 * Get the current value, retrieved from the queue.
	 * @return double Returns the current value.
	 */
	public double getValue() {
		return value;
	}
 
	/**
	 * Fetch the next value from the queue and make it 
	 * available through the getValue() method.
	 * @return boolean Returns true if next value is available 
	 * (see hasValue() method.)
	 */
	public boolean fetchNext() 
		throws IOException, NumberFormatException
	{
		if (isValid) {
			isValid = false;
			if (tokens == null || ! tokens.hasMoreTokens()) {
				String sLine = reader.readLine();
				if (sLine == null) {
					tokens = null;
				} else {
					tokens = new StringTokenizer(sLine);
				}
			}
			if (tokens != null) {
				value = Double.parseDouble(tokens.nextToken());
				isValid = true;
			}
			if (!isValid) {
				reader.close();
			}
		}
		return isValid;
	}
}

The code for the NumberWriter class is relatively simple.

/**
 * Class to encapsulate number output.
 * @author Ivan Georgiev
 */
import java.io.*;
class NumberWriter {
	private PrintWriter printer;
 
	public NumberWriter(String fileName) 
		throws IOException
	{
		printer = new PrintWriter(new BufferedWriter(new FileWriter(fileName)));
	}
 
	public void writeValue(double value) {
		printer.println(value);
	}
 
	public void close() {
		printer.close();
	}
}

Having implemented the NumberReader and NumberWriter and delegated the IO tasks to them, we can focus on the core of the MergeSort class.

/**
 * Class to implement merge sort.
 * @author Ivan Georgiev
 */
class MergeSort {
	private NumberReader source1;
	private NumberReader source2;
	private NumberWriter writer;
 
	public MergeSort(NumberReader source1, NumberReader source2, NumberWriter writer) {
		this.source1 = source1;
		this.source2 = source2;
		this.writer = writer;
	}
 
	public void run() 
		throws IOException, NumberFormatException
	{
    	NumberReader usedReader;
    	while (source1.hasValue() || source2.hasValue()) {
    		if (!source1.hasValue()) {
    			usedReader = source2;
    		} else if (!source2.hasValue()) {
    			usedReader = source1;
    		} else if (source1.getValue() < source2.getValue()) {
    			usedReader = source1;
    		} else {
    			usedReader = source2;
    		}
    		writer.writeValue(usedReader.getValue());
    		usedReader.fetchNext();
    	}
    	writer.close();
	}
}

Testing the Solution

Here is a simple program to test the application.

/**
 * The problem:
 * Suppose that we have two data files with numbers.
 * The numbers are stored in ascending order. Create
 * a program that merges the two files, i.e. generates 
 * a third file where all the numbers from both of 
 * the files are stored in ascending order.
 *
 * @author Ivan Georgiev
 * @version 1.00 2008/9/8
 */
 
import java.io.*; 
import java.util.StringTokenizer;
 
public class FileMergeApplication {
    private static final String sourceName1 = "../data/source1.txt";
    private static final String sourceName2 = "../data/source2.txt";
    private static final String targetName = "../data/output.txt";
 
    public static void main(String[] args) {
    	try {
	    	NumberReader reader1 = new NumberReader(sourceName1);
	    	NumberReader reader2 = new NumberReader(sourceName2);
	    	NumberWriter writer = new NumberWriter(targetName);
 
    		MergeSort sort = new MergeSort(reader1, reader2, writer);
    		sort.run();	    	
    	} catch (Exception e) {
    		e.printStackTrace();
    	}
    }
}

Test Run

source1.txt

5
15
35
80

source2.txt

10 20 30
40
50
60

output.txt

5.0
10.0
15.0
20.0
30.0
35.0
40.0
50.0
60.0
80.0

The source2.txt has some numbers separated a space and a tab character to test that NumberReader supports the additional requirement for any whitespace separator.

 
java/excercises/filesortmerge.txt · Last modified: 2009/10/31 23:39 (external edit)
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki