Java Program – Compare Text File(s) to Check duplicate Content

Introduction

This program helps to read multiple files from the system and compare the contents to check the duplicate records. This is a sample program written using Java Programming language. We used a very simple algorithm so everyone can understand and execute it on their local system. Please find steps used to check the duplicate records below –

  • Read the Text File from the File System and load them in a List.
  • While Reading the File check and Prepare the total Duplicate Element available.
  • Call method to validate the duplicate records and prepare the Object to display for the user.
  • Print the Result

Program Code

package com.kw.sample;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;

/**
 * This class helps to retrieve duplicate number repeated in multiple files with
 * total number of repetitions.
 * 
 * This use 11 Files to validate duplicate numbers and it's repetition.
 * 
 * @author dsahu1
 * 
 */
public class ReadDataFromMultipleTextFiles {

	/**
	 * This method helps to execute the program.
	 * 
	 * @param args
	 * @throws FileNotFoundException
	 * @throws IOException
	 */
	public static void main(String[] args) {
		// Files path
		String path = "C:/Files/";

		// List to contain Duplicate records
		List<Integer> dublicateData = new ArrayList<Integer>();

		// Set to contain unique records
		Set<Integer> set = new HashSet<Integer>();

		// Actual File Name
		String fileWithPath = path + "A.txt";
		// Call method to read number from text file and generate List
		List<Integer> listA = readDataFromFile(set, dublicateData, fileWithPath);
		System.out.println("listA.size() : " + listA.size());
		System.out.println("set.size() : " + set.size());
		System.out.println("dublicateData.size() : " + dublicateData.size());

		// -------------------------------------------------------------------
		// Actual File Name
		fileWithPath = path + "B.txt";
		// Call method to read number from text file and generate List
		List<Integer> listB = readDataFromFile(set, dublicateData, fileWithPath);
		System.out.println("listB.size() : " + listB.size());
		System.out.println("set.size() : " + set.size());
		System.out.println("dublicateData.size() : " + dublicateData.size());

		// -------------------------------------------------------------------
		// Actual File Name
		fileWithPath = path + "C.txt";
		// Call method to read number from text file and generate List
		List<Integer> listC = readDataFromFile(set, dublicateData, fileWithPath);
		System.out.println("listC.size() : " + listC.size());
		System.out.println("set.size() : " + set.size());
		System.out.println("dublicateData.size() : " + dublicateData.size());

		// -------------------------------------------------------------------
		// Actual File Name
		fileWithPath = path + "D.txt";
		// Call method to read number from text file and generate List
		List<Integer> listD = readDataFromFile(set, dublicateData, fileWithPath);
		System.out.println("listD.size() : " + listD.size());
		System.out.println("set.size() : " + set.size());
		System.out.println("dublicateData.size() : " + dublicateData.size());

		// -------------------------------------------------------------------
		// Actual File Name
		fileWithPath = path + "E.txt";
		// Call method to read number from text file and generate List
		List<Integer> listE = readDataFromFile(set, dublicateData, fileWithPath);
		System.out.println("listE.size() : " + listE.size());
		System.out.println("set.size() : " + set.size());
		System.out.println("dublicateData.size() : " + dublicateData.size());

		// -------------------------------------------------------------------
		// Actual File Name
		fileWithPath = path + "F.txt";
		// Call method to read number from text file and generate List
		List<Integer> listF = readDataFromFile(set, dublicateData, fileWithPath);
		System.out.println("listF.size() : " + listF.size());
		System.out.println("set.size() : " + set.size());
		System.out.println("dublicateData.size() : " + dublicateData.size());

		// -------------------------------------------------------------------
		// Actual File Name
		fileWithPath = path + "G.txt";
		// Call method to read number from text file and generate List
		List<Integer> listG = readDataFromFile(set, dublicateData, fileWithPath);
		System.out.println("listG.size() : " + listG.size());
		System.out.println("set.size() : " + set.size());
		System.out.println("dublicateData.size() : " + dublicateData.size());

		// -------------------------------------------------------------------
		// Actual File Name
		fileWithPath = path + "H.txt";
		// Call method to read number from text file and generate List
		List<Integer> listH = readDataFromFile(set, dublicateData, fileWithPath);
		System.out.println("listH.size() : " + listH.size());
		System.out.println("set.size() : " + set.size());
		System.out.println("dublicateData.size() : " + dublicateData.size());

		// -------------------------------------------------------------------
		// Actual File Name
		fileWithPath = path + "I.txt";
		// Call method to read number from text file and generate List
		List<Integer> listI = readDataFromFile(set, dublicateData, fileWithPath);
		System.out.println("listI.size() : " + listI.size());
		System.out.println("set.size() : " + set.size());
		System.out.println("dublicateData.size() : " + dublicateData.size());

		// -------------------------------------------------------------------
		// Actual File Name
		fileWithPath = path + "J.txt";
		// Call method to read number from text file and generate List
		List<Integer> listJ = readDataFromFile(set, dublicateData, fileWithPath);
		System.out.println("listJ.size() : " + listJ.size());
		System.out.println("set.size() : " + set.size());
		System.out.println("dublicateData.size() : " + dublicateData.size());

		// -------------------------------------------------------------------
		// Actual File Name
		fileWithPath = path + "K.txt";
		// Call method to read number from text file and generate List
		List<Integer> listK = readDataFromFile(set, dublicateData, fileWithPath);
		System.out.println("listK.size() : " + listK.size());
		System.out.println("set.size() : " + set.size());
		System.out.println("dublicateData.size() : " + dublicateData.size());

		// Construct result map object to contain the result
		Map<Integer, Map<String, Integer>> result = new HashMap<Integer, Map<String, Integer>>();

		// Call method to validate the duplicate records and prepare the result
		// object
		validateDoublicateWithCountAndFile(result, dublicateData, listA, "A");
		// Call method to validate the duplicate records and prepare the result
		// object
		validateDoublicateWithCountAndFile(result, dublicateData, listB, "B");
		// Call method to validate the duplicate records and prepare the result
		// object
		validateDoublicateWithCountAndFile(result, dublicateData, listC, "C");
		// Call method to validate the duplicate records and prepare the result
		// object
		validateDoublicateWithCountAndFile(result, dublicateData, listD, "D");
		// Call method to validate the duplicate records and prepare the result
		// object
		validateDoublicateWithCountAndFile(result, dublicateData, listE, "E");
		// Call method to validate the duplicate records and prepare the result
		// object
		validateDoublicateWithCountAndFile(result, dublicateData, listF, "F");
		// Call method to validate the duplicate records and prepare the result
		// object
		validateDoublicateWithCountAndFile(result, dublicateData, listG, "G");
		// Call method to validate the duplicate records and prepare the result
		// object
		validateDoublicateWithCountAndFile(result, dublicateData, listH, "H");
		// Call method to validate the duplicate records and prepare the result
		// object
		validateDoublicateWithCountAndFile(result, dublicateData, listI, "I");
		// Call method to validate the duplicate records and prepare the result
		// object
		validateDoublicateWithCountAndFile(result, dublicateData, listJ, "J");
		// Call method to validate the duplicate records and prepare the result
		// object
		validateDoublicateWithCountAndFile(result, dublicateData, listK, "K");

		System.out.println("result.size() : " + result.size());
		// Call method to print the result object map
		printResults(result);
	}

	/**
	 * This method helps to print the output result in readable format.
	 * 
	 * @param result
	 */
	public static void printResults(Map<Integer, Map<String, Integer>> result) {

		Set<Entry<Integer, Map<String, Integer>>> set = result.entrySet();

		Iterator<Entry<Integer, Map<String, Integer>>> itr = set.iterator();

		while (itr.hasNext()) {

			Entry<Integer, Map<String, Integer>> entry = itr.next();

			Map<String, Integer> map = entry.getValue();

			System.out.println(entry.getKey() + " ::: " + map
					+ " - Total Repetitions - " + map.size());

		}

	}

	/**
	 * This method helps to validate the duplicate records in given file and
	 * Construct the result object.
	 * 
	 * @param result
	 * @param dublicateData
	 * @param list
	 * @param fileName
	 */
	public static void validateDoublicateWithCountAndFile(
			Map<Integer, Map<String, Integer>> result,
			List<Integer> dublicateData, List<Integer> list, String fileName) {

		for (Integer number : list) {

			if (dublicateData.contains(number)) {

				if (result.containsKey(number)) {

					Map<String, Integer> map = result.get(number);
					map.put(fileName, 1);
					result.put(number, map);

				} else {
					Map<String, Integer> map = new HashMap<String, Integer>();
					map.put(fileName, 1);
					result.put(number, map);
				}

			}
		}

	}

	/**
	 * This method helps to read the records from the test file and construct
	 * List Object.
	 * 
	 * @param set
	 * @param dublicateData
	 * @param fileWithPath
	 * @return List
	 */
	public static List<Integer> readDataFromFile(Set<Integer> set,
			List<Integer> dublicateData, String fileWithPath) {

		File file = new File(fileWithPath);
		String line = null;
		List<Integer> list = new ArrayList<Integer>();
		try {
			System.out.println("Batch ID are Picked From --- " + fileWithPath);
			// FileReader reads text files in the default encoding.
			FileReader fileReader = new FileReader(file);

			// Always wrap FileReader in BufferedReader.
			BufferedReader bufferedReader = new BufferedReader(fileReader);

			// Read Files
			while ((line = bufferedReader.readLine()) != null) {
				Integer data = new Integer(line.trim());
				list.add(data);
				if (set.contains(data)) {
					dublicateData.add(data);
				} else {
					set.add(data);
				}
			}
		} catch (Exception e) {
			System.err.println("**** Exception Occured *****"
					+ e.getLocalizedMessage());
		}
		return list;
	}

}

Output

Text File(s) Overview

We tested this Program using 3 Text Files A.txt, B.txt and C.txt

Run1: Without any Duplicate Records in the Files.

Batch ID are Picked From --- C:/Files/A.txt
listA.size() : 32
set.size() : 32
dublicateData.size() : 0
Batch ID are Picked From --- C:/Files/B.txt
listB.size() : 120
set.size() : 152
dublicateData.size() : 0
Batch ID are Picked From --- C:/Files/C.txt
listC.size() : 353
set.size() : 505
dublicateData.size() : 0
result.size() : 0

--------------------------------------------------------------------
Run2: With some Duplicate Records in the Files.

Batch ID are Picked From --- C:/Files/A.txt
listA.size() : 36
set.size() : 36
dublicateData.size() : 0
Batch ID are Picked From --- C:/Files/B.txt
listB.size() : 125
set.size() : 161
dublicateData.size() : 0
Batch ID are Picked From --- C:/Files/C.txt
listC.size() : 353
set.size() : 505
dublicateData.size() : 9
result.size() : 9
864 ::: {B=1, C=1} - Total Repetitions - 2
774 ::: {B=1, C=1} - Total Repetitions - 2
615 ::: {B=1, C=1} - Total Repetitions - 2
215 ::: {A=1, C=1} - Total Repetitions - 2
63 ::: {A=1, C=1} - Total Repetitions - 2
47 ::: {B=1, C=1} - Total Repetitions - 2
28 ::: {B=1, C=1} - Total Repetitions - 2
607 ::: {A=1, C=1} - Total Repetitions - 2
362 ::: {A=1, C=1} - Total Repetitions - 2

Leave a Reply

Your email address will not be published. Required fields are marked *