Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Comparator - Data Sources Comparer

0.00/5 (No votes)
20 Mar 2018 3  
A simple program for comparing table data from two sources - SQL databases, Excel, CSV or XML-files
In this article, you will find a simple program that can be used to compare table data from two sources.

Introduction

This application may be useful whenever you want to compare table data from two sources, that is:

  • Identify whether the rows from the determined sources match using key fields (search for pairs through keys)
  • Compare contents of specific fields in the found pairs of rows
  • Identify rows which don’t have a match in the other source
  • Identify rows which have more than one pair in the other source (non-unique key)
  • Analyze the results
  • Conduct regular compare tasks based on preconfigured settings (profiles) and send out the results via email or save them in common formats

No installation required - just run Comparator.exe.

Operating Procedure

First…

Create compare settings from scratch or based on loaded profile:

Or load an existing profile from file or recently opened list:

Then…

  1. Define the two sources
  2. Configure settings for data and results processing
  3. Set source fields for data compare and display, and define conditions for pair selection
  4. Define options of compare startup from command line when required

And Finally…

Run the compare process by pressing "Compare" button and check the results in a new window:

Besides

Data Source Setup

Source name will be displayed in compare results, profile name and mailing subject, suggested by default.

Depending on source type: database, excel, text (CSV) or XML file, you will have to configure various settings.

Database

Use any source that can be accessed via OleDB or ODBC to request data using SQL code.

  1. Open the ‘connect and run query’ configuration form
  2. Setup connection settings. To make it easier, the main connection settings via OleDB for MSSQL, Sybase and Oracle are shown in separate menu. For common cases (OleDB or ODBC) connection string is required.
  3. Input SQL code script
  4. Run the script. In order to have the up-to-date fields list from the source, SQL script must be run before you setup the fields to compare.

Note: If the script returns several datasets, only the first one will be considered. Selected parts of the script can be run for debugging purposes.

Excel

  1. Open Excel file or select one of the currently open files. You can indicate file path based on the application path.
  2. Select sheet.
  3. Select field names to be filled from first row values of the selected range (or the entire sheet if the range is not set). Otherwise, field name will be displayed as the column letter.
  4. Open/bring on top selected excel book.
  5. Set range either manually or select it directly on the sheet. If any of the range boundaries are not defined, the boundaries of the “used range” will be considered.
  6. Review data that will be received in accordance with the settings configured.

XML

To compare, XML data must be first converted into table. To configure conversion process:

  1. Open XML configuration settings form
  2. Select XML file (you can indicate file path based on the application path)
  3. Set code page number (e.g. 1251, 866) when required or common page name (e.g., (UTF8, cp866). When not set, encoding will be defined by default by ‘encoding’ attribute in XML file header.
  4. Select an option of XSLT pre-use before conversion:
    • Not use – not used
    • From Script – XSLT Script button enables an editor where you need to input XSL script for prior conversion of the XML script into an XML which will be then converted into table view
    • From File – select XSL file for the same purposes (you can indicate file path based on the application path)
  5. Set path (in XPath format) for tag which will be used as basis for the table data row, that will be created as a result of the conversion.

  6. Select defining method of table fields that will be created as a result of the conversion:
    • Data from tags – fields will be defined by child tags of the tags selected according to XPath, set in (5).
    • Data from attributes – fields will be defined by attributes of the tags selected according to XPath, set in (5).
    • Use fields map – this option will disable all previous fields defining options, and enable path setup (in XPath format) for tags or attributes to define fields. These paths must be set based on the tag defined in (5). In this case, pressing «Fields Map» button will enable a list in which name and path must be indicated, and also, when required, default value for occurrences when there’s no data found in the indicated path. If conversion result has already been received, you can add fields from it into the list by selecting respective menu item.

  7. To verify XML conversion process, press «Convert». Then selected XML file will be read and pre-converted via XSLT when needed. To view the results, press «Prepared XML».

    Based on settings configured in (5) and (6), data table will be created from the prepared XML file. To view this table, press «Result».

Delimited Text

Define fields delimiter for selected file (‘tab’ will be used by default) and encoding: code page number (e.g., 1251, 866) or common page name (e.g., (UTF8, cp866). When not set, encoding will be defined by default as current ANSI code page of the Operating System. Field names will be defined by first row names, similarly to Excel file source. To view delimited data in a separate window, press "View data". You can indicate file path based on the application path.

Fields Configuration Settings

After source setup is complete, define data fields to compare:

  1. Open field list (source A will be shown on the left, source B on the right).
  2. Check fields to select them for compare. Correspondence (pairs) will be set in order downwards. To define a specific pair change fields order using blue arrows. The quantity of selected fields in the two sources can be unequal, in this case, unpaired fields won’t be compared but will be displayed in the results.
  3. Define type of selected fields – keys, compare or display only (None).
  4. Clear previous selection when needed.
  5. Add selected fields to the list.

    Fields list can be edited when needed:

  6. Delete selected rows.
  7. Define pairs by row order, ignoring keys.
  8. Match all the fields.
  9. Change fields order in the results.
  10. Define fields processing type:
  • Key – key fields, when they match, it will define row pairs from the sources
  • Match – this data will be compared in the found row pairs

Processing Settings

  • Show differences only – result does not include rows that match
  • Show records only in Source A – result includes unpaired fields from source A
  • Show records only in Source В – result includes unpaired fields from source B
  • Check repeating rows – search for all pairs for each key fields set; if more than one pair found, it will be marked as repeating. This setting may slow down the process. When not checked, the search will cease after finding first pair.
  • Case sensitive – case sensitive in compare process
  • Null as empty stringnull value will be processed as empty char
  • Try to convert text to date or number – will try to convert text fields into date or number format, e.g., ‘20151231’ and ’31.12.2015’ will be matched as equal values.

View Results

Key points of comparison results view (see on picture below):

  1. Results are shown in a separate window where various results set will be available, depending on compare settings. You can switch between the sets:
    • Differences – row pairs with differences
    • Identicals – row pairs with no differences
    • Only in Source A – rows in Source A for which a pair couldn’t be found in Source B
    • Only in Source В – rows in Source B for which a pair couldn’t be found in Source A
  2. Differences are marked red; key fields are marked blue; matched fields are marked green.
  3. Headers and rows in the result sets "Differences" and "Identicals" are shown in pairs: first row from source A, second row from source B (the latter is marked with colour).
  4. Use ‘Search’ field to search the entire result table or a specific column, or use it to filter data in the current column.
  5. Use ‘Difference’ button to jump to the next difference, ‘Key’ button – to the next key field, ‘Duplicate’ button – to the next ‘repeating’ row. There are hints on all buttons for the hot keys definition.
  6. If one row was included in more than one pair, it will be marked as ‘repeating’ with letter ‘R’, and have a hint showing its number in the source (starting from 0).
  7. The results can be opened in Excel or HTML, a file ‘results.xls’ or ‘results.htm’ will be created in folder defined in application settings. The results can also be saved as HTML.

For example - we have two sources and their comparison result in window and then - in Excel or HTML file:

Launch Modes

Windows Mode (Using Common Application Settings)

> comparator [-profile:"[path]filename"]

  • [-profile:"[path]filename"] – profile will be loaded.

When profile not defined, application will be launched with new empty profile or template profile (as configured in application settings).

Console Mode (Using Console Mode Application Settings)

> comparator -batch -profile:"[path]filename" [-path:"path"] [-log:"[path]filename"] [-type:excel|html] [-sendto:"address[;address...]"] [-file:"filename"] [-open]

Required settings:

  • -batch – defines console mode – console mode of the specified profile will be used (see further)
  • -profile:"pathfilename" – file containing the profile

Optional settings (ignored if -batch is not defined), have higher priority than respective settings from the application profile and settings

  • -path:"path" – path where the results will be saved (do not use ‘/’ in the end)
  • -log:"[path]filename" – log file
  • -type:excel|html – result file type (Excel or HTML)
  • -sendto:"address[;address...]" – mailing list; when specified but the list is empty, the results will not be sent
  • -file:"filename" – result file name
  • -open – when specified, the result file will be opened

After launching in console mode, the following will happen: profile data check, search and connect to the sources, compare and create result file, mail and/or open the result file.

The process will be written in the log file:

  • [08.02.2016 20:37:07.776] >>> Start process for "testprofile.xml"

  • [08.02.2016 20:37:08.192] Profile loaded

  • [08.02.2016 20:37:09.977] Compared

  • [08.02.2016 20:37:10.002] Save result in "C:\Comparator\results\testprofile__20160208_203709.xls"

Application Settings

Console Mode Settings

  • Result type – result file type (Excel or HTML)
  • Result path – result file path (application settings will be used by default)
  • Result file – result file name (profile name plus file extension depending on the result type will be used by default)
  • Timestamp in file name – when checked, date and time of the compare launch will be added to the file name
  • Send result to – when checked - specify mailing list (separated by semicolon) and choose sending option of the result file in Send result as:
    • Attachment
    • ZIP – file attached as Gzip archive (filename.gz) – optimal for large files
    • Text – sent as HTML text in message body
    • Link – link to the result file, no attachment

    Optional: You can define message subject in Mail Subject (“Compare ‘Source A Name’ and ‘Source B Name’” by default)

    When Send result to not checked or mailing list is empty, the results will not be sent.

For other mailing options, such as connect SMTP server, see common settings.

Common Settings

For global settings, use Comparator.xml file which should be stored in application folder. If it doesn’t exist, the file will be created at first launch.

Example:

<?xml version="1.0" encoding="windows-1251"?>
<Config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <optionsList>
    <Options Host="BATCH">
      <ProfileFolder>C:\Comparator\profiles</ProfileFolder>
      <ResultFolder>C:\Comparator\results</ResultFolder>
      <LogFile>C:\Comparator\Comparator.log</LogFile>
      <HtmlStylesFile>C:\Comparator\styles.css</HtmlStylesFile>
      <SendOptions>
        <Server>smtp.server.ru</Server>
        <Port />
        <Ssl>false</Ssl>
        <User />
        <Pwd />
        <PwdEncr />
        <From>Comparator@server.ru</From>
        <FromAlias>Comparator</FromAlias>
      </SendOptions>
    </Options>
    <Options Host="HOST1">
      <RecentFiles>
        <file>C:\Comparator\profiles\SomeProfile.xml</file>
        <file>C:\Comparator\profiles\test.xml</file>
      </RecentFiles>
      <ProfileFolder>C:\Comparator\profiles</ProfileFolder>
      <ResultFolder>C:\Comparator\results</ResultFolder>
      <HtmlStylesFile>C:\Comparator\styles.css</HtmlStylesFile>
      <PatternFile>C:\Comparator\pattern.xml</PatternFile>
    </Options>
    <Options Host="HOST2">
      ..............
    </Options>
  </optionsList>
</Config>

Console mode settings are in the tag <Options> with attribute Host="BATCH".

Settings for normal (windows) mode are defined for each host in the tag <Options> with attribute Host="Host_name".

On first launch, the host will set default settings, which can be changed by editing the file manually.

The following tags can be used to store settings:

  • <ProfileFolder> - folder where the profiles are stored (‘profile’ in the application folder by default)
  • <ResultFolder> - folder where the result files will be stored (‘result’ in the application folder by default)
  • <LogFile> - log file name (comparator.log in the application folder by default)
  • <HtmlStylesFile> - CSS file for results style design (style.css in the application folder by default)
  • <PatternFile> - default pattern profile
  • <SendOptions> - mailing options
  • <Server> - SMTP server
  • <Port> - Port (25 by default)
  • <Ssl> - use SSL (true/false)
  • <User> - login for SMTP server
    • <Pwd> - password for SMTP server, specified here and then encrypted on first launch, stored in <PwdEncr> and erased from <Pwd>;
    • <PwdEncr> - encrypted password for SMTP server
    • <From> - sender for mailing list
    • <FromAlias> - alias of the sender
    • </SendOptions>
    • <RecentFiles> - recent profiles:
    • <file> path/name of the latest opened profile
    • <file> path/name of the previous opened profile and so on
    • </RecentFiles>

Using the Code

The solution consists of five projects and is located here. Let's briefly discuss each of the projects.

Project “Comparator“

Class Master (Master.cs)

Оbject of class Master is created at application startup and controls the operation of the application in one of the modes - window or batch.

This object:

  • manages application settings - creates / reads / saves (via serialization) an object of Options class (see Options.cs in project “Common”);

  • manages profile settings - creates / reads / saves (via serialization to XML-file) an object of Profile class and check it;

  • in window mode:

    • subscribes to events of objects that implements IView and IViewSource interfaces (see Views.cs in project “Common”). In this application winform-class, FormView implements IView (see FormView.cs) and user-control SourcePanel implements IViewSource (see project “Sources”).

    • defines objects for transferring Profile data to IView and IviewSource

    • controls the process of select fields pairs

  • starts processes of preparing data from sources, invoking methods for receiving content data (see descendants of abstract SourceContent class in project “Sources”);

  • starts comparison of the received data by calling method of DSComparer static class (project “DSComparer”) and receiving comparison result as object of CompareResult class

  • controls the progress of comparison process and its interruption

  • controls the output of comparison result - to a form (window mode) or to a file (batch mode)

  • controls the sending of comparison result (in batch mode) via static Mailer class (see SendMail.cs in project “Common”)

  • supplies the necessary messages using an object that implements ILoger interface (see Common.cs in project “Common”). In batch mode, this is an object of Loger class that uses methods of static class Log to write to the log file. In window mode, the output of messages is made through FormView class that implements ILoger interface.

Class FormView (FormView.cs)

Represents the application's user interface.
It implements IView interface for creating, modifying, verifying and saving Profile data, for starting comparison process and for viewing the latest comparison results.
Contains two UserControls that implement the IViewSource interface.
To select pairs of fields for comparison, opens FormSelectPair form.
When the comparison starts, it opens FormCompare form to display the progress of data receiving and comparison with the possibility of process interruption.

Project “Sources“

Class Source (Source.cs)

Describes the data source for comparison. Objects of this class are created with object of Profile class (in project Comparator) for each of the two sources.
Object of Source class contains an property for object of descendant of abstract SourceContent class. It may be DbContent, ExcelContent, CsvContent or XmlContent classes that provide an implementation of methods for receiving data from a source for a particular type.
Some features:

  • class ExcelContent - uses methods of static class ExcelProc (see ExcelProc.cs in project “Common”) for access to opened Excel books:
  • class DbContent - implements the interface ISqlModel, which describes the connection and query, uses SqlController class to receive data (see SqlController.cs in project “SqlSource”)
  • class XmlContent – uses XmlController static class to work with XML data - reading, xsl-transformation and conversion to a table

Class SourcePanel (SourcePanel.cs)

This is UserControl, which displays the settings of the selected source.
For DB and XML sources, it opens forms with detailed settings and the ability to work with source data - SqlView (see SqlView.cs in project “SqlSource”) and FormXml.
For Excel and CSV sources, it opens a form for quick data viewing - FormFlatData (see FormFlatData.cs in project “Common”)

Project “SqlSource”

Class SqlController (SqlController.cs)

Creates or receives at creation an object that implements ISqlModel - connection and query parameters.
Creates or receives at creation an object that implements ISqlView (see Views.cs in project “Common”) - an interface for configuring the connection and executing query. In this application winform-class SqlView implements ISqlView.

Sends data from ISqlModel to ISqlView, subscribes to ISqlView events.
Controls the progress of the data receiving process and its interruption.

Project “DSComparer”

Class DSComparer (DSComparer.cs)

Contains method "Compare", which compares two DataTables according to the specified settings and returns a CompareResult object with comparison results.

Class CompareResult (CompareResult.cs)

Contains the comparison results and methods for their output to form (FormResult) and to html / excel (using css-styles).

Project “Common”

Contains general procedures and definitions for the projects described above.

NOTE: Some classes and projects have no dependencies of the other parts of the application and can be used separately. For example – project “SqlSource”, project “DSComparer”, FormXml with XmlController, static classes from project “Common”.

Points of Interest

Special thanks to Pavel Torgashov for his excellent FastColoredTextBox component that help us to input SQL and XSLT code! :)

History

  • May 2017: Initial version

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here