Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database

The Data Tune System

4.98/5 (39 votes)
15 Feb 2013CPOL18 min read 44.6K  
The DataTune system was one of the earliest data cleansing systems.

Introduction

In 1998 I developed the first version of the DataTune software which was part of the “TargetData” venture.

Further reading

Image 1

The purpose was different and the focus was on displaying statistic and demographic data about the state of Israel.

During 2001-2003 I have developed DataTune, a data cleansing system, under the brand name of TargetData, which is part of Target Eye Limited.

Among our clients were government institutes, large corporations including The Standards Institution of Israel, The Israel Export and International Cooperation Institute (IEICI), Microsoft Israel, Bezeq Call, and People and Computers. Here is an historic view of the TargetData web site (2002):

Image 2

The target data services include

  • Data enhancement
  • Filtering redundancies (duplicates)
  • Data conversion
  • Unification and filtering of records (including between various databases)
  • Location of potential customers (for Database marketing).
  • Integration of various information systems
  • Generation of automatic reports
  • System analysis
  • Development of efficient tools for normal and correct entry of new data
  • Location of updated information about private individuals and companies (including investigations, sending people into the field, etc., that is, in case there is no updated information)
  • Development of customized software for execution of current activities on the customer’s premises, without the need for external help.
  • Improvement of information systems performance
  • Development of software and tools based on Excel
  • Training and assimilation

Information systems and the importance of maintaining their serviceability

One of the most important assets of the organization, if not the most important, are the information systems. Most of the organization’s information is currently stored in its computer network. In most cases use is made of a number of information systems, that are frequently different from one another, thus creating the need for mutual synchronization and updating of items of information between the various databases.

Furthermore, since a significant proportion of the information is entered manually, a large number of errors arise, causing duplication and loss of information. As a result it becomes difficult to locate items of information at a later stage, overloading is produced, mail is returned, and malfunctions related to customers or suppliers occur.

Major importance is attached to the current updating of parts of the information without affecting the information systems in general. Most of the information items comprising the information systems require updating at some given frequency.

In fact, the frequency of updating, which differs from one item to another, also represents part of the problem. Some of the information items are related to the organization’s work:

  • Products
  • Prices
  • Quotations
  • Invoices
  • Customer databank.

An additional part, over which there is even less control, includes updating of:

  • Addresses. For example: a street name that has been changed.
  • Phone numbers. For example: changes in Area Codes or changes in the first few digits of the phone number.

All these subjects require complex processes for the current enhancement of the information systems. In most cases it is necessary to occasionally or periodically refresh all the information systems, and to formulate procedures for maintaining the information systems constantly serviceable.

Here is a screenshot of DataTune's main screen:

Image 3

General data enhancement

The concept of data enhancement refers to a series of operations intended to significantly improve the quality and efficiency of the organization’s information systems. The following is a description of these operations:

  • Separation of fields
  • In some information systems, a number of fields are kept as single fields. This makes it difficult to perform updating, retrieval, and current maintenance.
    For example, saving a customer’s address as a single field (i.e. “1020 Main St. Appt #5”), will make it difficult to locate all the customers living in a suitable street (i.e. Main St), or alternatively, will make it difficult to update the street name, if replaced, in the entire records of all customers living in this street.

    Target Data has developed a unique method for splitting fields into their components. This method comprises separating full addresses, street addresses, names of people and names of companies. In addition, this method may be used to perform further manipulations of data structures of all kinds, at the customer’s request.

  • Updating addresses
  • From time to time street names are changed. Furthermore, most information systems contain different versions of streets and towns names, as well as different syntax for address format.

    Using Target Data’s exclusive method, a process has been developed to attach to each address a unique identifier, based on the coding system employed by the Ministry of the Interior in each country, giving each town and each street it’s unique numeric code.

    The process consists of interpretation of the address in the information system and its conversion into town code, street code, house number, entrance number, apartment number, and floor number.

    These data permit immediate attachment of the precise, official address: town name, street name, Zip Code, etc. whenever the street name is changed in any town. This process permits easy and effective attachment of the postal code to every address

  • Location of addresses containing the names of well-known places instead of the street name
  • In many cases the address field contains the name of a well-known place instead of the street name. For example, DATATECH, a firm situated in the Central Bus Station might mistakenly give its official address as follows: DATATECH, Central Bus Station, floor 5.

    We shall identify a wide variety of places in major European towns, such as bus and railway stations, airports, cultural centers, squares, bathing beaches (which in certain cases form the address of hotels), as well as sites which do not belong to a town or a local authority (such as Highway projects). We can also identify shopping malls, commercial centers, industrial zones, etc.

  • Interpretation of house numbers
  • In cases of an un-separated Street Address, the system is capable of interpreting and locating house numbers in addresses, using a variety of methods. In general, an address may contain a house number in several ways: house number/ apartment number, apartment number/ house number, and other combinations, such as floor number and entrance number. The most important of the data (an error in which may cause the mail to be returned) is the house number. We cross-check this number in the known range of numbers for that street. For example, Barnauer St. in Berlin should not contain house numbers greater than 200. In addition we check the apartment number and verify that it is logical, by comparing it with the number of floors usual in that district. For example, it is impossible to find an address on the tenth floor in a neighborhood of cottages.

  • Data enhancement using an Error Bank
  • The Error Bank was a concept used by TargetData to collect common errors and misprints in a databases, built up over the years we do this business. This bank permits DataTune to identify common misprints and relate them to the correct name, which may be:

    • Name of a street
    • Name of a company
    • Name of a town

    This is all based on the assumption that errors frequently repeat themselves.

  • Data enhancement using Soundex-based algorithms
  • In order to identify unfamiliar misprints, we make use of Soundex-based algorithms that help DataTune to identify misprints according to the language root of the names. According to this method, the words Meerdevoor street in Den Haag can be misprinted as Merdevur, Mardevoor, etc. and yet receive an identical root. This method helps to locate and correct difficult and strange misprints of names of companies, streets, and towns, as well as of people’s names, and to enhance the data in an automatic, rapid and quality manner.

  • Attaching postal (zip) codes and other codes
  • DataTune attaches / verify the updated postal code to every address in the customer’s database in accordance with the updated table of postal codes of the postal authority of each country. DataTune also attaches the codes for streets and towns in accordance with the updated table of codes of the Ministry of the Interior in each country. This enables DataTune to update the changes to street names published by the Ministry of the Interior from time to time.

  • Enhancing names of companies and/or private individuals
  • This process is based on the technique of separating fields, developed by Target Data. At the end of the process an updated database is created permitting access to each element in the customer’s name (individual or company). This process permits locating duplications, locating family relations from the customer database and, if necessary, combining households. The process permits creating a single record for each company, and combining all the contact people with whom the organization is in contact into the main record. In this way the need is removed for updating separately details of information common to a company (address, fax number, Internet website), while at the same time access is retained to details of information specific to each contact person (extension number, or direct phone number, email address, etc.).

  • Locating updated addresses and phone numbers of companies and private individuals
  • In our experience the best way to locate the most up-to-date address and phone number of the customer is to use Bezeq’s records. The file of records of residents is actually updated only after the citizen informs the Ministry of the Interior, and is not always up-to-date. In contrast, since most people own a telephone today (even if they are renting their homes), because of the reduced prices of phone lines, during projects done mostly for Israeli companies and institutes, we have found that Bezeq’s (the Israeli local telephone company) records are more up-to-date.

    DataTune can update and verify addresses and phone numbers of private individuals and companies using this method, by querying several sources of information until DataTune locates the most up-to-date address and phone number.

  • Filtering redundancies (duplicates)
  • As part of the data enhancement services Target Data offered facilities of marking, followed by filtering of redundant records. This service is implemented after data enhancement since only at this stage it is possible to identify duplicate records which originally contained apparently different data because of errors. Only the enhancement process will permit identification and correction of redundancies at a later stage.

    The following source code taken from the DataTune project filters duplicates found in a given table of the Database:

    VB
    Private Sub FilterDuplicates_Click()
        Dim rr As Recordset
        Dim Ignore() As Long
        Dim IgnoreCount As Long
        If Combo1.Text = "Choose" Then
            MsgBox "Please select " + Label1
            Exit Sub
        End If
        If Combo2.Text = "Choose" Then
            MsgBox "Please select " + Label2
            Exit Sub
        End If
        Open Text1.Text For Output As #1
        Print #1, "Target Data Duplicates Report"
        Print #1, Date$
        Print #1, ""
        Print #1, ""
        Set CustRS = CustDB.OpenRecordset("select * from  " + CurrentTable)
        CustRS.MoveLast
        CustRS.MoveFirst
        RecsTotal = CustRS.RecordCount
        ProgressBar1.Max = CustRS.RecordCount
        ReDim Ignore(1)
        While Not CustRS.EOF
            RecsRead = Val(RecsRead) + 1
            ProgressBar1.Value = Val(RecsRead)
            
            DoEvents
            If IsNull(CustRS(Combo1.Text)) Or IsNull(CustRS(Combo1.Text)) Then GoTo 2
            For i = 1 To IgnoreCount
                If CustRS(Combo2.Text) = Ignore(i) Then GoTo 2
            Next
            q$ = "select * from " + CurrentTable + " where " + Combo1.Text + " = " + _
                 GetRightDelim(CustRS(Combo1.Text)) + " and not(" + Combo2.Text + _
                 " = " + GetRightDelim(CustRS(Combo2.Text)) + ")"
            Set rr = CustDB.OpenRecordset(q$)
            If rr.RecordCount > 0 Then
                RecsFound = Val(RecsFound + 1)
                rr.MoveLast
                rr.MoveFirst
                ReDim Preserve Ignore(UBound(Ignore) + rr.RecordCount)
                While Not rr.EOF
                    IgnoreCount = IgnoreCount + 1
                    Ignore(IgnoreCount) = rr(Combo2.Text)
                    rr.MoveNext
                Wend
                RecsUpdated = Val(RecsUpdated) + rr.RecordCount + 1
                Print #1, "Customer: "; CustRS(Combo2.Text)
                Print #1, Chr(9); CustRS(Combo1.Text)
                Print #1, Chr(9); "Appears "; rr.RecordCount + 1; " times in your database"
                rr.MoveFirst
                Print #1, "Records are:"
                While Not rr.EOF
                    Print #1, rr(Combo2.Text),
                    rr.MoveNext
                Wend
                Print #1, ""
                Print #1, " ----------------------- "
                Print #1, ""
                'Debug.Print "Found "; rr.RecordCount; " "; CustRS(Combo1.Text)
            End If
    2:
            CustRS.MoveNext
        Wend
        Close #1
    End Sub
    Private Sub Form_Load()
        Combo1.Text = "Choose"
        Combo2.Text = "Choose"
        For i = 0 To CustRS.Fields.Count - 1
            Combo1.AddItem CustRS(i).Name
            Combo2.AddItem CustRS(i).Name
        Next
    End Sub

    For example, the firm of “NewCo” appears in the database as “New Co” because of a mistake in data entry, and also under its correct name. Only after enhancement of the company’s name, i.e. after correcting “New Co” to “NewCo” will the redundancy be identified.

    After identification of redundant records, the report is sent to the customer, containing several options: unification of the information in each of the redundant records into a single record (i.e. sending details of the information appearing in separate redundant records, in order not to lose important information).

    Another options is to ignore and delete one or more of the redundant appearances of the record.

    The third option is to leave the records unchanged. (This option is intended for customers who prefer to delete these records by themselves at a later stage.) It should be noted that, even then, data enhancement permits fast and easy identification of redundancies by the customer himself, by making simple queries.

  • Location and enhancement of names of company branches
  • One of the most frequently encountered problems in data enhancement is that of enhancing records containing names and addresses of company branches. In most cases the official publications and most of the databases contain a record of the company management and/or the head office only. On the other hand, a record giving details of one of the branches may appear in the customer’s databank.

    For example, under the heading: Bank of America, there will actually appear one of the branches since this branch is the company’s customer. However, if another branch is added to the database of customers, we do not want this addition to be identified as a redundancy but as an additional record, preferably with a link to Bank Leumi as a major name for each of the branch records.

    DataTune is prepared to handle data related to the branches of:

    • Banks
    • DataTune can attach the number of the bank and of the branch in each European country, the address, phone number and fax number. DataTune will also give a unique Sort Code to each branch.

    • Schools
    • DataTune will attach the school network name (e.g., Ort), the classification (e.g., primary school), the address, phone number, and fax number.

    • Health fund branches
    • DataTune will attach the name of the fund, the branch name, the manager’s name, address, phone number, and fax number.

    • Income tax branches
    • DataTune will attach the name of the branch, address, phone number, and fax number.

    • National Insurance branches
    • DataTune will attach the name of the branch, address, phone number, and fax number.

    • Government ministries
    • DataTune will attach most of the details for branches of government ministries, such as the Ministry of the Interior, the Licensing Offices, etc.

    • Large companies
    • In the case of records containing information about one or more branches of a large company, DataTune will attach the company’s official name to every record, but will store the information for every branch.

    • Marketing chains
    • DataTune will identify the branch and the chain to which it belongs.

  • Improvement of information system performance
  • Target Data has unique methods for improving information system performance. DataTune can improve the processes of retrieval, generation of reports, queries, and transfer of data from one database to another. All these processes may be made more efficient and rapid. This is done by analyzing the most and least frequent processes in the organization, and by specifying priorities by making trade-off between the following criteria:

    • Storage area and speed. (This is done by constructing search keys as required or deleting keys no longer required.)
    • Saving copies of items of information separately, in order to provide fast access to them, as compared to saving pointers to these items of information, in order to save storage space at the expense of the response time required to retrieve them.

    The Target Data venture specialized in providing support for companies and organizations that are just starting, or are in the middle of the process of, assimilation of a CRM system.

  • Data conversion
  • It is frequently necessary to once-only or currently convert the contents of one database to the format of another. In many cases the company receives a file of data from a customer or from another source and it is necessary to import these data to the organization’s database. In other cases use is made of a number of applications or database and it is necessary to import or export between different formats.

    Target Data offers one-time services for data conversion between different systems, providing support for most information systems including databases, off-the-shelf applications such as Access, Excel, Office, dedicated applications such as WinFax, and applications supporting organizers, such as Palm, Visor, etc.

  • Constant synchronization and updating
  • In addition DataTune offers organizations an application for current synchronization between different systems and platforms. This is a more complex process and involves an initial investment of resources, but as a result the need is removed in the future for conversion, and import and export processes. A synchronization system such as the MultiSyncTM system of Target Data is installed in the company’s computers and constantly checks which data have been updated and in which information system. If necessary automatic updating and conversion is performed on the relevant data in the required information systems or software products. For example, in a computer in which the MultiSync system is installed, linking Palm and WinFax applications, the system will identify the addition of a new customer to the WinFax address book, and will automatically update the Palm address book accordingly, and vice versa.

  • Excel applications
  • A unique type of applications in which the TargetData venture specialized in, were addons and tools for MS Excel. These are tools used to automatically execute a variety of operations, such as conversion of accounts between various formats (e.g., a customer requires an application to convert accounts submitted to customers in a new format recently specified by one of his major customers). As a result, the need arises to convert hundreds of accounts from the old to the new format. This tools aids in performing automatic conversion of formats.

    The following source code is the Excel class used by DataTune

    VB.NET
    VERSION 1.0 CLASS
    BEGIN
      MultiUse = -1  'True
      Persistable = 0  'NotPersistable
      DataBindingBehavior = 0  'vbNone
      DataSourceBehavior  = 0  'vbNone
      MTSTransactionMode  = 0  'NotAnMTSObject
    END
    Attribute VB_Name = "cExcel"
    Attribute VB_GlobalNameSpace = False
    Attribute VB_Creatable = True
    Attribute VB_PredeclaredId = False
    Attribute VB_Exposed = False
    ' Class       : cExcel
    ' Description : Class for working with Microsoft Excel through Automation
    ' Usage       : Instantiate an object of the class, use the class methods and
                  ': vairables to access the object of excel
                
    
    'Set m_objExcel = new Excel.Application
    'In the StartExcel Method
    Private m_objExcel As New Excel.Application
    Private m_objWorkbook As Excel.Workbook
    Public Property Get AppExcel() As Excel.Application
      ' Returns: A handle to the current instance of Excel
      Set AppExcel = m_objExcel
    
    End Property
    Public Property Get CurWorkbook() As Excel.Workbook
      ' Returns: A handle to the currently open workbook
    
      Set CurWorkbook = m_objWorkbook
    
    End Property
    Public Sub CloseExcel()
      ' Comments  : Closes Excel
      ' Parameters: None
      ' Returns   : Nothing
    '
      On Error GoTo PROC_ERR
    
      m_objExcel.Quit
    
      Set m_objExcel = Nothing
    
    PROC_EXIT:
      Exit Sub
    
    PROC_ERR:
      MsgBox "Error: " & Err.Number & ". " & Err.Description, , _
        "CloseExcel"
      Resume PROC_EXIT
    
    End Sub
    Public Sub CloseWorkbook( _
      fSave As Boolean)
      ' Comments  : Closes the current workbook
      ' Parameters: fSave - True to save changes, False to discard changes
      ' Returns   : Nothing
    '
      On Error GoTo PROC_ERR
    
      m_objWorkbook.Close SaveChanges:=fSave
    
    PROC_EXIT:
      Exit Sub
    
    PROC_ERR:
      MsgBox "Error: " & Err.Number & ". " & Err.Description, , _
        "CloseWorkbook"
      Resume PROC_EXIT
    
    End Sub
    Public Sub CreateTableFromAccess( _
      strDatabase As String, _
      strDataSource As String, _
      fFieldNames As Boolean, _
      Optional varMaxRecs As Variant)
      ' Comments  : Gets the contents of an Access table or query into
      '             the current document
      ' Parameters: strDatabse - full path and name of the Access database
      '             you want to read from
      '             strDataSource - name of a table or query in the database
      '             to read records from
      '             fFieldNames - True to put the field names in the first
      '             row, false otherwise.
      '             varMaxRecs - optional: set to the maximum number of
      '             records you want to retrieve. To include all records,
      '             don't specify this argument.
      ' Returns   : Nothing
    '
      Dim dbs As DAO.Database
      Dim rst As DAO.Recordset
      Dim intCounter As Integer
      Dim intFieldCount As Integer
      Dim lngRowCount As Long
      Dim varField As Variant
      Dim intRow As Integer
      Dim intCol As Integer
    
      On Error GoTo PROC_ERR
    
      ' Open the database objects
      Set dbs = DAO.DBEngine.OpenDatabase(strDatabase)
      Set rst = dbs.OpenRecordset(strDataSource)
      intFieldCount = rst.Fields.Count
    
      intRow = 1
      intCol = 1
    
      ' Add the field names if specified
      If fFieldNames Then
        For intCounter = 1 To intFieldCount
          m_objWorkbook.ActiveSheet.Cells(1, intCounter).Value = _
            rst.Fields(intCounter - 1).Name
        Next intCounter
      End If
    
      ' Start inserting data on the second row of the table
      lngRowCount = 2
    
      With rst
        ' Loop through all records
        Do Until .EOF
      
          For intCounter = 1 To intFieldCount
            ' Add each fields value
            varField = .Fields(intCounter - 1).Value
          
            ' Handle null field values
            If IsNull(varField) Then
              varField = "<null>"
            End If
          
            m_objWorkbook.ActiveSheet.Cells(lngRowCount, intCounter).Value = _
              varField
            
          Next intCounter
    
          lngRowCount = lngRowCount + 1
        
          ' See if we are still in range
          If Not IsMissing(varMaxRecs) Then
            If lngRowCount > varMaxRecs Then
              Exit Do
            End If
          End If
        
          ' Move to the next record
          .MoveNext
    
        Loop
      End With
    
      ' Cleanup
      rst.Close
      dbs.Close
    
    PROC_EXIT:
      Exit Sub
    
    PROC_ERR:
      MsgBox "Error: " & Err.Number & ". " & Err.Description, , _
        "CreateTableFromAccess"
      Resume PROC_EXIT
    
    End Sub
    Public Sub CreateWorkbook( _
      strName As String, _
      fSave As Boolean)
      ' Comments  : Creates a new workbook and saves it
      ' Parameters: strName - name for the new workbook
      '             fSave - True to save, False to leave unsaved
      ' Returns   : Nothing
    '
      On Error GoTo PROC_ERR
    
      Set m_objWorkbook = m_objExcel.Workbooks.Add
      m_objWorkbook.SaveAs FileName:=strName
    
    PROC_EXIT:
      Exit Sub
    
    PROC_ERR:
      MsgBox "Error: " & Err.Number & ". " & Err.Description, , _
        "CreateWorkbook"
      Resume PROC_EXIT
    
    End Sub
    Public Sub InsertValue( _
      strRange As String, _
      varValue As Variant)
      ' Comments  : Inserts values into cells
      ' Parameters: strRange - string defining the range to insert into
      '             varValue - value to insert
      ' Returns   : Nothing
    '
      On Error GoTo PROC_ERR
    
      m_objWorkbook.ActiveSheet.Range(strRange).Value = varValue
    
    PROC_EXIT:
      Exit Sub
    
    PROC_ERR:
      MsgBox "Error: " & Err.Number & ". " & Err.Description, , _
        "InsertValue"
      Resume PROC_EXIT
    
    End Sub
    Public Function OpenWorkbook( _
      strFileName As String, _
      fReadOnly As Boolean, _
      Optional varPassword As Variant) As Excel.Worksheet
      ' Comments  : Opens the named file and associates it with the class
      ' Parameters: strFileName - full path and name of the file to open
      '             fReadOnly - True to open readonly
      '             varPassword - Optional: specify the password if the
      '             workbook file is password protected.
      ' Returns   : Nothing
    '
      On Error GoTo PROC_ERR
    
      If Not IsMissing(varPassword) Then
        Set m_objWorkbook = m_objExcel.Workbooks.Open( _
          strFileName, _
          , _
          fReadOnly, _
          , _
          varPassword)
      Else
        Set m_objWorkbook = m_objExcel.Workbooks.Open( _
          strFileName, _
          , _
          fReadOnly)
      End If
    
    PROC_EXIT:
      Exit Function
    
    PROC_ERR:
      MsgBox "Error: " & Err.Number & ". " & Err.Description, , _
        "OpenWorkbook"
      Resume PROC_EXIT
    
    End Function
    Public Sub OpenWorkbookFromLib( _
      strFileName As String, _
      fReadOnly As Boolean, _
      Optional varPassword As Variant)
      ' Comments  : Opens the named file and associates it with the class.
      '             This version looks in the Excel library folder.
      ' Parameters: strFileName - name of the file to open
      '             fReadOnly - True to open readonly
      '             varPassword - Optional: specify the password if the
      '             workbook file is password protected.
      ' Returns   : Nothing
    '
      Dim strLibPath As String
    
      On Error GoTo PROC_ERR
    
      strLibPath = m_objExcel.LibraryPath & _
        m_objExcel.PathSeparator & _
        strFileName
      
      If Not IsMissing(varPassword) Then
        Set m_objWorkbook = m_objExcel.Workbooks.Open( _
          strLibPath, _
          , _
          fReadOnly, _
          , _
          varPassword)
      Else
        Set m_objWorkbook = m_objExcel.Workbooks.Open( _
          strLibPath, _
          , _
          fReadOnly)
      End If
    
    PROC_EXIT:
      Exit Sub
    
    PROC_ERR:
      MsgBox "Error: " & Err.Number & ". " & Err.Description, , _
        "OpenWorkbookFromLib"
      Resume PROC_EXIT
    
    End Sub
    Public Sub PrintSheet( _
      intFrom As Integer, _
      intTo As Integer, _
      intCopies As Integer, _
      fPreview As Boolean, _
      fPrintToFile As Boolean, _
      fCollate As Boolean)
      ' Comments  : Prints the active workbook
      ' Parameters: intFrom - starting page number
      '             intTo - ending page number
      '             intCopies - number of copies
      '             fPreview - True for print preview
      '             fPrintToFile - True to print to a file. Excel will prompt
      '             for the filename when this is set to True.
      '             fCollate - True to collate copies
      ' Returns   : Nothing
    '
      On Error GoTo PROC_ERR
    
      m_objWorkbook.PrintOut _
        intFrom, _
        intTo, _
        intCopies, _
        fPreview, _
        , _
        fPrintToFile, _
        fCollate
    
    PROC_EXIT:
      Exit Sub
    
    PROC_ERR:
      MsgBox "Error: " & Err.Number & ". " & Err.Description, , _
        "PrintSheet"
      Resume PROC_EXIT
      
    End Sub
    Public Sub SortRange( _
      strRange As String, _
      strKey As String, _
      Optional fAscending As Boolean = False)
      ' Comments  : Sorts the specified range
      ' Parameters: strRange - range to sort
      '             strKey - range to use as the key for sorting
      '             fAscending - True for ascending, False for descending
      ' Returns   : Nothing
    '
      Dim lngSort As Integer
    
      If fAscending Then
        lngSort = xlAscending
      Else
        lngSort = xlDescending
      End If
    
      m_objWorkbook.ActiveSheet.Range(strRange).Sort _
        Key1:=ActiveSheet.Range(strKey), order1:=lngSort
    
    PROC_EXIT:
      Exit Sub
    
    PROC_ERR:
      MsgBox "Error: " & Err.Number & ". " & Err.Description, , _
        "SortRange"
      Resume PROC_EXIT
      
    End Sub
    Public Function ReadRange(strRange As String) As Variant
      ' Parameters: strRange - range to sort
      '             strKey - range to use as the key for sorting
      '             fAscending - True for ascending, False for descending
      ' Returns   : Nothing
    '
      Dim lngSort As Integer
    
      ReadRange = m_objWorkbook.ActiveSheet.Range(strRange)
    
    PROC_EXIT:
      Exit Function
    
    PROC_ERR:
      MsgBox "Error: " & Err.Number & ". " & Err.Description, , _
        "SortRange"
      Resume PROC_EXIT
      
    End Function
    Public Sub StartExcel(fVisible As Boolean)
      ' Comments  : Starts an instance of Excel
      ' Parameters: fVisible - True to make Excel visible
      ' Returns   : Nothing
      On Error GoTo PROC_ERR
    
     'SEE COMMENTS IN DECLARATIONS FOR REASONS THIS IS COMMENTED OUT
     'AND WHY YOU MIGHT WANT TO UNCOMMENT IT
    
     'Set m_objExcel = New Excel.Application
      m_objExcel.Visible = fVisible
    
    PROC_EXIT:
      Exit Sub
    
    PROC_ERR:
      MsgBox "Error: " & Err.Number & ". " & Err.Description, , _
        "StartExcel"
      Resume PROC_EXIT
    End Sub

Special projects

Image 4

Projects like the one in the screenshot were created using the DataTune technology as turn-key solutions for specific customers. In the image: a project for Del Engineering.

Attaching details about companies

It is possible to attach to the names of companies a broad range of details, including private company number, number of workers, date of establishment, sales turnover, names of contact persons, area of activities.

Location of potential customers

The TargetData venture offered the option of locating potential customers amongst companies and authorized traders, based on criteria such as number of workers, sales turnover, geographic region, where the proposed companies database includes only those companies that do not already exist in your organization’s database. The TargetData venture offered this service as an addition to the project, to be executed on completion of the project, so that the data regarding potential customers will be integrated into the existing data only after enhancement. In other words, if the company name is incorrect it should be enhanced first of all in order to identify it as a recognized name in the database of potential customers, so that it can be filtered from this database.

  • Development of a user interface and search keys

DataTune allows defining its outputs of its work in any format the customer chooses. However, it is possible to include in the work output an Access or Excel application including the following:

  • A user-friendly interface containing buttons and menus for data retrieval, report generation, and updating of data.
  • Search keys permitting extremely rapid data retrieval.
  • Construction of ready-made queries for performing retrieval operating known in advance (e.g. retrieval of all the data about customers whose enlistment date lies in a range of given dates).
  • Issue of reports based on user’s characteristics.
  • Control of several databases from a single window.

Systems analysis

The project may include comprehensive systems analysis. This is required when:

  • It is necessary to convert an existing system.
  • It is necessary to develop a system which will interface with an existing system.
  • It is necessary to develop customized software.

Systems analysis permits the TargetData staff to thoroughly learn about the work processes currently existing, in order to develop a system which will adopt the work processes and the logic behind them so that they can be assimilated in the computerized system. The systems analysis will take about three weeks and will contain the following elements:

  • A written questionnaire: At this stage you will be sent a list of questions about various subjects related to the existing work processes, professionals, roles, time durations and products.
  • An oral questionnaire: In this stage the TargetData staff interviews the key personnel in the organization in order to acquire in-depth understanding of the work process, locate bottlenecks, and construct a flow chart of the work process. Analysis of the information systems includes the analysis of current and future systems.

The Project Manager assigned by TargetData to work with the customer makes an in-depth investigation of which items of information are required by the customer, which items are collected, which items are collected but not saved, which items may be retrieved at a later stage (for example, for purposes of marketing additional projects to customers), which items are unnecessary, etc.

Technology and infrastructure

Image 5

Methodology

Target Data makes use of unique tools that it has developed for automatic data processing on a tremendous scale.

At the end of the automatic process a list is produced of the data requiring manual examination and enhancement.

Image 6

Manual enhancement

Development tools

Image 7

The unique system for enhancement of addresses

DataTune was developed under the following platforms:

  • Microsoft Visual C++ 6.0
  • Microsoft Visual Basic 6.0
  • Microsoft Access
  • VBA (Visual Basic for Application) — automation and macro programs for Office and other Microsoft tools.

The DataTune Scrambler

DataTune has a scrambling feature which can be used for providing a "pilot" prior to starting a project. The advantage of a "pilot" is the ability to randomly select a number of records (usually 500) and perform data cleansing on these records. TargetData has provided such pilot free of charge. The part in the DataTune application that does that is this subroutine:

VB
Private Sub DoScramble()
    Dim rs1, rs2 As Recordset
    Dim NumRecs As Long
  
    Randomize Timer
    Status = "Create temp table"
    DoEvents
    CreateTDTable "td_temp1"
    Status = "Create Table " + Text2.Text
    DoEvents
    CreateTDTable Text2.Text
    Status = "Copy data to temp table"
    DoEvents
    CopyAllData CurrentTable, "td_temp1"
    Status = "Start scrambling"
    DoEvents
    Set rs2 = CustDB.OpenRecordset(Text2.Text)
    Set rs1 = CustDB.OpenRecordset("td_temp1")
    rs1.MoveLast
    rs1.MoveFirst
    NumRecs = rs1.RecordCount
    ProgressBar1.Max = Val(Text1.Text)
    For k = 1 To Val(Text1.Text)
        ProgressBar1.Value = k
        DoEvents
        jump = Int(Rnd(1) * (NumRecs - 1))
        Status = "Sampling record #" + Trim(Str(jump))
        rs1.Move jump
        rs2.AddNew
        For i = 0 To rs1.Fields.Count - 1
            rs2(i) = rs1(i)
        Next
        rs2.Update
        rs1.Delete
        Set rs1 = CustDB.OpenRecordset("td_temp1")
        rs1.MoveLast
        rs1.MoveFirst
        NumRecs = rs1.RecordCount
    Next
    rs1.Close
    CustDB.TableDefs.Delete "td_temp1"
    Status = "Completed"
    rTables.FindFirst "TableName = '" + Text2.Text + "'"
    If rTables.NoMatch Then
        rTables.AddNew
        rTables("TABLENAME") = Text2.Text
        rTables("PROJECTID") = ProjectID
        rTables.Update
    End If
    DoEvents
End Sub

Test and optimization tools

  • Numega — Bounds Check was used for source code testing.

Platforms supported

DataTune supports the following Operating Systems:

  • Windows
  • DOS
  • UNIX
  • Mainframe systems
  • AS 400
  • Mac OS
  • LINUX

Image 8

The unique mechanism for learning the structure of unfamiliar files

The output of DataTune may be supplied as an EXE file. In certain cases the source code may be purchased from TargetData.

The work process Target Data offered to clients was generally implemented as follows:

No. Activity Time required
0 Meeting in the customer’s offices. Alternatively a pilot scheme may be initiated.
1 Executing a pilot scheme of 500 records (if the customer is interested in this). Generally within 4 working days
2Preparing a proposal. Generally within 3 working days
3 Executing the work and installing the products. Generally within 10 working days
4Performing training in the customer’s offices, or by phone, if necessary. When installing the work products, or at a different time at the customer’s request
5Assimilation and accompaniment. During one year from the date of completion of the project
6Receiving feedback and introduction improvements to the software as required. Up to one year from the date of completion of the project 
Michael Haephrati CodeProject MVP 2013  

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)