Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / XML

Cinchoo ETL - Deserialize Selective XML Nodes from Large XML File

3.00/5 (1 vote)
7 Nov 2021CPOL2 min read 3.5K  
Use Cinchoo ETL to deserialize selective XML nodes from large XML file
In this tip, you will learn how to deserialize selective XML nodes from large XML file using Cinchoo ETL framework. It is very simple to use, with few lines of code, the conversion can be done. You can convert large files as the conversion process is stream based, quite fast and with low memory footprint.

1. Introduction

ChoETL is an open source ETL (extract, transform and load) framework for .NET. It is a code based library for extracting data from multiple sources, transforming, and loading into your very own data warehouse in .NET environment. You can have data in your data warehouse in no time.

This article talks about deserializing selective XML nodes from large XML file using Cinchoo ETL framework. It is very simple to use, with few lines of code, the conversion can be done. You can convert large files as the conversion process is stream based, quite fast and with low memory footprint.

2. Requirement

This framework library is written in C# using .NET 4.5 / .NET Core 3.x Framework.

3. How to Use

3.1 Sample Data

Let's begin by looking into the sample XML file below. Assuming the XML file is large in size, in here, address nodes are repeated and wanted to deserialize them to object model.

Listing 3.1.1. Xml file (sample.xml)
XML
<Sites>
    <PostCodeValidatedSite>
        <Address>
            <ALK>A00067262524</ALK>
            <BuildingName>1 The Pavilions</BuildingName>
            <CSSDistrictCode>CM</CSSDistrictCode>
            <ExchangeCode>SOL</ExchangeCode>
            <IsPostCodeValid>true</IsPostCodeValid>
            <Locality>Shirley</Locality>
            <PostCode>B90 4SB</PostCode>
            <PostTown>Solihull</PostTown>
            <Qualifier>Gold</Qualifier>
            <Street>Cranmore Drive</Street>
            <Technologies>
                <Technology>
                    <IsAssociated>true</IsAssociated>
                    <IsRestricted>false</IsRestricted>
                    <Name>Copper</Name>
                </Technology>
                <Technology>
                    <IsAssociated>true</IsAssociated>
                    <IsRestricted>false</IsRestricted>
                    <Name>PointToPointFibre</Name>
                </Technology>
                <Technology>
                    <IsAssociated>false</IsAssociated>
                    <IsRestricted>false</IsRestricted>
                    <Name>FTTPBrownfield</Name>
                </Technology>
                <Technology>
                    <IsAssociated>false</IsAssociated>
                    <IsRestricted>false</IsRestricted>
                    <Name>FTTPGreenfield</Name>
                </Technology>
            </Technologies>
        </Address>
        <Coordinates>
            <Easting>413358</Easting>
            <Latitude>52.39657</Latitude>
            <Longitude>-1.79875</Longitude>
            <Northing>278082</Northing>
        </Coordinates>
    </PostCodeValidatedSite>
    <PostCodeValidatedSite>
        <Address>
            <ALK>A15100427347</ALK>
            <BuildingName>1 The Pavilions</BuildingName>
            <CSSDistrictCode>CM</CSSDistrictCode>
            <ExchangeCode>SOL</ExchangeCode>
            <IsPostCodeValid>true</IsPostCodeValid>
            <Locality>Shirley</Locality>
            <PostCode>B90 4SB</PostCode>
            <PostTown>Solihull</PostTown>
            <Qualifier>Gold</Qualifier>
            <Street>Cranmore Drive</Street>
            <SubBuilding>Floor 001-Room Comm</SubBuilding>
            <Technologies>
                <Technology>
                    <IsAssociated>false</IsAssociated>
                    <IsRestricted>false</IsRestricted>
                    <Name>Copper</Name>
                </Technology>
                <Technology>
                    <IsAssociated>true</IsAssociated>
                    <IsRestricted>false</IsRestricted>
                    <Name>PointToPointFibre</Name>
                </Technology>
                <Technology>
                    <IsAssociated>false</IsAssociated>
                    <IsRestricted>false</IsRestricted>
                    <Name>FTTPBrownfield</Name>
                </Technology>
                <Technology>
                    <IsAssociated>false</IsAssociated>
                    <IsRestricted>false</IsRestricted>
                    <Name>FTTPGreenfield</Name>
                </Technology></Technologies>
        </Address>
        <Coordinates>
            <Easting>413358</Easting>
            <Latitude>52.39657</Latitude>
            <Longitude>-1.79875</Longitude>
            <Northing>278082</Northing>
        </Coordinates>
    </PostCodeValidatedSite>
</Sites>

The first thing to do is to install ChoETL /ChoETL.NETStandard nuget package. To do this, run the following command in the Package Manager Console.

.NET Framework

Install-Package ChoETL

.NET Core

Install-Package ChoETL.NETStandard

Now add ChoETL namespace to the program.

using ChoETL;

3.2 Deserialization Operation

As XML file may be large in size, we need to consider deserialize address nodes in stream model rather than loading entire file in memory to avoid memory pressure.

Define Address object model matching address XML node.

Listing 3.2.1. Address class model
C#
public class Address
{
    [XmlElement("ALK")]
    public string ALK { get; set; }

    [XmlElement("BuildingName")]
    public string BuildingName { get; set; }

    [XmlElement("CSSDistrictCode")]
    public string CSSDistrictCode { get; set; }

    [XmlElement("IsPostCodeValid")]
    public Boolean IsPostCodeValid { get; set; }

    [XmlElement("Locality")]
    public string Locality { get; set; }

    [XmlElement("PostCode")]
    public string PostCode { get; set; }

    [XmlElement("PostTown")]
    public string PostTown { get; set; }

    [XmlElement("Qualifier")]
    public string Qualifier { get; set; }

    [XmlElement("Street")]
    public string Street { get; set; }
}

Then using Cinchoo ETL library to extract the address nodes using WithXPath() method and deserialize them to address object model as below.

Listing 3.2.2. Parse XML file
JavaScript
public static void Main()
{
    using (var r = new ChoXmlReader<Address>("sample.xml").WithXPath("//Address"))
    {
        r.Print();
    }
}

Sample fiddle: https://dotnetfiddle.net/Jg1lUv

For more information about Cinchoo ETL, please visit the other CodeProject articles:

History

  • 7th November, 2021: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)