(untagged)

Flattening out the complexity in flat file schemas in BizTalk 2004 - Part 1

Naveen Karamchetti

0.00/5 (No votes)

27 Feb 2006

An article explaining, how to write various flat file schemas in BizTalk Server 2004.

Download source files - 87.9 Kb

Introduction - First things first!

This article is the first part of a series of articles on writing flat file schemas in BizTalk Server 2004. Flat file schemas are known to be complex and cryptic. This article shall try to allay all the fears in writing a flat file schema.

Flat file structure

A flat file unlike an XML file does not have any visible inherent structure. A flat file's structure is evident from its usage and also requires some domain knowledge to understand its representation. A flat file structure can be of several types:

Delimited flat file.
Positional flat file.
A flat file with a combination of delimited and positional records.

Let's take some examples of a flat file and understand its structure.

Example 1 - A "TAB" delimited flat file

JOHN DOE    1964-10-05    CLAYTON
ROBERT B    1978-11-10    EDWARD STREET
JOHN LENON    1927-02-30    WORTHING
EDMOND DANTES    1910-09-12    COVENTRY
SIR CHAMBERS    1934-05-18    HARRODS

In the above example, there are 5 lines, each line represents a "record" of information.
A "record" consists of several "fields". In the example, the "fields" are "name", "date of birth" and "place of origin".
Each "field" is separated by a "TAB" space, which acts as a "delimiter" for each "field".
Each "line" is separated by a "CRLF" (a combination of carriage return and line feed). This "CRLF" combination works for "Windows based" systems. For "UNIX" based systems, only a "line feed" (LF) is used.

Example 2 - A comma (,) separated flat file

JOHN DOE, 1964-10-05, CLAYTON
ROBERT B, 1978-11-10, EDWARD STREET
JOHN LENON, 1927-02-30, WORTHING
EDMOND DANTES, 1910-09-12, COVENTRY
PETER JAMES,  , GATWICK
SIR CHAMBERS, 1934-05-18, HARRODS

This structure is very similar to the one used in example 1. The difference being that each field is separated by a comma (,), which acts as a delimiter for each field.
The fourth line does not have a value for the "date of birth" field, but even then the commas are in place. This is known as place holder, basically, the commas are in place indicating that the field has no value.

Example 3 - A positional flat file

12345678901234567890123456789012345678901234567890
JOHN DOE            1964-10-05CLAYTON             
ROBERT B            1978-11-10EDWARD STREET       
JOHN LENON          1927-02-30WORTHING            
EDMOND DANTES       1910-09-12COVENTRY            
SIR CHAMBERS        1934-05-18HARRODS

A positional flat file is one whose fields are placed in positions (columns), and the field lengths are of fixed size. In the above example, the "name" field has a fixed size of 20 characters, the "date-of-birth" field has a fixed size of 10 characters and the "place of origin" has a fixed size of 20 characters.
A positional flat file record must always be a child of the delimited record. The delimiter character specified for the parent-delimited record must not appear in the data of the child positional record. There is no way to escape the delimiter character of the parent-delimited record in the data of the child positional field.

Creating the BizTalk flat file schema solution

Create a new BizTalk Server Solution in Visual Studio.

Step 1: In the Visual Studio .NET menu, select the File -> New -> "Blank Solution" and type the name "FFSchemas":

Step 2: In the Solution Explorer, right click on the solution name "FFSchemas" and select Add -> New Project. In the "Add Project" dialog box, for the type of project, select "BizTalk Projects". Select the template "Empty BizTalk Project" and create a project named "FlatFileSchema".

Building the schemas - Example 1

We shall create the schema based on the example 1.

Step 1: Right-click on the project in the Solution Explorer and select the "Add New Item" option. Then, select the item "Schema" and name it "FFSchema_TAB". When the schema shows up, rename the "Root" element to "TSV".

Step 2: Select the item "Schema" and right-click, select Properties. Change the property "Schema Editor Extensions" to Flat File Extension:

Step 3: Select the item "TSV" and right-click, select Properties:

TSV "Root Node" properties
Property Name	Property Value
Child Delimiter Type	Hexadecimal
Child Delimiter	0x0D 0x0A
Structure	Delimited

Step 4: Select the item "TSV" and right-click -> Insert Schema Node -> Child Record. Name the Records as "Record" and create the child elements: "Name", "DOB" and "Address":

Step 5: Select the item "Record" and right-click, select Properties. Set the properties as shown in the table below. The hexadecimal character "0X09" represents a character. The child order is set to "Infix", since the tab appears in between the fields in each record. We need to support multiple records and hence we set the Max Occurs to "*" or "unbounded":

"Record" Node properties
Property Name	Property Value
Child Delimiter Type	Hexadecimal
Child Delimiter	0x09
Child Order	Infix
Min Occurs	1
Max Occurs	unbounded

Step 6: In the Solution Explorer, select the schema "FFSchema_TAB.xsd" and right-click -> select "Properties". In the property pages screen, select the properties as shown in the image. For the "Input Instance File Name" -> Choose the path where the input files(*.txt) are present:

Validating and testing the schema created

Once we have finished writing the schema, we need to validate and test the schema.

Step 1: In the Solution Explorer, select the schema "FFSchema_TAB.xsd" and right-click -> select "Validate Schema". Observe the output window and you would notice a message starting with "Validate Schema succeeded for file...".

Step 2: In the Solution Explorer, select the schema "FFSchema_TAB.xsd" and right-click -> select "Validate Instance". Observe the output window and you would notice a message starting with "Validate Instance succeeded for schema FFSchema_TAB.xsd...". Now click on the link which starts with the message "Validation generated XML output..."

The XML output file would look like this:

Building the schemas - Example 2

We shall create the schema based on the example 2.

Step 1: Right-click on the project in the Solution Explorer and select the "Add New Item" option. Then, select the item "Schema" and name it "FFSchema_CSV". When the schema shows up, rename the "Root" element to "CSV".

Step 2: Select the item "Schema" and right-click, and select Properties. Change the property "Schema Editor Extensions" to Flat File Extension:

Step 3: Select the item "CSV" and right-click, select the Properties:

CSV "Root Node" properties
Property Name	Property Value
Child Delimiter Type	Hexadecimal
Child Delimiter	0x0D 0x0A
Structure	Delimited

Step 4: Select the item "CSV" and right-click -> Insert Schema Node -> Child Record. Name the Records as "Record" and create the child elements: "Name", "DOB" and "Address":

Step 5: Select the item "Record" and right-click, select Properties. Set the properties are shown in the table below. The character "," represents a comma character. The child order is set to "Infix", since the comma appears in between the fields in each record. We need to support multiple records and hence we set the Max Occurs to "*" or "unbounded":

"Record" Node properties
Property Name	Property Value
Child Delimiter Type	Character
Child Delimiter	,
Child Order	Infix
Min Occurs	1
Max Occurs	unbounded

Step 6: In the Solution Explorer, select the schema "FFSchema_CSV.xsd" and right-click -> select "Properties". In the property pages screen, select the properties as shown in the image. For the "Input Instance File Name" -> Choose the path where the input files(*.txt) are present:

Validating and testing the schema created

Once we have finished writing the schema, we need to validate and test the schema.

Step 1: In the Solution Explorer, select the schema "FFSchema_CSV.xsd" and right-click -> select "Validate Schema". Observe the output window and you would notice a message starting with "Validate Schema succeeded for file...".

Step 2: In the Solution Explorer, select the schema "FFSchema_CSV.xsd" and right-click -> select "Validate Instance". Observe the output window and you would notice a message starting with "Validate Instance succeeded for schema FFSchema_CSV.xsd...". Now click on the link which starts with the message "Validation generated XML output..."

The XML output file would look like this:

Quick takeaways

Set the schema's editor extensions property before you start with the flat file schema.
Set "Child Delimiter Type" property to Hexadecimal to avoid character ambiguity.

Part 2

The next part of this article shall discuss about "Positional Flat files". Until then happy schema programming.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here