GitHub: https://github.com/skywithin/Ebcdic2Unicode
Introduction
The purpose of Ebcdic2Ascii
converter is to provide simple API for converting mainframe dump files encoded in EBCIDIC into ASCII.
Background
Our daily operation heavily depends on multiple feed files received from a mainframe. Most of these files are EBCDIC encoded which contain packed numbers (COMP-3). The "old team" used Easytrieve software to write COBOL-like scripts to parse source files and convert them into something readable, but these scripts are extremely difficult to maintain and can only run on 32-bit machines. I've combined multiple sources from the net to create a single API for converting EBCIDIC encoded files into ASCII encoded CSV files.
One of the common problems with converting EBCIDIC encoded files is packed numbers (also called "Computational-3", "Packed Decimal", or "Packed") because they are not characters and there is no byte-to-byte conversion for these fields. Technically, packed numbers are binary fields that put two digits into each byte in order to half the storage requirement compared to a character. If your file does not contain packed numbers, you are lucky and you won't require this tutorial. Otherwise, you need to know beforehand the template of the file format: the length of the "line" and each "field", its position, length and format within that line.
Using the Code
I thought that a concept of a line template, containing one or more field templates, helps a programmer to visualize the structure of the file and makes it easier to extend functionality for more complex tasks.
The parser takes an instance of a LineTemplate
and can process either the whole file at once or line by line, which allows you to swap different templates for files with complex data structure. If you parse the whole file, you can call CreateCsvFile()
method to dump parsed file in CSV format. Alternatively, you can iterate through an array of parsed lines. Each ParsedLine
object contains a dictionary of ParsedField
. All of the parsing logic is concentrated in ParsedField
class.
The library contains a handful of classes listed below. First of all, you will need to create a LineTemplate
for the file you intend to parse. When you make an instance of a LineTemplate
class, you'll have to provide the line length (how many bytes per line) and a name (anything that makes sense to you).
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Linq;
namespace Ebcdic2Ascii
{
public class LineTemplate
{
public Dictionary<string, FieldTemplate> FieldTemplateDictionary { get; private set; }
public string LineTemplateName { get; private set; }
public int LineSize { get; private set; }
public int FieldsCount
{
get
{
return this.FieldTemplateDictionary.Count;
}
}
public LineTemplate(int lineSize, string templateName)
{
if (lineSize <= 0)
{
throw new ArgumentOutOfRangeException("line length must be greater than zero");
}
this.LineTemplateName = templateName;
this.LineSize = lineSize;
}
public void AddFieldTemplate(FieldTemplate fieldTemplate)
{
if ((fieldTemplate.StartPosition + fieldTemplate.FieldSize) > this.LineSize)
{
throw new Exception(String.Format("Field \"{0}\" exceeds line boundary", fieldTemplate.FieldName));
}
this.FieldTemplateDictionary.Add(fieldTemplate.FieldName, fieldTemplate);
}
public string GetFieldNamesCSV(bool addQuotes)
{
StringBuilder sb = new StringBuilder();
int count = 0;
foreach (FieldTemplate fieldTemplate in this.FieldTemplateDictionary.Values)
{
sb.Append(addQuotes ? "\"" : "");
sb.Append(fieldTemplate.FieldName);
sb.Append(addQuotes ? "\"" : "");
sb.Append(this.FieldTemplateDictionary.Count < count ? "," : "");
count++;
}
return sb.ToString();
}
public XElement GetLineTemplateXml(string templateName)
{
XElement lineXml = new XElement("line",
new XAttribute("templateName", this.LineTemplateName),
new XAttribute("lineSize", this.LineSize),
new XAttribute("fieldCount", this.FieldsCount)
);
foreach (FieldTemplate field in FieldTemplateDictionary.Values)
{
XElement fieldXml = field.GetFieldTemplateXml();
lineXml.Add(fieldXml);
}
return lineXml;
}
public string GetLineTemplateXmlString()
{
XElement xml = this.GetLineTemplateXml(this.LineTemplateName);
return xml.ToString();
}
}
}
In addition, line template contains a dictionary of type FieldTemplate
. Line template must contain at least one field template for the parser to work. Each field must have a unique name, but their positions may overlap if required.
FieldType
enumerator provides most common data types for conversion.
AlphaNum
- Regular alpha-numerical text encoded in EBCIDIC Packed
- Packed number (COMP-3 / packed decimal) Binary
- Binary Int16
or Int32
Numeric
- EBCIDIC encoded signed or unsigned number Date
- EBCIDIC encoded number representing a date in yyMMdd format PackedDate
- Packed number representing a date in yyMMdd format SourceBytesInHex
- Source bytes in hex (raw data) SourceBytesInDec
- Source bytes in decimal (raw data)
FieldTemplate
class has two constructors. Overloaded constructor takes an extra parameter: int decimalPlaces
, used by packed and numerical field types. By default, decimalPlaces
equals zero.
using System;
using System.Xml.Linq;
namespace Ebcdic2Ascii
{
public enum FieldType { AlphaNum, Packed, Binary, Numeric, Date,
PackedDate, SourceBytesInHex, SourceBytesInDec }
public class FieldTemplate
{
public string FieldName { get; private set; }
public FieldType Type { get; private set; }
public int StartPosition { get; private set; }
public int FieldSize { get; private set; }
public int DecimalPlaces { get; private set; }
public FieldTemplate(string fieldName, FieldType fieldType,
int startPosition, int fieldSize, int decimalPlaces)
{
this.ValidateInputParameters(fieldName, fieldType, startPosition, fieldSize, decimalPlaces);
this.FieldName = fieldName.Trim();
this.Type = fieldType;
this.StartPosition = startPosition;
this.FieldSize = fieldSize;
this.DecimalPlaces = decimalPlaces;
}
public FieldTemplate(string fieldName, FieldType fieldType, int startPosition, int fieldSize)
: this(fieldName, fieldType, startPosition, fieldSize, 0)
{
}
private void ValidateInputParameters(string fieldName,
FieldType fieldType, int startPosition, int fieldSize, int decimalPlaces)
{
if (fieldName == null || fieldName.Trim().Length == 0)
{
throw new ArgumentNullException("Field name is required for a template");
}
if (startPosition < 0)
{
throw new ArgumentOutOfRangeException(String.Format(
"Start position cannot be negative for a field template \"{0}\"", fieldName));
}
if (fieldSize <= 0)
{
throw new ArgumentOutOfRangeException(String.Format(
"Filed size must be greater than zero for a field template \"{0}\"", fieldName));
}
if (fieldType == FieldType.Binary)
{
if (fieldSize != 2 && fieldSize != 4)
{
throw new Exception(String.Format(
"Incorrect number of bytes provided for a binary field template \"{0}\": {1}",
fieldName, fieldSize));
}
}
if (decimalPlaces < 0)
{
throw new ArgumentOutOfRangeException(String.Format(
"Number of decimal places cannot be negative for a field template \"{0}\"", fieldName));
}
if (decimalPlaces > 6)
{
throw new ArgumentOutOfRangeException(String.Format(
"Number of decimal places exceeds limit for a field template \"{0}\"", fieldName));
}
}
public XElement GetFieldTemplateXml()
{
XElement fieldXml = new XElement("field", new XAttribute("name", this.FieldName));
fieldXml.Add(new XElement("type", this.Type, new XAttribute("code", (int)Type)));
fieldXml.Add(new XElement("position", this.StartPosition));
fieldXml.Add(new XElement("length", this.FieldSize));
fieldXml.Add(new XElement("decimalPlaces", this.DecimalPlaces));
return fieldXml;
}
}
}
ParserUtilites static
class provides reusable functionality.
using System;
using System.IO;
namespace Ebcdic2Ascii
{
public static class ParserUtilities
{
public static void PrintError(string errMsg)
{
Console.ForegroundColor = ConsoleColor.Red;
Console.WriteLine(errMsg);
Console.ForegroundColor = ConsoleColor.Gray;
}
public static void ConvertLineArrayToCsv(ParsedLine[] lines,
string outputFilePath, bool includeColumnNames, bool addQuotes)
{
Console.WriteLine("{0}: Writing output file...", DateTime.Now);
if (Uri.IsWellFormedUriString(outputFilePath, UriKind.RelativeOrAbsolute))
{
throw new Exception("Output file path is not valid");
}
if (lines == null || lines.Length == 0)
{
PrintError("Nothing to write");
return;
}
try
{
if (File.Exists(outputFilePath))
{
File.Delete(outputFilePath);
}
using (TextWriter tw = new StreamWriter(outputFilePath, true))
{
if (includeColumnNames && lines.Length > 0)
{
tw.WriteLine(lines[0].Line_Template.GetFieldNamesCSV(addQuotes));
}
foreach (ParsedLine line in lines)
{
tw.WriteLine(line.GetParsedFieldValuesCSV(addQuotes));
}
}
Console.WriteLine("{1}: Output file created {0}",
Path.GetFileName(outputFilePath), DateTime.Now);
}
catch (Exception ex)
{
PrintError(ex.Message);
}
}
public static string ConvertBytesToDec(byte[] bytes)
{
string result = "";
foreach (byte b in bytes)
{
result += (int)b + " ";
}
return result.Trim();
}
public static byte[] ConvertHexStringToBytes(string hexStr)
{
if ((hexStr.Length + 1) % 3 != 0)
{
throw new Exception("Invalid hex string");
}
String[] strArray = hexStr.Split('-');
byte[] byteArray = new byte[strArray.Length];
for (int i = 0; i < strArray.Length; i++)
{
byteArray[i] = Convert.ToByte(strArray[i], 16);
}
return byteArray;
}
public static string RemoveNonAsciiChars(string text)
{
char[] chars = text.ToCharArray();
for (int i = 0; i < chars.Length; i++)
{
if ((int)chars[i] < 32 || (int)chars[i] > 126)
{
chars[i] = ' ';
}
else if (chars[i] == '"' || chars[i] == '^')
{
chars[i] = ' ';
}
}
return new String(chars).Trim();
}
public static byte[] ReadBytesRange(byte[] sourceBytes,
int startPosition, int length, bool throwExceptionIfSourceArrayIsTooShort)
{
byte[] resultBytes;
if (length <= 0)
{
throw new Exception("Invalid array length: " + length);
}
if (startPosition < 0)
{
throw new Exception("Invalid start position: " + length);
}
if (sourceBytes.Length < startPosition)
{
throw new Exception("Start position is outside of array bounds");
}
if (sourceBytes.Length - startPosition - length < 0)
{
if (throwExceptionIfSourceArrayIsTooShort)
{
throw new Exception("End position is outside of array bounds");
}
else
{
length = sourceBytes.Length - startPosition;
}
}
resultBytes = new byte[length];
Array.Copy(sourceBytes, startPosition, resultBytes, 0, length);
return resultBytes;
}
public static byte[] ReadBytesRange(byte[] sourceBytes, int startPosition, int length)
{
bool throwExceptionIfSourceArrayIsTooShort = true;
byte[] resultBytes = ReadBytesRange(sourceBytes,
startPosition, length, throwExceptionIfSourceArrayIsTooShort);
return resultBytes;
}
}
}
ParsedField
class encapsulates the main functionality for converting EBCDIC bytes to ASCII. I need to give citation of the original author of the Unpack()
method, but I could not find the original source. Please let me know if you find out where this code came from.
using System;
using System.Globalization;
using System.Text;
using System.Text.RegularExpressions;
namespace Ebcdic2Ascii
{
public class ParsedField
{
public FieldTemplate Field_Template { get; private set; }
public string Value { get; private set; }
public byte[] OriginalBytes { get; private set; }
public string OriginalBytesInHex
{
get
{
return BitConverter.ToString(this.OriginalBytes);
}
}
public string OriginalBytesInDec
{
get
{
return ParserUtilities.ConvertBytesToDec(this.OriginalBytes);
}
}
public bool ParsedSuccessfully { get; private set; }
public ParsedField(byte[] lineBytes, FieldTemplate fieldTemplate)
{
this.ParsedSuccessfully = true;
this.Field_Template = fieldTemplate;
this.Value = ParseField(lineBytes, fieldTemplate);
}
private string ParseField(byte[] lineBytes, FieldTemplate template)
{
if (lineBytes == null || lineBytes.Length == 0)
{
ParserUtilities.PrintError("Line bytes is null or empty");
this.ParsedSuccessfully = false;
return null;
}
if (lineBytes.Length < (template.StartPosition + template.FieldSize))
{
this.ParsedSuccessfully = false;
throw new Exception(String.Format(
"Field \"{0}\" length falls outside the line length", template.FieldName));
}
byte[] fieldBytes = new byte[template.FieldSize];
Array.Copy(lineBytes, template.StartPosition, fieldBytes, 0, template.FieldSize);
this.OriginalBytes = fieldBytes;
if (this.Field_Template.Type == FieldType.AlphaNum)
{
return this.ConvertAlphaNumEbcdic(fieldBytes);
}
else if (this.Field_Template.Type == FieldType.Numeric)
{
return this.ConvertNumericEbcdic(fieldBytes, template.DecimalPlaces);
}
else if (this.Field_Template.Type == FieldType.Packed)
{
return this.Unpack(fieldBytes, template.DecimalPlaces);
}
else if (this.Field_Template.Type == FieldType.Binary)
{
return ConvertBinaryEbcdic(fieldBytes, template.DecimalPlaces);
}
else if (this.Field_Template.Type == FieldType.Date)
{
return ConvertDateStrEbcdic(fieldBytes);
}
else if (this.Field_Template.Type == FieldType.PackedDate)
{
return ConvertPackedDateStrEbcdic(fieldBytes);
}
else if (this.Field_Template.Type == FieldType.SourceBytesInHex)
{
return this.OriginalBytesInHex;
}
else if (this.Field_Template.Type == FieldType.SourceBytesInDec)
{
return this.OriginalBytesInDec;
}
else
{
this.ParsedSuccessfully = false;
throw new Exception(String.Format(
"Unable to parse field \"{0}\". Unknown field type: {1}",
template.FieldName, template.Type.ToString()));
}
}
private string ConvertAlphaNumEbcdic(byte[] ebcdicBytes)
{
if (this.ByteArrayIsFullOf_0xFF(ebcdicBytes))
{
return "";
}
Encoding ebcdicEnc = Encoding.GetEncoding("IBM037");
string result = ebcdicEnc.GetString(ebcdicBytes);
return result;
}
private string ConvertNumericEbcdic(byte[] ebcdicBytes, int decimalPlaces)
{
string tempNumStr = this.ConvertAlphaNumEbcdic(ebcdicBytes).Trim();
if (tempNumStr == null || tempNumStr.Length == 0)
{
return "";
}
if (Regex.IsMatch(tempNumStr, @"^\d+$"))
{
string result = this.AdjustDecimalValues(Int64.Parse(tempNumStr), decimalPlaces);
return result;
}
else if (Regex.IsMatch(tempNumStr, @"^\d+[A-R{}]$"))
{
string lastChar = tempNumStr.Substring(tempNumStr.Length - 1);
switch (lastChar)
{
case "{":
tempNumStr = tempNumStr.Replace("{", "0");
break;
case "A":
tempNumStr = tempNumStr.Replace("A", "1");
break;
case "B":
tempNumStr = tempNumStr.Replace("B", "2");
break;
case "C":
tempNumStr = tempNumStr.Replace("C", "3");
break;
case "D":
tempNumStr = tempNumStr.Replace("D", "4");
break;
case "E":
tempNumStr = tempNumStr.Replace("E", "5");
break;
case "F":
tempNumStr = tempNumStr.Replace("F", "6");
break;
case "G":
tempNumStr = tempNumStr.Replace("G", "7");
break;
case "H":
tempNumStr = tempNumStr.Replace("H", "8");
break;
case "I":
tempNumStr = tempNumStr.Replace("I", "9");
break;
case "}":
tempNumStr = "-" + tempNumStr.Replace("}", "0");
break;
case "J":
tempNumStr = "-" + tempNumStr.Replace("J", "1");
break;
case "K":
tempNumStr = "-" + tempNumStr.Replace("K", "2");
break;
case "L":
tempNumStr = "-" + tempNumStr.Replace("L", "3");
break;
case "M":
tempNumStr = "-" + tempNumStr.Replace("M", "4");
break;
case "N":
tempNumStr = "-" + tempNumStr.Replace("N", "5");
break;
case "O":
tempNumStr = "-" + tempNumStr.Replace("O", "6");
break;
case "P":
tempNumStr = "-" + tempNumStr.Replace("P", "7");
break;
case "Q":
tempNumStr = "-" + tempNumStr.Replace("Q", "8");
break;
case "R":
tempNumStr = "-" + tempNumStr.Replace("R", "9");
break;
}
string result = this.AdjustDecimalValues(Int64.Parse(tempNumStr), decimalPlaces);
return result;
}
else
{
this.ParsedSuccessfully = false;
return tempNumStr;
}
}
private string ConvertBinaryEbcdic(byte[] ebcdicBytes, int decimalPlaces)
{
if (this.ByteArrayIsFullOf_0xFF(ebcdicBytes))
{
return "";
}
Array.Reverse(ebcdicBytes);
long tempNum;
if (ebcdicBytes.Length == 2)
{
tempNum = BitConverter.ToUInt16(ebcdicBytes, 0);
}
else if (ebcdicBytes.Length == 4)
{
tempNum = BitConverter.ToInt32(ebcdicBytes, 0);
}
else
{
throw new Exception(String.Format(
"Incorrect number of bytes provided for a binary field: {1}", decimalPlaces));
}
string result = this.AdjustDecimalValues(tempNum, decimalPlaces);
return result;
}
private string AdjustDecimalValues(long numericValue, int decimalPlaces)
{
if (decimalPlaces == 0)
{
return numericValue.ToString();
}
double result = numericValue / Math.Pow(10, decimalPlaces);
return result.ToString();
}
private string ConvertDateStrEbcdic(byte[] ebcdicBytes)
{
string dateStr = this.ConvertAlphaNumEbcdic(ebcdicBytes).Trim();
string result = this.ConvertDateStr(dateStr);
return result;
}
private string ConvertPackedDateStrEbcdic(byte[] ebcdicBytes)
{
string dateStr = this.Unpack(ebcdicBytes, 0);
string result = this.ConvertDateStr(dateStr);
return result;
}
private string ConvertDateStr(string dateStr)
{
dateStr = dateStr.Trim();
if (dateStr.Trim() == "" || dateStr == "0" ||
dateStr == "0000000" || dateStr == "9999999")
{
return "";
}
if (Regex.IsMatch(dateStr, @"^\d{3,5}$"))
{
dateStr = dateStr.PadLeft(6, '0');
}
Match match = Regex.Match(dateStr, @"^(?<Year>\d{3})(?<Month>\d{2})
(?<Day>\d{2})$");
if (match.Success)
{
int year = Int32.Parse(match.Groups["Year"].Value) + 1900;
int month = Int32.Parse(match.Groups["Month"].Value);
int day = Int32.Parse(match.Groups["Day"].Value);
try
{
DateTime tempDate = new DateTime(year, month, day);
return tempDate.ToString("yyyy-MM-dd");
}
catch { }
}
if (Regex.IsMatch(dateStr, @"^\d{6}$"))
{
DateTime tempDate;
if (DateTime.TryParseExact(dateStr, "yyMMdd",
CultureInfo.InvariantCulture, DateTimeStyles.None, out tempDate))
{
return tempDate.ToString("yyyy-MM-dd");
}
}
this.ParsedSuccessfully = false;
return dateStr;
}
private string Unpack(byte[] ebcdicBytes, int decimalPlaces)
{
if (ByteArrayIsFullOf_0xFF(ebcdicBytes))
{
return "";
}
long lo = 0;
long mid = 0;
long hi = 0;
bool isNegative;
switch (Nibble(ebcdicBytes, 0))
{
case 0x0D:
isNegative = true;
break;
case 0x0F:
case 0x0C:
isNegative = false;
break;
default:
this.ParsedSuccessfully = false;
return this.ConvertAlphaNumEbcdic(ebcdicBytes);
}
long intermediate;
long carry;
long digit;
for (int j = ebcdicBytes.Length * 2 - 1; j > 0; j--)
{
intermediate = lo * 10;
lo = intermediate & 0xffffffff;
carry = intermediate >> 32;
intermediate = mid * 10 + carry;
mid = intermediate & 0xffffffff;
carry = intermediate >> 32;
intermediate = hi * 10 + carry;
hi = intermediate & 0xffffffff;
carry = intermediate >> 32;
digit = Nibble(ebcdicBytes, j);
if (digit > 9)
{
this.ParsedSuccessfully = false;
return this.ConvertAlphaNumEbcdic(ebcdicBytes);
}
intermediate = lo + digit;
lo = intermediate & 0xffffffff;
carry = intermediate >> 32;
if (carry > 0)
{
intermediate = mid + carry;
mid = intermediate & 0xffffffff;
carry = intermediate >> 32;
if (carry > 0)
{
intermediate = hi + carry;
hi = intermediate & 0xffffffff;
carry = intermediate >> 32;
}
}
}
decimal result = new Decimal((int)lo, (int)mid, (int)hi, isNegative, (byte)decimalPlaces);
return result.ToString();
}
private int Nibble(byte[] ebcdicBytes, int nibbleNo)
{
int b = ebcdicBytes[ebcdicBytes.Length - 1 - nibbleNo / 2];
return (nibbleNo % 2 == 0) ? (b & 0x0000000F) : (b >> 4);
}
private bool ByteArrayIsFullOf_0xFF(byte[] ebcdicBytes)
{
if (ebcdicBytes == null || ebcdicBytes.Length == 0)
{
return false;
}
foreach (byte b in ebcdicBytes)
{
if (b != 0xFF)
{
return false;
}
}
return true;
}
}
}
ParsedLine
contains a dictionary or parsed fields as well as means of accessing them.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Ebcdic2Ascii
{
public class ParsedLine
{
public LineTemplate Line_Template { get; private set; }
public Dictionary<string, ParsedField> FieldDictionary
{ get; private set; }
public string this[string fieldName]
{
get
{
return this.FieldDictionary[fieldName].Value.Trim();
}
}
public ParsedLine(LineTemplate lineTemplate, byte[] lineBytes)
{
this.Line_Template = lineTemplate;
this.ParseLine(lineBytes, lineTemplate);
}
private void ParseLine(byte[] lineBytes, LineTemplate lineTemplate)
{
this.ValidateInputParameters(lineBytes, lineTemplate);
foreach (var fieldTemplate in lineTemplate.FieldTemplateDictionary)
{
FieldDictionary.Add(fieldTemplate.Key,
new ParsedField(lineBytes, lineTemplate.FieldTemplateDictionary[fieldTemplate.Key]));
}
}
private void ValidateInputParameters(byte[] lineBytes, LineTemplate template)
{
if (lineBytes == null)
{
throw new ArgumentNullException("Line bytes required");
}
if (lineBytes.Length < template.LineSize)
{
throw new Exception(String.Format(
"Bytes provided: {0}, line size: {1}", lineBytes.Length, template.LineSize));
}
if (template == null)
{
throw new ArgumentNullException("line template is required");
}
if (template.FieldsCount == 0)
{
throw new Exception("Field templates have not been defined in the line template");
}
}
public string GetParsedFieldValuesCSV(bool addQuotes)
{
StringBuilder sb = new StringBuilder();
int count = 0;
foreach (ParsedField parsedField in this.FieldDictionary.Values)
{
sb.Append(addQuotes ? "\"" : "");
sb.Append(parsedField.Value);
sb.Append(addQuotes ? "\"" : "");
sb.Append(this.FieldDictionary.Count < count ? "," : "");
count++;
}
return sb.ToString();
}
}
}
EbcdicParser
is the manager class which takes care of applying template to a given EBCDIC file. It only has a few public
methods and a single public
property called Lines
which gives you an array of type ParsedLine
.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
namespace Ebcdic2Ascii
{
public class EbcdicParser
{
public ParsedLine[] Lines { get; private set; }
#region Constructors
public EbcdicParser()
{
}
public EbcdicParser(byte[] allBytes, LineTemplate lineTemplate)
{
double expectedRows = (double)allBytes.Length / lineTemplate.LineSize;
Console.WriteLine("{0}: Parsing started", DateTime.Now);
Console.WriteLine("{1}: Line count est {0:#,###.00}", expectedRows, DateTime.Now);
this.Lines = this.ParseAllLines(lineTemplate, allBytes);
GC.Collect();
GC.WaitForPendingFinalizers();
Console.WriteLine("{1}: {0} line(s) have been parsed", this.Lines.Count(), DateTime.Now);
}
public EbcdicParser(string sourceFilePath, LineTemplate lineTemplate)
: this(File.ReadAllBytes(sourceFilePath), lineTemplate)
{
}
#endregion
public ParsedLine[] ParseAllLines(LineTemplate lineTemplate, byte[] allBytes)
{
bool isSingleLine = false;
this.ValidateInputParameters(lineTemplate, allBytes, isSingleLine);
List<ParsedLine> parsedLines = new List<ParsedLine>();
byte[] lineBytes = new byte[lineTemplate.LineSize];
ParsedLine parsedLine;
for (int i = 0; i < allBytes.Length; i += lineTemplate.LineSize)
{
if (i % 1000 == 0)
{
Console.Write(i + "\r");
}
Array.Copy(allBytes, i, lineBytes, 0, lineTemplate.LineSize);
parsedLine = this.ParseSingleLine(lineTemplate, lineBytes);
parsedLines.Add(parsedLine);
}
return parsedLines.ToArray();
}
public ParsedLine[] ParseAllLines(LineTemplate lineTemplate, string sourceFilePath)
{
return this.ParseAllLines(lineTemplate, File.ReadAllBytes(sourceFilePath));
}
public ParsedLine ParseSingleLine(LineTemplate lineTemplate, byte[] lineBytes)
{
bool isSingleLine = true;
this.ValidateInputParameters(lineTemplate, lineBytes, isSingleLine);
ParsedLine parsedLine = new ParsedLine(lineTemplate, lineBytes);
return parsedLine;
}
private bool ValidateInputParameters(LineTemplate lineTemplate, byte[] allBytes, bool isSingleLine)
{
if (allBytes == null)
{
throw new ArgumentNullException("Ebcdic data is not provided");
}
if (lineTemplate == null)
{
throw new ArgumentNullException("Line template is not provided");
}
if (lineTemplate.FieldsCount == 0)
{
throw new Exception("Line template must contain at least one field");
}
if (allBytes.Length < lineTemplate.LineSize)
{
throw new Exception("Data length is shorter than the line size");
}
if (isSingleLine && allBytes.Length != lineTemplate.LineSize)
{
throw new Exception("Bytes count doesn't equal to line size");
}
double expectedRows = (double)allBytes.Length / lineTemplate.LineSize;
if (expectedRows % 1 != 0)
{
throw new Exception("Expected number of rows is not a whole number. Check line template.");
}
return true;
}
public void CreateCsvFile(string outputFilePath, bool includeColumnNames, bool addQuotes)
{
if (this.Lines == null || this.Lines.Length == 0)
{
throw new Exception("No lines have been parsed");
}
ParserUtilities.ConvertLineArrayToCsv(this.Lines, outputFilePath, includeColumnNames, addQuotes);
}
}
}
Finally, to run a program, we create an instance of a LineTemplate
class, add several FieldTemplate
objects to it and pass it on to EbcdicParser
. Then, we can access an array of ParsedLine
via Lines
property and read every field within that line.
class Program
{
static void Main(string[] args)
{
LineTemplate lineTemplate = new LineTemplate(200, "Accounts_SourceFileTemplate");
lineTemplate.AddFieldTemplate(new FieldTemplate("RecordType", FieldType.AlphaNum, 0, 2));
lineTemplate.AddFieldTemplate(new FieldTemplate("CustomerNumber", FieldType.Numeric, 2, 4));
lineTemplate.AddFieldTemplate(new FieldTemplate("FirstName", FieldType.AlphaNum, 6, 30));
lineTemplate.AddFieldTemplate(new FieldTemplate("LastName", FieldType.AlphaNum, 36, 30));
lineTemplate.AddFieldTemplate(new FieldTemplate("DateOfBirth", FieldType.PackedDate, 66, 2));
lineTemplate.AddFieldTemplate(new FieldTemplate("BalanceOutstanding", FieldType.Packed, 68, 2));
lineTemplate.AddFieldTemplate(new FieldTemplate("SomeStrangeData", FieldType.AlphaNum, 70, 35));
lineTemplate.AddFieldTemplate(new FieldTemplate("FileRunDate", FieldType.Date, 105, 7));
lineTemplate.AddFieldTemplate(new FieldTemplate("RentalDays", FieldType.Packed, 112, 2));
lineTemplate.AddFieldTemplate(new FieldTemplate("TheWholeLine",
FieldType.AlphaNum, 0, 200));
EbcdicParser parser = new EbcdicParser(@"C:\temp\sourceFile.dat", lineTemplate);
foreach (ParsedLine line in parser.Lines)
{
Console.WriteLine("{0} {1} {2}", line["CustomerNumber"],
line["FirstName"], line["DateOfBirth"]);
}
}
}
I hope you'll find it useful.
Points of Interest
Parser is rather quick. It takes a few seconds to parse source files which may be hundreds of megabytes in size.
History
- 2013-10-28: Uploaded source code, added more "background" and "using the code" information
- 2015-03-12: Source code has been refactored. Added IBM935 custom decoder for simplified Chinese characters. Added ability to export template to XML and use XML files to initialize templates.