Overview
Dynamsoft’s OCR
SDK is an add-on of Dynamic .NET
TWAIN, an image acquisition SDK optimized for .NET applications. The OCR
SDK allows you to convert scanned images to searchable PDF/text files.
Recognized as a useful feature, it’s not easy to implement it. A lot of
complicated things, such as accuracy, image format and more, are involved to
get better results. OCR performance is another important factor that affects
the efficiency of the whole process.
Dynamsoft’s OCR SDK, optimized based on the highly developed
open source engine (Tesseract OCR engine), helps you relieve from these
burdens. By integrating with Dynamic .NET TWAIN, you can create a robust image
acquisition and processing solution in several lines of source code.
Key Features
-
Supports more than 40 languages, including Arabic and various
Asian languages.
-
High OCR performance by supporting multi-thread processing.
-
Accurate recognition with font identification
-
Easy integration with the image acquisition SDK – Dynamic .NET
TWAIN.
The following sections will show you how to integrate the
OCR add-on to your WinForm application and convert scanned images to searchable
PDF/text files.
Source Code
1. Embed
Dynamic .NET TWAIN to your WinForm or WPF app.
We will take WinForm as an
example.
Assume you’ve already downloaded and installed the .NET
component onto your development machine (If not, please download
the 30-day free trial from Dynamsoft’s website.).
Open your WinForm app or create a new one in Visual Studio. From
the Tools menu, select Choose Toolbox Items. In the prompt dialog
box, click Browse and select DynamicDotNetTWAIN.dll which can be
found in the installation folder of Dynamic .NET TWAIN. Click OK to
close the dialog box.
Drag and drop the component to the form.
2. Scan
images from scanners, webcams or get from local folders.
Dynamic .NET TWAIN supports getting images from various
sources, including scanners, webcams and other TWAIN/WIA/UVC compatible
devices. In this article, I’ll show you how to load an existing image from your
local disk.
SetViewMode
: defines the view mode of the control.
LoadImage
: loads the existing local images. Supported image format
includes BMP, PNG, JPEG, TIFF (both single and multi-page) and PDF (both single
and multi-page).
this.dynamicDotNetTwain1.SetViewMode(1, 1);
OpenFileDialog filedlg = new OpenFileDialog();
if (filedlg.ShowDialog() == DialogResult.OK)
{
foreach (string strfilename in filedlg.FileNames)
{
this.dynamicDotNetTwain1.LoadImage(strfilename);
}
}
3. Initialize
the OCR add-on and choose the language package.
1) Choose
the language package and define the path of the package by using the OCRTessDataPath
property.
Dynamsoft’s OCR SDK supports more than 40 languages,
including English, Spanish, Arabic and more. The sample code below chooses
English as the default language. Other language packages can be downloaded from
Dynamsoft’s website: OCR SDK
Language Packages
string languageFolder = Application.StartupPath;
this.dynamicDotNetTwain1.OCRTessDataPath = languageFolder;
this.dynamicDotNetTwain1.OCRLanguage = "eng";
2) Set
the path of DynamicOCR.dll or DynamicOCRx64.dll to initialize the OCR add-on.
this.dynamicDotNetTwain1.OCRDllPath = "";
3) Choose
the OCR result file format and save. Supported file format includes Text, PDF
Plain Text and PDF Image over Text. By setting the format to PDF Image over
Text, the detailed image/text position and format, such as font names, font
sizes, line widths and more, will keep as original.
this.dynamicDotNetTwain1.OCRResultFormat = (Dynamsoft.DotNet.TWAIN.OCR.ResultFormat)this.ddlResultFormat.SelectedIndex;
byte[] sbytes = this.dynamicDotNetTwain1.OCR(this.dynamicDotNetTwain1.CurrentSelectedImageIndicesInBuffer);
if(sbytes != null)
{
SaveFileDialog filedlg = new SaveFileDialog();
if (this.ddlResultFormat.SelectedIndex != 0)
{
filedlg.Filter = "PDF File(*.pdf)| *.pdf";
}
else
{
filedlg.Filter = "Text File(*.txt)| *.txt";
}
if (filedlg.ShowDialog() == DialogResult.OK)
{
FileStream fs = File.OpenWrite(filedlg.FileName);
fs.Write(sbytes, 0, sbytes.Length);
fs.Close();
}
}
else
{
MessageBox.Show(this.dynamicDotNetTwain1.ErrorString);
}
Distribution
To distribute the application to the end users, please copy
the following files to the client machine along with the EXE file.
The
language package
DynamicOCR.dll (for 32-bit Windows OS) and/or DynamicOCRx64.dll (for 64-bit
Windows OS)
DynamicDotNetTwain.dll
Xcopy deployment is also supported.
Resources
The complete source code of OCR can be downloaded from the
article. To test and/or customize the code, you can download the trial version
of Dynamic .NET TWAIN from Dynamsoft’s website.
Download
Dynamic .NET TWAIN 30-Day Free Trial
Other demos/samples of .NET image acquisition and processing
can be found here:
Dynamic
.NET TWAIN Demos
If you have any questions, you
can contact our support team at nettwain@dynamsoft.com.