Here we build on the app we’ve created by adding a page that lets the user upload a scanned image of a form. Then, we demonstrate how to take the data returned by the API and save it to the database.
Synopsis
This three-article series demonstrates how to use Azure Form Recognizer to build a realistic end-to-end form recognition app using Java and Spring Boot.
- Part 1 — App Creation and Deployment via Azure App Service
- Part 2 — Adding Image Upload to the Spring Boot App and Processing via Form Recognizer
- Part 3 — Making Practical Use of Data Returned by Form Recognizer
The complete code repository is available on GitHub.
Introduction
We created our project and infrastructure in the first article in this series.
Here in the second article, we’ll modify the app to handle image uploads. In our case, the images contain receipts that will be sent to the Azure Form Recognizer for processing. We’ll then store three of the recognized fields (MerchantName
, TransactionDate
, and Total
) in our PostgreSQL database, which we prepared in Part 1.
After completing the steps in this article, our app will be able to recognize receipts and output the selected fields to the terminal.
Our application will be capable of applying machine learning to solve a real-world business problem. Usually, business travelers need to provide scans of their receipts to receive reimbursement. Often, these scans are not processed automatically. Instead, accountants need to enter the data into another system. We shorten this process by automating receipt recognition and data ingestion. Let’s continue our app development to achieve this.
Setting up the Form Recognizer
First, we need to provision Form Recognizer in Azure. You can do this using the Azure CLI or Azure Portal. We’ll use Azure Portal. We use the search box to look up the Form Recognizers, and then click Create form recognizer button in the center of the Applied AI Services view. It will open another wizard:
In the Create Form Recognizer form, follow these steps:
- Select DB-General as your Subscription.
- Select recognition as your Resource group.
- Choose an Azure Region for your instance. We set our region to East US.
- Enter the globally unique name you set for your app when deploying to the Azure App Service in Part 1. We set our app name to
db-receipt-recognizer-82
. - Select a pricing tier. We set our tier to Free F0.
The Free F0 pricing tier enables us to use Azure Form Recognizer for free with a restriction of 500 pages per month. This is more than enough for our proof of concept. When setting up the Form Recognizer, ensure that you can access it from any network.
After the service is provisioned, navigate to Keys and Endpoint under Resource Management.
Then, open application.properties
and add the last two lines of code in the following example:
logging.level.org.springframework.jdbc.core=DEBUG
spring.datasource.url=<YOUR_POSTGRES_URL>
spring.datasource.username=<YOUR_USERNAME>
spring.datasource.password=<YOUR_PASSWORD>
spring.jpa.hibernate.ddl-auto=update
server.port=80
azure.form.recognizer.key=<YOUR_FORM_RECOGNIZER_KEY>
azure.form.recognizer.endpoint=<YOUR_FORM_RECOGNIZER_ENDPOINT>
This stores your endpoint and one of your two keys.
Finally, supplement the dependencies group of your pom.xml file to include the azure-sdk-bom
and azure-ai-formrecognizer
packages:
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-sdk-bom</artifactId>
<version>1.1.1</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-ai-formrecognizer</artifactId>
<version>3.1.8</version>
</dependency>
Image Upload View
Let’s now modify the upload.html view.
In resources/templates/upload.html, add the code between the <!-- comment -->
tags in the following example to create a form that enables users to upload their images:
<body class="w3-black">
<header class="w3-container w3-padding-32 w3-center w3-black">
<h1 class="w3-jumbo">Form recognizer</h1>
<p>Upload a new file</p>
</header>
<!--
<form class="w3-container w3-padding-32"
method="POST"
enctype="multipart/form-data"
action="/upload"
style="margin:auto;width:40%">
<div class="w3-container" >
<input type="file" name="file" />
</div>
<div class="w3-container">
<input type="submit" value="Upload"/>
</div>
</form>
<!--
</body>
</html>
After rendering, the upload view looks like this:
Here, we only have one form element, which allows us to upload an image. In a real-world scenario, we might supplement the form with elements like a field to submit the user’s name, or proper validation to limit the image size, image format, et. al.
Handling Image Upload
Next, we need to implement the logic that handles the image upload on the controller end. This requires us to send the image for recognition after authorization within the Azure Form Recognizer service.
To make our code more generic, let's use Azure Form Recognizer’s key and endpoint from application.properties
. To obtain these values at runtime, we’ll modify the FormRecognitionController
by adding two fields. You can refer to the companion code to see the full class.
In FormRecognitionController
class, add the following code:
@Value("${azure.form.recognizer.key}")
private String key;
@Value("${azure.form.recognizer.endpoint}")
private String endpoint;
These two fields use the @Value
attribute to retrieve the corresponding values from the application.properties
. Note that we only need to provide the property name as the argument of the @Value
attribute.
Then, we need to implement the controller’s method, handleFileUpload
, which will be invoked whenever the user submits the image upload form.
Add the following code to FormRecognitionController
:
@PostMapping("/upload")
public String handleFileUpload(@RequestParam("file") MultipartFile file) {
FormRecognizerClient formRecognizerClient = new FormRecognizerClientBuilder()
.credential(new AzureKeyCredential(key))
.endpoint(endpoint)
.buildClient();
try (InputStream receiptImage = file.getInputStream()) {
SyncPoller<FormRecognizerOperationResult, List<RecognizedForm>> syncPoller =
formRecognizerClient.beginRecognizeReceipts(receiptImage, file.getSize());
List<RecognizedForm> recognizedForms = syncPoller.getFinalResult();
if(recognizedForms.size() >= 1) {
final RecognizedForm recognizedForm = recognizedForms.get(0);
RecognitionResult recognitionResult = ExtractFormFields(file, recognizedForm);
resultsRepository.save(recognitionResult);
System.out.println("\n\n--== Recognition result ==--\n\n"
+ recognitionResult.toString());
}
} catch (IOException e) {
e.printStackTrace();
}
return "index";
}
First, the method creates an instance of the FormRecognizerClient
using the FormRecognizerClientBuilder
. The client builder requires our Azure Form Recognizer credentials, which we already have available to retrieve from the key
and endpoint
fields.
Then, the handleFileUpload
method starts an asynchronous receipt recognition. To that end, we call beginRecognizeReceipts
of the FormRecognizerClient
class instance. The beginRecognizeReceipts
method takes two arguments: the input stream containing the uploaded image, and the file size.
On the back end, beginRecognizeReceipts
sends our image to Azure Form Recognizer for processing. The underlying process uses the default, pre-trained machine learning model, which recognizes specific elements of receipt images. In this case, we only pass one image, but you can send multiple images at once. When beginRecognizeResults
completes, we need to retrieve and interpret the recognition results.
Retrieving Recognition Results
The beginRecognizeResults
method returns a collection of instances of the RecognizedForm
class. The RecognizedForm
class has several properties, but we are mostly interested in the fields
member:
private final Map<String, FormField> fields;
This member stores the names of recognized elements and their values as instances of the FormField class. The FormField
class has several members used to interpret the recognized element of our receipt:
public final class FormField {
private final float confidence;
private final FieldData labelData;
private final String name;
private final FieldValue value;
private final FieldData valueData;
}
As we can see, we can optionally retrieve the field name, value, and even the prediction confidence. This is particularly important if the receipt image is low quality and causes poor or unreliable predictions. In a case like this, we might want to reject recognitions that have confidences below a certain threshold and then inform our submitter to re-upload the image.
To extract the actual values of the recognized fields, you can proceed in one of two ways.
The first approach is to use the strongly typed Receipt class. Its constructor takes an instance of RecognizedForm
and creates an object with properties that are mapped to the corresponding elements in the receipt image:
public final class Receipt {
private List<ReceiptItem> receiptItems;
private ReceiptType receiptType;
private TypedFormField<String> merchantName;
}
The other approach is to parse the form fields manually:
Map<String, FormField> recognizedFields = recognizedForm.getFields();
FormField totalField = recognizedFields.get("Total");
if (totalField != null) {
if (FieldValueType.FLOAT == totalField.getValue().getValueType()) {
recognitionResult.setTotal(totalField.getValue().asFloat());
}
}
For this tutorial, we’ll combine both approaches in handleFileUpload
by using the ExtractFormFields
method. To do this, we need to supplement the project with Receipt.java
. We use this class to extract the MerchantName
and TransactionDate
fields, and we parse the form fields manually to extract the Total
field.
The ExtractFormFields
method looks like this:
private RecognitionResult ExtractFormFields(MultipartFile file,
final RecognizedForm recognizedForm) {
RecognitionResult recognitionResult = new RecognitionResult();
Receipt receipt = new Receipt(recognizedForm);
recognitionResult.setReceiptFileName(file.getOriginalFilename());
recognitionResult.setMerchantName(receipt.getMerchantName().getValue());
recognitionResult.setTransactionDate(receipt.getTransactionDate().getValue());
Map<String, FormField> recognizedFields = recognizedForm.getFields();
FormField totalField = recognizedFields.get("Total");
if (totalField != null) {
if (FieldValueType.FLOAT == totalField.getValue().getValueType()) {
recognitionResult.setTotal(totalField.getValue().asFloat());
}
}
return recognitionResult;
}
The ExtractFormFields
helper method returns an instance of the RecognitionResult
class, which we implemented in Part 1 of this tutorial. Once we have an instance of the RecognitionResult
, we store it in the database. You can refer to the handleFileUpload
method in the FormRecognitionController
in the companion code.
In FormRecognitionController
, add the following code:
resultsRepository.save(recognitionResult);
To test the solution, compile your app using mvn clean install
. Then, run your app using mvn spring-boot:run
.
Next, go to the upload view to recognize a receipt image. We are using the example image in the images/ folder in the companion GitHub repository. The recognition result appears in the output window of your IDE.
Summary
In this part of the tutorial, we learned how to extend the Spring Java app with image uploads. We added the form and controller method we need for this functionality.
Additionally, we used two different approaches to upload receipt images, which were sent for recognition to an instance of the Azure Form Recognizer. The recognition results were stored in the PostgreSQL database deployed to Azure.
In the final part of this tutorial, we will use the display of the recognition results.
To learn more tips for the easiest ways to deliver Java code to Azure and other clouds, check out the webinar Azure webinar series - Delivering Java to the Cloud with Azure and GitHub.