As developers, we spend a lot of time just wiring up all the various components of our applications, which is not fun. This is true for me especially, which is annoying because my forte is working with and analyzing results, not shuffling around data. But in my current application, I can’t avoid this work, and the resulting code has been a kludge that is not stable, and just bad logic. I’ve been looking for new resources.
I had the opportunity to test out Warewolf and see if it would do the trick. Warewolf is a promising offering to help nimble front-end applications perform heavy backend tasks. This writeup is on my experience with the product.
First, I want to be clear about what Warewolf is and is not. Their site talks about microservices. I don't see the tool as being a way to build microservices really. Microservices implies a very specific way to plan and build your application, and has other elements such as infrastructure and containers to consider. It also has elements of Enterprise Service Bus (ESB), tool to build APIs, and job creation.
My final conclusions was that Warewolf fits into the category of workflow or node-based programming. Basically, you have a canvas-—various node types with specific configurations per type. You can connect the nodes, create logic between them, and the result is a runnable application. There are similar tools today both open-source and commercial, but their primary focus is on IoT and mobile application development, not enterprise application development.
This approach is great, because GUI makes things easier. And each canvas is a way to visualize logic holistically, instead of sifting through code to understand what is going on.
The Install
<img height="241px" src="1116165/image001.png" width="483px" />
The install of Warewolf was easy. At first I was excited to test out the Azure instance in the marketplace. (But that attempt did not work.)
<img height="437px" src="1116165/image002.png" width="496px" />
The minimum requirements for Warewolf include the use of an A2 Basic instance in Azure, which is roughly $200 a month. At this point I'm cheap, so I could not stop myself from trying an A0 (.25 core) instance instead. But that did not work. Microsoft should give the vendors a chance to say what instance type will and will not work, and hide those that won’t.
No problem, though. I moved to a local VM instead, downloaded the bits from the fully functional trial, and did a quick install. After install, all the services started for the server, and the console launched immediately. Then it was time to get my bearings.
Right off the bat, I noticed something that I cannot thank the Warewolf team enough for, and that is examples! They have a TON of ready-to-run examples; enough examples that you can pretty much demonstrate basic configuration on every node type and all common scenarios. I’m a tinkerer who hates reading manuals, so of course this helps—a lot.
<img height="736px" src="1116165/image003.png" width="376px" />
Example Selection
I spent some time going through the examples, and finally found one that was a good start for my application called File and Folder - MOVE. There was also a more advanced backup example which would work, but I wasn’t ready for it just yet. Because I could not find a way to duplicate the "File and Folder" project, I copied and pasted everything on the canvas to a new one, and started customizing.
My Application
The application I’m building is all about document transformation, indexing, and matching. Basically it takes documents of varying types and creates a correlation based on content similarities in order to build a relationship graph. The application takes a lot of input file types: documents, images, audio, and video files, but the logic is all based on text files, and each file type needs to be converted. The media files are the tricky ones. They need some additional transformation to make them useful, while the documents are a simple conversion. So for the media files, I need to use recognition technology in order to get enough metadata for analysis.
If you are familiar with optical character recognition (OCR) and speech recognition technologies, you know that they need a server with a lot of power. They suck up CPU and can sometimes take a long time to process a single file. While my application is currently working with recognition enabled, the setup is not great.
Currently, all files get copied to a shared location on the recognition server. They are scheduled for conversion twice a day. When converted, the results are serialized to Azure Blob store, indexed by Azure search, and the original deleted. There are a lot of things wrong with this:
- The user of the web application has to wait for the scheduled run (6-12 hours).
- Because I’m running all file processing as a batch, if something goes wrong with one file, it can kill the whole batch.
- I have to create special logic in a separate Azure WebJob just to keep track of when to record results.
- Due to unpredictable processing times, it is possible to miss results from a previous batch, which means two times the wait.
- There is no logic based on file type, which means I have to test the file type via the recognition server itself. If an audio file is uploaded, the OCR service tries first, and upon rejection sends it to the directory for the speech recognition service. (Big time waste.)
- Documents have the same delay as media files because they are in the same batch, but they should be processed immediately.
One option would be to have more of the application running on the recognition server, but this would cause latency for the web user, and is not scalable.
This is where Warewolf comes in. My idea: Use the tool to run continuously as files are uploaded to the recognition server. When a file is uploaded, I want a decision on what to do based on file type. When the results are produced, I want a POST back to my web application, which is listening for results.
So that’s what I built.
The nodes I used were pretty straightforward. I only got stuck when it came time to understand how RecordSets and assigning data sources work. Ultimately I worked around them, because I had a lot of issues keeping track of the set data. I also wanted to use the "For Each" node, but I could not figure out how to get a "decision" to be part of the loop’s logic. It was okay, because file frequency is not high, and because the flow is running regularly, the delay when a file is picked up is no longer than 15 minutes. Here is an example of one of the "Decision" node’s logic, looking at full file paths that contain ".jpg"; if true, they are sent the OCR route.
<img height="350px" src="1116165/image004.png" width="439px" />
The three node types I used were "Decision", "POST", "Copy", and "Read Folder." I have a feeling there is a better way to do this. I actually ended up creating two flows where I probably could have had just one. But first, I created the In Flow, for all documents uploaded through the web app, and the Out Flow, for all documents finally converted. Here is what the In Flow final logic looked like:
<img height="487px" src="1116165/image005.png" width="628px" />
And here are the JSON results of the Out Flow run:
<img height="100px" src="1116165/image006.png" width="628px" />
Now in the web application, the user sees their files in a queue, and the results come in asynchronously when an individual file is done—not all at once.
vNext
There is a lot more I want to do with these flows. So far, I still have to have an Azure WebJob, but this WebJob does not need to interact with the server. It only knows to take a result notification and index the file.
The Shared Resource Server feature comes next. This centralizes resources across flows. So if I wanted to, I could pass variables from the two flows, and then avoid any states in my web application. I could switch from a local file share to Dropbox, which is a built-in connector in Warewolf. If I chose to do this, I could leverage some features in the Dropbox API, avoid potentially large VMs, and should have faster data transfer speeds.
I could also leverage the feature for pulling system information. In the case of recognition technologies (which tend to suck up resources) I could use this feature to create advanced distribution logic for my processing. It could also give me greater processing feedback to present to the user.
I noticed some additional functionality that I really wanted to test, but did not have a use case for (yet). I also noticed a lot of integrations with SharePoint, which peaks my interest, as I’ve done a lot of SharePoint dev over the years. In fact, a recent application I wrote for automatic file sync and MMS population with SharePoint could have been done much faster with Warewolf. (I had to kill this application due to the pain of maintenance.)
Running
Running the application is simple. You hit the play button next to the project name, and you can either run in debug, or just in the browser. I found out that running these applications in Microsoft Edge does not work so well. I kept getting security issues, even when running on my local box, so I did all testing on Chrome. Also, it is good to note that while each project provides an application URL on the top of each canvas, this is not very useful for most projects, as it will not pass the output or input variables — so always use the play button.
I tried to do most of my testing outside of the Warewolf server to make sure I could hit the service from the Web. Because Warewolf is running on the same machine as the recognition services, all I had to do is schedule the server locally and tell the web app to listen for results. In a full-blown setup, you would want to have the Warewolf server separated, but on the same LAN, and have at least two Warewolf servers, one for dev, and one you would deploy to for the production flows.
A good way to test your running flows is to leverage a tool like Postman. This is fast, but also allows you to test calls coming from the web instead of just internal.
Up and Running
It took me about two days to complete the setup and integration with my web application. I would still call it a beta, but I am already seeing better results. My processing for an individual file is less than one hour, versus every 12. My core application’s logic is much less complex, and the processing of files is much more stable.
I think the application could use some UI/UX improvements. I also think that working with data sources could be streamlined. RecordSets, for example, took me a long time to understand. But nothing prevented me from getting my application built. And actually they have a strong set of documentation in their knowledge base.
One feature that took me a while to find, but I REALLY wanted was versioning. You can version your flows so that you can rollback as you make changes. An important feature.
After I completed my application I realized that my use case probably is not the best. The better scenarios are going to be if you need to create a collection of web services without building your own API from scratch or do not need a full-blown API.
The power of this approach
Warewolf is very useful for quickly creating background jobs, especially for line-of-business, or process-heavy applications. With the little amount of time I spent working with the product, I was able to improve the logic of my application, and offloaded some really boring parts of the process. The full list of benefits when taking this approach are:
- It’s easier to manage code outside the core application.
- It’s easier to manage job logic
- It’s easier to upgrade my logic
- It’s easier to share logic with the team: No code reviews, just flow reviews.
- Great debugging: You can’t get debugging like this for scripted jobs.
<img height="322px" src="1116165/image007.png" width="290px" />
Beyond the flow examples, I have to commend the Warewolf team and community (Did I mention it’s open source?) for not convoluting the product with a lot of specialized languages or features. (Some tools require classes, and you have to specialize in the tool to even begin.) Everything in Warewolf is enjoyably standard and obvious.
I don't know if Warewolf or flow-based programming is useful for all applications. But for background jobs of existing applications, or applications that are mostly logic-based with minimal user interaction (not to be mistaken for inputs), Warewolf is a great solution.
Things for Warewolf to Consider
- "Decision" only adds new criteria when I close and re-open.
- In toolbox, you can only drag an already selected object, even if different from the one you drag. It would be nice not to have to select each time first.
- When I click on an example, it should open it right away, instead of needing to click on an eyeball
- When I choose OpsSystem for the System information example, Warewolf crashes.
- It was not clear how to name a document.
- I would like to be able to duplicate projects.
- If I change the decision display text, but then add first value, it renames display text
- A snap grid on the canvas would be nice.
- Would be nice to use the Delete key on nodes.
- Did not understand why you can't put a "Decision" in a "For Each" block.
- Would be nice to have a run button in the canvas.
From Warewolf, in response to the Warewolf review published on Code Project.
Thank you for the write-up on Warewolf!
We consulted our technical team and would like to offer some input on the key issues raised in this review.
<img height="121px" src="1116165/image001a.jpg" width="602px" />
- Firstly, we felt it’s pretty important to mention that the URL at the top of the design surface does pass inputs and outputs as variables.
- Each service takes inputs and produces outputs if setup that way in the variable list.
- Each workflow is a service that can be called by another workflow (just drag and drop it on as a node) or call it from the web.
- The debug (play button) is for debugging.
In response to the points raised:
Things for Warewolf to Consider
- "Decision" only adds new criteria when I close and re-open.
- Thanks, this is a bug. Will be addressed for next release.
- In toolbox, you can only drag an already selected object, even if different from the one you drag. It would be nice not to have to select each time first.
- Agreed. Fixed for the next release.
- When I click on an example, it should open it right away, instead of needing to click on an eyeball
- Double clicking an item will open it. We like the single click idea and will put it in the backlog.
- When I choose OpsSystem for the System information example, Warewolf crashes.
- Wow. We can’t replicate this at all. Please send the crash report when it comes up!
- It was not clear how to name a document.
- Provide a name when saving. Potentially we should do this at creation. We will add to the backlog.
- I would like to be able to duplicate projects.
- Agreed. In the backlog already.
- If I change the decision display text, but then add first value, it renames display text
- A snap grid on the canvas would be nice.
- We like the easy positioning of the current design surface. Perhaps an option for either.
- Would be nice to use the Delete key on nodes.
- If the node is selected pressing Delete will remove it from the service.
- Did not understand why you can't put a "Decision" in a "For Each" block.
- This is by design. A Decision alters the flow of traffic and therefore cannot be executed multiple times in a single execution. To do this, create a service with the decision in it and drop the service into the “For Each”.
- Would be nice to have a run button in the canvas.
- You can also make use of shortcut keys to run and debug your flow. F6 debugs in the Studio (with the previous input values), F7 runs the workflow service from your browser.
We’re busy working on a much more user-friendly set of videos for our users to help decrease the learning curve even more. In the meantime, we have 2 really comprehensive user guides with step by step exercises so you can learn as you use Warewolf. Start here: https://warewolf.io/knowledge-base/warewolf-user-guide-1/
We really appreciate the input on some of the finer details – we love getting feedback! We’d love to invite all developers to participate in sharing their experience with Warewolf via our Community Forum.