Introduction
Cloud service providers like Microsoft® Windows® Azure, Amazon®
Web Services, Heroku®, Rackspace® and Google® App Engine make it fast and easy
to deploy and run applications in the cloud. However, running applications in
the cloud comes with a potential loss of visibility and control. While logs and
metrics can provide some insight into how your application is doing, this data
is often in different formats scattered in different places, making it
difficult to gain full visibility into your operations.
How can you best troubleshoot, monitor and proactively manage
your applications in the cloud?
The Solution
Splunk Storm provides a technology-agnostic approach to
monitoring and managing cloud applications. Harness data from every tier of
your application and trace transactions across multiple hops, correlate
application events with infrastructure or user experience issues and
proactively prevent outages from impacting the business. Splunk Storm,
delivering the industry-leading Splunk software as a service, indexes and
stores machine data in real time from virtually any source, format, platform,
or cloud provider without the need for custom parsers or connectors. Whether
your application is written in Ruby, Java, Python, PHP, .NET, Node.js or any
other language or framework, send data to Splunk Storm using network streams
such as syslog, using a REST API, or via the universal forwarder for indexing
and searching.
Let’s Walk Through the Example
We’ll put some sample Apache access and MySQL logs into
Splunk Storm and see how quickly we can troubleshoot issues. First, a little
setup:
-
Sign-up for Splunk Storm at https://www.splunkstorm.com/
-
Create a new project
-
Now add data to your new project. While you can also send data
to Splunk Storm projects over TCP/UPD (including syslog), via a REST API or via
a Splunk Universal Forwarder, we’ll manually upload some sample data for this
exercise. First thing we’ll do is download the sample data (Apache Web server
logs and MySQL database logs from a hypothetical online flower shop) from sampledata.zip.
-
Once you download and unzip sampledata.zip,
you'll see three folders each with an Apache "access_combined" log
file inside and one Mysql folder with a MySQL log file inside it.
-
No back in the browser, you’ll see the Inputs page
-
Click on Files
-
Click the "Upload" button
-
Browse to the access_combined.log file in apache1.splunk.com and
choose it.
-
For each of the log files, choose a source type. Specifying a
source type tells Storm how to parse your data, and allows you to group all the
data of a certain type together when searching. When you add your own data to
Storm, you'll want to specify the right source type so that Storm extracts
timestamps and linebreaks your data correctly.
-
For the Apache access logs, choose Apache web access logs
and click the "Upload" button.
-
For the MySQL log file, choose Generic single-line data
and click the "Upload" button.
-
Repeat the upload process until all of the sample data is in your
project
- Once the data is added to your project, the "Explore data"
button will become enabled. click it!
- Now that you have the data in a Splunk Storm project, see how
quickly you can troubleshoot your applications. Let’s say you receive a call
from a customer who keeps hitting a server error when he tries to complete a
purchase on your company’s online flower shop. He gives you his IP address –
10.2.1.44
-
Everything in Splunk Storm is searchable, so you just type "10.2.1.44"
into the search bar, hit enter, and you will see all of this customer's traffic
to your shop.
-
So, you see a lot of 200 response codes, but you're only
interested in errors. Filter out any event that's not a 200 success response by
typing "NOT 200", narrowing down the list of events.
-
Notice that each of the events appears on the timeline below the
search bar. And double-clicking on the bars re-runs the search over a smaller
time-range. So if you drill down a bit you can get to the one-minute window when
one of the errors occurred.
-
Knowing the time-window of one of the errors, you broaden the
search to include everything that was happening in the application around the
same time... and you get a handful of database errors, which are pretty good
leads to the root cause of the server error our customer is seeing.
-
Finding the root cause of issues using production logs saves long
hours of trying to reproduce bugs in a development environment so you can fix
issues quickly and keep your users happy.