It can be quite difficult tying together multiple systems, APIs, and third-party services. Recently, we faced this exact problem in-house, when we wanted to get data from Segment into MongoDB so we could take advantage of MongoDB's native analytics capabilities and rich query language. Using some clever tools we were able to make this happen in under an hour – the first time around.
While this post is detailed, the actual implementation should only take around 20 minutes. I'll start off by introducing our cast of characters (what tools we used to do this) and then we will walk through how we went about it.
The Characters
To collect data from a variety of sources including mobile, web, cloud apps, and servers, developers have been turning to Segment since 2011. Segment consolidates all the events generated by multiple data sources into a single clickstream. You can then route the data to over 200+ integrations all at the click of a button. Companies like DigitalOcean, New Relic, InVision, and Instacart all rely on Segment for different parts of their growth strategies.
To store the data generated by Segment, we turn to MongoDB Atlas – MongoDB's database as a service. Atlas offers the best of MongoDB:
- A straightforward query language that makes it easy to work with your data
- Native replication and sharding to ensure data can live where it needs to
- A flexible data model that allows you to easily ingest data from a variety of sources without needing to know precisely how the data will be structured (its shape)
All this is wrapped up in a fully managed service, engineered and run by the same team that builds the database, which means that as a developer you actually can have your cake and eat it too.
The final character is MongoDB Stitch, MongoDB's serverless platform. Stitch streamlines application development and deployment with simple, secure access to data and services – getting your apps to market faster while reducing operational costs. Stitch allows us to implement server-side logic that connects third-party tools like Segment, with MongoDB, while ensuring everything from security to performance is optimized.
Order of Operations
We are going to go through the following steps. If you have completed any of these already, feel free to just cherry pick the relevant items you need assistance with:
- Setting up a Segment workspace
- Adding Segment's JavaScript library to your frontend application – I've also built a ridiculously simple HTML page that you can use for testing
- Sending an event to Segment when a user clicks a button
- Signing up for MongoDB Atlas
- Creating a cluster, so your data has somewhere to live
- Creating a MongoDB Stitch app that accepts data from Segment and saves it to your MongoDB Atlas cluster
While this blog focusses on integrating Segment with MongoDB, the process we outline below will work with other APIs and web services. Join the community slack and ask questions if you are trying to follow along with a different service.
Each time Segment sees new data a webhook fires an HTTP Post request to Stitch. A Stitch function then handles the authentication of the request and, without performing any data manipulation, saves the body of the request directly to the database – ready for further analysis.
Setting up a Workspace in Segment
Head over to Segment.com and sign up for an account. Once complete, Segment will automatically create a Workspace for you. Workspaces allow you to collaborate with team members, control permissions, and share data sources across your whole team. Click through to the Workspace that you've just created.
To start collecting data in your Workspace, we need to add a source. In this case, I'm going to collect data from a website, so I'll select that option, and on the next screen, Segment will have added a JavaScript source to my workspace. Any data that comes from our website will be attributed to this source. There is a blue toggle link I can click within the source that will give me the code I need to add to my website so it can send data to Segment. Take note of this as we will need it shortly.
Adding Segment to your Website
I mentioned a simple sample page I had created in case you want to test this implementation outside of other code you had been working on. You can grab it from this GitHub repo.
In my sample page, you'll see I've copied and pasted the Segment code and dropped it in between my page's <head>
tags. You'll need to do the equivalent with whatever code or language you are working in.
If you open that page in a browser, it should automatically start sending data to Segment. The easiest way to see this is by opening Segment in another window and clicking through to the debugger.
Clicking on the debugger button in the Segment UI takes you to a live stream of events sent by your application.
Customizing the events you send to Segment
The Segment library enables you to get as granular as you like with the data you send from your application.
As your application grows, you'll likely want to expand the scope of what you track. Best practice requires you to put some thought into how you name events and what data you send. Otherwise different developers will name events differently and will send them at different times – read this post for more on the topic.
To get us started, I'm going to assume that we want to track every time someone clicks a favorite button on a web page. We are going to use some simple JavaScript to call Segment's analytics tracking code and send an event called a "track" to the Segment API. That way, each time someone clicks our favorite button, we'll know about it.
You'll see at the bottom of my web page, that there is a jQuery function attached to the .btn
class. Let's add the following after the alert()
function.
analytics.track("Favorited", {
itemId: this.id,
itemName: itemName
});
Now, refresh the page in your browser and click on one of the favorite buttons. You should see an alert box come up. If you head over to your debugger window in Segment, you'll observe the track event streaming in as well. Pretty cool, right!
You probably noticed that the analytics code above is storing the data you want to send in a JSON document. You can add fields with more specific information anytime you like. Traditionally, this data would get sent to some sort of tabular data store, like MySQL or PostgreSQL, but then each time new information was added you would have to perform a migration to add a new column to your table. On top of that, you would likely have to update the object-relational mapping code that's responsible for saving the event in your database. MongoDB is a flexible data store, that means there are no migrations or translations needed, as we will store the data in the exact form you send it in.
Getting Started with MongoDB Atlas and Stitch
As mentioned, we'll be using two different services from MongoDB. The first, MongoDB Atlas, is a database as a service. It's where all the data generated by Segment will live, long-term. The second, MongoDB Stitch, is going to play the part of our backend. We are going to use Stitch to set up an endpoint where Segment can send data, once received, Stitch validates that the request Stitch was sent from Segment, and then coordinate all the logic to save this data into MongoDB Atlas for later analysis and other activities.
First Time Using MongoDB Atlas?
Click here to set up an account in MongoDB Atlas.
Once you've created an account, we are going to use Atlas's Cluster Builder to set up our first cluster (every MongoDB Atlas deployment is made up of multiple nodes that help with high availability, that's why we call it a cluster). For this demonstration, we can get away with an M0 instance – it's free forever and great for sandboxing. It's not on dedicated infrastructure, so for any production workloads, its worth investigating other instance sizes.
When the Cluster Builder appears on screen, the default cloud provider is AWS, and the selected region is North Virginia. Leave these as is. Scroll down and click on the Cluster Tier section, and this will expand to show our different sizing options. Select M0 at the top of the list.
You can also customize your cluster's name, by clicking on the Cluster Name section.
Once complete, click Create Cluster. It takes anywhere from 7-10 minutes to set up your cluster so maybe go grab a drink, stretch your legs and come back… When you're ready, read on.
Creating a Stitch Application
While the Cluster is building, on the left-hand menu, click Stitch Apps. You will be taken to the stitch applications page, from where you can click Create New Application.
Give your application a name, in this case, I call it "SegmentIntegration" and link it to the correct cluster. Click Create.
Once the application is ready, you'll be taken to the Stitch welcome page. In this case, we can leave anonymous authentication off.
We do need to enable access to a MongoDB collection to store our data from Segment. For the database name I use "segment", and for the collection, I use "events". Click Add Collection.
Next, we will need to add a service. In this case, we will be manually configuring an HTTP service that can communicate over the web with Segment's service. Scroll down and click Add Service.
You'll jump one page and should see a big sign saying, "This application has no services"… not for long. Click Add a Service… again.
From the options now visible, select HTTP and then give the service a name. I'll use "SegmentHTTP". Click Add Service.
Next, we need to add an Incoming Webhook. A Webhook is an HTTP endpoint that will continuously listen for incoming calls from Segment, and when called, it will trigger a function in Stitch to run.
Click Add Incoming Webhook
- Leave the default name as is and change the following fields:
- Turn on Respond with Result as this will return the result of our insert operation
- Change Request Validation to "Require Secret as Query Param"
- Add a secret code to the last field on the page. Important Note: We will refer to this as our "public secret" as it is NOT protected from the outside world, it's more of a simple validation that Stitch can use before running the Function we will create. Shortly, we will also define a "private secret" that will not be visible outside of Stitch and Segment.
Finally, click "Save".
Define Request Handling Logic with Functions in Stitch
We define custom behavior in Stitch using functions, simple JavaScript (ES6) that can be used to implement logic and work with all the different services integrated with Stitch.
Thankfully, we don't need to do too much work here. Stitch already has the basics set up for us. We need to define logic that does the following things:
- Grabs the request signature from HTTP headers
- Uses the signature to validate the requests authenticity (i.e., it came from Segment)
- Write the request to our
segment.events
collection in MongoDB Atlas
Getting an HTTP Header and Generating an HMAC Signature
Add the following to line 8, after the curly close brace }.
const signature = payload.headers['X-Signature'];
And then use Stitch's built-in Crypto library to generate a digest that we will compare with the signature.
const digest = utils.crypto.hmac(payload.body.text(), context.values.get("segment_shared_secret"), "sha1", "hex");
A lot is happening here so I'll step through each part and explain. Segment signs requests with a signature that is a combination of the HTTP body and a shared secret. We can attempt to generate an identical signature using the utils.crytop.hmac
function if we know the body of the request, the shared secret, the hash function Segment uses to create its signatures, and the output format. If we can replicate what is contained within the X-Signature header from Segment, we will consider this to be an authenticated request.
Note: This will be using a private secret, not the public secret we defined in the Settings page when we created the webhook. This secret should never be publicly visible. Stitch allows us to define values that we can use for storing variables like API keys and secrets. We will do this shortly.
Validating that the Request is Authentic and Writing to MongoDB Atlas
To validate the request, we simply need to compare the digest
and the signature
. If they're equivalent, then we will write to the database. Add the following code directly after we generate the digest
.
if (digest == signature) {
} else {
console.log("Request is invalid");
}
Finally, we will augment the if statement with the appropriate behavior needed to save our data. On the first line of the if statement, we will get our "mongodb-atlas" service. Add the following code:
let mongodb = context.services.get("mongodb-atlas");
Next, we will get our database collection so that we can write data to it.
let events = mongodb.db("segment").collection("events");
And finally, we write the data.
events.insertOne(body);
Click the Save button on the top left-hand side of the code editor. At the end of this, our entire function should look something like this:
exports = function(payload) {
var queryArg = payload.query.arg || '';
var body = {};
if (payload.body) {
body = JSON.parse(payload.body.text());
}
const signature = payload.headers['X-Signature'];
const digest = utils.crypto.hmac(payload.body.text(),
context.values.get("segment_shared_secret"), "sha1", "hex");
if (digest == signature) {
let mongodb = context.services.get("mongodb-atlas");
let events = mongodb.db("segment").collection("events");
events.insertOne(body);
} else {
console.log("Digest didn't match");
}
return queryArg + ' ' + body.msg;
};
Defining Rules for a MongoDB Atlas Collection
Next, we will need to update our rules that allow Stitch to write to our database collection. To do this, in the left-hand menu, click on "mongodb-atlas".
Select the collection we created earlier, called "segment.events
". This will display the Field Rules for our Top-Level Document. We can use these rules to define what conditions must exist for our Stitch function to be able to Read or Write to the collection.
We will leave the read rules as is for now, as we will not be reading directly from our Stitch application. We will, however, change the write rule to "evaluate" so our function can write to the database.
Change the contents of the "Write" box:
- Specify an empty JSON document {} as the write rule at the document level.
- Set Allow All Other Fields to Enabled, if it is not already set.
Click Save at the top of the editor.
Adding a Secret Value in MongoDB Stitch
As is common practice, API keys and passwords are stored as variables, meaning they are never committed to a code repo – visibility is reduced. Stitch allows us to create private variables (values) that may be accessed only by incoming webhooks, rules, and named functions.
We do this by clicking Values on the Stitch menu, clicking Create New Value, and giving our value a name – in this case segment_shared_secret
(we will refer to this as our private secret). We enter the contents in the large text box. Make sure to click Save once you're done.
Getting Our Webhook URL
To copy the webhook URL across to Segment from Stitch, navigate using the Control menu: Services > SegmentHTTP > webhook0 > Settings (at the top of the page). Now copy the "Webhook URL".
In our case, the Webhooks looks something like this:
https:
Adding the Webhook URL to Segment
Head over to Segment and log in to your workspace. In destinations, we are going to click Add Destination.
Search for Webhook in the destinations catalog and click Webhooks. Once through to the next page, click Configure Webhooks. Then select any sources from which you want to send data. Once selected, click Confirm Source.
Next, we will find ourselves on the destination settings page. We will need to configure our connection settings. Click the box that says Webhooks (max 5).
Copy your webhook URL from Stitch, and make sure you append your public secret to the end of it using the following syntax:
Initial URL:
https:
Add the following to the end: ?secret=<YOUR_PUBLIC_SECRET_HERE>
Final URL:
https:
Click Save
We also need to tell Segment what our private secret is so it can create a signature that we can verify within Stitch. Do this by clicking on the Shared Secret field and entering the same value you used for the segment_shared_secret
. Click Save.
Finally, all we need to do is activate the webhook by clicking the switch at the top of the Destination Settings page:
Generate Events, and See Your Data in MongoDB
Now, all we need to do is use our test HTML page to generate a few events that get sent to Segment – we can use Segment's debugger to ensure they are coming in. Once we see them flowing, they will also be going across to MongoDB Stitch, which will be writing the events to MongoDB Atlas.
We'll take a quick look using Compass to ensure our data is visible. Once we connect to our cluster, we should see a database called "segment". Click on segment and then you'll see our collection called "events". If you click into this you'll see a sample of the data generated by our frontend!
The End
Thanks for reading through – hopefully you found this helpful. If you're building new things with MongoDB Stitch we'd love to hear about it. Join the community slack and ask questions in the #stitch channel!