Natural Language Toolkit (NLTK) is a powerful Python library for natural language processing (NLP) and machine learning. Popular cloud services offer some alternative NLP tools that use the same underlying concepts as NLTK.
The goal of this series on Sentiment Analysis is to use Python and the open-source Natural Language Toolkit (NLTK) to build a library that scans replies to Reddit posts and detects if posters are using negative, hostile or otherwise unfriendly language.
If you don’t want to build your own custom sentiment classifier, there are some alternatives to VADER.
Here's an overview of what some popular cloud platforms offer as alternatives to Natural Language Toolkit (NLTK), the popular Python library for natural language processing (NLP).
If you've followed through the NLP sentiment analysis articles we started in Introducing NLTK for Natural Language Processing, you've seen one established approach. The following overviews will show you what the interface and response look like for sentiment analysis on these cloud services. In many cases it's very similar to NLTK, just using the horsepower of someone else's computers..
Amazon Comprehend
Amazon Web Services (AWS) provides an Amazon Comprehend NLP service that includes a range of features analogous to some of what you’ll find in NLTK.
Similar to NLTK’s pos_tag
, the AWS service can identify parts of speech (POS) and tag them as proper names, places, and locations, and so forth. It has support for 100 languages that can be detected in unstructured text, and includes text summarization capabilities to identify and extract key phrases that contribute to the overall meaning of a given piece of text.
Here’s an example of what the client code looks like, if you wanted to call it:
import boto3
client = boto3.client('comprehend')
client.detect_sentiment(text='This is cool!')
And the response returned:
{
"SentimentScore": {
"Mixed": 0.03058,
"Positive": 0.94992,
"Neutral": 0.01415,
"Negative": 0.00893
},
"Sentiment": "POSITIVE",
"LanguageCode": "en"
}
This should look familiar. The way the values are calculated by the service may be different, but the response looks to follow a model similar to the VADER sentiment analysis we reviewed earlier. This means you can interpret the results in a similar way.
Microsoft Azure Text Analytics
Among Microsoft Azure’s cognitive services is a Text Analytics API. Similar to Amazon Comprehend, it provides NLP features like keyphrase extraction, language detection, and named entity recognition.
Microsoft provides an Azure Python SDK client library that simplifies the calls for sending text for sentiment analysis. In this example, we import the TextAnalyticsClient
functions from the SDK (azure.ai.textanalytics
):
from azure.ai.textanalytics import TextAnalyticsClient
text_analytics_client = TextAnalyticsClient(
endpoint=endpoint, credential=ta_credential)
documents = ["I’m using Azure. This is cool!"]
response = client.analyze_sentiment(documents = documents)[0]
The client library makes it easy to perform analysis with a single call to client.analyze_sentiment.
One difference with this client is the response scores not only the full document, but each sentence individually.
{
"documents": [
{
"id": "1",
"sentiment": "positive",
"documentScores": {
"positive": 0.98570585250854492,
"neutral": 0.0001625834556762,
"negative": 0.0141316400840878
},
"sentences": [
{
"sentiment": "neutral",
"sentenceScores": {
"positive": 0.0785155147314072,
"neutral": 0.89702343940734863,
"negative": 0.0244610067456961
},
"offset": 0,
"length": 12
},
{
"sentiment": "positive",
"sentenceScores": {
"positive": 0.98570585250854492,
"neutral": 0.0001625834556762,
"negative": 0.0141316400840878
},
"offset": 13,
"length": 36
}
]
},
}
There are useful samples on GitHub for using the Azure Python SDK and Cognitive Services APIs, including Text Analytics.
Google Cloud Natural Language
Google also provides a Cloud Natural Language service. In addition to similar features of entity analysis, content classification, multi-language support, and syntax analysis, you can use the AutoML custom machine learning model tools to create your own models. For custom sentiment analysis, this is an area of differentiation from some of the other providers.
Here’s a client example using the Google Cloud Client library for Python:
from google.cloud import language_v1
from google.cloud.language_v1 import enums
text_content = ‘This is cool!’
client = language_v1.LanguageServiceClient()
document = {"content": text_content, "type":
enums.Document.Type.PLAIN_TEXT, "language": ‘en’}
response = client.analyze_sentiment(document,
encoding_type=enums.EncodingType.UTF8)
```
The response includes a sentiment score along with a magnitude not only for the document as a whole, but each sentence.
Review
How you apply sentiment analysis in your project may come down to business objectives.
Perfect can be the enemy of good enough, so many projects could deliver quick wins by starting with a VADER sentiment analysis and stopping there. You could also get a lot of mileage using a cloud solution from Amazon, Microsoft, Google, or IBM.
If you want to fine-tune the results to achieve the best possible accuracy for your use case, a machine learning-based custom sentiment analysis workflow, like the one discussed in NLTK and Machine Learning for Sentiment Analysis, may be the right fit. It’s a more complex process with a fair amount of data wrangling, but may give you insights you hadn’t considered.
If you’re trying to manage a community around your content, products, or services, it might be better to look for an intensity of engagement that gives you actionable insights, rather than an aggregate sentiment analysis.
If you missed any of our articles covering NLP with NLTK and Python, start at the beginning with Introducing NLTK for Natural Language Processing.