About the CData JDBC Driver for Twitter
The CData JDBC Driver for Twitter allows users to work with data from Twitter using standard SQL statements. With the Twitter Driver, you can easily connect to Twitter in Java applications, as well as BI, reporting, and ETL tools that support the JDBC standard. The CData Driver supports working with Tweets, direct messages, followers, replies, lists, and more.
Download and Install the JDBC Driver for Twitter
First, download the Twitter JDBC Driver: (click here for a 30-day free trial). Follow the instructions in the installer to complete the installation.
Connect to Twitter with JDBC in a Java IDE
Choose your favorite Java IDE (Eclipse, IntelliJ, NetBeans, etc.) or Java tool that supports JDBC and use the native JDBC support to create a connection to Twitter. For this article, we use NetBeans.
- Register a new JDBC driver.
- Driver: The driver JAR file (typically found in C:\Program Files\CData\CData JDBC Driver for Twitter\lib)
- Driver Class: cdata.jdbc.twitter.TwitterDriver
- Name: Any (we use CDataTwitter)
- Click "OK."
- Create a new connection. Select the newly registered driver and enter the authentication parameters in the JDBC URL.
For example: jdbc:twitter:InitiateOAuth=GETANDREFRESH;...;
The JDBC Driver for Twitter uses OAuth for authentication. To learn more, refer to the Twitter JDBC Driver Help Documentation.
- After testing the connection, click "Finish" to establish the connection and begin querying Twitter.
Query Twitter
To query Twitter for the list of users we want, we need to create a SQL query to represent the question we have asked. We will break down our question into separate parts first and combine the separate queries to find the users we want.
Find Users Who Are Not Following You
The JDBC Driver for Twitter exposes a Follower view, which is a list of Twitter accounts that are following your account. There is a Following
column which is a Boolean value that can be used to find a list of accounts that you have followed but are not following you back:
SELECT ID, Screen_Name, Following FROM Followers WHERE Following = 'false';
Find Tweets About Specific Content
In the Tweets table, you can get Tweets as records. Use a pseudo-column called SearchTerms
to get Tweets that contain a specific string. Since all Tweets are searched, it is important to extract only Tweets containing the target character string (in order to avoid hitting the Twitter API rate limits).
SELECT From_User_ID, From_User_Name, Text FROM Tweets WHERE SearchTerms = 'JDBC'
If your query still exceeds the rate limit, add LIMIT 10000
to the SQL statement to reduce the number of results.
Combine the SQL Queries
Once we have parsed the separate parts of the question, we can combine them into a single query. Below, we use a sub-select query to find all of the Tweets from our followers that contain the string "JDBC." From the results, we select those accounts that are not following our account.
SELECT * FROM
(
SELECT
Tweets.From_User_ID,
Tweets.From_User_Name,
Tweets.From_User_Screen_Name,
Followers.Following,
Tweets.Text
FROM
Tweets
LEFT OUTER JOIN
Followers
ON
Tweets.From_User_ID = Followers.ID
WHERE Tweets.SearchTerms = 'JDBC'
)
WHERE Following = false;
Additional Restrictions
As mentioned above, SearchTerms
in the Tweets table is useful because it allows you to extract only a specific character string from many Tweets. If you put a LIKE
condition in the Text
column (the main text of a given Tweet), the API will query the entire timeline, hitting the API rate limit immediately. The same is true for the Hashtags
column. As a workaround, write a LIMIT in the query and further filter by the Text
and Hashtags
columns.
A limitation of the SearchTerms
pseudo-column is that you can only get the most recent Tweets. This is a limitation of the Twitter API. By default, Twitter supports returning only Tweets from searches of up to 7 days. However, customers with premium accounts can retrieve older Tweets by running a 30-day or full-archive search.
To help limit your results, search for content that is relevant for a short time period, such as event-specific hashtags or phrases. Otherwise, set up a program to query Twitter every 7 days to get complete results.
NOTE: SearchTerms
cannot be queried in combination with columns such as User_Id
. To work around this restriction, use an OUTER JOIN
to combine the result sets.
Further Querying
If you want to see distinct results based on follower ID, you can use the following query:
SELECT * FROM
(
SELECT
Tweets.From_User_ID,
Tweets.From_User_Name,
Tweets.From_User_Screen_Name,
Followers.Following
FROM
Tweets
LEFT OUTER JOIN
Followers
ON
Tweets.From_User_ID = Followers.ID
WHERE
Tweets.SearchTerms = 'JDBC'
GROUP BY
Tweets.From_User_ID
)
WHERE Following = false ;
Summary
With the CData JDBC Driver for Twitter, you can use SQL to work with Twitter data. Download a free, 30-day trial and start working with your Twitter data today. For more on using SQL to access 150+ other SaaS, Big Data, and NoSQL data sources, explore our JDBC Drivers.