
Twitter is a wonderful service, but, until now, you have to subscribe to some websites to be alerted when a selected word (maybe your trademark) is tweeted. We’ll try to develop a service that filters the tweeter api, stores the interesting ones in our database, and show them in the browser in real time.
If you want to try it, watch it in action, grab the code or read on…
We’ll take twitter real time results for a given word (or words) and visualize them in a browser window, like monitter.com, but on our own servers and a bit more automatic. This is going to be useful to get our own alerts and work with them.
We are going to use the twitter stream API to get the comments containing a given word. Since the results are given in Json format, we’ll need to filter the data, take the interesting bits and store them. From the other side, we’re going to write a bit of javascript to read the data from the server into a web page using ajax. We could use other technologies, like comet, to push the data to the browser, but, while it would be a much cleaner implementation, I think I’m not ready to write about that yet (check this space
). We need to decouple the reading of the stream from twitter and the serving to our clients because we can only have a single active instance of the twitter streaming api at a given time, and we’re going to leave the listening proccess on for a long time using a daemon/service approach. On top of that, we want to store the tweets in a database for later perusal and data mining.
We are trying to make a useful system. To do this, we’ll need some tools:
Twitter has published a Streaming API that’s described as “The Twitter Streaming API allows near-realtime access to various subsets of Twitter public statuses”. In fact, this is jus what we need. You can read all the documentation at https://twitterapi.pbworks.com/Streaming-API-Documentation, but I’ll try to take the interesting parts for this project so you don’t need to yet.
We are going to use just one method (status/filter) to get results including one or several words. This method can return a stream of data in xml or json formats, has to be called using POST and can get some parameters. You can use it from the command line if you have access to some kind of unix in the following way:
curl -d 'track=google' http://stream.twitter.com/1/statuses/filter.json -uuser:password
Where user and password are your twitter credentials.
It should return something like:
Until you exit it (with CTRL+C) . This is a Json stream and can be read and parsed by several means. In fact, it’s eval-uable javascript code that we could read from the browser. But right now, we’re going to use a server-side language to read it and work with the interesting parts.
As I told you in the tools section, we are going to use PHP as our server side language. For the first part, the reading of the twitter stream, we don’t even need a web server, since we can run it from the command line. And if we run it from the command line, we can convert it to a kind of daemon/service and leave it on for a long time. But first, some code:
This is (almost) the simplest code that delivers what we want. A php formatted stream of tweets that include our marked word. If you run it from your command line, it’ll show something like:
Let’s look at the code:
We create an $opts array (in fact an array of arrays) that contain the parameters. In this particular case, we’re using two, the POST method and the search line (track=google). Then, we can treat the twitter stream as a file, using stream_context_create and fopen and just start reading lines. Each line is going to be a JSON encoded tweet, similar to what we’ve seen when we called the API from command line. Since we want to use the contents as easily as possible, we’ll need the json_decode function to parse it into PHP objects, print them and call flush just in case we’re calling the script from a browser.
The best way to store the results for later perusal is a database. I’m using MySQL but any other database should be OK. To be able to store the data, we need to create a database with a single table.
We are only going to store the following data:
We could store several other fields, and a complete solution should probably take into account that you can have some different tweet types, but for the time, these four fields should suffice.
To create the table, we can run the following SQL script from the server:
After that, we will have a single table database waiting for us to fill it with tweets… Let’s go:
storing_tweets_in_the_database.php
It’s as ugly as sin but it works. If you run it from the command line, it should start storing tweets in your database and keep on until you stop it. So, our database is starting to fill with tweets concerning our desired word. Now we need to be able to navigate them…
Now we need to publish the tweets in our browser. To do that, we need a small PHP script that returns the tweets when called. If we call it with a parameter start it’ll return all the tweets with an id bigger than that. Otherwise, it’ll return the last ten tweets stored in our database. To do that, we will use two different queries, the first one to return the last ten results, and the second one to return all results since the given id. We use subqueries (SELECT from SELECT) to get the results in our wished order.
We are going to poll this code every ten seconds and refresh the tweets list to show the most recent ones, using javascript, and the output format will be JSON.
Since, as Larry Wall said, one of the cardinal virtues of a programmer is lazyness, we are going to use jQuery to construct the interface and the business logic. And we’re not even serving it, but linking from the Google CDN, as Dave Ward posted in his wondeful blog.
So, let’s start with the HTML:
Just some styles, one script link and a body containing one div, two links and a header. Simple, eh?
We need to add some javascript in the middle of the code to connect to the server:
Let’s see… This is probably the most complex part of the article, so, I’ll try to go slow and explain every function:
This function calls the server using the getJSON jQuery method. Then, it takes each response line and calls addNew with it. If we call it with an id parameter, it’ll ask the server for all tweets with ids greater than that. Otherwise, it will grab the last ten tweets.
It takes an item (a tweet) as input. It hides the first tweet, remove it’s ‘tweet’ class, appends a new tweet at the bottom and shows it. It calls the renderTweet function to get the tweet in HTML format.
Just one line to call getImportanceColor() and a return with the HTML code. It’s a bit long but that’s because we’re adding a couple of links to the tweet to be able to visit the original one.
It takes a number of followers and returns a rgb color that will be between total black, for people without followers, and total red, for Ashton Kutchner. It uses logarithms to scale between the two extremes, because there are 6 orders of magnitude between the extremes. We will use it to paint (it) black the twitters with few followers and red the twitter stars.
This is the timeout function that calls itself every 200ms and gets the new tweets.
The last block just starts the polling as soon as the document is loaded.
This is a small screen capture of my browser visiting the HTML/javascript page while running storing_tweets_in_the_database.php. It’s watching the word ‘twitter’ and, as you can see, it’s running too fast for the human eye -at least mine -, but since we are keeping all the data in our database, it’s not lost forever
Right now, because of the Twitter API limits, just one instance of the watching process can be run at once. Anyhow, you can write several words, separated by commas, and Twitter will return results for all of them.This code should not be used in production, since there are almost no security checks to avoid missuse. If you want to use it in a machine open to the public, you should check -twice- every input for missbehaviour.
Obviously, this is just a sample. It can be made much better looking, and we could even analyze the tweets and tweet back a response to any questions concerning our keywords. The watch module should be daemonized or converted to a service to be left unatended. The HTML page could be able to filter between two dates and so on. Keep on watching. We’ll try to keep on posting this kind of contents.
I’m part of Corunet, a web agency in Spain, that can deliver consistent good results in all kind of internet projects. You can visit our website http://coru.net/ or contact me at david@corunet.com if you have any special needs
You can follow me on twitter as @dei_biz
6 Responses to Twitter alerts: using twitter streaming API for fun and profit
plotti
October 29th, 2009 at 21:32
Very nice example, I like how you explained things simply and easy.
Do you know of a way to do twitter searches that go back more than 7 days?
David Pardo
October 30th, 2009 at 11:17
Thank you, plotti. Twitter has stated that they’re keeping the tweets, but they’re not available yet over the API. I’ll try to keep you informed if that changes.
Magda
November 30th, 2009 at 06:58
Wow. This is possibly the clearest explanation of streaming API for Twitter on the web.
I’m trying to decipher how to do this in real-time in Flash, but I might just do it your way.
Thank you.
Jed Herzog
January 26th, 2010 at 19:07
This is the best tutorial on the Twitter Streaming API for PHP out there. Awesome job. I was a little disappointed to get to the end, have working code, and then find out that you don’t think this is ready for production. What do you think it would take to get his code ready for production? What are the issues? Have you looked at phirehose before?
I am trying to use the twitter streaming API for a personal project. If you think you could help me for a reasonable amount of compensation please contact me.
David Pardo
January 26th, 2010 at 19:14
Hi Jed,
I don’t think it’s ready for production since it doesn’t take care of disconnections/reconnections, neither can update the stream for new filter words. Anyway, I’ve already used it with minor modifications for some customers. I’ve been trying phirehose lately and looks great, but can’t vouch for it yet.
If you want me to help you with a project, drop me a line to david@corunet.com with your idea and I will try to send you a budget.
Dan Goodwin
February 28th, 2010 at 22:24
That is a very helpful, very well written article. Thanks for taking the time to write it up and share.