Twitter is a wonderful service, but, until now, you have to subscribe to some websites to be alerted when a selected word (maybe your trademark) is tweeted. We’ll try to develop a service that filters the tweeter api, stores the interesting ones in our database, and show them in the browser in real time.
NOTE: Now you have to use https:// instead of http:// for it to work.
We’ll take twitter real time results for a given word (or words) and visualize them in a browser window, like monitter.com, but on our own servers and a bit more automatic. This is going to be useful to get our own alerts and work with them.
We are trying to make a useful system. To do this, we’ll need some tools:
- HTTP server:I’m currently using Apache, but any one would be OK
- Server side programming language: This time, PHP. While probably not as elegant or fast as Python or as cool as Ruby, gets the work done and it’s available in almost all web hosting plans. And the documentation is pretty extensive.
- A database to store all the info
- A twitter acount: Yours own is OK but maybe you want to create a new one for this kind of tasks. Write down your user and password, we are going to need them soon.
- The twitter API. Twitter people are so kind they have developed a restful API free for everyone to use. And it’s sub-zero cool.
The Twitter streaming API
Twitter has published a Streaming API that’s described as “The Twitter Streaming API allows near-realtime access to various subsets of Twitter public statuses”. In fact, this is jus what we need. You can read all the documentation at https://twitterapi.pbworks.com/Streaming-API-Documentation, but I’ll try to take the interesting parts for this project so you don’t need to yet.
We are going to use just one method (status/filter) to get results including one or several words. This method can return a stream of data in xml or json formats, has to be called using POST and can get some parameters. You can use it from the command line if you have access to some kind of unix in the following way:
curl -d 'track=google' https://stream.twitter.com/1/statuses/filter.json -uuser:password
Where user and password are your twitter credentials.
It should return something like:
Reading Twitter stream
As I told you in the tools section, we are going to use PHP as our server side language. For the first part, the reading of the twitter stream, we don’t even need a web server, since we can run it from the command line. And if we run it from the command line, we can convert it to a kind of daemon/service and leave it on for a long time. But first, some code:
This is (almost) the simplest code that delivers what we want. A php formatted stream of tweets that include our marked word. If you run it from your command line, it’ll show something like:
Let’s look at the code:
We create an $opts array (in fact an array of arrays) that contain the parameters. In this particular case, we’re using two, the POST method and the search line (track=google). Then, we can treat the twitter stream as a file, using stream_context_create and fopen and just start reading lines. Each line is going to be a JSON encoded tweet, similar to what we’ve seen when we called the API from command line. Since we want to use the contents as easily as possible, we’ll need the json_decode function to parse it into PHP objects, print them and call flush just in case we’re calling the script from a browser.
Storing the results
The best way to store the results for later perusal is a database. I’m using MySQL but any other database should be OK. To be able to store the data, we need to create a database with a single table.
We are only going to store the following data:
- Text: This is the twetter status. 140 chars max.
- User screen name: The screen name of the poster. This is needed to create the link to twitter
- Id: A unique id for the tweet. It’s a sequential number, so, we can order the tweets acording to this, and use it along with the user screen name to built a link back to twitter, and use it as our primary key.
- Followers count: The number of people that are going to receive the tweet in their inboxes. We are using it to style the real-time viewer. Since my primal intention is to watch a trademark, I care about the number of people that are watching the messages.
- The time of the tweet: basically for filtering purposes. We are going to store our server time to avoid lengthy conversions.
We could store several other fields, and a complete solution should probably take into account that you can have some different tweet types, but for the time, these four fields should suffice.
To create the table, we can run the following SQL script from the server:
After that, we will have a single table database waiting for us to fill it with tweets… Let’s go:
It’s as ugly as sin but it works. If you run it from the command line, it should start storing tweets in your database and keep on until you stop it. So, our database is starting to fill with tweets concerning our desired word. Now we need to be able to navigate them…
Creating the code from the server side
Now we need to publish the tweets in our browser. To do that, we need a small PHP script that returns the tweets when called. If we call it with a parameter start it’ll return all the tweets with an id bigger than that. Otherwise, it’ll return the last ten tweets stored in our database. To do that, we will use two different queries, the first one to return the last ten results, and the second one to return all results since the given id. We use subqueries (SELECT from SELECT) to get the results in our wished order.
Writing a front-end
Since, as Larry Wall said, one of the cardinal virtues of a programmer is lazyness, we are going to use jQuery to construct the interface and the business logic. And we’re not even serving it, but linking from the Google CDN, as Dave Ward posted in his wondeful blog.
So, let’s start with the HTML:
Just some styles, one script link and a body containing one div, two links and a header. Simple, eh?
Let’s see… This is probably the most complex part of the article, so, I’ll try to go slow and explain every function:
This function calls the server using the getJSON jQuery method. Then, it takes each response line and calls addNew with it. If we call it with an id parameter, it’ll ask the server for all tweets with ids greater than that. Otherwise, it will grab the last ten tweets.
It takes an item (a tweet) as input. It hides the first tweet, remove it’s ‘tweet’ class, appends a new tweet at the bottom and shows it. It calls the renderTweet function to get the tweet in HTML format.
Just one line to call getImportanceColor() and a return with the HTML code. It’s a bit long but that’s because we’re adding a couple of links to the tweet to be able to visit the original one.
It takes a number of followers and returns a rgb color that will be between total black, for people without followers, and total red, for Ashton Kutchner. It uses logarithms to scale between the two extremes, because there are 6 orders of magnitude between the extremes. We will use it to paint (it) black the twitters with few followers and red the twitter stars.
This is the timeout function that calls itself every 200ms and gets the new tweets.
The last block just starts the polling as soon as the document is loaded.
Right now, because of the Twitter API limits, just one instance of the watching process can be run at once. Anyhow, you can write several words, separated by commas, and Twitter will return results for all of them.This code should not be used in production, since there are almost no security checks to avoid missuse. If you want to use it in a machine open to the public, you should check -twice- every input for missbehaviour.
- Download the code and unzip it into a folder in your local webserver
- Edit config.php to add your twitter login data and the words you want to watch
- Create the database and the table with the SQL code above
- Run watch.php and leave it running for as long as you wish.
- Visit http://localhost/thefolderwhereyouunzippedthecode/ and watch the tweets coming.
Obviously, this is just a sample. It can be made much better looking, and we could even analyze the tweets and tweet back a response to any questions concerning our keywords. The watch module should be daemonized or converted to a service to be left unatended. The HTML page could be able to filter between two dates and so on. Keep on watching. We’ll try to keep on posting this kind of contents.
I’m part of Corunet, a web agency in Spain, that can deliver consistent good results in all kind of internet projects. You can visit our website http://coru.net/ or contact me at firstname.lastname@example.org if you have any special needs 🙂
You can follow me on twitter as @dei_biz