Web development, usability and more
Twitter is a wonderful service, but, until now, you have to subscribe to some websites to be alerted when a selected word (maybe your trademark) is tweeted. We’ll try to develop a service that filters the tweeter api, stores the interesting ones in our database, and show them in the browser in real time.
NOTE: Now you have to use https:// instead of http:// for it to work.
We’ll take twitter real time results for a given word (or words) and visualize them in a browser window, like monitter.com, but on our own servers and a bit more automatic. This is going to be useful to get our own alerts and work with them.
We are trying to make a useful system. To do this, we’ll need some tools:
Twitter has published a Streaming API that’s described as “The Twitter Streaming API allows near-realtime access to various subsets of Twitter public statuses”. In fact, this is jus what we need. You can read all the documentation at https://twitterapi.pbworks.com/Streaming-API-Documentation, but I’ll try to take the interesting parts for this project so you don’t need to yet.
We are going to use just one method (status/filter) to get results including one or several words. This method can return a stream of data in xml or json formats, has to be called using POST and can get some parameters. You can use it from the command line if you have access to some kind of unix in the following way:
curl -d 'track=google' https://stream.twitter.com/1/statuses/filter.json -uuser:password
Where user and password are your twitter credentials.
It should return something like:
As I told you in the tools section, we are going to use PHP as our server side language. For the first part, the reading of the twitter stream, we don’t even need a web server, since we can run it from the command line. And if we run it from the command line, we can convert it to a kind of daemon/service and leave it on for a long time. But first, some code:
This is (almost) the simplest code that delivers what we want. A php formatted stream of tweets that include our marked word. If you run it from your command line, it’ll show something like:
Let’s look at the code:
We create an $opts array (in fact an array of arrays) that contain the parameters. In this particular case, we’re using two, the POST method and the search line (track=google). Then, we can treat the twitter stream as a file, using stream_context_create and fopen and just start reading lines. Each line is going to be a JSON encoded tweet, similar to what we’ve seen when we called the API from command line. Since we want to use the contents as easily as possible, we’ll need the json_decode function to parse it into PHP objects, print them and call flush just in case we’re calling the script from a browser.
The best way to store the results for later perusal is a database. I’m using MySQL but any other database should be OK. To be able to store the data, we need to create a database with a single table.
We are only going to store the following data:
We could store several other fields, and a complete solution should probably take into account that you can have some different tweet types, but for the time, these four fields should suffice.
To create the table, we can run the following SQL script from the server:
After that, we will have a single table database waiting for us to fill it with tweets… Let’s go:
It’s as ugly as sin but it works. If you run it from the command line, it should start storing tweets in your database and keep on until you stop it. So, our database is starting to fill with tweets concerning our desired word. Now we need to be able to navigate them…
Now we need to publish the tweets in our browser. To do that, we need a small PHP script that returns the tweets when called. If we call it with a parameter start it’ll return all the tweets with an id bigger than that. Otherwise, it’ll return the last ten tweets stored in our database. To do that, we will use two different queries, the first one to return the last ten results, and the second one to return all results since the given id. We use subqueries (SELECT from SELECT) to get the results in our wished order.
Since, as Larry Wall said, one of the cardinal virtues of a programmer is lazyness, we are going to use jQuery to construct the interface and the business logic. And we’re not even serving it, but linking from the Google CDN, as Dave Ward posted in his wondeful blog.
So, let’s start with the HTML:
Just some styles, one script link and a body containing one div, two links and a header. Simple, eh?
Let’s see… This is probably the most complex part of the article, so, I’ll try to go slow and explain every function:
This function calls the server using the getJSON jQuery method. Then, it takes each response line and calls addNew with it. If we call it with an id parameter, it’ll ask the server for all tweets with ids greater than that. Otherwise, it will grab the last ten tweets.
It takes an item (a tweet) as input. It hides the first tweet, remove it’s ‘tweet’ class, appends a new tweet at the bottom and shows it. It calls the renderTweet function to get the tweet in HTML format.
Just one line to call getImportanceColor() and a return with the HTML code. It’s a bit long but that’s because we’re adding a couple of links to the tweet to be able to visit the original one.
It takes a number of followers and returns a rgb color that will be between total black, for people without followers, and total red, for Ashton Kutchner. It uses logarithms to scale between the two extremes, because there are 6 orders of magnitude between the extremes. We will use it to paint (it) black the twitters with few followers and red the twitter stars.
This is the timeout function that calls itself every 200ms and gets the new tweets.
The last block just starts the polling as soon as the document is loaded.
Right now, because of the Twitter API limits, just one instance of the watching process can be run at once. Anyhow, you can write several words, separated by commas, and Twitter will return results for all of them.This code should not be used in production, since there are almost no security checks to avoid missuse. If you want to use it in a machine open to the public, you should check -twice- every input for missbehaviour.
Obviously, this is just a sample. It can be made much better looking, and we could even analyze the tweets and tweet back a response to any questions concerning our keywords. The watch module should be daemonized or converted to a service to be left unatended. The HTML page could be able to filter between two dates and so on. Keep on watching. We’ll try to keep on posting this kind of contents.
I’m part of Corunet, a web agency in Spain, that can deliver consistent good results in all kind of internet projects. You can visit our website http://coru.net/ or contact me at firstname.lastname@example.org if you have any special needs
You can follow me on twitter as @dei_biz