Monday, October 27, 2008

The Tracker Demystified - Part 3: Reality

Okay, it's time to get back to the coding. On Tuesday, I developed a script that would play by the rules assuming everybody else did too. It was awesome: 29 lines, with a maximum of two database queries. And I wouldn't give it five minutes in the real world.

Why? Well, first, no input data was validated leaving the script open to SQL injection. Hopefully I don't need to say much more about the problems presented there, aside from that it would be a very bad thing were we not to address this issue. If you haven't heard of injection, go take a few minutes and read up on the implications. Since infohashes and peer ids are binary, any value containing 0x27 (apostrophe) would invalidate the SQL even without any malicious intent.

Secondly, even if we assume that the user is always benevolent, the script fails to anticipate that they may not be as true to the protocol as we are. The most glaring issue is that if the stopped message is never received, that particular row will remain in the database forever. This happens in the real world all the time; some poorly-designed clients will even neglect to send stopped when closed.

This time around, I'm just going to link the finished code and then go over some of the highlights. It'll be faster than just detailing every change in minute detail.

In my previous code, there wasn't even an error function. I've corrected that now, using the failure reason response provided by the protocol spec. I like my code to fail gracefully, since it cuts down on errors for users to come running to you with, but occasionally that's just not possible. Using failure reason with the message "Invalid port number", the user will see something like this in their client:

failure reason error in Transmission
Obviously, you don't have a lot of room to work with, but you can get some pertinent information across. In this case, it is used to inform the user that their client is fucked. (Side note: this is an artificially-generated error. Transmission would never do anything bad.)

I've added a new field to the database, a timestamp field called last_connect. It has ON UPDATE CURRENT_TIMESTAMP set, so whenever we make a change to a row, the timestamp will be updated to match. This will be handy later on.

A bunch of validation has been added to the top of the script, which should be pretty self-explanatory. The BitTorrent spec describes the optional ip value as being "generally used for the origin if [the client] is on the same machine as the tracker", so the script makes an exception for 127.0.0.1, ignoring the value otherwise and simply going on the server's record.

I've shuffled the core about a bit so that they make more sense to the computer and less to a human. Since the maximum announce interval has been set to 30 minutes, if a row hasn't been updated in an hour, the script will assume it to be trash and delete it. In order to update the timestamp, a dummy query (SET uploader = $uploader) is run. Well, it's not entirely useless – this way, if the completed event is never received, the script won't even notice. In fact, it doesn't even check for completed anymore. Both it and empty are now treated in exactly the same way. 

If you rewind a bit to my previous entry on the subject, you'll see that empty was never dealt with. That's because it doesn't involve any change in state, which the tracker would need to record. Well, now the state is simply an assertion of its continued existence, which the tracker will need to keep track of.

It's not necessary to clear old entries with every connection; in fact, it is quite wasteful to do so. However, since this is a one-script application, we can't rely on a cron job or other external function to do our dirty work for us.

This is where I will leave you. There are a ton of features I could add, including ratio tracking, IP bans, client identification, protocol extensions (some trackers include swarm info on announce), defending against browser attacks, and so forth. However, I promised to deliver a script that functioned as a basic barebones open tracker in the real world, and that's exactly what I did. I hope you'll be able to use it as a framework to work from in writing your own code.

On the subject of working as a framework, you should now see why I left the script half-finished last time. It's a lot easier to understand the early version than the completed version, and hopefully helped a bit to clarify the basic structure we were working with. I'm sorry for any bugs or security holes that may have crept in here. This is a last-minute job as always, and it's entirely possible that I'm being sloppy.

That's all for now, folks.

No comments:

Clicky Web Analytics