Monday, October 20, 2008

The Tracker Demystified – Part 1: Building the Database

You may have noticed a post on this subject a while back that was unintentionally released before its time. Well, if you have already read that one, re-read this anyway. It's finished, for one thing.

Anyway, this week is going to be devoted to building a simple tracker from scratch. This tracker will do no more than accept and share IP addresses, with no front-end for uploading and downloading metainfo (.torrent) files. However, if you've worked with PHP much, you can probably already work out how to do file uploads and downloads and pretty or at least usable interfaces. This is the big obstacle, and also the bit that has the potential to be interesting for the non-coder, if I can manage to write clearly enough to keep them engaged and slightly comprehending.

Hopefully, over the course of the week, I'll dispel any notions of BitTorrent as a complex or incomprehensible protocol. Once you get your head around it, it's actually quite easy to understand and use. The peer-to-peer bit is a little less straightforward, but happily, we don't have to deal with that. We're writing a tracker, not a client.

My resource in all of this is going to be the official BitTorrent spec, which I know almost by heart. I'll be referring to the spec from time to time, so read it over and my posts might start to make some kind of sense. My MySQL abilities are a little more touch-and-go, so optimization may not be as fantastic as it could be and I'd welcome any constructive criticism on that front.

On that note, we'll be starting today with outlining the basic structure of the announce through the creation of a database table for the peers. This is the only table we'll need, which is handy. Basically, we need to store the pertinent bits of the data that's received by the announce, and enough to provide a coherent response. There are eight variables passed by the client to the tracker: info_hash, peer_id, ip, port, uploaded, downloaded, left, and event. Official or unofficial extensions may add extra values, but we're writing a barebones tracker, so we can safely ignore them. The spec does a perfectly good job of clearly outlining the purpose of each of these variables, so I won't repeat it here.

Now, all of the provided variables are pretty important for various things, but again, this isn't a full-featured tracker, so we'll ignore some of them. We'll save info_hash so we can connect peers on the same torrent to one another, peer_id so we can distinguish one peer from another, and ip and port so we can share the user's address with others on request. As well, while we have no need to track ratios here, we do need to know who is seeding and who is leeching so that we don't waste time sharing seeds' IPs with other seeds. We'll call this variable uploader, and it'll simply be a bit assigned based on the test of left == 0.

At this point, our database looks like this:

database structure

If you're familiar with phpMyAdmin, you'll see that I've set a primary index on id, which is standard practice. I've also set an index on info_hash, since we'll be running a lot of queries for it.

Now, at this point, before the more knowledgeable members start tearing holes in my post, I want to point out that I'm working with an ideal model here. No information is lost, all clients follow the protocol to the letter (particularly in always cleanly closing connections), and there are no clients that spoof information for personal gain. Obviously, none of these are true, but since I'm aiming to explain an implementation of the BitTorrent protocol, I'm going to work with these assumptions for the time being, just like how friction is often ignored in introductory physics courses.

No comments:

Clicky Web Analytics