Tuesday, October 21, 2008

The Tracker Demystified – Part 2: Structure

Now it's time to get started on the coding itself. For debugging purposes, we're going to need a lot more than the single-line tracker status field in normal BitTorrent clients. Happily, since the connectivity is simply via HTTP GET, you could just type the announce URL into your browser's address bar to test the effect of various values. In fact, this is the way I did my testing last time I took it upon myself to write a tracker.

This time, I'm going a little more high class. I've made a simple form with fields for each of the variables (info_hash, peer_id and so forth) that is submitted to announce.php using GET. It's much easier to play around with different values this way. Since forms are pretty simple and not the subject of discussion here, I won't go into any more depth here.

First, even before connecting to the database, I'll load the passed variables and perform a minimum of validation. Like yesterday, I'll be referring to the protocol spec to determine the expected values and ranges of particular variables. I don't like throwing a ton of errors at my user, so I tend to design for graceful failure. For example, if event is anything other than started, completed, stopped, or empty, it will simply default to empty.

Speaking of event, it is going to form the basis of our code structure. The heart of the script consists of a series of conditionals based on the four possible variables. Actually, since empty merely denotes that the client is requesting more peers, we only need the three that require updates on the tracker's end: started, completed, and stopped. These three events map cleanly to three MySQL statements: INSERT, UPDATE, and DELETE, respectively. When a torrent is started, the peer needs a new row in the database, which should be updated when the download has finished and removed when the connection is closed.

Now, stopped only takes input – there's no point in making output when nobody is going to use it. For everything else, we'll use BitTorrent's bencode protocol (detailed in the spec) to output the proper response. Again, I encourage you to go back and peruse the spec. Even if you don't muck around in the backend much, it's nice to be able to pop the hood of any metainfo (.torrent) file you may come across and get a general idea of what's going on in there. The bencoded data may look intimidating, but it's perfectly readable if you take a few minutes to understand how it works.

So, back to the announce. The proper response I mentioned above is a dictionary consisting of two keys/value pairs: interval to indicate the number of seconds until the tracker suggests that you scrape again, and the peers to provide a list of the peer id, ip, and port of each provided peer. There's one bit in the official specification that I feel the need to quote for all the private tracker elitists out there:
Note that downloaders may rerequest on nonscheduled times if an event happens or they need more peers.
Announcing more often than this is perfectly acceptable practice. I do hate to hold it up for ridicule, but BitMe has ironically banned the Mainline (BitTorrent) client originally developed by Bram Cohen himself, developer of the protocol, because it "does not honor the protocol requirements of private trackers". I'm not going to knock on BitMe too much because they're more open than many private trackers in that regard, and they're the only one that I know of that lists banned clients and justifications for banning. However, allow me to take this opportunity to say... what the fuck?

That's a little off-topic. Anyway, now we just have to pull the data from the database. In practice, clients can and do indicate the number of peers they're looking for with the numwant variable, but that's not part of the spec, so I'm ignoring it for now. Instead, I've specified 25 as an arbitrary maximum, and used ORDER BY RAND() in my MySQL query to ensure that all peers get a statistically equal showing no matter where they sit in the database.

Now that we've got the pertinent data, it has to be output, so we'll bencode it according to the structure laid out in the protocol, and shove it out the door. I might talk about writing full functions to read and write bencoded data at some point, but it's a lot more work. The implementation in the announce is simple because we only have to account for the variation between different volumes of data, which is nevertheless being output in a rigid format. Super easy.

Now, while I may not have done a particularly satisfying of describing the last bit of the announce, it is in fact done. The finished script is only 29 lines, including whitespace. I had it heavily commented, but many of the comments repeated what I'm saying here, and I want to emphasize how simple an announce can really be, so I've left it blank and hopefully the minimalism will speak for itself.

At the moment, I'm running three peers: Transmission, Azureus, and my own web browser masquerading as a client. Transmission has successfully seeded the file to Azureus, and both are trying to connect to my web browser to share with it too. Obviously, since it isn't actually a torrent client at all, that's not going to happen.

peers table entries

Now, I see the seasoned coders cringing already, so I'll repeat this once again: this is a barebones demo script. It is designed to provide a framework for you to work off of in forming a more complex announce. The idea is to illustrate the way the protocol works, and the way the tracker interfaces with the protocol. As it stands now, the script doesn't even sanitize input, so it is vulnerable to SQL injection as well as just about every other damn thing.

Tomorrow, I hope to elaborate a bit on these issues, refining this code to create a perfectly usable announce. Ratio tracking is still outside the scope of the project, but we can certainly tighten things up and maybe lay the necessary groundwork for you to go on and add ratios and such frivolity. I say "I hope" because I'm burdened with a lot of work tomorrow and may not get time for a lot of coding. I'll make sure you get an article, and hopefully part 3 of this series, but no promises.

And yes, you're welcome to take the code I wrote above and use it without attribution for whatever purposes you want. However, if you're going to do that, I really suggest that you fix the holes first, or wait until I have a chance to do it for you.

5 comments:

Denney said...

I really enjoy these sorts of things. I wish more websites did these sorts of tutorials/explanations.

Keep the awesome work coming guys. It's a great read.

Nick said...

Talk about the seasoned coders cringing. That shit is riddled with SQL injection vulnerabilities (not to mention the general lack of structure or good design).

CurlyFries said...

That's the point. Like I said, the next article is when I go through validation and so forth, turning a working proof-of-concept into a secure and stable announce. It was just too much to cover today.

OnionRings said...

@nick,
Pastebin your script if you think it's better. Just saying that it's this or that doesn't help anyone.

Nick said...

Sorry, my comment didn't quite come across as I intended it.

I understand that it's a simple, proof-of-concept and I applaud you for stepping up and encouraging people to learn and experiment with this sort of stuff.

That said, I believe security is fundamental to all software, particularly web design. It is my opinion that even a proof of concept should demonstrate good coding and security practices. This means input validation, some form of logical structure and some simple abstractions.

Whilst the seasoned coder understands the need for these things and is able to infer them from your proof-of-concept, those who are new to coding may not. Furthermore, it is going to be the Novice programmers who take the most away from your discussion here - and being the impressionable folk that they are, will benefit heavily from you demonstrating good practices from the word go.

-Nick

Clicky Web Analytics