Performance issues

As you may have noticed from time to time, the site can be slow or throw errors. Needless to say I’m aware of these issues, but unfortunately it’s not easy to solve right away. Some of the issues come from the fact that I didn’t plan well enough ahead when I wrote the site scripts about a year ago. There have been about 1.8 million file uploads since the relaunch, and certain scripts are straining under these amounts of files.

Take the log parsing module for example: to be able to track and disable files that use to much bandwidth, a script is run every hour that parses the log files. This in itself is a pretty fast operation, but the gathered statistics are then stored in a MySQL database. This again was no problem, until the amount of unique files being hit every hour was getting large. At roughly 100K row updates per hour, the server harddrive barely keeps up and it takes minutes to store statistics that took seconds to parse. Meanwhile the harddrive activity severely slows other things down too. I’ve got an idea how to handle this better, but until I got time to implement it log parsing is temporary offline. This won’t affect you much of course, unless somebody decides to upload a file that uses so much bandwidth it hoses the server.

Another one of these “wtf” moments was 404 (file not found) handling. In order to display an intelligent message depending on why the image was missing (wrong url or deleted because of bandwidth limits or tos violations), the image was served using a PHP script. That however isn’t a smart thing if say, an small image was deleted for high bandwidth usage: instead of causing a few dozen static file hits per second, it was causing a few dozen PHP (dynamic) hits per second. In other words, by deleting the image the load on the server got worse, not better. Doh! On the new server (see below) I’ve implemented this trick for the 404 handling, so that after the first hit (which is still handled by a PHP script) all subsequent hits (for a few seconds) are loaded from memcached by the webserver (nginx), which is very fast.

Possibly the biggest cause of slow performance is that the current servers aren’t quite up to the task. The main server (www.imagehost.org & b.imagehost.org) is a dual CPU Xeon with merely 1 GB of RAM, the other (c.imagehost.org) has merely 512 MB of RAM. The server that hosts a.imagehost.org is probably fast enough, but the network it’s connected to is not (and thus most uploads are on b & c). Anyway, I’ve ordered a new server: Core 2 Duo E6600 with 4GB of RAM. Needless to say I expect this to help a lot. The only issue is that it is configured differently, and adapting the scripts and testing them is taking time. Hopefully it’ll be ready for new uploads very soon though.

I realize this post has been far more technical than some of you may understand, but I hope you appreciate me explaining more about what goes on behind the scenes.

Comments are closed.