Issues of speed and scalability become paramount as your marketing, SEO and other promotional efforts pay off and your site climbs the ladder on the Alexa rankings. Staying in front of this tidal wave of growth with sound decisions early in your growth curve will let you spend more time enjoying the success of your hard work and less time putting out fires related to your newfound success.
Over the past 12 years I have experienced firsthand a number of issues related to site speed and infrastructure scaling. During the past six years with Video Secrets, a live webcam operation with three datacenters, hundreds of servers and bandwidth measured in Gbps (gigabits per second), we have battled, and will no doubt continue to battle, issues of scaling on a daily basis. The list that follows is a collection of concepts, software and third-party applications, which, taken individually or in combination, can help increase your capacity per server and help your site load faster. While more on the technical side, the ideas herein can serve as good discussion items for those of you in managerial positions and technical teams who maintain your sites.
Clustering
The first weapon in your arsenal against growing traffic is often to scale horizontally by adding more servers, creating a cluster. A cluster is two or more servers designed to serve the same application, distributing the load across all cluster members. A load-balancing device (most often hardware, sometimes DNS) sits in front of the cluster to distribute the requests amongst the cluster members. In addition to load distribution, clusters also provide redundancy in the event that anything happens to one of your servers. Speak with your hosting provider about this option before it becomes an issue. It is best to know your options and the cost associated with them before your site outgrows your single-server environment.
Memcached
Memcache is a key/value pair storage engine that uses memory (RAM) to store data to be retrieved later. Memcache is extremely fast and can be used to store everything from database query results to entire HTML page output. APIs exist for most popular programming languages (PHP, Perl, Ruby, Java, Python, etc.). Let’s say the main page of your site gets hit heavily, and the content only updates every hour; you could store the entire HTML source in memcache and dramatically speed up page-load times. Memcache is used by some of the largest sites on the Internet, including Facebook.com, which touts a cluster of over 800 memcache servers supplying over 28 terabytes of memory (facebook.com/eblog).
mod gzip and mod deflate
These two Apache modules relate to compressing data before it is returned to the end-user. The output of PHP, CSS, JavaScript and other text-based output is a prime candidate for compression. The compressed data is then uncompressed by the client’s browser and displayed normally. All browsers support this feature due to how it is implemented; the browser must send a special header telling the server it can support compression. Browsers that do not send this special header flag will receive uncompressed data. Speak with your hosting provider about which solution is correct for your version of Apache.
PHP Accelerators
Remember, PHP is a scripting language. Each time your server handles a request for a PHP script the PHP binary must interpret the PHP source code to turn it into byte-code. OpCode cachers store this byte-code in RAM, dramatically reducing CPU cycles and time associated with handling PHP requests. Since several accelerators exist, speak with your hosting company about whether it already offers this feature or if it has any recommendations. You might find that one accelerator works better with your site (software, plug-ins, etc.) than another.
Content Delivery Networks
These are services designed to serve images and video files (among others) very well. Due to advanced architectures and economies of scale, the cost of serving data from a CDN is often less than using the bandwidth from your main hosting provider. Also, offloading the work of serving images and videos to a CDN saves your main servers from all of that work. Most CDN providers offer POPs (points of presence) around the world for faster load times to users across the globe. A prudent solution for a large site is to balance between multiple CDN providers, giving yourself much needed redundancy for times when a provider is down (it happens!).
Query Optimization
Dynamic sites love data, and the hammer of choice for most webmasters is MySQL. Entire articles and books have been written on database optimization; for those who do not want to become a DBA, a few tricks can help you get more mileage out of your database and speed up your site in the process.
Make sure your database queries are using table indexes for maximum speed. Use of the EXPLAIN tool in MySQL can help you understand if a query is using an indexed field: EXPLAIN SELECT COUNT(*) FROM table1 WHERE foo = 'bar';
In MySQL, the storage engine is also very important. The InnoDB storage engine features row-locking, non-locking reads and transactional support. While no storage engine is correct for every problem I found it compelling that a lead engineer from Flickr.com said that use of InnoDB is one of their engineering rules; this is food for thought for anyone always using MyISAM simply out of habit.
ob_gzhandler()
This PHP function is similar to the mod gzip Apache module in that it handles sending compressed data to the user. Performing compression via the ob_gzhandler() has a benefit of giving you control of which pages produce compressed output. Remember though, since this is a PHP specific utility it will not handle the compression of JavaScript, CSS or HTML like mod gzip will.
Move Heavy Lifting Offline
Anything that takes a long time to compute should be moved offline to run in the background. The cron daemon in Unix provides an easy way to schedule and automate recurring tasks. For instance, a long database query could be run every few minutes by cron and the data could be stored in a text file, memcache or another database table for quick access by your frontend scripts.
The goal of building and maintaining highly scalable and fast websites is never-ending, but even implementing one of these techniques can help your sites load faster and help your servers handle more traffic.
Brad Estes is the Operations Manager for Video Secrets; he oversees the business development and technical direction of the company’s award-winning live video chat network of sites.