NOTE:
This is not a full tutorial. This is simply a description of the project with a few config files and links to resources.

This project took me two days, I am not about to give away everything I worked so hard to learn, however I am willing to point you in the right direction if possible. If you have any questions related to this post, please leave a comment and I will try my best to answer.

As most of you know about 4 months ago I switched jobs.  I went from being a Computer Centre Manager at a private library to being a L1 Support Agent/L2 Sysadmin for a server hosting company.  The learning curve has been immense. I went from working with 3 servers to over 1000 servers.

One of the first things I did was start a personal project.  I mean the more things you can do in the world of servers, the more valuable of an asset you are to the company.  I mean I don’t want to stay at L1/L2, I want to move up, become L3 and work on Complex Solutions.  SO I took on the task of learning how to load balance.

What is Load Balancing?

According to Wikipedia:

In computing, load balancing distributes workloads across multiple computing resources, such as computers, a computer cluster, network links, central processing units or disk drives. Load balancing aims to optimize resource use, maximize throughput, minimize response time, and avoid overload of any single resource.

In hosting load balancing mostly applies to Web (Apache, Nginx, Lighttpd, etc) and Database (MySQL, MSSQL, Postgresql, etc) utilization, especially in the case of my project.

The Project

My Setup

The concept was simple.  Using 3 cloud servers (via DigitalOcean and Vultr) create a load balanced system that would perform the following:

  • Split the load of connections evenly between servers
  • Replicate the web data between servers
  • Replicate the databases between servers
     

Going into this setup I knew it had limitations, the largest of which was the fact that there is still one point of failure, the load balancer.  In a true production load balanced system, you need redundancies.  In my project, I didn’t have this, which is not to say what I learned in this project would not be valuable, but its just not a typical setup that would be used in a real life environment.  The other known is that I am load balancing connections in general, that is to say I have not separated my Web server from my SQL Server, they reside on the same system and therefore use up the resources.  Again, this would not necessarily be used in a real life scenario, except maybe for a blogger like myself.

Each server had the same virtual hardware and OS setup:

  • 1 CPU
  • 1 GB RAM
  • 1 GB Swap
  • 30 GB HDD (20GB on Vultr)
  • CentOS 6.6 (Except one Zen Loadbalancer)
     

As mentioned there were some exceptions.  I tested two different types of Load Balancers.  The first was using Nginx, which I had never heard of despite it being available to the public since 2004.  Nginx (pronounced Engine X) is a webserver like the popular Apache, but also has a built in ability to load balance HTTP requests (Web).  The Second test was done using Zen LoadBalancer, which is a Debian based Open source TCP/UDP Load balancer.  This means that not only can it load balance HTTP Requests, but also MySQL Requests (for when you split the Web and MySQL servers apart).  In  order to use Zen, I had to move my setup over to Vultr due to the fact DigitalOcean does not allow custom ISO images.

Replicating Data (the /var/www/html directory)

Ok, the heading is a bit misleading… It was actually the /usr/share/nginx/html directory.  But as the HTML engine for the two webservers could be anything, most people would use Apache, which on latest versions of Linux are stored by default in /var/www/html (Ubuntu 14.04-14.10 and CentOS 7 among many distros).  I used Nginx only because its great for single domain use and easy to setup.

For the site I used WordPress as the engine, which again is quite popular.  I wanted to try and keep it to as real a situation as possible, despite my limitations.  In order to replicate the data from one server to another I used Lsyncd, which can be installed fairly easily using yum or apt-get.  I used CentOS 6.6 so I used the following command (on both servers, although for safety you really only need to install rsync on the second server):

Then I used the following config on the first server (/etc/lsyncd.conf):

This used SSH to transfer the files over. This was found to be the most efficient way to transfer the files. It also transfers them instantly. (you can adjust the Delay if you find it takes time to upload files, this way you are not streaming the files over, taking up resources).

Now since replication is only one way, I needed to setup in my load balancer, a way that when I am accessing the Admin section, I am always accessing the first server and never the second. Here is my Nginx Config (/etc/nginx/conf.d/default.conf on LB server), you can see the section where I state that the /wp-admin section goes only to the one server:

Setting up the Admin section forwarder is similar in Zen Loadbalancer, but its all done through the GUI by creating a separate service within the farm (hint, only available in an HTTP farm. So make sure when you create your farm, you care not leaving it as the default TCP type).

Replicating the Database (MySQL)

For the database I used a MySQL based, again, one of the top databases used on the web. MySQL does have a clustering software, but documentation on the latest setup is hard to find. I used Percona XtraDB Cluster which is a MySQL Stand-in, which means it acts exactly like MySQL at the application level. This way WordPress can still use it.

Like Lsyncd you can install Percona from the yum/apt-get cli (command line interface):

For a great tutorial on how to setup Persona, you can visit their website here. They do their setup using 3 databases, but for my project, I simply removed the 3rd node.

Conclusion

This was a fun project. Took me only a weekend. In a real life situation, I would use the same setup but separate the databases from the website. I would then use Zen Loadbalancer to direct traffic to the HTTP servers, and then create another farm to send traffic to the MySQL Cluster using 3 servers. In most cases, MySQL is the resource hog, so having 2 HTT servers and 3 MySQL is not uncommon. I have seen configurations for Lsyncd to replicate to multiple servers, however I haven’t tried them as the rest of the content in the tutorials was out of date, and I could not find any update documentation using more then 2 servers. Something to tryout later on.(if anyone figures this out, by all means leave it in the comments)

I used all internal IPs for my project, however in real life you could have servers all over the world, so using public IPs is also possible.