Speaking to members of the linux community at the Melbourne BarCamp, I got into a discussion over the use of Gentoo for a viable alternative for a reliable server.
One of the issues brought up was the fact that in order to run a Gentoo server you need to compile your packages on your live server increasing the servers load. Most of the community was unaware of the use of compile farms and how they can give you the ability to not only remove the load on the Gentoo server, but also give you a test phase to see how the newly compiled applications would react within your environment.
A compile farm is normally made up of a group of computers (but can also be just one computer) that, as suggested by the name, compiles all the code needed for a specific application. This is done by using a utility consisting of a client and a daemon named distcc.
The distcc client takes the application that you want to compile and breaks it up into small manageable chunks. These chunks are then distributed to the servers running the distcc daemon which then compiles the chunk and returns it to the client. The client then assembles all the chunks and returns the complete application.
Luckily for us, Gentoo's package manager (portage) comes with a useful feature set. One of the features is the ability for you to run your emerges in distcc mode. In order to explain how this works I am going to discuss the client and server daemon separately.
Setting up the compiling servers
For each of the servers that are going to do your compiling, you need to install the distccd daemon. This is done by issuing the commands:
-
#USE='-gtk -gnome' emerge distcc (This installs the distccd daemon. The option I have added the the USE variable tell portage not to install the gtk interface. If we didn't add these XOrg and gtk would be installed.)
-
#vi /etc/conf.d/distccd (Change DISTCCD_OPTS to include the list of clients allowd to use this server to compile)
-
#rc-update add distccd default (This tells the distccd daemon to start when the server boots)
-
#/etc/init.d/distccd start (This starts the distccd daemon)
If you run nmap -p 3632 127.0.0.1 you should now have distccd listening on port 3632.
Now that you have a running compile server, you might also want to look into setting the –listen option in /etc/conf.d/distccd to ensure that distccd only listens for traffic on the network interface you want.
Setting up the clients
Setting up the distcc client works in much the same way:
-
#USE='-gtk -gnome' emerge distcc (As mentioned before, we don't want to install X just to have a graphical representation of the compile process. If you would like, use distccmon-text to view information about the compile proccess.)
-
#vi /etc/make.conf (We need to make two changes in this file. First we need to modify the FEATURES variable to include distcc, then we need to modify the MAKEOPTS variable to inform it how many parallel makes to perform at the same time. This is normally set to the number of cpu's +1. eg. Lets say we have 4 compile servers. We would set the MAKEOPTS to -j5.
-
Lastly we want to tell the client which compile servers to use. This is done by issuing the command distcc-config –set-hosts “server1 server2 server3 server4″. Please note that I leave localhost out because I don't want the client doing any compiling.
Every time we compile (or emerge) an application from within the client, all the code will now be compiled on the compile servers. You might want to do some research into using the ccache option to cache compiled chunks for quicker compilations at a later stage. Now we move on to testing and distribution.
Creating test binaries and distributing them
So far we have solved the problem of the compilation process using up valuable resources to compile the applications need to run our live servers. From this point on we will look at how we first test the newly compiled package and then distribute it throughout our servers. This will once again be divided into two sections. The clients (our live servers) and our server(the test server)
The process works by first installing our new application or version on the test server. Once we are happy that everything is working correctly and we don't have any unexpected side effects, we convert the the files into a binary package. We then tell the live servers to fetch the binaries from our test server instead of downloading the source files and compiling them.
Creating our test server
In order to have the best environment for testing we should have an installation that clones our Live servers and an ftp server. I will leave these up to you to sort out. The only modification we need is to create an ftp user who has his home directory set to /usr/portage/packages, once again, I will leave this up to you to do.
Start by emerging the application you want. I am going to use distcc as an example. #USE='-gtk -gnome' emerge distcc. once the package has been built on the test server, run any tests you would like to do. (I generally put the test option in the FEATURES variable in /etc/make.conf to get the package to test itself, but you will still need to run your own tests). Once you are happy that the application has no problems, run quickpkg to make the binary, eg. #quickpkg =distcc-2.18.3-r10. This places a tbz2 file in /usr/portage/packages/All/ called distcc-2.18.3-r10.tbz2.
Getting our Live servers to use the compiled packages
In order for our live servers to use our newly created binary packages we need to tell them where to look. This is done by editing /etc/make.conf and adding PORTAGE_BINHOST = ”ftp://username:pass@test_server/All/”. The user name and password is the one you set when setting up your ftp user name and test_server should be set to the ip address or domain name of the test server.
Whenever you would like to use the binaries from the test server, you would just add –usepkg and –getbinpkg when you emerge, eg. #emerge –usepkg –getbinpkg distcc.
Network Topology
In general I would normally set up 3 networks. Network 1 would contain my compile farm and test server. Network 2 would contain my test server, backup servers, monitoring clients and my live servers. Network 3 would link the front facing services on my live servers to the outside world. If however you don't need that much extraction or are using public compile farms like bytemark, then just make sure you secure all communications between your servers.
Conclusion
I can only hope that this will curve the thoughts of most linux administrators who believe that Gentoo's portage system places to much strain on their live servers and because of this, exclude Gentoo from their choices of distribution.