Gentoo compile farms
Monday, August 6th, 2007
Speaking to members of the linux community at the Melbourne BarCamp, I got into a discussion over the use of Gentoo for a viable alternative for a reliable server.
One of the issues brought up was the fact that in order to run a Gentoo server you need to compile your packages on your live server increasing the servers load. Most of the community was unaware of the use of compile farms and how they can give you the ability to not only remove the load on the Gentoo server, but also give you a test phase to see how the newly compiled applications would react within your environment.
A compile farm is normally made up of a group of computers (but can also be just one computer) that, as suggested by the name, compiles all the code needed for a specific application. This is done by using a utility consisting of a client and a daemon named distcc.
The distcc client takes the application that you want to compile and breaks it up into small manageable chunks. These chunks are then distributed to the servers running the distcc daemon which then compiles the chunk and returns it to the client. The client then assembles all the chunks and returns the complete application.
Luckily for us, Gentoo’s package manager (portage) comes with a useful feature set. One of the features is the ability for you to run your emerges in distcc mode. In order to explain how this works I am going to discuss the client and server daemon separately.
Setting up the compiling servers
For each of the servers that are going to do your compiling, you need to install the distccd daemon. This is done by issuing the commands:
- #USE=’-gtk -gnome’ emerge distcc (This installs the distccd daemon. The option I have added the the USE variable tell portage not to install the gtk interface. If we didnt add these XOrg and gtk would be installed.)
- #vi /etc/conf.d/distccd (Change DISTCCD_OPTS to include the list of clients allowd to use this server to compile)
- #rc-update add distccd default (This tells the distccd daemon to start when the server boots)
- #/etc/init.d/distccd start (This starts the distccd daemon)
If you run “nmap -p 3632 127.0.0.1” you should now have distccd listening on port 3632.
Now that you have a running compile server, you might also want to look into setting the –listen option in /etc/conf.d/distccd to ensure that distccd only listens for traffic on the network interface you want.
Setting up the clients
Setting up the distcc client works in much the same way:
- #USE=’-gtk -gnome’ emerge distcc (As mentioned before, we dont want to install X just to have a graphical representation of the compile preccess. If you would like, use “distccmon-text” to view information about the compile proccess.)
- #vi /etc/make.conf (We need to make two changes in this file. First we need to modify the “FEATURES” variable to include distcc, then we need to modify the “MAKEOPTS” variable to inform it how many parallel makes to perform at the same time. This is normally set to the number of cpu’s +1. eg. Lets say we have 4 compile servers. We would set the MAKEOPTS to “-j5″.
- Lastly we want to tell the client which compile servers to use. This is done by issuing the command ‘distcc-config –set-hosts “server1 server2 server3 server4″‘. Please note that I leave “localhost” out because I dont want the client doing any compiling.
Every time we compile (or emerge) an application from within the client, all the code will now be compiled on the compile servers. You might want to do some research into using the ccache option to cache compiled chunks for quicker compilations at a later stage. Now we move on to testing and distribution.
Creating test binaries and distributing them
So far we have solved the problem of the compilation process using up valuable resources to compile the applications need to run our live servers. From this point on we will look at how we first test the newly compiled package and then distribute it throughout our servers. This will once again be divided into two sections. The clients (our live servers) and our server(the test server)
The process works by first installing our new application or version on the test server. Once we are happy that everything is working correctly and we dont have any unexpected side affects, we convert the the files into a binary package. We then tell the live servers to fetch the binaries from our test server instead of downloading the source files and compiling them.
Creating our test server
In order to have the best environment for testing we should have an installation that clones our Live servers and an ftp server. I will leave these up to you to sort out. The only modification we need is to create an ftp user who has his home directory set to “/usr/portage/packages“, once again, I will leave this up to you to do.
Start by emerging the application you want. I am going to use “distcc” as an example. “#USE=’-gtk -gnome’ emerge distcc“. once the package has been built on the test server, run any tests you would like to do. (I generally put the “test” option in the “FEATURES” variable in “/etc/make.conf” to get the package to test itself, but you will still need to run your own tests). Once you are happy that the application has no problems, run “quickpkg” to make the binary, eg. “#quickpkg =distcc-2.18.3-r10“. This places a tbz2 file in “/usr/portage/packages/All/” called distcc-2.18.3-r10.tbz2.
Getting our Live servers to use the compiled packages
In order for our live servers to use our newly created binary packages we need to tell them where to look. This is done by editing “/etc/make.conf” and adding ‘PORTAGE_BINHOST=”ftp://username:pass@test_server/All/”‘. The username and password is the one you set when setting up your ftp username and test_server should be set to the ip address or domain name of the test server.
Whenever you would like to use the binaries from the test server, you would just add “–usepkg” and “–getbinpkg” when you emerge, eg. “#emerge –usepkg –getbinpkg distcc“.
Network Topology
In general I would normally setup 3 networks. Network 1 would contain my compile farm and test server. Network 2 would contain my test server, backup servers, monitoring clients and my live servers. Network 3 would link the front facing services on my live servers to the outside world. If however you dont need that much extraction or are using public compile farms like bytemark, then just make sure you secure all communications between your servers.
Conclusion
I can only hope that this will curve the thoughts of most linux administrators who believe that Gentoo’s portage system places to much strain on their live servers and because of this, exclude Gentoo from their choices of distribution.