Introduction: Build a Linux Cluster - Submitted by BayLab for the Instructables Sponsorship Program

Ever wanted to have a Linux cluster to crunch numbers? This guide will get you started on setting up a cluster that can automatically load balance threads and help you learn how supercomputers work. This image is what my room looked like when I set my cluster up and had 14 nodes running.

Step 1: OpenMosix

OpenMosix is a now deprecated program that is used for automatic load balancing over a network. There is a single head node and many slave nodes that take instructions from the head node. The computers talk to each other over the local network. 

You can install OpenMosix from here, but I suggest using a LiveCD like CHAOS. This is because there is a lot of stuff already set up that you don't have to worry about. I'm going to assume OpenMosix is already correctly installed for this guide. 

Step 2: Setting Up the Network

This image shows how the network should be built. I used a cheap switch I bought off ebay to connect every computer I could find. Note that the speed of your cluster depends on the speed of the network, so using 10/100 vs gigabit will make a difference. Also keep in mind that since the cluster will load balance, nodes that can't contribute much won't get used much. 

Step 3: Configuring the Network

This is the part that took me a while to figure out. Below I've reproduced the instructions I wrote for myself.

INSTRUCTIONS FOR INITIATING CLUSTER
v.2.0

************************************************************************************************************************************************
*Note that 192.168.0.xx refers to the ip of a particular drone node and that the same number should be used in every step for a specific node. *
*The number changes based on the drone node number, so the series goes 192.168.0.2, 192.168.0.3, 192.168.0.4, etc.                             *
*You also need to make sure that you know the ip of the router. In the case of my silver belkin router, it is 192.168.0.1 . You can find this  *
*by trying to ping it. it is either 192.168.0.1 or 192.168.1.1 .                 *
************************************************************************************************************************************************

Editing files:

There are 2 ways to edit system files in linux. If you are in the GUI, you can simply navigate to the desired folder and open the file with something like kwrite. If you are on a command line, then use the command "vi filename". this will open the file in the command line and let you edit it. To begin, hit the insert key and use the arrow keys to get to where you need to insert text. type it in and then press the escape key. Then, hold down shift and press "zz". This will exit and save your work. Note that "/" is the lowest directory so "/etc/hosts" is located in "/"->"/etc"->"/hosts".

Notes:

Anytime you see <this> that means to type exactly what is in the < > and hit enter


HEAD NODE
----------

****************************************************************************************************
*Before you initiate the networking, do this:          *
*<cd />               *
*<mkdir mfs>              *
*<vi /etc/fstab>          *
*then add this line somewhere, no quotes:
*"cluster /mfs mfs odfsa=1 0 0"              *
*then continue to step 1            *
*               *
****************************************************************************************************


1. <ifconfig eth0 192.168.0.2>
2. <route add -net 0.0.0.0 gw 192.168.0.1>
3. <etc/init.d/openmosix start>

DRONE NODES
------------

1. <ifconfig eth0 192.168.0.x>
2. <route add -net 0.0.0.0 gw 192.168.0.1>
3. <etc/init.d/openmosix start>


CORRECT /etc/openmosix.map FILE CONFIG
---------------------------------
This should be the exact same on every node

1      192.168.0.2 1
2      192.168.0.3 1
3      192.168.0.x 1


and continue adding aditional ip's that corespond to step 1 of drone node config


CORRECT /etc/hosts FILE CONFIG
-------------------------------
this should be modified on every node also

192.168.0.x node2 localhost
127.0.0.1 localhost

Note here that node2 should be replaced with that node's name, which should correspond to the list below:



machine number  node number  machine name  assigned ip
1   1   node1   192.168.0.2
2   2   node2   192.168.0.3
3   3   node3   192.168.0.4



To test if it worked, type <mosmon> and hit t. this will give you a graph of cluster activity and the number of availible cpu's.

Step 4: OpenMosix UI

The first image is what you should see when you run "openmosixview". The second image is from "openmosixmigmon" and the third is from "mtop". If it's working, you should see each ID turn green. Once a load is issued, you should also see it balance among each node.

Remember, since this is balancing threads, running a single threaded program on the head node won't do anything. The application you're running must be multithreaded. The following code from IBM can be used to test your cluster:

// testapp.c  Script for testing load-balancing clusters

#include <stdio.h>

int main() {
   unsigned int o = 0;
   unsigned int i = 0;

   unsigned int max = 255 * 255 * 255 * 128;

      // daemonize code (flogged from thttpd)
      switch ( fork() ) {
         case 0:
            break;
         case -1:
            // syslog( 1, "fork - %m" );
            exit( 1 );
         default:
            exit( 0 );
      }

   // incrementing counters is like walking to the moon
   // its slow, and if you don't stop, you'll crash.
   while (o < max) {
      o++;
      i = 0;
      while (i < max) {
         i++;
      }
   }

   return 0;
}


Compile with:
gcc testapp.c -o testapp
and then run
./testapp


Step 5: Other Resources

Since OpenMosix is a dead project, it can be hard to find help. A good knowledge of Linux can be very useful, as well as these links:
http://www.ibm.com/developerworks/linux/library/l-clustknop/index.html
http://c.mills.ctru.auckland.ac.nz/OpenMosix/