Introduction: Build Your Own Supercomputer

About: I'm an Emu. As a young chick my parents use to feed me watermelon and I loved it so much everyone nick named me, you guessed it, watermelon. Now that I have moved away from home I rarely get to eat any waterme…

I needed to count the number of duplicate values in an integer array to implement this method. While the size of the array remained under 5,000, the values in the array grew exponentially after each cycle so decided to use more than one computer.

(Note: I am really disappointed that after selling 400 million copies of Windows 7 Microsoft no longer supports Windows XP but has begun to downgrade its functionality in bits and pieces making it no longer suitable for this project. Since Doing the project under Windows 7 is even more difficult than doing it under Linux and Linux offer many more ultimate advantages than Windows I'm making the transition to Linux and as I learn will update this project accordingly. Also I am considering making this an open source collaboration under Linux so let me know if you would like to collaborate.)

Since the values in the array exceeded the limits of a Microsoft long integer variable and I had to use single or double precision variables (in the search for duplicate values by comparing each value in the array with an index ranging from the minimum to the maximum array value) the amount of available memory and time was often exceeded.

An array size of 265 elements that contains integer values that increase exponentially in value each cycle with potentially no limit in the number of cycles or the values in the array the time a memory required quickly exceeded the capability of my single personal computer.


An array and its set of values and a compare index example:

1    2622695870048
 -              -
 -              -
5    6645178834471
6    63491409181716
7    83861832294247
8    2622695870048
9    45029704436816
10    18117222372627
11    6659485477833
12    50600552165484
13    37292816675177
14    36784339702878
15    40751538724887
16    41576195825074
17    37319353740877
18    15405643023342
19    77102983094426
  -                 -
  -                 -
265    41576195825074

...with a compare index from 0 to 28239790275938 for this particular array.

I considered rewriting the program in ASP, opening an Amazon or Google or Microsoft cloud account but decide to tinker with a small Beowulf cluster first after reading several articles about them.

Since it was possible to duplicate the array and copy it to another computer it and it was also possible to split the index in half and count duplicates in the lower range of values on one computer and the upper range on another computer

28239790275938 / 2 = 14119895137969

lower range processed on one computer  0 to 14119895137969 - 1
upper range processed on another computer: 14119895137969 to 28239790275938

then adding more and more computers could potentially provide a solution to the limits of a single computer.

If you have a problem that can be divided in a similar manner and run one half on one computer and the other half on another you can build a supercomputer by simply dividing the problem and adding more computers .

Of course you might want to use an Ethernet link to transfer data instead of using floppies and you might want to avoid computers that run at speeds less than 1 GHz but the idea does not change.

Because my application does not demand too much server memory or processing time to count the duplicates I can usually run several instance of the program on each computer in the background without too much slow down or memory loss for other programs. With dedicated servers I can usually run a lot more instances of the program.

The system I have up and running can now handle an array size of 265 values requiring an undivided index from 0 to 1,280,000,000,000 (1.28 trillion) for a single computer or that can be divided into 38 segments to reduce index value size to 33,684,210 (33 million), 38 instances can run on 7 computers without issues.

Although it might take overnight or longer to complete this task and be slow by industrial standards it shows that with enough computers  the possibilities are virtually unlimited..





Step 1: What Components Did I Use?


After a power supply failed on the computer I was using to run seven hard drives I decided to limit hard drives per computer to four per computer. To do this I needed to crank up some much older computers and add them to the network. In addition I decided to add a new but inexpensive system to take some of the hard drive load

To keep costs down I recruited an old case and bought a new Intel D525 motherboard with an embedded processor and a total of four logic units and added 4GB laptop memory. Total cost for this repair and  addition including the new power supplies was around $250.

Now  the network has six computers and seventeen hard drives. Since I only need one hard drive per computer I will not need to buy  hard drives for some time.All that is really needed to boot the D525 and handle the data and programming is a flash drive.

I started with a standard Ethernet configuration consisting of network peers using four PC's that were already running under some version of Microsoft Windows and were connected using a four port wireless Westell Ultraline Series 3 Model 9200EM router that came with my Verizon FIOS installation. This particular router allowed one of the cables to be made from one hundred feet of four copper conductor telephone line which ran past CRT monitors, fans and florescent lights and up and down walls without any detectable interference, data loss of slow down.

The oldest PC in the network is an Intel D865GLC board that lost its on-board audio after lightning hit my amplifier. I still use it on occasion for word processing, email, and exploring the web while my other computers are busy. I set it up to monitor security cameras right after I lost the audio instead of throwing it away. I also decided to use it explore oil immersion cooling which completely eliminated system noise but made replacing components a pain. Although I could upgrade it to 2GB I decided not remove it from the oil bath but to leave it at 1.5GB. It runs Windows XP32

The next oldest system has an Asus A8V-X motherboard I purchased  to replace a far better Foxconn board that also got zapped through the audio port but unlike the other board did not survive to tell the story. I bought the A8V-X to keep the AMD X2 4400 CPU I paid $450 new when they first came out. The A8V-X is far inferior to the Foxconn but it allowed me to keep the X2 processor in service. ...to think I could have 2 Intel i7's now for the same price no longer erks me but it did for awhile. Memory is maxed out at 4GB of 200MHz (DDR400) PC3200. It also runs Windows XP32.

The next is a  BioStar G31D-M7 and a dual core Intel P6 CPU running at 2.7MHz with 4GB of Adata DDR2-800 (PC-2 6400). I will have to replace this system soon as it has a tendency to lock up and turn off when several programs run at the same time and may exceed CPU or memory capacity or just not be designed to handle it on the board. It runs Windows XP64 which may actually be the problem.

The forth computer is an Acer A0751h netbook which has an Atom Z520 CPU running at 1.33GHz and 2GB internal memory. It has a 260GB hard drive and a 11.6 screen but I got it at a discount because it came with Vista. Vista Home Basic presents a number of hardware and software bottlenecks and challenges but once persuaded to grant access and to run applications programs it is as fast as the desktops. I use if as the client and any processing I share over the whole network gets done in better than one forth the time since the servers only process things that can be run at the same time like counting each occurrence of a first name in an array that would otherwise exceed available memory or time or the maximum value of a long integer variable type (2,147,483,647).

If your setup is close to this one then you can use it to develop a Beowulf network. Although you can use older and slower computers running under 1.0 GHz its probably a good idea to adjust the workload accordingly. I tried using an old laptop that runs at 300 MHz and has a 5.5 gig hard drive but eventually had to limit it to only 5 locations or nodes for the whole computer.

All tasks that use this setup benefit from much shorter completion times without overtaxing each node by running in an interleaved mode that is controlled by the operating system. With this setup I can explore other ways of avoiding the requirement of excess memory and time while remaining in a novice friendly programming language like Microsoft Visual Basic 6.

Any improvements you can make to the software or hardware of your network to exceed the capacity of this setup will help but any configuration that is similar will be sufficient to get started. You can experiment with upgrading or adding older hardware and operating systems after your network is established. Although I have added older computers running Windows 98SE at around 300 MHz their slower speed causes a noticeable and perhaps unacceptable delay. Identical systems with highest affordable speed are the best way to go.

Step 2: What Task Did It Speed Up?

I was running my version of the optimal classification program described here on a single PC. The program ran into a limit when I tried to optimize a database with 265 elements, 39 states and 42 characteristics or a group size of 6.67903155 × 10^66. The target size was a whopping 3.5 billion after reaching just 6 characteristics and attaining 99.97% separation. Achieving 100% separation was outside any possibility of completion within a reasonable length of time or within  the memory capacity of a single PC. I needed a way to increase speed and memory without spending an arm and a leg.  

Enter the Beowulf network solution...

In the optimal classification process there is a need to count duplicates in each multiset. For problems like the one I was trying to solve the multiset size what outrageous. Part of one multiset that had to be counted is shown below…

Because the segment size can also be much larger than the portion shown below a double precision variable index type is usually required in addition to the array being double precision type as well. Incrementing the array index then is what takes so much time while the array itself is responsible for the need to maximize memory.. .

By dividing the task into smaller segments and assigning each segment to locations or nodes on other computes the faster the process of comparison and duplicate counting can be.

A portion of one segment of values to be compared and duplicates counted is shown below. .

  1116642534732
  600272059440
  738555848598
  1324220299720
  1282170772552
   406100332736
  1443470060825
  1127113699913
  242098093032
  1587858021259
 1371440052986
  120608508586
  1885859489440
  1614337503910
  1848669946193
  92056035995
  574605286121
  1301334977149
  777374744415
  406100332736
  868090093135
...

Once the segment is processed only the results of the compare and count are needed to be stored in the same folder, location or node for retrieval by the Beowulf client program.

With a total of 199 nodes (limited only by the number of files simultaneously open under Visual Basic 6) segment size can be substantially reduced and each processed in under half a second.

This was the prototype task which the Beowulf network could but a single personal computer could not solve..

Step 3: Programming the Network

After communication was established between client and servers through the network the client program interacted with the programs running on the servers at each location or node by updating the contents of status and data files while the server programs updated the status and the result files.

A very interesting aspect of this method is that programs and files can be duplicated and placed in additional folders on each server to be run almost in parallel by the operating allocating slices of available time to each one.

By allowing the operating system to control execution the results can usually follow so closely behind the incoming data a general flow develops between the client and the servers that can also be improved by adding higher speed Ethernet to facilitate higher rate of data transfer

This setup greatly improves execution speed and memory allocation, especially in multi-core, multi-processor and multi-disk environments using only standard Ethernet, old computers and outdated operating systems and application programs. .   

The Windows task scheduler starts the server programs whenever the servers are rebooted. Once running the first task is to monitor their data status file. When a server program, for instance finds the word "start" in the status file it drops out of the monitoring phase and starts processing data found in the data file previously sent there by the client.

When the server is done processing the data it changes the word in the status file to "finished" and returns to monitoring the status file until the client updates it with the word "start."

The client also monitors the status file and retrieves the results when the status word is changed to "finished."

While this basic setup is simple it can be greatly expanded and also reconfigured to emulate a swam. My Beowulf network can blow most state of the art personal computes out of the water for certain tasks. When it comes to any similar task even though it is not a dedicated system like IBM"S Watson it still has the power to go beyond the performance of any personal computer operating alone.

I no longer expect a need to replace my desktop computer every couple of years but rather a need to find more things I can get it to do in a single computer's place.