Class PiDistributed

public class PiDistributed extends PiParallel
Calculates pi using a cluster of servers. The servers should be running OperationServer. The names and ports of the cluster nodes are read from the file, or a ResourceBundle by the name "cluster". The format of the property file is as follows:
The server addresses are specified as hostname:port. Weights can (but don't have to) be assigned to nodes to indicate the relative performance of each node, to allow distributing a suitable amount of work for each node. For example, weight2 is the relative performance of server2 etc. The weights must be integers in the range 1...1000.

Guidelines for configuring the servers:

  • If the machines are not identical, give proper weights to every machine. This can improve performance greatly.
  • If the machines are somewhat similar (e.g. same processor but different clock frequency), you can calculate the weight roughly as clockFrequency * numberOfProcessors. For example, a machine with two 1600MHz processors is four times as fast as a machine with one 800MHz processor.
  • If the machines are very heterogenous, you can benchmark their performance by running e.g. PiParallel with one million digits. Remember to specify the correct number of CPUs on each machine.
  • Different JVMs can have different performance. For example, Sun's Java client VM achieves roughly two thirds of the performance of the server VM when running this application.
  • When running OperationServer on the cluster nodes, specify the number of worker threads for each server to be the same as the number of CPUs of the machine.
  • Additionally, you should specify the number of processors correctly in the file for each cluster server.

Similarly as with PiParallel, if some nodes have multiple CPUs, to get any performance gain from running many threads in parallel, the JVM must be executing native threads. If the JVM is running in green threads mode, there is no advantage of having multiple threads, as the JVM will in fact execute just one thread and divide its time to multiple simulated threads.

Mikko Tommila