Doing Parallel Distributed Compiles

The following instructions for doing parallel distributed compiles with SML/NJ are adapted from the Nov 1, 1999 SML/NJ 110.24 release README:

Parallel and Distributed compilation

Firing up on host machine

To perform parallel compilations, you must attach "compile servers" to CM. This is done using function CM.server_start with the following signature:
      val server_start :
   { name: string,
     pathtrans: (string -> string) option,
     cmd: string * string list } -> bool,
     pref : int
    }
    
name is a string uniquely identifying the server and cmd is a value suitable as argument to Unix.execute. (Since you are asking: no, there is only a dummy implementation of this for non-Unix systems.)

If the path to the sml executable is /path/to/smlnj/bin/sml, then a server process on the local machine could be started by:

      CM.server_start
  { name = "A", 
    pathtrans = NONE,
    cmd = ("/path/to/smlnj/bin/sml", ["@CMslave"]),
    pref = 0
           };
    
The command line argument @CMslave puts sml into "slave mode". pref is a numeric rating when choosing between idle slaves. Higher values of pref are given greater preference.

Firing up on remote machines

To run a process on a remote machine, e.g., thatmachine, as compute server, you should be using something like rsh. (You must specify the full path to rsh in the command because that's what Unix.execute wants. I.e., no PATH search. :-( ) The remote machine must share the file system with the local machine via something like NFS.
      CM.server_start
         { name = "thatmachine", 
   pathtrans = NONE,
           cmd = ("/usr/ucb/rsh", 
          ["thatmachine", "/path/to/smlnj/bin/sml",
                   "@CMslave"]) 
   pref
          };
    
You can start as many servers as you want, but they all must have different names. If you attach any servers at all, then you should attach at least two (unless you want to attach one that runs on a machine vastly more powerful than your local one). Local servers make sense on multi-CPU machines: start as many servers as there are CPUs.

File paths

For local servers, pathtrans can be safely left at NONE. If you connect to a remote server, you can use pathtrans to specify a function for translating local absolute pathnames to remote absolute pathnames. This can be a bit tricky to get right, especially if the machines use automounters and such. Here is an area that definitely is only "alpha".

Go for it

After you have attached your servers, you should be able to do CM.recomp, CM.make, CM.stabilize, CMB.make, CMB.deliver, etc. as usual. Cross-compilers ([Arch][OS]CMB.make ...) also work. If you are using CM.xxx functions, all attached servers must have the same architecture as your local one. For CMB.xxx functions this requirement does not exist because there are cross-compilers available on the slave side, too. (I have not tested a mixed-architecture setup, though).

Verbose

For fun (or trouble-shooting), you can watch the master-slave protocol by setting CM.debug to true (#set CM.debug true;)

Races

You might experience strange things in case of compile errors or interrupts. Please report bugs. Attached servers should go away if you quit the contolling sml session. Warning: Be careful though, because this feature is a bit fragile. If servers don't go away they tend to spin rapidly and suck up large quantities of CPU cycles.

PervEnv

Since the protocol is fairly simple and brain-dead, it cannot handle complicated things like setting up the initial (pervasive) environment in the case of CMB.make. Therefore, this will be done locally by the "master" process.

Control-C

When pressing ^C, you will experience a certain delay if servers are currently attached and busy. This is because the interrupt-handling code will wait for the servers to finish what they are currently doing and bring them back to an "idle" state. Back to Cynbe's SML/NJ Internals Page
Cynbe ru Taren
Last modified: Mon Jan 31 17:35:43 CST 2005