[Xplor-nih] Parallel computing details in Xplor-nih

Charles at Schwieters.org Charles at Schwieters.org
Thu Dec 13 22:58:00 EST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Hello Tyler--

> 
> 1) If I terminate a running Xplor-nih calculation with ctrl+c in unix,
> neither the local nor the server machines actually stop performing
> Xplor-nih calculations on any of their processors. I have to manually
> kill -9 each instance of Xplor-nih on each processor. If the
> calculations terminate normally (the script is allowed to finish) then
> Xplor-nih does stop on all processors of all machines.
> 

probably 

  killall xplor

on all hosts will do the trick. The limitation is actually in ssh -
there's an open bug report for this. If you use rsh, signals should be
propagated correctly. The problem could be worked around by having the
top-level xplor script catch the signals and then clean up, but it
hasn't been a priority.

> 2) The way the calculations are distributed between processors is not
> weighted to the speed of the processors so some computing time is
> lost. For example, the local machine (a 4-CPU power mac) finished the
> first 50 calculations at 7:20 pm, while the two 2-CPU imacs finished
> their respective 25 calculations at 8:04 pm. I don't suppose there's
> an easy way to increase the efficiency of this workload distribution
> (i.e. send some work back to the faster CPU system when it's done with
> the first 50).

A round-robin scheduler would take care of this problem, and would also
be useful for the case of structure calculations having wildly different
computation times. I've outlined an implementation, but have not found
anyone (or the time myself) to write it up. Interested in working on
this?

> 3) In order to get the network parallel processing to work, all
> machines needed an identical directory structure, and the resulting
> structure files get stored on the individual machines rather than just
> on the local machine. I'm not sure there's an easy way to avoid the
> directory structure issue, but maybe it's possible to get the python
> script to output structures back over the network to the local machine
> to avoid having to scp the files after the calculations are all done
> (not really a big deal, just thought I'd ask though).
> 

Actually, for the scripts to really work properly, you need shared
directories, using nfs, afs or the like. One can imagine ways around
this, but I think it's fair to just ask that a shared filesystem be
used. 

If you have any suggestions or contributions, they would be appreciated.

best regards--
Charles
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8+ <http://mailcrypt.sourceforge.net/>

iD8DBQFHYf9IPK2zrJwS/lYRAvToAJ9WEBYN3AcgjPmGU5rL9IgtkOQRBwCgg/sA
2OXOpHoVk3QEyjkL/2xVo/0=
=ZYVG
-----END PGP SIGNATURE-----


More information about the Xplor-nih mailing list