Helping distributed computers pull their weight
Shared-memory computing applications have never taken particularly well to operating on distributed-memory systems, at least until now. Researchers have come up with a possible solution of particular interest to NASA and IBM – and is being tested on their distributed computing systems. It involves the most powerful number-cruncher in Europe today, the MareNostrum supercomputer in Spain.
POP goes beyond ‘grid computing' where many processors are linked over relatively low-speed networks. POP targets tightly coupled multi-core chips connected by a high-speed network. This would allow new power-hungry scientific and engineering problems to be addressed.
Developed by a consortium of computer vendors to enable the easy creation of portable, high-level shared-memory applications under the Fortran, C and C++ languages, OpenMP seemed to promise easier distributed computing. But it has failed to live up to expectations so far.
The problem with OpenMP applications is that, because they are designed for use in shared-memory systems, where memory is stored in a block and accessed by different processors, they do not function efficiently on distributed-memory machines, where memory is accessed from different computers across a network. To make them run, programmers have traditionally had to spend large amounts of time fiddling with code, often using a Message Passing Interface (MPI) that requires individual tuning for each computer.
“What we have done is to adapt OpenMP to make it more flexible, so programs can run on shared or distributed-memory systems without having to be retuned in each individual case,” Jesús Labarta, the coordinator of the POP project at the Technical University of Catalonia, told IST Results. “The end goal is to allow OpenMP applications to run anywhere, reducing the time and costs of reprogramming.”
This is important for the future uses of OpenMP – principally for powerful numerical and simulation applications – given the increased use of distributed computing systems. The POP adaptations to OpenMP will allow programmers to concentrate on writing good programs and not on worrying whether they can run on a variety of distributed processors – hence the interest of global number-crunchers such as NASA and IBM.
Heading for a big impact
“The layer of software we have developed automates the underlying message passing activity, detecting and satisfying the communication needs at run time. By dynamically detecting the application structure it can then be modified, without any user intervention, to minimise the total amount of communication needed by the system.”
The POP technique is also being tested on IBM's MareNostrum supercomputer in Barcelona, the most powerful supercomputer in Europe today, built entirely from commercially available components and using a Linux operating environment. On MareNostrum OpenMP applications are being run over various processor nodes in a distributed environment. “Without the work of POP, OpenMP wouldn't run on MareNostrum,” Labarta notes.
Although the IST contract concluded in February this year, the POP team is continuing its work with a view to putting their OpenMP environment into widespread use. “The POP research is ongoing, it is a really long-term activity that started in 1999 with Nanos, an LTR project. I think within three years our environment could be widely used and will have a substantial impact in the programming world,” says Labarta.Jesús Labarta, Technical University of Catalonia, European Centre for Parallelism and IST Results