Index |
High-Performance Scheduling
a chapter by Fran Berman
from
"The Grid -- Blueprint for a New Computing Infrastructure",
edited by Ian Foster and Carl Kesselman.
This chapter is part of a text covering the basic issues involved in developing applications, middleware, and infrastructure for a metacomputing platform ("compuutational grid"). The metaphore "computational grid" is based on the electrical power grid which provides ubiquitous electrical power. In the same way the metacomputer has the potential of developing into a computational grid which can provide ubiquitous computing power.
The chapter deals with the problem of achieving application performance via scheduling on such a platform. Application scheduling on a metacomputer is compared with application scheduling on MPPs. Current work in the area of metacomputer application scheduling is reviewed. Finally, trends and challenges in the metacomputing application scheduling area as well as system support for such "high-performance" schedulers are discussed.
In this paper we examine the performance sensitivity of a master/slave parallel ray-tracing application when running on non-dedicated clusters of workstations. Our experiments show that two different scheduling strategies, one static and one dynamic, exhibit different performance sensitivities to variabilities in resource capabilities and workload distributions. We demonstrate for our example application that neither scheduling strategy by itself consistently induces the best application performance (minimal execution time) when running on the same resources under normally experienced production operating conditions. These results support the idea that dynamic selection of appropriate scheduling strategies to match run-time conditions could be a promising approach to achieving better application performance for master/slave applications on heterogeneous platforms.
This paper investigates the efficacy of Application-Level Scheduling (AppLeS) for a parallel gene sequence library comparison application in production metacomputing settings. We compare an AppLeS-enhanced version of the application to an original implementation designed and tuned to use the native scheduling mechanisms of Mentat -- a metacomputing software infrastructure. The experimental data shows that the AppLeS versions outperform the best Mentat versions over a range of problem sizes and computational settings.
The structure of the AppLeS we have defined for this application does not depend on the scheduling algorithms that it uses. Instead, the AppLeS scheduler considers the uncertainty associated with the information it uses in its scheduling decisions to choose between the static placement of computation, and the dynamic assignment of computation during execution. We propose that this framework is general enough to represent the class of metacomputing applications that are organized as a master and set of parallel slaves, in which the master distributes uncomputed work.
This paper describes the state-of-the-art of the AppLeS Project as of Summer, 1997. Application-level scheduling, the AppLeS architecture, ongoing work in developing application AppLeS, and AppLeS coordinated subprojects are described. References are given to related work both within and outside the AppLeS group.
We focus om the problems of scheduling applications on metacomputing systems. We intoduce the concept of application-centric scheduling in which everything about the system is evaluated in terms of its impact on the application. Application-centric scheduling is used by virtually all metacomputer programmers to achieve performance on metacomputing systems. We describe two successful metacomputing appli and Panoramacations to illustrate this approach, and describe AppLeS scheduling agents which generalize the application-centric scheduling approach. Finally, we show preliminary results which compare AppLeS-dervied schedules with conventional strip and blocked schedules for a two-dimensional Jacobi code.
In this paper, we define a set of principles underlying application-level scheduling and describe our work-in-progress building AppLeS (application-level scheduling) agents. We illustrate the application-level scheduling approach with a detailed description and results for a distributed 2D Jacobi application on 2 heterogeneous platforms.
While running on parallel distributed resources, schedulers may find it advantageous to redistribute elements of a computation in response to changing conditions. In this paper, we focus on the development of dynamically parametrizable models to determine the cost (in terms of execution delay) of performing redistribution. We illustrate our approach by examining in detail the modeling of redistribution costs for a 2D Jacobi application running in a cluster of workstations environment.
Computational Grids composed of distributed, high-performance computing resources, are quickly becoming the platform-of-choice for performance-challenged applications. In this paper, we describe the AppLeS (Application-Level Scheduling) approach to achieving application performance via scheduling on dynamic multiple-user Computational Grids and clusters. We illustrate AppLeS methodology by describing an AppLeS developed for JPL's Synthetic Aperture Radar Atlas (SARA) -- a visualization tool for satellite radar images. We demonstrate the effectiveness of application scheduling by providing a performance-efficient strategy for retrieving SARA data files in everyday, multiple-user Grid environments.
In this paper, we show that distributed parallel application performance in clusters of workstations benefits from knowledge about network topology in cases where LAN resources can become potential performance bottlenecks. These performance bottlenecks often appear in common networking technologies that employ highly shared resources, such as ethernets and token-rings. A method to discover network configuration as it relates to application performance, called Effective Network Views (ENV), is presented, and experimental results with an example application are given to show the value of including ENV information in the scheduling process when good performance is required.
One of the compelling reasons for developing the Information Power Grid (IPG) is to provide a platform for more rapid development and execution of simulations and other resource-intensive applications. However, the IPG will ultimately not be successful unless users and application developers can achieve execution performance for their codes. In this paper, we describe a performance-efficient approach to scheduling applications in dynamic multiple-user distributed environments such as the IPG. This approach provides the basis for application scheduling agents called AppLeS. We describe the AppLeS methodology and discuss the lessons learned from the development of AppLeS for a variety of distributed applications. In addition, we describe an AppLeS-in-progress currently being developed for NASA's INS2D code, a distributed "parameter sweep" application.
The computational grid is becoming the platform of choice for large-scale distributed data-intensive applications. Accurately predicting the transfer times of remote data files, a fundamental component of such applications, is critical to achieving application performance. In this paper, we introduce a performance prediction method, AdRM (Adaptive Regression Modeling), to determine file transfer times for network-bound distributed data-intensive applications. We demonstrate the effectiveness of the AdRM method on two distributed data applications, SARA (Synthetic Aperture Radar Atlas) and SRB (Storage Resource Broker), and discuss how it can be used for application scheduling. Our experiments use the Network Weather Service, a resource performance measurement and forecasting facility, as a basis for the performance prediction model. Our initial findings indicate that the AdRM method can be effective in accurately predicting data transfer times in wide-area multi-user grid environments.
Computational Grids are becoming an increasingly important and powerful platform for the execution of large-scale, resource-intensive applications. However, it remains a challenge for applications to tap the potential of Grid resources in order to achieve performance. In this paper, we illustrate how applications can leverage Grids to achieve performance through coallocation. We describe our experiences developing a scheduling strategy for a real-life parallel tomography application targeted to Grids which contain both workstations and parallel supercomputers. Our strategy uses dynamic information exported by a supercomputer's batch scheduler to simultaneously schedule on workstations and immediately available supercomputer nodes. This strategy is of great practical interest because it combines resources available to the typical research lab: time-shared workstations and CPU time in remote space-shared supercomputers. We show that this strategy improves the performance of the parallel tomography application compared to traditional scheduling strategies, which target the application to either type of resource alone.
Also visit main AppLeS page for more info.
The goal of the Network Weather Service is to provide accurate forecasts of dynamically changing performance characteristics from a distributed set of metacomputing resources. Providing a ubiquitous service that can both track dynamic performance changes and remain stable in spite of them requires adaptive programming techniques, an architectural design that supports extensibility, and internal abstractions that can be implemented efficiently and portably. In this paper, we describe the current implementation of the NWS for Unix and TCP/IP sockets and provide examples of its performance monitoring and forecasting capabilities.
The Network Weather Service is a generalizable and extensible facility designed to provide dynamic resource performance forecasts in metacomputing environments. In this paper, we outline its design and detail the predictive performance of the forecasts it generates. While the forecasting methods are general, we focus on their ability to predict the TCP/IP end-to-end throughput and latency that is attainable by an application using systems located at different sites. Such network forecasts are needed both to support scheduling, and by the metacomputing software infrastructure to develop quality-of-service guarantees.
In this paper, we focus on the problem of making short and medium term forecasts of CPU availability on time-shared Unix systems. We evaluate the accuracy with which availability can be measured using Unix load average, the Unix utility vmstat, and the Network Weather Service CPU sensor that uses both. We also examine the autocorrelation between successive CPU measurements to determine their degree of self-similarity. While our observations show a long-range autocorrelation dependence, we demonstrate how this dependence manifests itself in the short and medium term predictability of the CPU resources in our study.
We describe the architecture of the Network Weather Service and implementations that we have developed and are currently deploying for the Legion and Globus/Nexus metacomputing infrastructures. We also detail NWS forecasts of resource performance using both the Legion and Globus/Nexus implementations. Our results show that simple forecasting techniques substantially outperform measurements of current conditions (commonly used to gauge resource availability and load) in terms of prediction accuracy.
Also visit NWS website for more info.
Accurate performance predictions are difficult to achieve for parallel applications executing on production distributed systems. Conventional point-valued performance parameters and prediction models are often inaccurate since they can only represent one point in a range of possible behaviors. We address this problem by allowing characteristic application and system data to be represented by a set of possible values and their probabilities, which we call stochastic values. In this paper, we give a practical methodology for using stochastic values as parameters to adaptable performance prediction models. We demonstrate their usefulness for a distributed SOR application, showing stochastic values to be more effective than single (point) values in predicting the range of application behavior that can occur during execution in production environments.
Accurate performance predictions are difficult to achieve for applications on production distributed parallel systems. We address this problem by allowing characteristic application and system data to be represented as an interval of possible values. We use intervals as parameters to a structural modeling technique, and predict the range of application behavior that will occur. We test this approach on a production system using a distributed SOR code, and demonstrate the resulting additional information and accuracy resulting from the interval values.
We present a structural performance model that uses application profiles and component models to predict an application's performance on a set of distributed resources. We decompose application performance in accordance with the structure of the application: that is, into interacting component models that correspond to component tasks. Then, using the application profile and available information as guides, we select models for each component appropriately. As a proof of concept, we have implemented this approach for two distributed applications, a master-slave genetic algorithm code and a red-black stencil successive over-relaxation code. Our predictions are within 10% of actual time.
Also visit Structural Modeling and Performance Prediction Engineering pages for more info.
In this paper, we present a model that provides an estimate of the slowdown imposed by competing load on applications targeted to high-performance clusters and networks of workstations. The model provides a basis for predicting realistic communication and computation costs and is shown to achieve good accuracy for a set of scientific benchmarks commonly found in high-performance applications.
Data-parallel applications executing in clustered environments share resources with other applications. Since system load can vary dramatically, it is critical to provide an accurate model of the effects of contention on application performance in order to provide realistic assessments of application behavior. In this paper, we present a model to predict contention effects in clustered environments. This model provides a basis for predicting realistic execution times for applications on clusters of workstations and is parameterized by the data allocation policy employed by the targeted application, the local slowdown present in each node of the cluster, and the relative weight associated with each node in the cluster.
We present a model for predicting the effects of contention on application behavior in a two-machine heterogeneous system. Our model porovides a characterization of contention in a heterogeneous system, and predicts the effect of contention on application communication and computation costs. We describe the model for the Sun/CM2 and Sun/Paragon coupled heterogeneous systems. We also present experiments which show the predicted communication and computation costs to be within 15% on average of actual costs. These experiments were performed on production systems with emulated contention.
In this paper, we propose a mapping strategy for applications formed by multiple tasks targeted to heterogeneous platforms. We first define a mapping model, the match-tree, which reflects the data movement and conversion costs of distributed algorithms and allows for alternative implementations of individual tasks on different machines. We then define the find-mapping and split-partition algorithms, based on the match-tree model, to determine the best allocation of tasks to resources in heterogeneous systems. We illustrate the use of these algorithms with a sample distributed application.
We describe Zoom, a hierarchical representation in which heterogeneous applications can be described. The goal of Zoom is to provide an abstraction that computer and computational scientists can use to describe heterogeneous applications, and to provide a foundation from which program development tools for heterogeneous network computing can be built. Three levels (structure, implementation and data) of the Zoom hierarchy are described and are used to illustrate two heterogeneous applications. Extensions to Zoom to include additional resource parameters required by program development tools are also discussed.
We couple the Zoom representation designed to facilitate development of heterogeneous applications, and the HeNCE graphical language and tool, designed as a representation for and an executional model of heterogeneous programs targeted to PVM. The combination of Zoom and HeNCE provides a hierarchical representation which exposes performance issues and a means of automatically translating that representation into code executable on heterogeneous networks of computers.
Program speedup is an important measure of the performance of an algorithm on a parallel machine. Of particular importance is the near linear or superlinear speedup exhibited by the most performance-efficient algorithms for a given system. We describe network and program models for heterogeneous networks, define notions of speedup and superlinear speedup, and observe that speedup consists of both heterogeneous and parallel components. We also consider the case of linear tasks, give a lower bound for speedup, and show that there is no theoretical upper limit on heterogeneous speedup.
The LeMans FunRun is a heterogeneous competition which was held at Supercomputing '93 in Portland Oregon. The participants raced to "solve" a puzzle in a fixed amount of time on a network of machines comprised of a CM-5, Paragon, MasPar, Cray and Silicon Graphics workstations. This paper describes the design and implementation of the LeMans FunRun and discusses its potential for use as a heterogeneous education tool.
This report explores the feasibility of using market mechanisms for efficient, automated resource allocation within a software system.