Many-task computing
Many-task computing (MTC)[1][2][3][4][5][6][7] in computational science is an approach to parallel computing that aims to bridge the gap between two computing paradigms, high throughput computing (HTC)[8] and high-performance computing (HPC).
Definition
MTC is reminiscent of HTC, but it differs in the emphasis of using many computing resources over short periods of time to accomplish many computational tasks (i.e. including both dependent and independent tasks), where the primary metrics are measured in seconds (e.g. FLOPS, tasks/s, MB/s I/O rates), as opposed to operations (e.g. jobs) per month. MTC denotes high-performance computations comprising multiple distinct activities, coupled via file system operations. Tasks may be small or large, uniprocessor or multiprocessor, compute-intensive or data-intensive. The set of tasks may be static or dynamic, homogeneous or heterogeneous, loosely coupled or tightly coupled. The aggregate number of tasks, quantity of computing, and volumes of data may be extremely large. MTC includes loosely coupled applications that are generally communication-intensive but not naturally expressed using standard message passing interface commonly found in HPC, drawing attention to the many computations that are heterogeneous but not "happily" parallel.
There is more to HPC than tightly coupled MPI, and more to HTC than embarrassingly parallel long running jobs. Like HPC applications, and science itself, applications are becoming increasingly complex opening new doors for many opportunities to apply HPC in new ways if we broaden our perspective. Some applications have just so many simple tasks that managing them is hard. Applications that operate on or produce large amounts of data need sophisticated data management in order to scale. There exist applications that involve many tasks, each composed of tightly coupled MPI tasks. Loosely coupled applications often have dependencies among tasks, and typically use files for inter-process communication. Efficient support for these sorts of applications on existing large scale systems will involve substantial technical challenges and will have big impact on science.
Related Areas
Some related areas are multiple program multiple data (MPMD), high throughput computing (HTC), workflows, capacity computing, or embarrassingly parallel. Some projects that could support MTC workloads are Condor,[9] Mapreduce,[10] Hadoop,[11] Boinc,[12] Cobalt HTC-mode,[13] Falkon,[14] and Swift.,[15][16]
References
- ↑ IEEE Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS08) 2008, http://datasys.cs.iit.edu/events/MTAGS08/
- ↑ ACM Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS09) 2009, http://datasys.cs.iit.edu/events/MTAGS09/
- ↑ IEEE Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS10) 2010, http://datasys.cs.iit.edu/events/MTAGS10/
- ↑ ACM Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS11) 2011, http://datasys.cs.iit.edu/events/MTAGS11/
- ↑ IEEE Transactions on Parallel and Distributed Systems, Special Issue on Many-Task Computing, June 2011, http://datasys.cs.iit.edu/events/TPDS_MTC/
- ↑ I. Raicu, I. Foster, Y. Zhao. "Many-Task Computing for Grids and Supercomputers", IEEE Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS08), 2008
- ↑ "Many Task Computing: Bridging the performance-throughput gap", International Science Grid This Week (iSGTW), January 28th, 2009, http://www.isgtw.org/?pid=1001602
- ↑ M. Livny, J. Basney, R. Raman, T. Tannenbaum. "Mechanisms for High Throughput Computing," SPEEDUP Journal 1(1), 1997
- ↑ D. Thain, T. Tannenbaum, M. Livny, "Distributed Computing in Practice: The Condor Experience" Concurrency and Computation: Practice and Experience 17( 2-4), pp. 323-356, 2005
- ↑ J. Dean, S. Ghemawat. "MapReduce: Simplified data processing on large clusters." In OSDI, 2004
- ↑ A. Bialecki, M. Cafarella, D. Cutting, O. O'Malley. "Hadoop: A Framework for Running Applications on Large Clusters Built of Commodity Hardware," http://lucene.apache.org/hadoop/, 2005
- ↑ D.P. Anderson, "BOINC: A System for Public-Resource Computing and Storage," IEEE/ACM International Workshop on Grid Computing, 2004
- ↑ IBM Coorporation. "High-Throughput Computing (HTC) Paradigm," IBM System Blue Gene Solution: Blue Gene/P Application Development, IBM RedBooks, 2008
- ↑ I. Raicu, Y. Zhao, C. Dumitrescu, I. Foster, M. Wilde. "Falkon: A Fast and Lightweight Task Execution Framework," IEEE/ACM SC, 2007
- ↑ Y. Zhao, M. Hategan, B. Clifford, I. Foster, G. Laszewski, I. Raicu, T. Stef-Praun, M. Wilde. "Swift: Fast, Reliable, Loosely Coupled Parallel Computation", IEEE SWF, 2007
- ↑ M. Wilde, M. Hategan, J. M. Wozniak, B. Clifford, D. S. Katz, and I. Foster." Swift: A language for distributed parallel scripting." Parallel Computing, 37:633–652, 2011.