FORMALIZED OBTAINING OF THE COMMUNICATION OPERATIONS OF GRAINED PARALLEL ALGORITHMS
https://doi.org/10.29235/1561-2430-2018-54-1-50-61
Abstract
Algorithms designed for implementation on parallel computers with distributed memory consist of computational macro operations (calculation grains) and communication operations specifying the data arrays exchange between computing nodes. The major difficulty is how to find an efficient way to organize the data exchange. To solve this problem, it is first necessary to identify information dependences between macro operations and then to generate the communication operations caused by these dependences. To automate and simplify the process of code generation, it is necessary to formalize communication operations. The formalization is known for the case of homogeneous information dependences. Such formalization uses the vectors of global dependences as a representation of dependences between the calculation grains. Also, there is a way that makes it possible to obtain the data arrays exchange, but it requires the usage of tools to work with polyhedra and does not formalize communication operations. This article presents a formalization method and a method of inclusion of communication operations into the algorithm structure (receiving and sending data arrays) in case of a parallel algorithm with affine dependences. The usage of functions determining the relationship between macro operations allowed obtaining explicit representations of communication operations. This work is a generalization of the formalization of the operations of sending data in a parallel algorithm, where operations are not divided into macro operations, as well as a generalization of some aspects of obtaining the communication operation method.
About the Authors
N. A. LikhodedBelarus
A. A. Tolstsikau
Belarus
References
1. Xue J., Cai W. Time-minimal tiling when rise is larger than zero. Parallel Computing, 2002, vol. 28, no. 5, pp. 915–939. Doi: 10.1016/s0167-8191(02)00098-4
2. Kim D., Rajopadhye S. Parameterized Tiling for Imperfectly Nested Loops. Technical Report CS-09-101. Colorado State University, Department of Computer Science, February 2009. 21 p.
3. Dathathri R., Mullapudi R. T., Bondhugula U. Compiling Affine Loop Nests for a Dynamic Scheduling Runtime on Shared and Distributed Memory. ACM Transactions on Parallel Computing (TOPC), 2016, vol. 3, no. 2, pp. 1–28. Doi: 10.1145/2948975
4. Likhoded N. A., Tolstikov A. A. Formalization of communication operations of multidimensional loops. Vestsі Natsyia nal’nai akademіі navuk Belarusі. Seryia fіzіka-matematychnykh navuk = Proceedings of the National Academy of Sciences of Belarus. Physics and Mathematics series, 2010, no. 3, pp. 109–114 (in Russian).
5. Likhoded N. A., Sobolevsky P. I., Tolstikov A. A. Parallel algorithm communication operations generated by uniform dependences. Doklady Natsional’noi akademii nauk Belarusi = Doklady of the National Academy of Sciences of Belarus2011, vol. 55, no. 3, pp. 21–26 (in Russian).
6. Likhoded N. A., Sobolevsky P. I. Information structure of grained algorithms with homogeneous dependencies. Doklady Natsional’noi akademii nauk Belarusi = Doklady of the National Academy of Sciences of Belarus, 2011, vol. 55, no. 2, pp. 22–26 (in Russian).
7. Bondhugula U. Compiling affine loop nests for distributed-memory parallel architectures. Supercomputing 2013. Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ACM, 2013. Article №. 33. Doi: 10.1145/2503210.2503289
8. Dathathri R., Reddy C., Ramashekar T., Bondhugula U. Generating efficient data movement code for heterogeneous architectures with distributed-memory. Proceedings of the 22th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2013, pp. 375–386. Doi: 10.1109/pact.2013.6618833
9. PLUTO: An automatic parallelizer and locality optimizer for affine loop nests. Available at: http://pluto-compiler. sourceforge.net (accessed 23 October 2017).
10. Voevodin V. V. Computational mathematics and the structure of algorithms. Moscow, Moscow State University Publishing House, 2006. 112 p. (in Russian).
11. Likhoded N. A., Tolstikov A. A. Functions assigning the dependences of grained algorithms. Doklady Natsional’noi akademii nauk Belarusi = Doklady of the National Academy of Sciences of Belarus, 2014, vol. 58, no. 4, pp. 35–41 (in Russian).
12. Voevodin V. V., Voevodin Vl. V. Parallel Computations. Saint Petersburg, BKhV-Peterburg Publ., 2002. 608 p. (in Russian).
13. Voevodin V. V. Massive parallelism and decomposition of algorithms. Zhurnal vychislitel’noi matematiki i matematicheskoi fiziki = Computational Mathematics and Mathematical Physics, 1995, vol. 35, no. 6, pp. 988–996 (in Russian).
14. Likhoded N. A., Tolstikov A. A. Parallel sequences of grain computations. Doklady Natsional’noi akademii nauk Belarusi = Doklady of the National Academy of Sciences of Belarus, 2010, vol. 54, no. 4, pp. 36–41 (in Russian).
15. Lim A. W., Lam M. S. Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Computing, 1998, vol. 24, no. 3–4, pp. 445–475. Doi: 10.1016/s0167-8191(98)00021-0
16. Bondhugula U., Baskaran M., Krishnamoorthy S., Ramanujam J., Rountev A., Sadayappan P. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. Lecture Notes in Computer Science, 2008, no. 4959, pp. 132–146. Doi: 10.1007/978-3-540-78791-4_9
17. Likhoded N. A. Sufficient conditions for the determination and use of data in the same granular parallel computa -tion process. Computational Mathematics and Mathematical Physics, 2014, vol. 54, no. 8, pp. 1316–1326. Doi: 10.1134/ s0965542514080077