Series Foreword |
|
xiii | (2) |
Preface |
|
xv | |
|
|
1 | (18) |
|
1.1 Why a New Infrastructure? |
|
|
1 | (1) |
|
1.2 Natural Description of Linear Algebra Algorithms |
|
|
2 | (1) |
|
1.3 Physically Based Matrix Distribution |
|
|
3 | (5) |
|
1.4 Redistributing and Duplicating Matrices and Vectors |
|
|
8 | (4) |
|
1.4.1 Redistributing vectors, matrix rows, and matrix columns |
|
|
8 | (2) |
|
1.4.2 Spreading vectors, matrix rows, and matrix columns |
|
|
10 | (1) |
|
1.4.3 Reducing vectors, matrix rows, and matrix columns |
|
|
11 | (1) |
|
|
12 | (1) |
|
1.5 Implementation of Basic Matrix-Vector Operations (Preview) |
|
|
12 | (3) |
|
1.5.1 Matrix-vector multiplication |
|
|
12 | (1) |
|
|
13 | (2) |
|
1.6 Implementation of Basic Matrix-Matrix Operations (Preview) |
|
|
15 | (1) |
|
1.6.1 Matrix-matrix multiplication |
|
|
15 | (1) |
|
1.6.2 Attaining better performance |
|
|
16 | (1) |
|
1.7 Basic Linear Algebra Subprograms |
|
|
16 | (1) |
|
1.8 Message-Passing Interface |
|
|
17 | (1) |
|
1.9 Parallel Sparse Linear Algebra |
|
|
17 | (1) |
|
|
17 | (1) |
|
|
18 | (1) |
|
2 Templates and Linear Algebra Objects |
|
|
19 | (24) |
|
|
19 | (1) |
|
2.2 Distribution Templates |
|
|
20 | (7) |
|
|
22 | (1) |
|
2.2.2 Template destruction |
|
|
23 | (2) |
|
2.2.3 Template inquiry routines |
|
|
25 | (2) |
|
2.3 Linear Algebra Objects |
|
|
27 | (12) |
|
2.3.1 Linear algebra object creation |
|
|
27 | (4) |
|
2.3.2 Linear algebra object destruction |
|
|
31 | (1) |
|
2.3.3 Linear algebra object inquiry routines |
|
|
31 | (6) |
|
2.3.4 Extracting and setting local data |
|
|
37 | (1) |
|
2.3.5 Initializing object data |
|
|
38 | (1) |
|
|
39 | (1) |
|
|
39 | (1) |
|
2.6 More Operations and Information |
|
|
39 | (4) |
|
3 Advanced Linear Algebra Object Manipulation |
|
|
43 | (16) |
|
3.1 Creating Views into Objects |
|
|
43 | (3) |
|
3.2 Splitting of Linear Algebra Objects |
|
|
46 | (4) |
|
3.2.1 Splitting into four quadrants |
|
|
46 | (2) |
|
3.2.2 Splitting into two parts |
|
|
48 | (2) |
|
3.3 Shifting of Linear Algebra Objects |
|
|
50 | (1) |
|
3.4 Determining Where to Split |
|
|
51 | (1) |
|
3.5 Creating Objects "Conformal to" Other Objects |
|
|
52 | (3) |
|
3.6 Annotating Object Orientation |
|
|
55 | (1) |
|
|
56 | (1) |
|
3.8 More Operations and Information |
|
|
57 | (2) |
|
4 Application Program Interface |
|
|
59 | (16) |
|
|
59 | (1) |
|
|
60 | (1) |
|
4.3 Opening and Closing an Object |
|
|
61 | (1) |
|
|
61 | (4) |
|
|
65 | (2) |
|
4.6 Completion and Synchronization |
|
|
67 | (1) |
|
|
67 | (6) |
|
4.8 More Operations and Information |
|
|
73 | (2) |
|
5 Data Duplication and Consolidation |
|
|
75 | (14) |
|
|
75 | (3) |
|
5.1.1 Copy involving vectors |
|
|
75 | (1) |
|
5.1.2 Copy involving multivectors |
|
|
75 | (1) |
|
5.1.3 Copy involving matrices |
|
|
76 | (2) |
|
5.1.4 Copy involving multiscalars |
|
|
78 | (1) |
|
|
78 | (1) |
|
|
78 | (2) |
|
5.3 Pipelining Computation and Communication |
|
|
80 | (1) |
|
5.4 A Building Block Approach to Implementing Copy and Reduce |
|
|
81 | (5) |
|
5.4.1 Collective communication operations |
|
|
81 | (1) |
|
5.4.2 Efficient implementation of collective communication |
|
|
82 | (1) |
|
5.4.3 Implementation of the copy |
|
|
83 | (3) |
|
5.4.4 Implementation of the reduce |
|
|
86 | (1) |
|
5.5 More Operations and Information |
|
|
86 | (3) |
|
6 Vector-Vector Operations |
|
|
89 | (16) |
|
|
89 | (3) |
|
6.1.1 Standard FORTRAN call |
|
|
89 | (1) |
|
6.1.2 PLAPACK FORTRAN-C interface |
|
|
89 | (1) |
|
|
90 | (2) |
|
|
92 | (1) |
|
6.2.1 Standard FORTRAN call |
|
|
92 | (1) |
|
6.2.2 PLAPACK FORTRAN-C interface |
|
|
92 | (1) |
|
|
92 | (1) |
|
6.3 Scaling a Vector (Object) |
|
|
93 | (1) |
|
6.3.1 Standard FORTRAN call |
|
|
93 | (1) |
|
6.3.2 PLAPACK FORTRAN-C interface |
|
|
93 | (1) |
|
|
94 | (1) |
|
6.4 Scaled Vector (Object) Addition |
|
|
94 | (2) |
|
6.4.1 Standard FORTRAN call |
|
|
95 | (1) |
|
6.4.2 PLAPACK FORTRAN-C interface |
|
|
95 | (1) |
|
|
95 | (1) |
|
6.5 Inner Product of Vectors |
|
|
96 | (1) |
|
6.5.1 Standard FORTRAN call |
|
|
96 | (1) |
|
6.5.2 PLAPACK FORTRAN-C interface |
|
|
96 | (1) |
|
|
96 | (1) |
|
|
97 | (2) |
|
6.6.1 Standard FORTRAN call |
|
|
97 | (1) |
|
6.6.2 PLAPACK FORTRAN-C interface |
|
|
98 | (1) |
|
|
98 | (1) |
|
6.7 Maximum Absolute Value in Vector |
|
|
99 | (1) |
|
6.7.1 Standard FORTRAN call |
|
|
99 | (1) |
|
6.7.2 PLAPACK FORTRAN-C interface |
|
|
99 | (1) |
|
|
99 | (1) |
|
6.8 Example: Parallelizing Inner Product |
|
|
100 | (1) |
|
6.9 Example: Parallelizing "axpy" for Vector Objects |
|
|
101 | (2) |
|
6.10 More Operations and Information |
|
|
103 | (2) |
|
7 Matrix-Vector Operations |
|
|
105 | (20) |
|
7.1 General Matrix-Vector Multiplication |
|
|
105 | (3) |
|
7.1.1 Standard FORTRAN call |
|
|
105 | (1) |
|
7.1.2 PLAPACK FORTRAN-C interface |
|
|
105 | (1) |
|
|
106 | (2) |
|
7.2 Symmetric Matrix-Vector Multiplication |
|
|
108 | (2) |
|
7.2.1 Standard FORTRAN call |
|
|
108 | (1) |
|
7.2.2 PLAPACK FORTRAN-C interface |
|
|
109 | (1) |
|
|
109 | (1) |
|
7.3 Triangular Matrix-Vector Multiplication |
|
|
110 | (2) |
|
7.3.1 Standard FORTRAN call |
|
|
110 | (1) |
|
7.3.2 PLAPACK FORTRAN-C interface |
|
|
110 | (1) |
|
|
111 | (1) |
|
|
112 | (1) |
|
7.4.1 Standard FORTRAN call |
|
|
112 | (1) |
|
7.4.2 PLAPACK FORTRAN-C interface |
|
|
112 | (1) |
|
|
113 | (1) |
|
|
113 | (2) |
|
7.5.1 Standard FORTRAN call |
|
|
113 | (1) |
|
7.5.2 PLAPACK FORTRAN-C interface |
|
|
114 | (1) |
|
|
114 | (1) |
|
7.6 Symmetric Rank-1 Update |
|
|
115 | (1) |
|
7.6.1 Standard FORTRAN call |
|
|
115 | (1) |
|
7.6.2 PLAPACK FORTRAN-C interface |
|
|
115 | (1) |
|
|
115 | (1) |
|
7.7 Symmetric Rank-2 Update |
|
|
116 | (2) |
|
7.7.1 Standard FORTRAN call |
|
|
116 | (1) |
|
7.7.2 PLAPACK FORTRAN-C interface |
|
|
117 | (1) |
|
|
117 | (1) |
|
7.8 Example: Parallelizing Matrix-Vector Multiplication |
|
|
118 | (3) |
|
7.8.1 Simple implementation |
|
|
118 | (1) |
|
7.8.2 General implementation |
|
|
119 | (2) |
|
7.9 Example: Parallelizing Rank-1 Update |
|
|
121 | (1) |
|
7.9.1 Simple implementation |
|
|
121 | (1) |
|
7.9.2 General implementation |
|
|
121 | (1) |
|
7.10 More Operations and Information |
|
|
122 | (3) |
|
8 Matrix-Matrix Operations |
|
|
125 | (36) |
|
8.1 General Matrix-Matrix Multiplication |
|
|
125 | (2) |
|
8.1.1 Standard FORTRAN call |
|
|
125 | (1) |
|
8.1.2 PLAPACK FORTRAN-C interface |
|
|
126 | (1) |
|
|
126 | (1) |
|
8.2 Symmetric Matrix-Matrix Multiplication |
|
|
127 | (4) |
|
8.2.1 Standard FORTRAN call |
|
|
129 | (1) |
|
8.2.2 PLAPACK FORTRAN-C interface |
|
|
129 | (1) |
|
|
129 | (2) |
|
8.3 Symmetric Rank-k Update |
|
|
131 | (1) |
|
8.3.1 Standard FORTRAN call |
|
|
131 | (1) |
|
8.3.2 PLAPACK FORTRAN-C interface |
|
|
131 | (1) |
|
|
131 | (1) |
|
8.4 Symmetric Rank-2k Update |
|
|
132 | (3) |
|
8.4.1 Standard FORTRAN call |
|
|
133 | (1) |
|
8.4.2 PLAPACK FORTRAN-C interface |
|
|
133 | (1) |
|
|
133 | (2) |
|
8.5 Triangular Matrix-Matrix Multiplication |
|
|
135 | (2) |
|
8.5.1 Standard FORTRAN call |
|
|
135 | (1) |
|
8.5.2 PLAPACK FORTRAN-C interface |
|
|
135 | (1) |
|
|
136 | (1) |
|
8.6 Triangular Solve with Multiple Right-Hand-Sides |
|
|
137 | (2) |
|
8.6.1 Standard FORTRAN call |
|
|
137 | (1) |
|
8.6.2 PLAPACK FORTRAN-C interface |
|
|
137 | (1) |
|
|
138 | (1) |
|
8.7 Example: Parallelizing Matrix-Matrix Multiplication |
|
|
139 | (19) |
|
8.7.1 Forming C = AB + BC |
|
|
139 | (3) |
|
8.7.2 Forming C = ABT + BC |
|
|
142 | (3) |
|
8.7.3 Forming C = ATB + BC |
|
|
145 | (2) |
|
8.7.4 Forming C = ATBT + BC |
|
|
147 | (1) |
|
8.7.5 A more general approach |
|
|
147 | (2) |
|
8.7.6 Performance results |
|
|
149 | (9) |
|
8.8 Querying Algorithmic Blocking Size |
|
|
158 | (1) |
|
8.9 More Operations and Information |
|
|
159 | (2) |
|
9 Application of the Infrastructure |
|
|
161 | (12) |
|
9.1 Cholesky Factorization |
|
|
161 | (1) |
|
9.2 Right-Looking Variant |
|
|
161 | (4) |
|
9.2.1 Level-2 BLAS implementation |
|
|
161 | (2) |
|
9.2.2 Level-3 BLAS implementation |
|
|
163 | (2) |
|
|
165 | (7) |
|
9.3.1 Level-2 BLAS implementation |
|
|
165 | (1) |
|
9.3.2 Level-3 BLAS implementation |
|
|
166 | (2) |
|
9.3.3 Towards further performance improvements |
|
|
168 | (4) |
|
9.4 More Operations and Information |
|
|
172 | (1) |
A Summary of PLAPACK Routines and Their Arguments |
|
173 | (8) |
B Summary of BLAS Related Routines |
|
181 | (4) |
Bibliography |
|
185 | (2) |
Index |
|
187 | (4) |
Constants Index |
|
191 | (2) |
Function Index |
|
193 | |