-
Notifications
You must be signed in to change notification settings - Fork 83
/
Copy pathCHANGELOG
283 lines (274 loc) · 12.4 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
5.2 | bf23ef4 | 20190809
------------------------
o Lots of build system updates, including:
- Bring aspects of build system up-to-date, especially relative to
that of BLIS.
- Support out-of-tree builds.
- Abide by GNU conventions for shared library names.
- Abide by GNU conventions for prefix, exec_prefix, libdir, and
include_dir variables and their recognition within the top-level
Makefile.
- Flatten various headers.
- ARG_MAX hack fixes/cleanups.
- Prepend preexisting contents of CFLAGS to CFLAGS.
- Support V=1 in top-level Makefile.
- Unconditionally define _POSIX_C_SOURCE=200809L.
o Added mixing I/O-related libf2c files.
o Initialize global scalar constants (e.g. FLA_ZERO, etc.) via
static initializers rather than at runtime.
o Integrated netlib LAPACK test suite.
o Diabled lapack2flamec layer's xblas.
o Prepend license notice to most .c and .h files.
o Various bug fixes.
5.1 | bf23ef4 | 20140318
------------------------
o Initial import after moving from subversion to git.
5.0 | r4648 | 20101028
----------------------
o Implemented new operations:
- Reduction of a Hermitian-definite generalized eigenproblem to
standardized form. Sequential and hierarchical interfaces are available
via FLA_Eig_gest() and FLASH_Eig_gest(), respectively.
- Reduction to upper Hessenberg form, via FLA_Hess_UT().
- Reduction to tridiagonal form, via FLA_Tridiag_UT(). (lower triangular
storage only)
- Reduction to bidiagonal form, via FLA_Bidiag_UT(). (upper bidiagonal
case only, ie: m >= n)
o Changed UT Householder transform functions so that alpha is no longer
constrained to be in the real domain, allowing tau to be complex. This
allows the transform to retain the property of being a reflector in the
complex domain.
o Various improvements to SuperMatrix runtime system, including support for
GPU execution if GPUs are available and the operation consists of sub-tasks
that are supported via CUBLAS.
o Various improvements to control tree implementation, including better
error checking to prevent the user from trying to run with control trees
that execute non-existent variants.
o Fixed some bugs with beta scaling in gemm, hemm, symm, trmm, and trsm when
beta is non-unit.
o Various improvments to the BLIS.
- Implemented the following new operations:
- bli_trmvsx (trmv to different vector)
- bli_trsvsx (trsv to different vector)
- bli_fnorm (Frobenius norm)
- bli_setv (set vector to scalar)
- bli_setm (set matrix to scalar)
- bli_setmr (set triangular matrix to scalar)
- bli_setdiag (set diagonal to scalar)
- Implemented missing row-major support for syr2k and her2k.
- Added various wrapper routines to map calls to Hermitian operations to
symmetric equivalents for real datatypes.
- Rewrote BLIS macros to look and act more like functions.
- Changed semantics of axpymt, axpysmt, copymt, swapmt to match BLAS/LAPACK.
- Fixed a subtle scaling bug in herk and her2k.
- Various bugfixes.
o Many minor interface and implementation improvements.
o Many other bug fixes and cleanups.
o Added configure-time switches to allow the building of static and/or dynamic
libraries. (GNU/Linux only)
o Added the ability to disable Fortran-77-based underscoring and LDFLAGS
queries at configure-time, which allows one to build libflame in an
environment that has absolutely no Fortran compiler.
o Added a configure-time option to enable SuperMatrix visualization, which
outputs DAG information which may be fed into graphviz.
o Added the ability for the user to specify a supplementary set of compiler
FLAGS. These flags get added to those that are automatically determined by
configure.
4.0 | r3815 | 20100213
----------------------
o Implemented blocked algorithm and algorithm-by-blocks for new
up-and-downdating operation, which updates the Cholesky/QR factor used in
solving a linear least-squares system.
o Implemented algorithm-by-blocks for LU with partial pivoting.
o Added advanced scheduling options to SuperMatrix runtime system.
o Implemented new BLAS-like interfaces (BLIS) layer.
o Implemented row-major support in BLIS layer for all operations except her2k
and syr2k.
o Added optimized unblocked routines written in C for the following operations
using the new BLIS layer to avoid LAPACK/Fortran dependence.
- lu_nopiv
- lu_piv
- qrut
- qr2ut
- lqut
- uddateut
- chol (l;u)
- ttmm (l;u)
- trinv (ln;lu;un;uu)
- sylv (nn;nh;hn;hh)
- aphudut (lh)
- aph2ut (lh;rn;rh)
- accumtut (fc;fr)
o Implemented row-major support in all LAPACK-level operations, leveraging
the new row-major support in the BLIS.
o Removed all dependencies on LAPACK.
o Removed all dependnecies on Fortran.
o Added _solve() routines for the following operations:
- chol
- lu_piv
- lu_incpiv
- qrut
- lqut
- qrutinc
o Renamed the following operations:
- QR_UT_UD -> QR2_UT
- Apply_Q_UT_UD -> Apply_Q2_UT
- Apply_H_UT -> Apply_H2_UT
o Modified hemv, hemvc, her, herc, her2, her2c, hemm, herk, her2k wrappers so
that their real symmetric brethren are invoked if the arguments are real.
o Reimplemented blocked QR and LQ algorithmic variants in terms of Apply_Q_UT
operation, resulting in greatly simplified implementations.
o Added algorithmic variants for:
- axpyt
- copyt
- gemv
- trsv
o Created new global scalar constants:
- FLA_ONE_HALF
- FLA_MINUS_ONE_HALF
- FLA_MINUS_TWO
o Added routines to recover tau vector from T matrix stored by QR_UT and LQ_UT
functions. This allows interoperability with LAPACK's QR/LQ interfaces and
is used in the new lapack2flame layer.
o Rewrote lapack2flame files in C and adjusted build system to bundle
the object files with libflame rather than put them in a separate library.
o Changed many signed ints that did not need to be signed to unsigned ints,
mostly in the form of a new type, dim_t.
o Added safeguards and error checking into partition/repartition routines.
o Adjusted safeguards and error checking for several other routines.
o Defined and implemented three error checking levels (full, minimal, and none)
where the runtime default can be set via a new configure option.
o Allow the user to enable/disable the memory leak counter at runtime.
o Provide options for disabling/enabling:
- lapack2flame compatibility layer
- use of external LAPACK implementations via external wrappers
- use of external LAPACK implementations for subproblems within blocked
algorithms and algorithms-by-blocks
via new configure options, as well as checks that prevent undefined and
unsupported combinations of these options.
o Added revision number to libflame book cover.
o Many miscellaneous bug fixes.
3.0 | r2991 | 20090501
----------------------
o Added LaTeX source for libflame reference guide.
o Added a build system for use with Microsoft Windows operating systems.
o Improved SuperMatrix abstractions and performance.
o Changed control trees and internal back-ends to support arbitrary
depth hierarchical matrices for many operations.
o Changed build system to use revision number instead of version number
when naming build products.
o Implemented much more robust parameter checking for most routines.
o Added support for new BLAS-like operations:
- axpys
- axpyt
- copyr
- copyt
- dotc
- dots
- dotcs
- dot2s
- dot2cs
- inv_scalc
- scalc
- scalr
- swapt
- gemvc
- gerc
- hemvc
- herc
- her2c
- trmvsx
- trsvsx
- trmmsx
- trsmsx
o Added front-end wrappers for all external wrappers to level-1 and
level-2 BLAS routines.
o Added complex datatype support to qrut and lqut operations.
o Added algorithm-by-blocks implementations for LAPACK-level operations:
- luincpiv: LU factorization with incremental pivoting.
- qrutinc: incremental QR factorization via the UT transform.
- apqutinc: Apply Q^T (or Q^H) obtained via qrutinc to a rhs matrix.
o Added helper routines to aid the user in creating the block Householder
transform matrix T for qrut, lqut, and qrutinc.
o Changed many function names and interfaces to be more clear.
o Added many new query/utility functions.
o Re-implemented many utility functions.
o Removed isolated instances where libflame relied on C99 behavior to allow
compatibility with C89 compilers.
o Many miscellaneous bug fixes.
o Removed reduction to upper Hessenberg form implementation (due to IP
concerns).
o Removed implementations for classic QR and LQ factorizations based on
LAPACK-style Householder transforms.
o Removed plapack2e code.
o Removed experimental (and non-functional) Cell support.
2.0 | r1754 | 20080401
----------------------
o Integrated FLASH and SuperMatrix with FLAME/C variants via control trees.
o Improved SuperMatrix abstractions and performance.
o Added POSIX threads support in SuperMatrix.
o New and expanded interfaces for FLASH along with improved, easier-to-read
implementations.
o Added new and/or extended support for the following operations:
- lu_nopiv: LU factorization without pivoting
- chol: Cholesky factorization
- ttmm: triangular transpose matrix multiply
- trinv: triangular matrix inversion
- spdinv: symmetric/Hermitian positive definite matrix inversion
- sylv: triangular Sylvester equation solver
o Implemented remaining unblocked FLAME/C variants for level-3 BLAS
operations.
o Added blocked in-place transpose implementation.
o New error- and parameter-checking implementations.
o Modified and improved various utility scripts.
o Fixed a prominent 32-bit integer bug, allowing the code to malloc()
regions of memory greater than 2GB, or, double-precision matrices larger
than 16384-by-16384.
o Many other bugfixes and cleanups.
o Added the option of disabling non-critical FLAME code.
o Added the option of compiling and including into libflame various netlib
implementations of the files that are need for external wrappers to
LAPACK-level operations. This is mostly useful because many FLAME
implementations invoke wrappers to unblocked codes for the smaller
subproblems. This applies for most LAPACK-level operations, such as
Cholesky, LU, LQ, QR, triangular inversion, and the triangular Sylvester
equation solver.
o Added the option of disabling control trees in level-3 BLAS front-end.
o Added the option of aligning memory to arbitrary base-2 boundaries via
posix_memalign().
o Added the option of aligning each column of a matrix to base-2 boundaries.
o Added the option of interfacing to CBLAS in the external wrappers.
o Removed lots of outdated and unused build system cruft.
o Renamed "parameter checking" option to "internal error checking".
o Updated configure to use gfortran over g77, if present.
o Combined libflame-base.a, libflame-blas.a, and libflame-lapack.a into a
single library archive, libflame.a. (liblapack2flame.a is still a separate
library due to the potential for linker symbol conflicts.)
o Produce build products in completely separate directories, allowing
builds for multiple architectures to be maintained simultaneously with the
same source tree.
o Updated config.guess and config.sub scripts (previously circa 2004).
1.0 | r1307 | 20070401
----------------------
o Additions, improvements, and cleanups to QR, LQ, QR_UT, LQ_UT, Triangular
Inversion, and Hessenberg Reduction implementations.
o More complex support throughout libflame, including within utility routines.
o Improved error checking for some operations.
o Improved leaf-level test drivers, providing easier datatype switching.
o New test drivers for front-end interfaces in src/blas/3 and src/lapack.
o Support for interfacing into internal libgoto blocksize symbols.
o Added non-double datatype lapack2flamec wrappers for Cholesky.
o Added non-double datatype lapack2flamec wrappers for LU with pivoting.
o Added single precision lapack2flamec wrappers for QR and LQ.
o Added full datatype support for ?trtri() lapack2flame wrappers.
o Added several netlib LAPACK files that depend on operations for which we have
lapack2flamec wrappers.
o Experimental SuperMatrix code is now included.
o Further PLAPACK2e code development.
o Improved misc. utility scripts, build tools, and configure script.
o Added a basic doxygen config file capable of mapping out the source tree and
providing output conveniently in HTML format.
o Many bugfixes and cleanups.
0.9 | r897 | 20060901
----------------------
o Initial release.