Team Fortress 2 Source Code as on 22/4/2020
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

755 lines
22 KiB

  1. /*
  2. File: vBLAS.h
  3. Contains: Header for the Basic Linear Algebra Subprograms, with Apple extensions.
  4. Version: QuickTime 7.3
  5. Copyright: (c) 2007 (c) 2000-2001 by Apple Computer, Inc., all rights reserved.
  6. Bugs?: For bug reports, consult the following page on
  7. the World Wide Web:
  8. http://developer.apple.com/bugreporter/
  9. */
  10. /* ==========================================================================================================================*/
  11. /*
  12. =================================================================================================
  13. Definitions of the Basic Linear Algebra Subprograms (BLAS) as provided by Apple Computer. At
  14. present this is a subset of the "legacy" FORTRAN and C interfaces. Only single precision forms
  15. are provided, and only the most useful routines. For example only the general matrix forms are
  16. provided, not the symmetric, Hermitian, or triangular forms. A few additional functions, unique
  17. to Mac OS, have also been provided. These are clearly documented as Apple extensions.
  18. Documentation on the BLAS standard, including reference implementations, can be found on the web
  19. starting from the BLAS FAQ page at these URLs (at least as of August 2000):
  20. http://www.netlib.org/blas/faq.html
  21. http://www.netlib.org/blas/blast-forum/blast-forum.html
  22. =================================================================================================
  23. */
  24. /*
  25. =================================================================================================
  26. Matrix shape and storage
  27. ========================
  28. Keeping the various matrix shape and storage parameters straight can be difficult. The BLAS
  29. documentation generally makes a distinction between the concpetual "matrix" and the physical
  30. "array". However there are a number of places where this becomes fuzzy because of the overall
  31. bias towards FORTRAN's column major storage. The confusion is made worse by style differences
  32. between the level 2 and level 3 functions. It is amplified further by the explicit choice of row
  33. or column major storage in the C interface.
  34. The storage order does not affect the actual computation that is performed. That is, it does not
  35. affect the results other than where they appear in memory. It does affect the values passed
  36. for so-called "leading dimension" parameters, such as lda in sgemv. These are always the major
  37. stride in storage, allowing operations on rectangular subsets of larger matrices. For row major
  38. storage this is the number of columns in the parent matrix, and for column major storage this is
  39. the number of rows in the parent matrix.
  40. For the level 2 functions, which deal with only a single matrix, the matrix shape parameters are
  41. always M and N. These are the logical shape of the matrix, M rows by N columns. The transpose
  42. parameter, such as transA in sgemv, defines whether the regular matrix or its transpose is used
  43. in the operation. This affects the implicit length of the input and output vectors. For example,
  44. if the regular matrix A is used in sgemv, the input vector X has length N, the number of columns
  45. of A, and the output vector Y has length M, the number of rows of A. The length of the input and
  46. output vectors is not affected by the storage order of the matrix.
  47. The level 3 functions deal with 2 input matrices and one output matrix, the matrix shape parameters
  48. are M, N, and K. The logical shape of the output matrix is always M by N, while K is the common
  49. dimension of the input matrices. Like level 2, the transpose parameters, such as transA and transB
  50. in sgemm, define whether the regular input or its transpose is used in the operation. However
  51. unlike level 2, in level 3 the transpose parameters affect the implicit shape of the input matrix.
  52. Consider sgemm, which computes "C = (alpha * A * B) + (beta * C)", where A and B might be regular
  53. or transposed. The logical shape of C is always M rows by N columns. The physical shape depends
  54. on the storage order parameter. Using column major storage the declaration of C (the array) in C
  55. (the language) would be something like "float C[N][M]". The logical shape of A without transposition
  56. is M by K, and B is K by N. The one storage order parameter affects all three matrices.
  57. For those readers still wondering about the style differences between level 2 and level 3, they
  58. involve whether the input or output shapes are explicit. For level 2, the input matrix shape is
  59. always M by N. The input and output vector lengths are implicit and vary according to the
  60. transpose parameter. For level 3, the output matrix shape is always M by N. The input matrix
  61. shapes are implicit and vary according to the transpose parameters.
  62. =================================================================================================
  63. */
  64. /* ==========================================================================================================================*/
  65. #ifndef __VBLAS__
  66. #define __VBLAS__
  67. #ifndef __CONDITIONALMACROS__
  68. #include <ConditionalMacros.h>
  69. #endif
  70. #if PRAGMA_ONCE
  71. #pragma once
  72. #endif
  73. #ifdef __cplusplus
  74. extern "C" {
  75. #endif
  76. #if PRAGMA_IMPORT
  77. #pragma import on
  78. #endif
  79. #if PRAGMA_STRUCT_ALIGN
  80. #pragma options align=power
  81. #elif PRAGMA_STRUCT_PACKPUSH
  82. #pragma pack(push, 2)
  83. #elif PRAGMA_STRUCT_PACK
  84. #pragma pack(2)
  85. #endif
  86. #if PRAGMA_ENUM_ALWAYSINT
  87. #if defined(__fourbyteints__) && !__fourbyteints__
  88. #define __VBLAS__RESTORE_TWOBYTEINTS
  89. #pragma fourbyteints on
  90. #endif
  91. #pragma enumsalwaysint on
  92. #elif PRAGMA_ENUM_OPTIONS
  93. #pragma option enum=int
  94. #elif PRAGMA_ENUM_PACK
  95. #if __option(pack_enums)
  96. #define __VBLAS__RESTORE_PACKED_ENUMS
  97. #pragma options(!pack_enums)
  98. #endif
  99. #endif
  100. /*
  101. ==========================================================================================================================
  102. Types and constants
  103. ===================
  104. */
  105. enum CBLAS_ORDER {
  106. CblasRowMajor = 101,
  107. CblasColMajor = 102
  108. };
  109. typedef enum CBLAS_ORDER CBLAS_ORDER;
  110. enum CBLAS_TRANSPOSE {
  111. CblasNoTrans = 111,
  112. CblasTrans = 112,
  113. CblasConjTrans = 113
  114. };
  115. typedef enum CBLAS_TRANSPOSE CBLAS_TRANSPOSE;
  116. enum CBLAS_UPLO {
  117. CblasUpper = 121,
  118. CblasLower = 122
  119. };
  120. typedef enum CBLAS_UPLO CBLAS_UPLO;
  121. enum CBLAS_DIAG {
  122. CblasNonUnit = 131,
  123. CblasUnit = 132
  124. };
  125. typedef enum CBLAS_DIAG CBLAS_DIAG;
  126. enum CBLAS_SIDE {
  127. CblasLeft = 141,
  128. CblasRight = 142
  129. };
  130. typedef enum CBLAS_SIDE CBLAS_SIDE;
  131. /*
  132. ------------------------------------------------------------------------------------------------------------------
  133. IsAlignedCount - True if an integer is positive and a multiple of 4. Negative strides are considered unaligned.
  134. IsAlignedAddr - True if an address is a multiple of 16.
  135. */
  136. #define IsAlignedCount(n) ( (n > 0) && ((n & 3L) == 0) )
  137. #define IsAlignedAddr(a) ( ((long)a & 15L) == 0 )
  138. /*
  139. ==========================================================================================================================
  140. ==========================================================================================================================
  141. Legacy BLAS Functions
  142. ==========================================================================================================================
  143. ==========================================================================================================================
  144. */
  145. /*
  146. ==========================================================================================================================
  147. Level 1 Single Precision Functions
  148. ==================================
  149. */
  150. /*
  151. * cblas_sdot()
  152. *
  153. * Availability:
  154. * Non-Carbon CFM: in vecLib 1.0.2 and later
  155. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  156. * Mac OS X: in version 10.0 and later
  157. */
  158. EXTERN_API_C( float )
  159. cblas_sdot(
  160. int N,
  161. const float * X,
  162. int incX,
  163. const float * Y,
  164. int incY);
  165. /*
  166. * cblas_snrm2()
  167. *
  168. * Availability:
  169. * Non-Carbon CFM: in vecLib 1.0.2 and later
  170. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  171. * Mac OS X: in version 10.0 and later
  172. */
  173. EXTERN_API_C( float )
  174. cblas_snrm2(
  175. int N,
  176. const float * X,
  177. int incX);
  178. /*
  179. * cblas_sasum()
  180. *
  181. * Availability:
  182. * Non-Carbon CFM: in vecLib 1.0.2 and later
  183. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  184. * Mac OS X: in version 10.0 and later
  185. */
  186. EXTERN_API_C( float )
  187. cblas_sasum(
  188. int N,
  189. const float * X,
  190. int incX);
  191. /*
  192. * cblas_isamax()
  193. *
  194. * Availability:
  195. * Non-Carbon CFM: in vecLib 1.0.2 and later
  196. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  197. * Mac OS X: in version 10.0 and later
  198. */
  199. EXTERN_API_C( int )
  200. cblas_isamax(
  201. int N,
  202. const float * X,
  203. int incX);
  204. /*
  205. * cblas_sswap()
  206. *
  207. * Availability:
  208. * Non-Carbon CFM: in vecLib 1.0.2 and later
  209. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  210. * Mac OS X: in version 10.0 and later
  211. */
  212. EXTERN_API_C( void )
  213. cblas_sswap(
  214. int N,
  215. float * X,
  216. int incX,
  217. float * Y,
  218. int incY);
  219. /*
  220. * cblas_scopy()
  221. *
  222. * Availability:
  223. * Non-Carbon CFM: in vecLib 1.0.2 and later
  224. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  225. * Mac OS X: in version 10.0 and later
  226. */
  227. EXTERN_API_C( void )
  228. cblas_scopy(
  229. int N,
  230. const float * X,
  231. int incX,
  232. float * Y,
  233. int incY);
  234. /*
  235. * cblas_saxpy()
  236. *
  237. * Availability:
  238. * Non-Carbon CFM: in vecLib 1.0.2 and later
  239. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  240. * Mac OS X: in version 10.0 and later
  241. */
  242. EXTERN_API_C( void )
  243. cblas_saxpy(
  244. int N,
  245. float alpha,
  246. const float * X,
  247. int incX,
  248. float * Y,
  249. int incY);
  250. /*
  251. * cblas_srot()
  252. *
  253. * Availability:
  254. * Non-Carbon CFM: in vecLib 1.0.2 and later
  255. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  256. * Mac OS X: in version 10.0 and later
  257. */
  258. EXTERN_API_C( void )
  259. cblas_srot(
  260. int N,
  261. float * X,
  262. int incX,
  263. float * Y,
  264. int incY,
  265. float c,
  266. float s);
  267. /*
  268. * cblas_sscal()
  269. *
  270. * Availability:
  271. * Non-Carbon CFM: in vecLib 1.0.2 and later
  272. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  273. * Mac OS X: in version 10.0 and later
  274. */
  275. EXTERN_API_C( void )
  276. cblas_sscal(
  277. int N,
  278. float alpha,
  279. float * X,
  280. int incX);
  281. /*
  282. ==========================================================================================================================
  283. Level 1 Double Precision Functions
  284. ==================================
  285. */
  286. /* *** TBD ****/
  287. /*
  288. ==========================================================================================================================
  289. Level 1 Complex Single Precision Functions
  290. ==========================================
  291. */
  292. /* *** TBD ****/
  293. /*
  294. ==========================================================================================================================
  295. Level 2 Single Precision Functions
  296. ==================================
  297. */
  298. /*
  299. * cblas_sgemv()
  300. *
  301. * Availability:
  302. * Non-Carbon CFM: in vecLib 1.0.2 and later
  303. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  304. * Mac OS X: in version 10.0 and later
  305. */
  306. EXTERN_API_C( void )
  307. cblas_sgemv(
  308. CBLAS_ORDER order,
  309. CBLAS_TRANSPOSE transA,
  310. int M,
  311. int N,
  312. float alpha,
  313. const float * A,
  314. int lda,
  315. const float * X,
  316. int incX,
  317. float beta,
  318. float * Y,
  319. int incY);
  320. /*
  321. ==========================================================================================================================
  322. Level 2 Double Precision Functions
  323. ==================================
  324. */
  325. /* *** TBD ****/
  326. /*
  327. ==========================================================================================================================
  328. Level 2 Complex Single Precision Functions
  329. ==========================================
  330. */
  331. /* *** TBD ****/
  332. /*
  333. ==========================================================================================================================
  334. Level 3 Single Precision Functions
  335. ==================================
  336. */
  337. /*
  338. * cblas_sgemm()
  339. *
  340. * Availability:
  341. * Non-Carbon CFM: in vecLib 1.0.2 and later
  342. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  343. * Mac OS X: in version 10.0 and later
  344. */
  345. EXTERN_API_C( void )
  346. cblas_sgemm(
  347. CBLAS_ORDER order,
  348. CBLAS_TRANSPOSE transA,
  349. CBLAS_TRANSPOSE transB,
  350. int M,
  351. int N,
  352. int K,
  353. float alpha,
  354. const float * A,
  355. int lda,
  356. const float * B,
  357. int ldb,
  358. float beta,
  359. float * C,
  360. int ldc);
  361. /*
  362. ==========================================================================================================================
  363. Level 3 Double Precision Functions
  364. ==================================
  365. */
  366. /* *** TBD ****/
  367. /*
  368. ==========================================================================================================================
  369. Level 3 Complex Single Precision Functions
  370. ==========================================
  371. */
  372. /* *** TBD ****/
  373. /*
  374. ==========================================================================================================================
  375. ==========================================================================================================================
  376. Latest Standard BLAS Functions
  377. ==========================================================================================================================
  378. ==========================================================================================================================
  379. */
  380. /* *** TBD ****/
  381. /*
  382. ==========================================================================================================================
  383. ==========================================================================================================================
  384. Additional Functions from Apple
  385. ==========================================================================================================================
  386. ==========================================================================================================================
  387. */
  388. /*
  389. -------------------------------------------------------------------------------------------------
  390. These routines provide optimized, AltiVec-only support for common small matrix multiplications.
  391. They do not check for the availability of AltiVec instructions or parameter errors. They just do
  392. the multiplication as fast as possible. Matrices are presumed to use row major storage. Because
  393. these are all square, column major matrices can be multiplied by simply reversing the parameters.
  394. */
  395. #ifdef __VEC__
  396. typedef vector float ConstVectorFloat;
  397. /*
  398. * vMultVecMat_4x4()
  399. *
  400. * Availability:
  401. * Non-Carbon CFM: in vecLib 1.0.2 and later
  402. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  403. * Mac OS X: in version 10.0 and later
  404. */
  405. EXTERN_API_C( void )
  406. vMultVecMat_4x4(
  407. ConstVectorFloat X[1],
  408. ConstVectorFloat A[4][1],
  409. vector float Y[1]);
  410. /*
  411. * vMultMatVec_4x4()
  412. *
  413. * Availability:
  414. * Non-Carbon CFM: in vecLib 1.0.2 and later
  415. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  416. * Mac OS X: in version 10.0 and later
  417. */
  418. EXTERN_API_C( void )
  419. vMultMatVec_4x4(
  420. ConstVectorFloat A[4][1],
  421. ConstVectorFloat X[1],
  422. vector float Y[1]);
  423. /*
  424. * vMultMatMat_4x4()
  425. *
  426. * Availability:
  427. * Non-Carbon CFM: in vecLib 1.0.2 and later
  428. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  429. * Mac OS X: in version 10.0 and later
  430. */
  431. EXTERN_API_C( void )
  432. vMultMatMat_4x4(
  433. ConstVectorFloat A[4][1],
  434. ConstVectorFloat B[4][1],
  435. vector float C[4][1]);
  436. /*
  437. * vMultVecMat_8x8()
  438. *
  439. * Availability:
  440. * Non-Carbon CFM: in vecLib 1.0.2 and later
  441. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  442. * Mac OS X: in version 10.0 and later
  443. */
  444. EXTERN_API_C( void )
  445. vMultVecMat_8x8(
  446. ConstVectorFloat X[2],
  447. ConstVectorFloat A[8][2],
  448. vector float Y[2]);
  449. /*
  450. * vMultMatVec_8x8()
  451. *
  452. * Availability:
  453. * Non-Carbon CFM: in vecLib 1.0.2 and later
  454. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  455. * Mac OS X: in version 10.0 and later
  456. */
  457. EXTERN_API_C( void )
  458. vMultMatVec_8x8(
  459. ConstVectorFloat A[8][2],
  460. ConstVectorFloat X[2],
  461. vector float Y[2]);
  462. /*
  463. * vMultMatMat_8x8()
  464. *
  465. * Availability:
  466. * Non-Carbon CFM: in vecLib 1.0.2 and later
  467. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  468. * Mac OS X: in version 10.0 and later
  469. */
  470. EXTERN_API_C( void )
  471. vMultMatMat_8x8(
  472. ConstVectorFloat A[8][2],
  473. ConstVectorFloat B[8][2],
  474. vector float C[8][2]);
  475. /*
  476. * vMultVecMat_16x16()
  477. *
  478. * Availability:
  479. * Non-Carbon CFM: in vecLib 1.0.2 and later
  480. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  481. * Mac OS X: in version 10.0 and later
  482. */
  483. EXTERN_API_C( void )
  484. vMultVecMat_16x16(
  485. ConstVectorFloat X[4],
  486. ConstVectorFloat A[16][4],
  487. vector float Y[4]);
  488. /*
  489. * vMultMatVec_16x16()
  490. *
  491. * Availability:
  492. * Non-Carbon CFM: in vecLib 1.0.2 and later
  493. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  494. * Mac OS X: in version 10.0 and later
  495. */
  496. EXTERN_API_C( void )
  497. vMultMatVec_16x16(
  498. ConstVectorFloat A[16][4],
  499. ConstVectorFloat X[4],
  500. vector float Y[4]);
  501. /*
  502. * vMultMatMat_16x16()
  503. *
  504. * Availability:
  505. * Non-Carbon CFM: in vecLib 1.0.2 and later
  506. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  507. * Mac OS X: in version 10.0 and later
  508. */
  509. EXTERN_API_C( void )
  510. vMultMatMat_16x16(
  511. ConstVectorFloat A[16][4],
  512. ConstVectorFloat B[16][4],
  513. vector float C[16][4]);
  514. /*
  515. * vMultVecMat_32x32()
  516. *
  517. * Availability:
  518. * Non-Carbon CFM: in vecLib 1.0.2 and later
  519. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  520. * Mac OS X: in version 10.0 and later
  521. */
  522. EXTERN_API_C( void )
  523. vMultVecMat_32x32(
  524. ConstVectorFloat X[8],
  525. ConstVectorFloat A[32][8],
  526. vector float Y[8]);
  527. /*
  528. * vMultMatVec_32x32()
  529. *
  530. * Availability:
  531. * Non-Carbon CFM: in vecLib 1.0.2 and later
  532. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  533. * Mac OS X: in version 10.0 and later
  534. */
  535. EXTERN_API_C( void )
  536. vMultMatVec_32x32(
  537. ConstVectorFloat A[32][8],
  538. ConstVectorFloat X[8],
  539. vector float Y[8]);
  540. /*
  541. * vMultMatMat_32x32()
  542. *
  543. * Availability:
  544. * Non-Carbon CFM: in vecLib 1.0.2 and later
  545. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  546. * Mac OS X: in version 10.0 and later
  547. */
  548. EXTERN_API_C( void )
  549. vMultMatMat_32x32(
  550. ConstVectorFloat A[32][8],
  551. ConstVectorFloat B[32][8],
  552. vector float C[32][8]);
  553. #endif /* defined(__VEC__) */
  554. /*
  555. ==========================================================================================================================
  556. Error handling
  557. ==============
  558. */
  559. /*
  560. -------------------------------------------------------------------------------------------------
  561. The BLAS standard requires that parameter errors be reported and cause the program to terminate.
  562. The default behavior for the Mac OS implementation of the BLAS is to print a message in English
  563. to stdout using printf and call exit with EXIT_FAILURE as the status. If this is adequate, then
  564. you need do nothing more or worry about error handling.
  565. The BLAS standard also mentions a function, cblas_xerbla, suggesting that a program provide its
  566. own implementation to override the default error handling. This will not work in the shared
  567. library environment of Mac OS 9. Instead the Mac OS implementation provides a means to install
  568. an error handler. There can only be one active error handler, installing a new one causes any
  569. previous handler to be forgotten. Passing a null function pointer installs the default handler.
  570. The default handler is automatically installed at startup and implements the default behavior
  571. defined above.
  572. An error handler may return, it need not abort the program. If the error handler returns, the
  573. BLAS routine also returns immediately without performing any processing. Level 1 functions that
  574. return a numeric value return zero if the error handler returns.
  575. */
  576. typedef CALLBACK_API_C( void , BLASParamErrorProc )(const char *funcName, const char *paramName, const int *paramPos, const int *paramValue);
  577. /*
  578. * SetBLASParamErrorProc()
  579. *
  580. * Availability:
  581. * Non-Carbon CFM: in vecLib 1.0.2 and later
  582. * CarbonLib: not in Carbon, but vecLib is compatible with CarbonLib
  583. * Mac OS X: in version 10.0 and later
  584. */
  585. EXTERN_API_C( void )
  586. SetBLASParamErrorProc(BLASParamErrorProc ErrorProc);
  587. /* ==========================================================================================================================*/
  588. #if PRAGMA_ENUM_ALWAYSINT
  589. #pragma enumsalwaysint reset
  590. #ifdef __VBLAS__RESTORE_TWOBYTEINTS
  591. #pragma fourbyteints off
  592. #endif
  593. #elif PRAGMA_ENUM_OPTIONS
  594. #pragma option enum=reset
  595. #elif defined(__VBLAS__RESTORE_PACKED_ENUMS)
  596. #pragma options(pack_enums)
  597. #endif
  598. #if PRAGMA_STRUCT_ALIGN
  599. #pragma options align=reset
  600. #elif PRAGMA_STRUCT_PACKPUSH
  601. #pragma pack(pop)
  602. #elif PRAGMA_STRUCT_PACK
  603. #pragma pack()
  604. #endif
  605. #ifdef PRAGMA_IMPORT_OFF
  606. #pragma import off
  607. #elif PRAGMA_IMPORT
  608. #pragma import reset
  609. #endif
  610. #ifdef __cplusplus
  611. }
  612. #endif
  613. #endif /* __VBLAS__ */