Leaked source code of windows server 2003
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

3967 lines
147 KiB

  1. ;/* *************************************************************************
  2. ;** INTEL Corporation Proprietary Information
  3. ;**
  4. ;** This listing is supplied under the terms of a license
  5. ;** agreement with INTEL Corporation and may not be copied
  6. ;** nor disclosed except in accordance with the terms of
  7. ;** that agreement.
  8. ;**
  9. ;** Copyright (c) 1995 Intel Corporation.
  10. ;** All Rights Reserved.
  11. ;**
  12. ;** *************************************************************************
  13. ;*/
  14. ;////////////////////////////////////////////////////////////////////////////
  15. ;//
  16. ;// $Header: R:\h26x\h26x\src\enc\ex5me.asv 1.17 24 Sep 1996 11:27:00 BNICKERS $
  17. ;//
  18. ;// $Log: R:\h26x\h26x\src\enc\ex5me.asv $
  19. ;//
  20. ;// Rev 1.17 24 Sep 1996 11:27:00 BNICKERS
  21. ;//
  22. ;// Fix register colision.
  23. ;//
  24. ;// Rev 1.16 24 Sep 1996 10:40:32 BNICKERS
  25. ;// For H261, zero out motion vectors when classifying MB as Intra.
  26. ;//
  27. ;// Rev 1.13 19 Aug 1996 13:48:26 BNICKERS
  28. ;// Provide threshold and differential variables for spatial filtering.
  29. ;//
  30. ;// Rev 1.12 17 Jun 1996 15:19:34 BNICKERS
  31. ;// Fix recording of block and MB SWDs for Spatial Loop Filtering case in H261.
  32. ;//
  33. ;// Rev 1.11 30 May 1996 16:40:14 BNICKERS
  34. ;// Fix order of arguments.
  35. ;//
  36. ;// Rev 1.10 30 May 1996 15:08:36 BNICKERS
  37. ;// Fixed minor error in recent IA ME speed improvements.
  38. ;//
  39. ;// Rev 1.9 29 May 1996 15:37:58 BNICKERS
  40. ;// Acceleration of IA version of ME.
  41. ;//
  42. ;// Rev 1.8 15 Apr 1996 10:48:48 AKASAI
  43. ;// Fixed bug in Spatial loop filter code. Code had been unrolled and
  44. ;// the second case had not been updated in the fix put in place of
  45. ;// (for) the first case. Basically an ebx instead of bl that cased
  46. ;// and overflow from 7F to 3F.
  47. ;//
  48. ;// Rev 1.7 15 Feb 1996 15:39:26 BNICKERS
  49. ;// No change.
  50. ;//
  51. ;// Rev 1.6 15 Feb 1996 14:39:00 BNICKERS
  52. ;// Fix bug wherein access to area outside stack frame was occurring.
  53. ;//
  54. ;// Rev 1.5 15 Jan 1996 14:31:40 BNICKERS
  55. ;// Fix decrement of ref area addr when half pel upward is best in block ME.
  56. ;// Broadcast macroblock level MV when block gets classified as Intra.
  57. ;//
  58. ;// Rev 1.4 12 Jan 1996 13:16:08 BNICKERS
  59. ;// Fix SLF so that 3 7F pels doesn't overflow, and result in 3F instead of 7F.
  60. ;//
  61. ;// Rev 1.3 27 Dec 1995 15:32:46 RMCKENZX
  62. ;// Added copyright notice
  63. ;//
  64. ;// Rev 1.2 19 Dec 1995 17:11:16 RMCKENZX
  65. ;// fixed 2 bugs:
  66. ;// 1. do +-15 pel search if central and NOT 4 mv / macroblock
  67. ;// (was doing when central AND 4 mv / macroblock)
  68. ;// 2. correctly compute motion vectors when doing 4 motion
  69. ;// vectors per block.
  70. ;//
  71. ;// Rev 1.1 28 Nov 1995 15:25:48 AKASAI
  72. ;// Added white space so that will complie with the long lines.
  73. ;//
  74. ;// Rev 1.0 28 Nov 1995 14:37:00 BECHOLS
  75. ;// Initial revision.
  76. ;//
  77. ;//
  78. ;// Rev 1.13 22 Nov 1995 15:32:42 DBRUCKS
  79. ;// Brian made this change on my system.
  80. ;// Increased a value to simplify debugging
  81. ;//
  82. ;//
  83. ;//
  84. ;// Rev 1.12 17 Nov 1995 10:43:58 BNICKERS
  85. ;// Fix problems with B-Frame ME.
  86. ;//
  87. ;//
  88. ;//
  89. ;// Rev 1.11 31 Oct 1995 11:44:26 BNICKERS
  90. ;// Save/restore ebx.
  91. ;//
  92. ;////////////////////////////////////////////////////////////////////////////
  93. ;
  94. ; MotionEstimation -- This function performs motion estimation for the macroblocks identified
  95. ; in the input list.
  96. ; Conditional assembly selects either the H263 or H261 version.
  97. ;
  98. ; Input Arguments:
  99. ;
  100. ; MBlockActionStream
  101. ;
  102. ; The list of macroblocks for which we need to perform motion estimation.
  103. ;
  104. ; Upon input, the following fields must be defined:
  105. ;
  106. ; CodedBlocks -- Bit 6 must be set for the last macroblock to be processed.
  107. ;
  108. ; FirstMEState -- must be 0 for macroblocks that are forced to be Intracoded. An
  109. ; IntraSWD will be calculated.
  110. ; Other macroblocks must have the following values:
  111. ; 1: upper left, without advanced prediction. (Advanced prediction
  112. ; only applies to H263.)
  113. ; 2: upper edge, without advanced prediction.
  114. ; 3: upper right, without advanced prediction.
  115. ; 4: left edge, without advanced prediction.
  116. ; 5: central block, or any block if advanced prediction is being done.
  117. ; 6: right edge, without advanced prediction.
  118. ; 7: lower left, without advanced prediction.
  119. ; 8: lower edge, without advanced prediction.
  120. ; 9: lower right, without advanced prediction.
  121. ; If vertical motion is NOT allowed:
  122. ; 10: left edge, without advanced prediction.
  123. ; 11: central block, or any block if advanced prediction is being done.
  124. ; 12: right edge, without advanced prediction.
  125. ; *** Note that with advanced prediction, only initial states 0, 4, or
  126. ; 11 can be specified. Doing block level motion vectors mandates
  127. ; advanced prediction, but in that case, only initial
  128. ; states 0 and 4 are allowed.
  129. ;
  130. ; BlkOffset -- must be defined for each of the blocks in the macroblocks.
  131. ;
  132. ; TargetFrameBaseAddress -- Address of upper left viewable pel in the target Y plane.
  133. ;
  134. ; PreviousFrameBaseAddress -- Address of upper left viewable pel in the previous Y plane. Whether this is the
  135. ; reconstructed previous frame, or the original, is up to the caller to decide.
  136. ;
  137. ; FilteredFrameBaseAddress -- Address of upper left viewable pel in the scratch area that this function can record
  138. ; the spatially filtered prediction for each block, so that frame differencing can
  139. ; utilize it rather than have to recompute it. (H261 only)
  140. ;
  141. ; DoRadius15Search -- TRUE if central macroblocks should search a distance of 15 from center. Else searches 7 out.
  142. ;
  143. ; DoHalfPelEstimation -- TRUE if we should do ME to half pel resolution. This is only applicable for H263 and must
  144. ; be FALSE for H261. (Note: TRUE must be 1; FALSE must be 0).
  145. ;
  146. ; DoBlockLevelVectors -- TRUE if we should do ME at block level. This is only applicable for H263 and must be FALSE
  147. ; for H261. (Note: TRUE must be 1; FALSE must be 0).
  148. ; DoSpatialFiltering -- TRUE if we should determine if spatially filtering the prediction reduces the SWD. Only
  149. ; applicable for H261 and must be FALSE for H263. (Note: TRUE must be 1; FALSE must be 0).
  150. ;
  151. ; ZeroVectorThreshold -- If the SWD for a macroblock is less than this threshold, we do not bother searching for a
  152. ; better motion vector. Compute as follows, where D is the average tolerable pel difference
  153. ; to satisfy this threshold. (Initial recommendation: D=2 ==> ZVT=384)
  154. ; ZVT = (128 * ((int)((D**1.6)+.5)))
  155. ;
  156. ; NonZeroDifferential -- After searching for the best motion vector (or individual block motion vectors, if enabled),
  157. ; if the macroblock's SWD is not better than it was for the zero vector -- not better by at
  158. ; least this amount -- then we revert to the zero vector. We are comparing two macroblock
  159. ; SWDs, both calculated as follows: (Initial recommendation: NZD=128)
  160. ; For each of 128 match points, where D is its Abs Diff, accumulate ((int)(M**1.6)+.5)))
  161. ;
  162. ; BlockMVDifferential -- The amount by which the sum of four block level SWDs must be better than a single macroblock
  163. ; level SWD to cause us to choose block level motion vectors. See NonZeroDifferential for
  164. ; how the SWDs are calculated. Only applicable for H261. (Initial recommendation: BMVD=128)
  165. ;
  166. ; EmptyThreshold -- If the SWD for a block is less than this, the block is forced empty. Compute as follows, where D
  167. ; is the average tolerable pel diff to satisfy threshold. (Initial recommendation: D=3 ==> ET=96)
  168. ; ET = (32 * ((int)((D**1.6)+.5)))
  169. ;
  170. ; InterCodingThreshold -- If any of the blocks are forced empty, we can simply skip calculating the INTRASWD for the
  171. ; macroblock. If none of the blocks are forced empty, we will compare the macroblock's SWD
  172. ; against this threshold. If below the threshold, we will likewise skip calculating the
  173. ; INTRASWD. Otherwise, we will calculate the INTRASWD, and if it is less than the [Inter]SWD,
  174. ; we will classify the block as INTRA-coded. Compute as follows, where D is the average
  175. ; tolerable pel difference to satisfy threshold. (Initial recommendation: D=4 ==> ICT=1152)
  176. ; ICT = (128 * ((int)((D**1.6)+.5)))
  177. ;
  178. ; IntraCodingDifferential -- For INTRA coding to occur, the INTRASWD must be better than the INTERSWD by at least
  179. ; this amount.
  180. ;
  181. ; Output Arguments
  182. ;
  183. ; MBlockActionStream
  184. ;
  185. ; These fields are defined as follows upon return:
  186. ;
  187. ; BlockType -- Set to INTRA, INTER1MV, or (H263 only) INTER4MV.
  188. ;
  189. ; PHMV and PVMV -- The horizontal and vertical motion vectors, in units of a half pel.
  190. ;
  191. ; BHMV and BVMV -- These fields get clobbered.
  192. ;
  193. ; PastRef -- If BlockType != INTRA, set to the address of the reference block.
  194. ;
  195. ; If Horizontal MV indicates a half pel position, the prediction for the upper left pel of the block
  196. ; is the average of the pel at PastRef and the one at PastRef+1.
  197. ;
  198. ; If Vertical MV indicates a half pel position, the prediction for the upper left pel of the block
  199. ; is the average of the pel at PastRef and the one at PastRef+PITCH.
  200. ;
  201. ; If both MVs indicate half pel positions, the prediction for the upper left pel of the block is the
  202. ; average of the pels at PastRef, PastRef+1, PastRef+PITCH, and PastRef+PITCH+1.
  203. ;
  204. ; Indications of a half pel position can only happen for H263.
  205. ;
  206. ; In H261, when spatial filtering is done, the address will be in the SpatiallyFilteredFrame, where
  207. ; this function stashes the spatially filtered prediction for subsequent reuse by frame differencing.
  208. ;
  209. ; CodedBlocks -- Bits 4 and 5 are turned on, indicating that the U and V blocks should be processed. (If the
  210. ; FDCT function finds them to quantize to empty, it will mark them as empty.)
  211. ;
  212. ; Bits 0 thru 3 are cleared for each of blocks 1 thru 4 that MotionEstimation forces empty;
  213. ; they are set otherwise.
  214. ;
  215. ; Bits 6 and 7 are left unchanged.
  216. ;
  217. ; SWD -- Set to the sum of the SWDs for the four luma blocks in the macroblock. The SWD for any block that is
  218. ; forced empty, is NOT included in the sum.
  219. ;
  220. ;
  221. ;
  222. ; IntraSWDTotal -- The sum of the block SWDs for all Intracoded macroblocks.
  223. ;
  224. ; IntraSWDBlocks -- The number of blocks that make up the IntraSWDTotal.
  225. ;
  226. ; InterSWDTotal -- The sum of the block SWDs for all Intercoded macroblocks.
  227. ; None of the blocks forced empty are included in this.
  228. ;
  229. ; InterSWDBlocks -- The number of blocks that make up the InterSWDTotal.
  230. ;
  231. ;
  232. ; Other assumptions:
  233. ;
  234. ; For performance reasons, it is assumed that the layout of current and previous frames (and spatially filtered
  235. ; frame for H261) rigourously conforms to the following guide.
  236. ;
  237. ; The spatially filtered frame (only present and applicable for H261) is an output frame into which MotionEstimation
  238. ; places spatially filtered macroblocks as it determines if filtering is good for a macroblock. If it determines
  239. ; such, frame differencing will be able to re-use the spatially filtered macroblock, rather than recomputing it.
  240. ;
  241. ; Cache
  242. ; Alignment
  243. ; Points: v v v v v v v v v v v v v
  244. ; 16 | 352 (narrower pictures are left justified) | 16
  245. ; +---+---------------------------------------------------------------------------------------+---+
  246. ; | D | Current Frame Y Plane | D |
  247. ; | u | | u |
  248. ; Frame | m | | m |
  249. ; Height | m | | m |
  250. ; Lines | y | | y |
  251. ; | | | |
  252. ; +---+---------------------------------------------------------------------------------------+---+
  253. ; | |
  254. ; | |
  255. ; | |
  256. ; 24 lines | Dummy Space (24 lines plus 8 bytes. Can be reduced to 8 bytes if unrestricted motion |
  257. ; | vectors is NOT selected.) |
  258. ; | |
  259. ; | 8 176 16 176 |8
  260. ; | +-+-------------------------------------------------------------------------------------------+-+
  261. ; +-+D| Current Frame U Plane | D | Current Frame V Plane |D|
  262. ; Frame |u| | u | |u|
  263. ; Height |m| | m | |m|
  264. ; Div By 2 |m| | m | |m|
  265. ; Lines |y| | y | |y|
  266. ; +-+-------------------------------------------+---+-------------------------------------------+-+
  267. ; 72 dummy bytes. I.e. enough dummy space to assure that MOD ((Previous_Frame - Current_Frame), 128) == 80
  268. ; +-----------------------------------------------------------------------------------------------+
  269. ; | |
  270. ; 16 lines | If Unrestricted Motion Vectors selected, 16 lines must appear above and below previous frame, |
  271. ; | and these lines plus the 16 columns to the left and 16 columns to the right of the previous |
  272. ; | frame must be initialized to the values at the edges and corners, propagated outward. If |
  273. ; | Unrestricted Motion Vectors is off, these lines don't have to be allocated. |
  274. ; | |
  275. ; | +---------------------------------------------------------------------------------------+ +
  276. ; Frame | | Previous Frame Y Plane | |
  277. ; Height | | | |
  278. ; Lines | | | |
  279. ; | | | |
  280. ; | | | |
  281. ; | +---------------------------------------------------------------------------------------+ +
  282. ; | |
  283. ; 16 lines | See comment above Previous Y Plane |
  284. ; | |
  285. ; |+--- 8 bytes of dummy space. Must be there, whether unrestricted MV or not. |
  286. ; || |
  287. ; |v+-----------------------------------------------+---------------------------------------------+-+
  288. ; +-+ | |
  289. ; | See comment above Previous Y Plane. | See comment above Previous Y Plane. |
  290. ; 8 lines | Same idea here, but 8 lines are needed above | Same idea here, but 8 lines are needed |
  291. ; | and below U plane, and 8 columns on each side.| and below V plane, and 8 columns on each side.|
  292. ; | | |
  293. ; |8 176 8|8 176 8|
  294. ; | +-------------------------------------------+ | +-------------------------------------------+ |
  295. ; | | Previous Frame U Plane | | | Previous Frame V Plane | |
  296. ; Frame | | | | | | |
  297. ; Height | | | | | | |
  298. ; Div By 2 | | | | | | |
  299. ; Lines | | | | | | |
  300. ; | +-------------------------------------------+ | +-------------------------------------------+ |
  301. ; | | |
  302. ; 8 lines | See comment above Previous U Plane | See comment above Previous V Plane |
  303. ; | | |
  304. ; | | |
  305. ; | | |
  306. ; +-----------------------------------------------+---------------------------------------------+-+
  307. ; Enough dummy space to assure that MOD ((Spatial_Frame - Previous_Frame), 4096) == 2032
  308. ; +---+---------------------------------------------------------------------------------------+---+
  309. ; | D | Spatially Filtered Y Plane (present only for H261) | D |
  310. ; | u | | u |
  311. ; Frame | m | | m |
  312. ; Height | m | | m |
  313. ; Lines | y | | y |
  314. ; | | | |
  315. ; +---+---------------------------------------------------------------------------------------+---+
  316. ; | |
  317. ; | |
  318. ; | |
  319. ; 24 lines | Dummy Space (24 lines plus 8 bytes. Can be reduced to 8 bytes if unrestricted motion |
  320. ; | vectors is NOT selected, which is certainly the case for H261.) |
  321. ; | |
  322. ; | 8 176 16 176 |8
  323. ; | +-+-------------------------------------------------------------------------------------------+-+
  324. ; +-+D| Spatially Filtered U plane (H261 only) | D | Spatially Filtered V plane (H261 only) |D|
  325. ; Frame |u| | u | |u|
  326. ; Height |m| | m | |m|
  327. ; Div By 2 |m| | m | |m|
  328. ; Lines |y| | y | |y|
  329. ; +-+-------------------------------------------+---+-------------------------------------------+-+
  330. ;
  331. ; Cache layout of the target block and the full range for the reference area (as restricted to +/- 7 in vertical,
  332. ; and +/- 7 (expandable to +/- 15) in horizontal, is as shown here. Each box represents a cache line (32 bytes),
  333. ; increasing incrementally from left to right, and then to the next row (like reading a book). The 128 boxes taken
  334. ; as a whole represent 4Kbytes. The boxes are populated as follows:
  335. ;
  336. ; R -- Data from the reference area. Each box contains 23 of the pels belonging to a line of the reference area.
  337. ; The remaining 7 pels of the line is either in the box to the left (for reference areas used to provide
  338. ; predictions for target macroblocks that begin at an address 0-mod-32), or to the right (for target MBs that
  339. ; begin at an address 16-mod-32). There are 30 R's corresponding to the 30-line limit on the vertical distance
  340. ; we might search.
  341. ;
  342. ; T -- Data from the target macroblock. Each box contains a full line (16 pels) for each of two adjacent
  343. ; macroblocks. There are 16 T's corresponding to the 16 lines of the macroblocks.
  344. ;
  345. ; S -- Space for the spatially filtered macroblock (H261 only).
  346. ;
  347. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  348. ; | T | | R | | T | | R | | S | | R | |
  349. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  350. ; | T | | R | | T | | R | | S | | R | |
  351. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  352. ; | T | | R | | T | | R | | S | | R | |
  353. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  354. ; | T | | R | | T | | R | | S | | R | |
  355. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  356. ; | T | | R | | T | | R | | S | | R | |
  357. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  358. ; | T | | R | | S | | R | | S | | R | |
  359. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  360. ; | T | | R | | S | | R | | S | | R | |
  361. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  362. ; | T | | R | | S | | R | | S | | R | |
  363. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  364. ; | T | | R | | S | | R | | S | | R | |
  365. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  366. ; | T | | R | | S | | R | | S | | R | |
  367. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  368. ; | T | | R | | S | | R | |
  369. ; +---+---+---+---+---+---+---+---+
  370. ;
  371. ; Thus, in a logical sense, the above data fits into one of the 4K data cache pages, leaving the other for all other
  372. ; data. Care has been taken to assure that the tables and the stack space needed by this function fit nicely into
  373. ; the other data cache page. Only the MBlockActionStream remains to conflict with the above data structures. That
  374. ; is both unavoidable, and of minimal consequence.
  375. ; An algorithm has been selected that calculates fewer SWDs (Sum of Weighted Differences) than the typical log search.
  376. ; In the typical log search, a three level search is done, in which the SWDs are compared for the center point and a
  377. ; point at each 45 degrees, initially 4 pels away, then 2, then 1. This requires a total of 25 SWDs for each
  378. ; macroblock (except those near edges or corners).
  379. ;
  380. ; In this algorithm, six levels are performed, with each odd level being a horizontal search, and each even level being
  381. ; a vertical search. Each search compares the SWD for the center point with that of a point in each direction on the
  382. ; applicable axis. This requires 13 SWDs, and a lot simpler control structure. Here is an example picture of a
  383. ; search, in which "0" represents the initial center point (the 0,0 motion vector), "A", and "a" represent the first
  384. ; search points, etc. In this example, the "winner" of each level of the search proceeds as follows: a, B, C, C, E, F,
  385. ; arriving at a motion vector of -1 horizontal, 5 vertical.
  386. ;
  387. ; ...............
  388. ; ...............
  389. ; ...............
  390. ; ...b...........
  391. ; ...............
  392. ; ...............
  393. ; ...............
  394. ; ...a...0...A...
  395. ; ...............
  396. ; .....d.........
  397. ; ......f........
  398. ; .c.BeCE........
  399. ; ......F........
  400. ; .....D.........
  401. ; ...............
  402. ;
  403. ;
  404. ; A word about data cache performance. Conceptually, the tables and local variables used by this function are placed
  405. ; in memory such that they will fit in one 4K page of the on-chip data cache. For the Pentium (tm) microprocessor,
  406. ; this leaves the other 4K page for other purposes. The other data structures consist of:
  407. ;
  408. ; The current frame, from which we need to access the lines of the 16*16 macroblock. Since cache lines are 32 bytes
  409. ; wide, the cache fill operations that fetch one target macroblock will serve to fetch the macroblock to the right,
  410. ; so an average of 8 cache lines are fetched for each macroblock.
  411. ;
  412. ; The previous frame, from which we need to access a reference area of 30*30 pels. For each macroblock for which we
  413. ; need to search for a motion vector, we will typically need to access no more than about 25 of these, but in general
  414. ; these lines span the 30 lines of the search area. Since cache lines are 32 bytes wide, the cache fill operations
  415. ; that fetch reference data for one macroblock, will tend to fetch data that is useful as reference data for the
  416. ; macroblock to the right, so an average of about 15 (rounded up to be safe) cache lines are fetched for each
  417. ; macroblock.
  418. ;
  419. ; The MBlockActionStream, which controls the searching (since we don't need to motion estimate blocks that are
  420. ; legislated to be intra) will disrupt cache behaviour of the other data structures, but not to a significant degree.
  421. ;
  422. ; By setting the pitch to a constant of 384, and by allocating the frames as described above, the one available 4K page
  423. ; of data cache will be able to contain the 30 lines of the reference area, the 16 lines of the target area, and the
  424. ; 16 lines of the spatially filtered area (H261 only) without any collisions.
  425. ;
  426. ;
  427. ; Here is a flowchart of the major sections of this function:
  428. ;
  429. ; +-- Execute once for Y part of each macroblock that is NOT Intra By Decree --+
  430. ; | |
  431. ; | +---------------------------------------------------------------+ |
  432. ; | | 1) Compute average value for target match points. | |
  433. ; | | 2) Prepare match points in target MB for easier matching. | |
  434. ; | | 3) Compute the SWD for (0,0) motion vector. | |
  435. ; | +---------------------------------------------------------------+ |
  436. ; | | |
  437. ; | v |
  438. ; | /---------------------------------\ Yes |
  439. ; | < 4) Is 0-motion SWD good enough? >-------------------------+ |
  440. ; | \---------------------------------/ | |
  441. ; | | | |
  442. ; | |No | |
  443. ; | v | |
  444. ; | +--- 5) While state engine has more motion vectors to check ---+ | |
  445. ; | | | | |
  446. ; | | | | |
  447. ; | | +---------------------------------------------------+ | | |
  448. ; | | | 5) Compute SWDs for 2 ref MBs and pick best of 3. |----->| | |
  449. ; | | +---------------------------------------------------+ | | |
  450. ; | | | | |
  451. ; | +--------------------------------------------------------------+ | |
  452. ; | | | |
  453. ; | v | |
  454. ; | /-----------------------------------------\ | |
  455. ; | < 6) Is best motion vector the 0-vector? > | |
  456. ; | \-----------------------------------------/ | |
  457. ; | | | | |
  458. ; | |No |Yes | |
  459. ; | v v | |
  460. ; | +-----------------+ +-------------------------------------------+ | |
  461. ; | | Mark all blocks | | 6) Identify as empty block any in which: |<-+ |
  462. ; | +--| non-empty. | | --> 0-motion SWD < EmptyThresh, and | |
  463. ; | | +-----------------+ +-------------------------------------------+ |
  464. ; | | | |
  465. ; | | v |
  466. ; | | /--------------------------------\ Yes +--------------------------+ |
  467. ; | | < 6) Are all blocks marked empty? >--->| 6) Classify FORCEDEMPTY |-->|
  468. ; | | \--------------------------------/ +--------------------------+ |
  469. ; | | | |
  470. ; | | |No |
  471. ; | | v |
  472. ; | | /--------------------------------------------\ |
  473. ; | | < 7) Are any non-phantom blocks marked empty? > |
  474. ; | | \--------------------------------------------/ |
  475. ; | | | | |
  476. ; | | |No |Yes |
  477. ; | v v v |
  478. ; | +---------------------+ +--------------------------------+ |
  479. ; | | 8) Compute IntraSWD | | Set IntraSWD artificially high | |
  480. ; | +---------------------+ +--------------------------------+ |
  481. ; | | | |
  482. ; | v v |
  483. ; | +-------------------------------+ |
  484. ; | | 10) Classify block as one of: | |
  485. ; | | INTRA |--------------------------------->|
  486. ; | | INTER | |
  487. ; | +-------------------------------+ |
  488. ; | |
  489. ; +----------------------------------------------------------------------------+
  490. ;
  491. ;
  492. OPTION PROLOGUE:None
  493. OPTION EPILOGUE:ReturnAndRelieveEpilogueMacro
  494. OPTION M510
  495. include e3inst.inc
  496. include e3mbad.inc
  497. .xlist
  498. include memmodel.inc
  499. .list
  500. .DATA
  501. ; Storage for tables and temps used by Motion Estimation function. Fit into
  502. ; 4Kbytes contiguous memory so that it uses one cache page, leaving other
  503. ; for reference area of previous frame and target macroblock of current frame.
  504. PickPoint DB 0,4,?,4,0,?,2,2 ; Map CF accum to new central pt selector.
  505. PickPoint_BLS DB 6,4,?,4,6,?,2,2 ; Same, for when doing block level search.
  506. OffsetToRef LABEL DWORD ; Linearized adjustments to affect horz/vert motion.
  507. DD ? ; This index used when zero-valued motion vector is good enough.
  508. DD 0 ; Best fit of 3 SWDs is previous center.
  509. DD 1 ; Best fit of 3 SWDs is the ref block 1 pel to the right.
  510. DD -1 ; Best fit of 3 SWDs is the ref block 1 pel to the left.
  511. DD 1*PITCH ; Best fit of 3 SWDs is the ref block 1 pel above.
  512. DD -1*PITCH ; Best fit of 3 SWDs is the ref block 1 pel below.
  513. DD 2 ; Best fit of 3 SWDs is the ref block 2 pels to the right.
  514. DD -2 ; Best fit of 3 SWDs is the ref block 2 pels to the left.
  515. DD 2*PITCH ; Best fit of 3 SWDs is the ref block 2 pel above.
  516. DD -2*PITCH ; Best fit of 3 SWDs is the ref block 2 pel below.
  517. DD 4 ; Best fit of 3 SWDs is the ref block 4 pels to the right.
  518. DD -4 ; Best fit of 3 SWDs is the ref block 4 pels to the left.
  519. DD 4*PITCH ; Best fit of 3 SWDs is the ref block 4 pel above.
  520. DD -4*PITCH ; Best fit of 3 SWDs is the ref block 4 pel below.
  521. DD 7 ; Best fit of 3 SWDs is the ref block 7 pels to the right.
  522. DD -7 ; Best fit of 3 SWDs is the ref block 7 pels to the left.
  523. DD 7*PITCH ; Best fit of 3 SWDs is the ref block 7 pel above.
  524. DD -7*PITCH ; Best fit of 3 SWDs is the ref block 7 pel below.
  525. M0 = 4 ; Define symbolic indices into OffsetToRef lookup table.
  526. MHP1 = 8
  527. MHN1 = 12
  528. MVP1 = 16
  529. MVN1 = 20
  530. MHP2 = 24
  531. MHN2 = 28
  532. MVP2 = 32
  533. MVN2 = 36
  534. MHP4 = 40
  535. MHN4 = 44
  536. MVP4 = 48
  537. MVN4 = 52
  538. MHP7 = 56
  539. MHN7 = 60
  540. MVP7 = 64
  541. MVN7 = 68
  542. ; Map linearized motion vector to vertical part.
  543. ; (Mask bottom byte of linearized MV to zero, then use result
  544. ; as index into this array to get vertical MV.)
  545. IF PITCH-384
  546. *** error: The magic of this table assumes a pitch of 384.
  547. ENDIF
  548. DB -32, -32
  549. DB -30
  550. DB -28, -28
  551. DB -26
  552. DB -24, -24
  553. DB -22
  554. DB -20, -20
  555. DB -18
  556. DB -16, -16
  557. DB -14
  558. DB -12, -12
  559. DB -10
  560. DB -8, -8
  561. DB -6
  562. DB -4, -4
  563. DB -2
  564. DB 0
  565. UnlinearizedVertMV DB 0
  566. DB 2
  567. DB 4, 4
  568. DB 6
  569. DB 8, 8
  570. DB 10
  571. DB 12, 12
  572. DB 14
  573. DB 16, 16
  574. DB 18
  575. DB 20, 20
  576. DB 22
  577. DB 24, 24
  578. DB 26
  579. DB 28, 28
  580. DB 30
  581. ; Map initial states to initializers for half pel search. Where search would
  582. ; illegally take us off edge of picture, set initializer artificially high.
  583. InitHalfPelSearchHorz LABEL DWORD
  584. DD 040000000H, 000000000H, 000004000H
  585. DD 040000000H, 000000000H, 000004000H
  586. DD 040000000H, 000000000H, 000004000H
  587. DD 040000000H, 000000000H, 000004000H
  588. InitHalfPelSearchVert LABEL DWORD
  589. DD 040000000H, 040000000H, 040000000H
  590. DD 000000000H, 000000000H, 000000000H
  591. DD 000004000H, 000004000H, 000004000H
  592. DD 040004000H, 040004000H, 040004000H
  593. SWDState LABEL BYTE ; Rules that govern state engine of motion estimator.
  594. DB 8 DUP (?) ; 0: not used.
  595. ; 1: Upper Left Corner. Explore 4 right and 4 down.
  596. DB 21, M0 ; (0,0)
  597. DB 22, MHP4 ; (0,4)
  598. DB 23, MVP4, ?, ? ; (4,0)
  599. ; 2: Upper Edge. Explore 4 left and 4 right.
  600. DB 22, M0 ; (0, 0)
  601. DB 22, MHN4 ; (0,-4)
  602. DB 22, MHP4, ?, ? ; (0, 4)
  603. ; 3: Upper Right Corner. Explore 4 right and 4 down.
  604. DB 31, M0 ; (0, 0)
  605. DB 22, MHN4 ; (0,-4)
  606. DB 32, MVP4, ?, ? ; (4, 0)
  607. ; 4: Left Edge. Explore 4 up and 4 down.
  608. DB 23, M0 ; ( 0,0)
  609. DB 23, MVN4 ; (-4,0)
  610. DB 23, MVP4, ?, ? ; ( 4,0)
  611. ; 5: Interior Macroblock. Explore 4 up and 4 down.
  612. DB 37, M0 ; ( 0,0)
  613. DB 37, MVN4 ; (-4,0)
  614. DB 37, MVP4, ?, ? ; ( 4,0)
  615. ; 6: Right Edge. Explore 4 up and 4 down.
  616. DB 32, M0 ; ( 0,0)
  617. DB 32, MVN4 ; (-4,0)
  618. DB 32, MVP4, ?, ? ; ( 4,0)
  619. ; 7: Lower Left Corner. Explore 4 up and 4 right.
  620. DB 38, M0 ; ( 0,0)
  621. DB 39, MHP4 ; ( 0,4)
  622. DB 23, MVN4, ?, ? ; (-4,0)
  623. ; 8: Lower Edge. Explore 4 left and 4 right.
  624. DB 39, M0 ; (0, 0)
  625. DB 39, MHN4 ; (0,-4)
  626. DB 39, MHP4, ?, ? ; (0, 4)
  627. ; 9: Lower Right Corner. Explore 4 up and 4 left.
  628. DB 44, M0 ; ( 0, 0)
  629. DB 39, MHN4 ; ( 0,-4)
  630. DB 32, MVN4, ?, ? ; (-4, 0)
  631. ; 10: Left Edge, No Vertical Motion Allowed.
  632. DB 46, M0 ; (0,0)
  633. DB 48, MHP2 ; (0,2)
  634. DB 47, MHP4, ?, ? ; (0,4)
  635. ; 11: Interior Macroblock, No Vertical Motion Allowed.
  636. DB 47, M0 ; (0, 0)
  637. DB 47, MHN4 ; (0,-4)
  638. DB 47, MHP4, ?, ? ; (0, 4)
  639. ; 12: Right Edge, No Vertical Motion Allowed.
  640. DB 49, M0 ; (0, 0)
  641. DB 48, MHN2 ; (0,-2)
  642. DB 47, MHN4, ?, ? ; (0,-4)
  643. ; 13: Horz by 2, Vert by 2, Horz by 1, Vert by 1.
  644. DB 14, M0
  645. DB 14, MHP2
  646. DB 14, MHN2, ?, ?
  647. ; 14: Vert by 2, Horz by 1, Vert by 1.
  648. DB 15, M0
  649. DB 15, MVP2
  650. DB 15, MVN2, ?, ?
  651. ; 15: Horz by 1, Vert by 1.
  652. DB 16, M0
  653. DB 16, MHP1
  654. DB 16, MHN1, ?, ?
  655. ; 16: Vert by 1.
  656. DB 0, M0
  657. DB 0, MVP1
  658. DB 0, MVN1, ?, ?
  659. ; 17: Vert by 2, Horz by 2, Vert by 1, Horz by 1.
  660. DB 18, M0
  661. DB 18, MVP2
  662. DB 18, MVN2, ?, ?
  663. ; 18: Horz by 2, Vert by 1, Horz by 1.
  664. DB 19, M0
  665. DB 19, MHP2
  666. DB 19, MHN2, ?, ?
  667. ; 19: Vert by 1, Horz by 1.
  668. DB 20, M0
  669. DB 20, MVP1
  670. DB 20, MVN1, ?, ?
  671. ; 20: Horz by 1.
  672. DB 0, M0
  673. DB 0, MHP1
  674. DB 0, MHN1, ?, ?
  675. ; 21: From 1A. Upper Left. Try 2 right and 2 down.
  676. DB 24, M0 ; (0, 0)
  677. DB 25, MHP2 ; (0, 2)
  678. DB 26, MVP2, ?, ? ; (2, 0)
  679. ; 22: From 1B.
  680. ; From 2 center point would be (0,-4/0/4).
  681. ; From 3B center point would be (0,-4).
  682. DB 27, M0 ; (0, 4)
  683. DB 18, MVP2 ; (2, 4) Next: Horz 2, Vert 1, Horz 1. (1:3,1:7)
  684. DB 13, MVP4, ?, ? ; (4, 4) Next: Horz 2, Vert 2, Horz 1, Vert 1. (1:7,1:7)
  685. ; 23: From 1C.
  686. ; From 4 center point would be (-4/0/4,0).
  687. ; From 7C center point would be (-4,0).
  688. DB 29, M0 ; (4, 0)
  689. DB 14, MHP2 ; (4, 2) Next: Vert 2, Horz 1, Vert 1. (1:7,1:3)
  690. DB 17, MHP4, ?, ? ; (4, 4) Next: Vert 2, Horz 2, Vert 1, Horz 1. (1:7,1:7)
  691. ; 24: From 21A. Upper Left. Try 1 right and 1 down.
  692. DB 0, M0 ; (0, 0)
  693. DB 0, MHP1 ; (1, 0)
  694. DB 0, MVP1, ?, ? ; (0, 1)
  695. ; 25: From 21B.
  696. ; From 31B center point would be (0,-2).
  697. DB 20, M0 ; (0, 2) Next: Horz 1 (0,1:3)
  698. DB 20, MVP1 ; (1, 2) Next: Horz 1 (1,1:3)
  699. DB 15, MVP2, ?, ? ; (2, 2) Next: Horz 1, Vert 1 (1:3,1:3)
  700. ; 26: From 21C.
  701. ; From 38C center point would be (-2,0).
  702. DB 16, M0 ; (2, 0) Next: Vert 1 (1:3,0)
  703. DB 16, MHP1 ; (2, 1) Next: Vert 1 (1:3,1)
  704. DB 19, MHP2, ?, ? ; (2, 2) Next: Vert 1, Horz 1 (1:3,1:3)
  705. ; 27: From 22A.
  706. DB 28, M0 ; (0, 4)
  707. DB 28, MHN2 ; (0, 2)
  708. DB 28, MHP2, ?, ? ; (0, 6)
  709. ; 28: From 27.
  710. DB 20, M0 ; (0, 2/4/6) Next: Horz 1. (0,1:7)
  711. DB 20, MVP1 ; (1, 2/4/6) Next: Horz 1. (1,1:7)
  712. DB 20, MVP2, ?, ? ; (2, 2/4/6) Next: Horz 1. (2,1:7)
  713. ; 29: From 23A.
  714. DB 30, M0 ; (4, 0)
  715. DB 30, MVN2 ; (2, 0)
  716. DB 30, MVP2, ?, ? ; (6, 0)
  717. ; 30: From 29.
  718. DB 16, M0 ; (2/4/6, 0) Next: Vert 1. (1:7,0)
  719. DB 16, MHP1 ; (2/4/6, 1) Next: Vert 1. (1:7,1)
  720. DB 16, MHP2, ?, ? ; (2/4/6, 2) Next: Vert 1. (1:7,2)
  721. ; 31: From 3A. Upper Right. Try 2 left and 2 down.
  722. DB 33, M0 ; (0, 0)
  723. DB 25, MHN2 ; (0,-2)
  724. DB 34, MVP2, ?, ? ; (2, 0)
  725. ; 32: From 3C.
  726. ; From 6 center point would be (-4/0/4, 0)
  727. ; From 9C center point would be (-4, 0)
  728. DB 35, M0 ; (4, 0)
  729. DB 14, MHN2 ; (4,-2) Next: Vert2,Horz1,Vert1. (1:7,-1:-3)
  730. DB 17, MHN4, ?, ? ; (4,-4) Next: Vert2,Horz2,Vert1,Horz1. (1:7,-1:-7)
  731. ; 33: From 31A. Upper Right. Try 1 left and 1 down.
  732. DB 0, M0 ; (0, 0)
  733. DB 0, MHN1 ; (0,-1)
  734. DB 0, MVP1, ?, ? ; (1, 0)
  735. ; 34: From 31C.
  736. ; From 44C center point would be (-2, 0)
  737. DB 16, M0 ; (2, 0) Next: Vert 1 (1:3, 0)
  738. DB 16, MHN1 ; (2,-1) Next: Vert 1 (1:3,-1)
  739. DB 19, MHN2, ?, ? ; (2,-2) Next: Vert 1, Horz 1 (1:3,-1:-3)
  740. ; 35: From 32A.
  741. DB 36, M0 ; (4, 0)
  742. DB 36, MVN2 ; (2, 0)
  743. DB 36, MVP2, ?, ? ; (6, 0)
  744. ; 36: From 35.
  745. DB 16, M0 ; (2/4/6, 0) Next: Vert 1. (1:7, 0)
  746. DB 16, MHN1 ; (2/4/6,-1) Next: Vert 1. (1:7,-1)
  747. DB 16, MHN2, ?, ? ; (2/4/6,-2) Next: Vert 1. (1:7,-2)
  748. ; 37: From 5.
  749. DB 17, M0 ; (-4/0/4, 0) Next: Vert2,Horz2,Vert1,Horz1 (-7:7,-3: 3)
  750. DB 17, MHP4 ; (-4/0/4,-4) Next: Vert2,Horz2,Vert1,Horz1 (-7:7, 1: 7)
  751. DB 17, MHN4, ?, ? ; (-4/0/4, 4) Next: Vert2,Horz2,Vert1,Horz1 (-7:7,-7:-1)
  752. ; 38: From 7A. Lower Left. Try 2 right and 2 up.
  753. DB 42, M0 ; ( 0,0)
  754. DB 43, MHP2 ; ( 0,2)
  755. DB 26, MVN2, ?, ? ; (-2,0)
  756. ; 39: From 13B.
  757. ; From 14 center point would be (0,-4/0/4)
  758. ; From 16B center point would be (0,-4)
  759. DB 40, M0 ; ( 0,4)
  760. DB 18, MVN2 ; (-2,4) Next: Horz2,Vert1,Horz1. (-3:-1,1:7)
  761. DB 13, MVN4, ?, ? ; (-4,4) Next: Horz2,Vert2,Horz1,Vert1. (-7:-1,1:7)
  762. ; 40: From 39A.
  763. DB 41, M0 ; (0, 4)
  764. DB 41, MHN2 ; (0, 2)
  765. DB 41, MHP2, ?, ? ; (0, 6)
  766. ; 41: From 40.
  767. DB 20, M0 ; ( 0,2/4/6) Next: Horz 1. ( 0,1:7)
  768. DB 20, MVN1 ; (-1,2/4/6) Next: Horz 1. (-1,1:7)
  769. DB 20, MVN2, ?, ? ; (-2,2/4/6) Next: Horz 1. (-2,1:7)
  770. ; 42: From 38A. Lower Left. Try 1 right and 1 up.
  771. DB 0, M0 ; ( 0,0)
  772. DB 0, MHP1 ; ( 0,1)
  773. DB 0, MVN1, ?, ? ; (-1,0)
  774. ; 43: From 38B.
  775. ; From 44B center point would be (0,-2)
  776. DB 20, M0 ; ( 0,2) Next: Horz 1 ( 0,1:3)
  777. DB 20, MVN1 ; (-1,2) Next: Horz 1 (-1,1:3)
  778. DB 15, MVN2, ?, ? ; (-2,2) Next: Horz 1, Vert 1 (-1:-3,1:3)
  779. ; 44: From 9A. Lower Right. Try 2 left and 2 up.
  780. DB 45, M0 ; ( 0, 0)
  781. DB 43, MHN2 ; ( 0,-2)
  782. DB 34, MVN2, ?, ? ; (-2, 0)
  783. ; 45: From 44A. Lower Right. Try 1 left and 1 up.
  784. DB 0, M0 ; ( 0, 0)
  785. DB 0, MHN1 ; ( 0,-1)
  786. DB 0, MVN1, ?, ? ; (-1, 0)
  787. ; 46: From 17A.
  788. DB 0, M0 ; (0,0)
  789. DB 0, MHP1 ; (0,1)
  790. DB 0, MHP1, ?, ? ; (0,1)
  791. ; 47: From 10C.
  792. ; From 11 center point would be (0,4/0/-4)
  793. ; From 12C center point would be (0,-4)
  794. DB 48, M0 ; (0,4)
  795. DB 48, MHN2 ; (0,2)
  796. DB 48, MHP2, ?, ? ; (0,6)
  797. ; 48 From 10B.
  798. ; From 47 center point would be (0,2/4/6)
  799. ; From 12B center point would be (0,-2)
  800. DB 0, M0 ; (0,2)
  801. DB 0, MHN1 ; (0,1)
  802. DB 0, MHP1, ?, ? ; (0,3)
  803. ; 49 From 12A.
  804. DB 0, M0 ; (0, 0)
  805. DB 0, MHN1 ; (0,-1)
  806. DB 0, MHN1, ?, ? ; (0,-1)
  807. ; 50: Interior Macroblock. Explore 7 up and 7 down.
  808. DB 51, M0 ; ( 0,0)
  809. DB 51, MVN7 ; (-7,0)
  810. DB 51, MVP7, ?, ? ; ( 7,0)
  811. ; 51: Explore 7 left and 7 right.
  812. DB 5, M0 ; (-7|0|7, 0)
  813. DB 5, MHN7 ; (-7|0|7,-7)
  814. DB 5, MHP7, ?, ? ; (-7|0|7, 7)
  815. MulByNeg8 LABEL DWORD
  816. CNT = 0
  817. REPEAT 128
  818. DD WeightedDiff+CNT
  819. CNT = CNT - 8
  820. ENDM
  821. ; The following treachery puts the numbers into byte 2 of each aligned DWORD.
  822. DB 0, 0
  823. DD 193 DUP (255)
  824. DD 250,243,237,231,225,219,213,207,201,195,189,184,178,172,167,162,156
  825. DD 151,146,141,135,130,126,121,116,111,107,102, 97, 93, 89, 84, 80, 76
  826. DD 72, 68, 64, 61, 57, 53, 50, 46, 43, 40, 37, 34, 31, 28, 25, 22, 20
  827. DD 18, 15, 13, 11, 9, 7, 6, 4, 3, 2, 1
  828. DB 0, 0
  829. WeightedDiff LABEL DWORD
  830. DB 0, 0
  831. DD 0, 0, 1, 2, 3, 4, 6, 7, 9, 11, 13, 15, 18
  832. DD 20, 22, 25, 28, 31, 34, 37, 40, 43, 46, 50, 53, 57, 61, 64, 68, 72
  833. DD 76, 80, 84, 89, 93, 97,102,107,111,116,121,126,130,135,141,146,151
  834. DD 156,162,167,172,178,184,189,195,201,207,213,219,225,231,237,243,250
  835. DD 191 DUP (255)
  836. DB 255, 0
  837. MotionOffsets DD 1*PITCH,0,?,?
  838. RemnantOfCacheLine DB 8 DUP (?)
  839. LocalStorage LABEL DWORD ; Local storage goes on the stack at addresses
  840. ; whose lower 12 bits match this address.
  841. .CODE
  842. ASSUME cs : FLAT
  843. ASSUME ds : FLAT
  844. ASSUME es : FLAT
  845. ASSUME fs : FLAT
  846. ASSUME gs : FLAT
  847. ASSUME ss : FLAT
  848. MOTIONESTIMATION proc C AMBAS: DWORD,
  849. ATargFrmBase: DWORD,
  850. APrevFrmBase: DWORD,
  851. AFiltFrmBase: DWORD,
  852. ADo15Search: DWORD,
  853. ADoHalfPelEst: DWORD,
  854. ADoBlkLvlVec: DWORD,
  855. ADoSpatialFilt: DWORD,
  856. AZeroVectorThresh: DWORD,
  857. ANonZeroMVDiff: DWORD,
  858. ABlockMVDiff: DWORD,
  859. AEmptyThresh: DWORD,
  860. AInterCodThresh: DWORD,
  861. AIntraCodDiff: DWORD,
  862. ASpatialFiltThresh: DWORD,
  863. ASpatialFiltDiff: DWORD,
  864. AIntraSWDTot: DWORD,
  865. AIntraSWDBlks: DWORD,
  866. AInterSWDTot: DWORD,
  867. AInterSWDBlks: DWORD
  868. LocalFrameSize = 128 + 168*4 + 32 ; 128 for locals; 168*4 for blocks; 32 for dummy block.
  869. RegStoSize = 16
  870. ; Arguments:
  871. MBlockActionStream_arg = RegStoSize + 4
  872. TargetFrameBaseAddress_arg = RegStoSize + 8
  873. PreviousFrameBaseAddress_arg = RegStoSize + 12
  874. FilteredFrameBaseAddress_arg = RegStoSize + 16
  875. DoRadius15Search_arg = RegStoSize + 20
  876. DoHalfPelEstimation_arg = RegStoSize + 24
  877. DoBlockLevelVectors_arg = RegStoSize + 28
  878. DoSpatialFiltering_arg = RegStoSize + 32
  879. ZeroVectorThreshold_arg = RegStoSize + 36
  880. NonZeroMVDifferential_arg = RegStoSize + 40
  881. BlockMVDifferential_arg = RegStoSize + 44
  882. EmptyThreshold_arg = RegStoSize + 48
  883. InterCodingThreshold_arg = RegStoSize + 52
  884. IntraCodingDifferential_arg = RegStoSize + 56
  885. SpatialFiltThreshold_arg = RegStoSize + 60
  886. SpatialFiltDifferential_arg = RegStoSize + 64
  887. IntraSWDTotal_arg = RegStoSize + 68
  888. IntraSWDBlocks_arg = RegStoSize + 72
  889. InterSWDTotal_arg = RegStoSize + 76
  890. InterSWDBlocks_arg = RegStoSize + 80
  891. EndOfArgList = RegStoSize + 84
  892. ; Locals (on local stack frame)
  893. MBlockActionStream EQU [esp+ 0]
  894. CurrSWDState EQU [esp+ 4]
  895. MotionOffsetsCursor EQU CurrSWDState
  896. HalfPelHorzSavings EQU CurrSWDState
  897. VertFilterDoneAddr EQU CurrSWDState
  898. IntraSWDTotal EQU [esp+ 8]
  899. IntraSWDBlocks EQU [esp+ 12]
  900. InterSWDTotal EQU [esp+ 16]
  901. InterSWDBlocks EQU [esp+ 20]
  902. MBCentralInterSWD EQU [esp+ 24]
  903. MBRef1InterSWD EQU [esp+ 28]
  904. MBRef2InterSWD EQU [esp+ 32]
  905. MBCentralInterSWD_BLS EQU [esp+ 36]
  906. MB0MVInterSWD EQU [esp+ 40]
  907. MBAddrCentralPoint EQU [esp+ 44]
  908. MBMotionVectors EQU [esp+ 48]
  909. DoHalfPelEstimation EQU [esp+ 52]
  910. DoBlockLevelVectors EQU [esp+ 56]
  911. DoSpatialFiltering EQU [esp+ 60]
  912. ZeroVectorThreshold EQU [esp+ 64]
  913. NonZeroMVDifferential EQU [esp+ 68]
  914. BlockMVDifferential EQU [esp+ 72]
  915. EmptyThreshold EQU [esp+ 76]
  916. InterCodingThreshold EQU [esp+ 80]
  917. IntraCodingDifferential EQU [esp+ 84]
  918. SpatialFiltThreshold EQU [esp+ 88]
  919. SpatialFiltDifferential EQU [esp+ 92]
  920. TargetMBAddr EQU [esp+ 96]
  921. TargetFrameBaseAddress EQU [esp+ 100]
  922. PreviousFrameBaseAddress EQU [esp+ 104]
  923. TargToRef EQU [esp+ 108]
  924. TargToSLF EQU [esp+ 112]
  925. DoRadius15Search EQU [esp+ 116]
  926. StashESP EQU [esp+ 120]
  927. BlockLen EQU 168
  928. Block1 EQU [esp+ 128+40] ; "128" is for locals. "40" is so offsets range from -40 to 124.
  929. Block2 EQU Block1 + BlockLen
  930. Block3 EQU Block2 + BlockLen
  931. Block4 EQU Block3 + BlockLen
  932. BlockN EQU Block4 + BlockLen
  933. BlockNM1 EQU Block4
  934. BlockNM2 EQU Block3
  935. BlockNP1 EQU Block4 + BlockLen + BlockLen
  936. DummyBlock EQU Block4 + BlockLen
  937. Ref1Addr EQU -40
  938. Ref2Addr EQU -36
  939. AddrCentralPoint EQU -32
  940. CentralInterSWD EQU -28
  941. Ref1InterSWD EQU -24
  942. Ref2InterSWD EQU -20
  943. CentralInterSWD_BLS EQU -16 ; CentralInterSWD, when doing blk level search.
  944. CentralInterSWD_SLF EQU -16 ; CentralInterSWD, when doing spatial filter.
  945. HalfPelSavings EQU Ref2Addr
  946. ZeroMVInterSWD EQU -12
  947. BlkHMV EQU -8
  948. BlkVMV EQU -7
  949. BlkMVs EQU -8
  950. AccumTargetPels EQU -4
  951. ; Offsets for Negated Quadrupled Target Pels:
  952. N8T00 EQU 0
  953. N8T04 EQU 4
  954. N8T02 EQU 8
  955. N8T06 EQU 12
  956. N8T20 EQU 16
  957. N8T24 EQU 20
  958. N8T22 EQU 24
  959. N8T26 EQU 28
  960. N8T40 EQU 32
  961. N8T44 EQU 36
  962. N8T42 EQU 40
  963. N8T46 EQU 44
  964. N8T60 EQU 48
  965. N8T64 EQU 52
  966. N8T62 EQU 56
  967. N8T66 EQU 60
  968. N8T11 EQU 64
  969. N8T15 EQU 68
  970. N8T13 EQU 72
  971. N8T17 EQU 76
  972. N8T31 EQU 80
  973. N8T35 EQU 84
  974. N8T33 EQU 88
  975. N8T37 EQU 92
  976. N8T51 EQU 96
  977. N8T55 EQU 100
  978. N8T53 EQU 104
  979. N8T57 EQU 108
  980. N8T71 EQU 112
  981. N8T75 EQU 116
  982. N8T73 EQU 120
  983. N8T77 EQU 124
  984. push esi
  985. push edi
  986. push ebp
  987. push ebx
  988. ; Adjust stack ptr so that local frame fits nicely in cache w.r.t. other data.
  989. mov esi,esp
  990. sub esp,000001000H
  991. mov eax,[esp] ; Cause system to commit page.
  992. sub esp,000001000H
  993. and esp,0FFFFF000H
  994. mov ebx,OFFSET LocalStorage+31
  995. and ebx,000000FE0H
  996. mov edx,PD [esi+MBlockActionStream_arg]
  997. or esp,ebx
  998. mov eax,PD [esi+TargetFrameBaseAddress_arg]
  999. mov TargetFrameBaseAddress,eax
  1000. mov ebx,PD [esi+PreviousFrameBaseAddress_arg]
  1001. mov PreviousFrameBaseAddress,ebx
  1002. sub ebx,eax
  1003. mov ecx,PD [esi+FilteredFrameBaseAddress_arg]
  1004. sub ecx,eax
  1005. mov TargToRef,ebx
  1006. mov TargToSLF,ecx
  1007. mov eax,PD [esi+EmptyThreshold_arg]
  1008. mov EmptyThreshold,eax
  1009. mov eax,PD [esi+DoHalfPelEstimation_arg]
  1010. mov DoHalfPelEstimation,eax
  1011. mov eax,PD [esi+DoBlockLevelVectors_arg]
  1012. mov DoBlockLevelVectors,eax
  1013. mov eax,PD [esi+DoRadius15Search_arg]
  1014. mov DoRadius15Search,eax
  1015. mov eax,PD [esi+DoSpatialFiltering_arg]
  1016. mov DoSpatialFiltering,eax
  1017. mov eax,PD [esi+ZeroVectorThreshold_arg]
  1018. mov ZeroVectorThreshold,eax
  1019. mov eax,PD [esi+NonZeroMVDifferential_arg]
  1020. mov NonZeroMVDifferential,eax
  1021. mov eax,PD [esi+BlockMVDifferential_arg]
  1022. mov BlockMVDifferential,eax
  1023. mov eax,PD [esi+InterCodingThreshold_arg]
  1024. mov InterCodingThreshold,eax
  1025. mov eax,PD [esi+IntraCodingDifferential_arg]
  1026. mov IntraCodingDifferential,eax
  1027. mov eax,PD [esi+SpatialFiltThreshold_arg]
  1028. mov SpatialFiltThreshold,eax
  1029. mov eax,PD [esi+SpatialFiltDifferential_arg]
  1030. mov SpatialFiltDifferential,eax
  1031. xor ebx,ebx
  1032. mov IntraSWDBlocks,ebx
  1033. mov InterSWDBlocks,ebx
  1034. mov IntraSWDTotal,ebx
  1035. mov InterSWDTotal,ebx
  1036. mov Block1.BlkMVs,ebx
  1037. mov Block2.BlkMVs,ebx
  1038. mov Block3.BlkMVs,ebx
  1039. mov Block4.BlkMVs,ebx
  1040. mov DummyBlock.Ref1Addr,esp
  1041. mov DummyBlock.Ref2Addr,esp
  1042. mov StashESP,esi
  1043. jmp FirstMacroBlock
  1044. ; Activity Details for this section of code (refer to flow diagram above):
  1045. ;
  1046. ; 1) To calculate an average value for the target match points of each
  1047. ; block, we sum the 32 match points. The totals for each of the 4
  1048. ; blocks is output seperately.
  1049. ;
  1050. ; 2) Define each prepared match point in the target macroblock as the
  1051. ; real match point times negative 8, with the base address of the
  1052. ; WeightedDiff lookup table added. I.e.
  1053. ;
  1054. ; for (i = 0; i < 16; i += 2)
  1055. ; for (j = 0; j < 16; j += 2)
  1056. ; N8T[i][j] = ( -8 * Target[i][j]) + ((U32) WeightedDiff);
  1057. ;
  1058. ; Both the multiply and the add of the WeightedDiff array base are
  1059. ; effected by a table lookup into the array MulByNeg8.
  1060. ;
  1061. ; Then the SWD of a reference macroblock can be calculated as follows:
  1062. ;
  1063. ; SWD = 0;
  1064. ; for each match point (i,j)
  1065. ; SWD += *((U32 *) (N8T[i][j] + 8 * Ref[i][j]));
  1066. ;
  1067. ; In assembly, the fetch of WeightedDiff array element amounts to this:
  1068. ;
  1069. ; mov edi,DWORD PTR N8T[i][j] ; Fetch N8T[i][j]
  1070. ; mov dl,BYTE PTR Ref[i][j] ; Fetch Ref[i][j]
  1071. ; mov edi,DWORD PTR[edi+edx*8] ; Fetch WeithtedDiff of target & ref.
  1072. ;
  1073. ; 3) We calculate the 0-motion SWD, as described just above. We use 32
  1074. ; match points per block, and write the result seperately for each
  1075. ; block. The result is accumulated into the high half of ebp.
  1076. ;
  1077. ; 4) If the SWD for the 0-motion vector is below a threshold, we don't
  1078. ; bother searching for other possibly better motion vectors. Presently,
  1079. ; this threshold is set such that an average difference of less than
  1080. ; three per match point causes the 0-motion vector to be accepted.
  1081. ;
  1082. ; Register usage for this section:
  1083. ;
  1084. ; Input of this section:
  1085. ;
  1086. ; edx -- MBlockActionStream
  1087. ;
  1088. ; Predominate usage for body of this section:
  1089. ;
  1090. ; esi -- Target block address.
  1091. ; edi -- 0-motion reference block address.
  1092. ; ebp[ 0:12] -- Accumulator for target pels.
  1093. ; ebp[13:15] -- Loop control
  1094. ; ebp[16:31] -- Accumulator for weighted diff between target and 0-MV ref.
  1095. ; edx -- Address at which to store -8 times pels.
  1096. ; ecx -- A reference pel.
  1097. ; ebx -- A target pel.
  1098. ; eax -- A target pel times -8; and a weighted difference.
  1099. ;
  1100. ; Expected Pentium (tm) microprocessor performance for section:
  1101. ;
  1102. ; Executed once per macroblock.
  1103. ;
  1104. ; 520 clocks for instruction execution
  1105. ; 8 clocks for bank conflicts (64 dual mem ops with 1/8 chance of conflict)
  1106. ; 80 clocks generously estimated for an average of 8 cache line fills for
  1107. ; the target macroblock and 8 cache line fills for the reference area.
  1108. ; ----
  1109. ; 608 clocks total time for this section.
  1110. ;
  1111. NextMacroBlock:
  1112. mov bl,[edx].CodedBlocks
  1113. add edx,SIZEOF T_MacroBlockActionDescr
  1114. and ebx,000000040H ; Check for end-of-stream
  1115. jne Done
  1116. FirstMacroBlock:
  1117. mov cl,[edx].CodedBlocks ; Init CBP for macroblock.
  1118. mov ebp,TargetFrameBaseAddress
  1119. mov bl,[edx].FirstMEState ; First State
  1120. mov eax,DoRadius15Search ; Searching 15 full pels out, or just 7?
  1121. neg al ; doing blk lvl => al=0, not => al=-1
  1122. or cl,03FH ; Indicate all 6 blocks are coded.
  1123. and al,bl
  1124. mov esi,[edx].BlkY1.BlkOffset ; Get address of next macroblock to do.
  1125. cmp al,5
  1126. jne @f
  1127. mov bl,50 ; Cause us to search +/- 15 if central
  1128. ; ; block and willing to go that far.
  1129. @@:
  1130. mov edi,TargToRef
  1131. add esi,ebp
  1132. mov CurrSWDState,ebx ; Stash First State Number as current.
  1133. add edi,esi
  1134. xor ebp,ebp
  1135. mov TargetMBAddr,esi ; Stash address of target macroblock.
  1136. mov MBlockActionStream,edx ; Stash list ptr.
  1137. mov [edx].CodedBlocks,cl
  1138. mov ecx,INTER1MV ; Speculate INTER-coding, 1 motion vector.
  1139. mov [edx].BlockType,cl
  1140. lea edx,Block1
  1141. PrepMatchPointsNextBlock:
  1142. mov bl,PB [esi+6] ; 06A -- Target Pel 00.
  1143. add ebp,ebx ; 06B -- Accumulate target pels.
  1144. mov cl,PB [edi+6] ; 06C -- Reference Pel 00.
  1145. mov eax,MulByNeg8[ebx*4] ; 06D -- Target Pel 00 * -8.
  1146. mov bl,PB [esi+4] ; 04A
  1147. mov [edx].N8T06,eax ; 06E -- Store negated quadrupled Pel 00.
  1148. add ebp,ebx ; 04B
  1149. mov eax,PD [eax+ecx*8] ; 06F -- Weighted difference for Pel 00.
  1150. mov cl,PB [edi+4] ; 04C
  1151. add ebp,eax ; 06G -- Accumulate weighted difference.
  1152. mov eax,MulByNeg8[ebx*4] ; 04D
  1153. mov bl,PB [esi+2] ; 02A
  1154. mov [edx].N8T04,eax ; 04E
  1155. add ebp,ebx ; 02B
  1156. mov eax,PD [eax+ecx*8] ; 04F
  1157. mov cl,PB [edi+2] ; 02C
  1158. add ebp,eax ; 04G
  1159. mov eax,MulByNeg8[ebx*4] ; 02D
  1160. mov bl,PB [esi] ; 00A
  1161. mov [edx].N8T02,eax ; 02E
  1162. add ebp,ebx ; 00B
  1163. mov eax,PD [eax+ecx*8] ; 02F
  1164. add esi,PITCH+1
  1165. mov cl,PB [edi] ; 00C
  1166. add edi,PITCH+1
  1167. lea ebp,[ebp+eax+000004000H] ; 02G (plus loop control)
  1168. mov eax,MulByNeg8[ebx*4] ; 00D
  1169. mov bl,PB [esi+6] ; 17A
  1170. mov [edx].N8T00,eax ; 00E
  1171. add ebp,ebx ; 17B
  1172. mov eax,PD [eax+ecx*8] ; 00F
  1173. mov cl,PB [edi+6] ; 17C
  1174. add ebp,eax ; 00G
  1175. mov eax,MulByNeg8[ebx*4] ; 17D
  1176. mov bl,PB [esi+4] ; 15A
  1177. mov [edx].N8T17,eax ; 17E
  1178. add ebp,ebx ; 15B
  1179. mov eax,PD [eax+ecx*8] ; 17F
  1180. mov cl,PB [edi+4] ; 15C
  1181. add ebp,eax ; 17G
  1182. mov eax,MulByNeg8[ebx*4] ; 15D
  1183. mov bl,PB [esi+2] ; 13A
  1184. mov [edx].N8T15,eax ; 15E
  1185. add ebp,ebx ; 13B
  1186. mov eax,PD [eax+ecx*8] ; 15F
  1187. mov cl,PB [edi+2] ; 13C
  1188. add ebp,eax ; 15G
  1189. mov eax,MulByNeg8[ebx*4] ; 13D
  1190. mov bl,PB [esi] ; 11A
  1191. mov [edx].N8T13,eax ; 13E
  1192. add ebp,ebx ; 11B
  1193. mov eax,PD [eax+ecx*8] ; 13F
  1194. add esi,PITCH-1
  1195. mov cl,PB [edi] ; 11C
  1196. add edi,PITCH-1
  1197. add ebp,eax ; 13G
  1198. mov eax,MulByNeg8[ebx*4] ; 11D
  1199. mov bl,PB [esi+6] ; 26A
  1200. mov [edx].N8T11,eax ; 11E
  1201. add ebp,ebx ; 26B
  1202. mov eax,PD [eax+ecx*8] ; 11F
  1203. mov cl,PB [edi+6] ; 26C
  1204. add ebp,eax ; 11G
  1205. mov eax,MulByNeg8[ebx*4] ; 26D
  1206. mov bl,PB [esi+4] ; 24A
  1207. mov [edx].N8T26,eax ; 26E
  1208. add ebp,ebx ; 24B
  1209. mov eax,PD [eax+ecx*8] ; 26F
  1210. mov cl,PB [edi+4] ; 24C
  1211. add ebp,eax ; 26G
  1212. mov eax,MulByNeg8[ebx*4] ; 24D
  1213. mov bl,PB [esi+2] ; 22A
  1214. mov [edx].N8T24,eax ; 24E
  1215. add ebp,ebx ; 22B
  1216. mov eax,PD [eax+ecx*8] ; 24F
  1217. mov cl,PB [edi+2] ; 22C
  1218. add ebp,eax ; 24G
  1219. mov eax,MulByNeg8[ebx*4] ; 22D
  1220. mov bl,PB [esi] ; 20A
  1221. mov [edx].N8T22,eax ; 22E
  1222. add ebp,ebx ; 20B
  1223. mov eax,PD [eax+ecx*8] ; 22F
  1224. add esi,PITCH+1
  1225. mov cl,PB [edi] ; 20C
  1226. add edi,PITCH+1
  1227. add ebp,eax ; 22G
  1228. mov eax,MulByNeg8[ebx*4] ; 20D
  1229. mov bl,PB [esi+6] ; 37A
  1230. mov [edx].N8T20,eax ; 20E
  1231. add ebp,ebx ; 37B
  1232. mov eax,PD [eax+ecx*8] ; 20F
  1233. mov cl,PB [edi+6] ; 37C
  1234. add ebp,eax ; 20G
  1235. mov eax,MulByNeg8[ebx*4] ; 37D
  1236. mov bl,PB [esi+4] ; 35A
  1237. mov [edx].N8T37,eax ; 37E
  1238. add ebp,ebx ; 35B
  1239. mov eax,PD [eax+ecx*8] ; 37F
  1240. mov cl,PB [edi+4] ; 35C
  1241. add ebp,eax ; 37G
  1242. mov eax,MulByNeg8[ebx*4] ; 35D
  1243. mov bl,PB [esi+2] ; 33A
  1244. mov [edx].N8T35,eax ; 35E
  1245. add ebp,ebx ; 33B
  1246. mov eax,PD [eax+ecx*8] ; 35F
  1247. mov cl,PB [edi+2] ; 33C
  1248. add ebp,eax ; 35G
  1249. mov eax,MulByNeg8[ebx*4] ; 33D
  1250. mov bl,PB [esi] ; 31A
  1251. mov [edx].N8T33,eax ; 33E
  1252. add ebp,ebx ; 31B
  1253. mov eax,PD [eax+ecx*8] ; 33F
  1254. add esi,PITCH-1
  1255. mov cl,PB [edi] ; 31C
  1256. add edi,PITCH-1
  1257. add ebp,eax ; 33G
  1258. mov eax,MulByNeg8[ebx*4] ; 31D
  1259. mov bl,PB [esi+6] ; 46A
  1260. mov [edx].N8T31,eax ; 31E
  1261. add ebp,ebx ; 46B
  1262. mov eax,PD [eax+ecx*8] ; 31F
  1263. mov cl,PB [edi+6] ; 46C
  1264. add ebp,eax ; 31G
  1265. mov eax,MulByNeg8[ebx*4] ; 46D
  1266. mov bl,PB [esi+4] ; 44A
  1267. mov [edx].N8T46,eax ; 46E
  1268. add ebp,ebx ; 44B
  1269. mov eax,PD [eax+ecx*8] ; 46F
  1270. mov cl,PB [edi+4] ; 44C
  1271. add ebp,eax ; 46G
  1272. mov eax,MulByNeg8[ebx*4] ; 44D
  1273. mov bl,PB [esi+2] ; 42A
  1274. mov [edx].N8T44,eax ; 44E
  1275. add ebp,ebx ; 42B
  1276. mov eax,PD [eax+ecx*8] ; 44F
  1277. mov cl,PB [edi+2] ; 42C
  1278. add ebp,eax ; 44G
  1279. mov eax,MulByNeg8[ebx*4] ; 42D
  1280. mov bl,PB [esi] ; 40A
  1281. mov [edx].N8T42,eax ; 42E
  1282. add ebp,ebx ; 40B
  1283. mov eax,PD [eax+ecx*8] ; 42F
  1284. add esi,PITCH+1
  1285. mov cl,PB [edi] ; 40C
  1286. add edi,PITCH+1
  1287. add ebp,eax ; 42G
  1288. mov eax,MulByNeg8[ebx*4] ; 40D
  1289. mov bl,PB [esi+6] ; 57A
  1290. mov [edx].N8T40,eax ; 40E
  1291. add ebp,ebx ; 57B
  1292. mov eax,PD [eax+ecx*8] ; 40F
  1293. mov cl,PB [edi+6] ; 57C
  1294. add ebp,eax ; 40G
  1295. mov eax,MulByNeg8[ebx*4] ; 57D
  1296. mov bl,PB [esi+4] ; 55A
  1297. mov [edx].N8T57,eax ; 57E
  1298. add ebp,ebx ; 55B
  1299. mov eax,PD [eax+ecx*8] ; 57F
  1300. mov cl,PB [edi+4] ; 55C
  1301. add ebp,eax ; 57G
  1302. mov eax,MulByNeg8[ebx*4] ; 55D
  1303. mov bl,PB [esi+2] ; 53A
  1304. mov [edx].N8T55,eax ; 55E
  1305. add ebp,ebx ; 53B
  1306. mov eax,PD [eax+ecx*8] ; 55F
  1307. mov cl,PB [edi+2] ; 53C
  1308. add ebp,eax ; 55G
  1309. mov eax,MulByNeg8[ebx*4] ; 53D
  1310. mov bl,PB [esi] ; 51A
  1311. mov [edx].N8T53,eax ; 53E
  1312. add ebp,ebx ; 51B
  1313. mov eax,PD [eax+ecx*8] ; 53F
  1314. add esi,PITCH-1
  1315. mov cl,PB [edi] ; 51C
  1316. add edi,PITCH-1
  1317. add ebp,eax ; 53G
  1318. mov eax,MulByNeg8[ebx*4] ; 51D
  1319. mov bl,PB [esi+6] ; 66A
  1320. mov [edx].N8T51,eax ; 51E
  1321. add ebp,ebx ; 66B
  1322. mov eax,PD [eax+ecx*8] ; 51F
  1323. mov cl,PB [edi+6] ; 66C
  1324. add ebp,eax ; 51G
  1325. mov eax,MulByNeg8[ebx*4] ; 66D
  1326. mov bl,PB [esi+4] ; 64A
  1327. mov [edx].N8T66,eax ; 66E
  1328. add ebp,ebx ; 64B
  1329. mov eax,PD [eax+ecx*8] ; 66F
  1330. mov cl,PB [edi+4] ; 64C
  1331. add ebp,eax ; 66G
  1332. mov eax,MulByNeg8[ebx*4] ; 64D
  1333. mov bl,PB [esi+2] ; 62A
  1334. mov [edx].N8T64,eax ; 64E
  1335. add ebp,ebx ; 62B
  1336. mov eax,PD [eax+ecx*8] ; 64F
  1337. mov cl,PB [edi+2] ; 62C
  1338. add ebp,eax ; 64G
  1339. mov eax,MulByNeg8[ebx*4] ; 62D
  1340. mov bl,PB [esi] ; 60A
  1341. mov [edx].N8T62,eax ; 62E
  1342. add ebp,ebx ; 60B
  1343. mov eax,PD [eax+ecx*8] ; 62F
  1344. add esi,PITCH+1
  1345. mov cl,PB [edi] ; 60C
  1346. add edi,PITCH+1
  1347. add ebp,eax ; 62G
  1348. mov eax,MulByNeg8[ebx*4] ; 60D
  1349. mov bl,PB [esi+6] ; 77A
  1350. mov [edx].N8T60,eax ; 60E
  1351. add ebp,ebx ; 77B
  1352. mov eax,PD [eax+ecx*8] ; 60F
  1353. mov cl,PB [edi+6] ; 77C
  1354. add ebp,eax ; 60G
  1355. mov eax,MulByNeg8[ebx*4] ; 77D
  1356. mov bl,PB [esi+4] ; 75A
  1357. mov [edx].N8T77,eax ; 77E
  1358. add ebp,ebx ; 75B
  1359. mov eax,PD [eax+ecx*8] ; 77F
  1360. mov cl,PB [edi+4] ; 75C
  1361. add ebp,eax ; 77G
  1362. mov eax,MulByNeg8[ebx*4] ; 75D
  1363. mov bl,PB [esi+2] ; 73A
  1364. mov [edx].N8T75,eax ; 75E
  1365. add ebp,ebx ; 73B
  1366. mov eax,PD [eax+ecx*8] ; 75F
  1367. mov cl,PB [edi+2] ; 73C
  1368. add ebp,eax ; 75G
  1369. mov eax,MulByNeg8[ebx*4] ; 73D
  1370. mov bl,PB [esi] ; 71A
  1371. mov [edx].N8T73,eax ; 73E
  1372. add ebp,ebx ; 71B
  1373. mov eax,PD [eax+ecx*8] ; 73F
  1374. mov cl,PB [edi] ; 71C
  1375. add esi,PITCH-1-PITCH*8+8
  1376. add edi,PITCH-1-PITCH*8+8
  1377. add ebp,eax ; 73G
  1378. mov eax,MulByNeg8[ebx*4] ; 71D
  1379. mov ebx,ebp
  1380. mov [edx].N8T71,eax ; 71E
  1381. and ebx,000001FFFH ; Extract sum of target pels.
  1382. add edx,BlockLen ; Move to next output block
  1383. mov eax,PD [eax+ecx*8] ; 71F
  1384. mov [edx-BlockLen].AccumTargetPels,ebx ; Store acc of target pels for block.
  1385. add eax,ebp ; 71G
  1386. and ebp,000006000H ; Extract loop control
  1387. shr eax,16 ; Extract SWD; CF == 1 every second iter.
  1388. mov ebx,ecx
  1389. mov [edx-BlockLen].CentralInterSWD,eax ; Store SWD for 0-motion vector.
  1390. jnc PrepMatchPointsNextBlock
  1391. add esi,PITCH*8-16 ; Advance to block 3, or off end.
  1392. add edi,PITCH*8-16 ; Advance to block 3, or off end.
  1393. xor ebp,000002000H
  1394. jne PrepMatchPointsNextBlock ; Jump if advancing to block 3.
  1395. mov ebx,CurrSWDState ; Fetch First State Number for engine.
  1396. mov edi,Block1.CentralInterSWD
  1397. test bl,bl ; Test for INTRA-BY-DECREE.
  1398. je IntraByDecree
  1399. add eax,Block2.CentralInterSWD
  1400. add edi,Block3.CentralInterSWD
  1401. add eax,edi
  1402. mov edx,ZeroVectorThreshold
  1403. cmp eax,edx ; Compare 0-MV against ZeroVectorThresh
  1404. jle BelowZeroThresh ; Jump if 0-MV is good enough.
  1405. mov cl,PB SWDState[ebx*8+3] ; cl == Index of inc to apply to central
  1406. ; ; point to get to ref1.
  1407. mov bl,PB SWDState[ebx*8+5] ; bl == Same as cl, but for ref2.
  1408. mov edx,TargToRef
  1409. mov MB0MVInterSWD,eax ; Stash SWD for zero motion vector.
  1410. mov edi,PD OffsetToRef[ebx] ; Get inc to apply to ctr to get to ref2.
  1411. mov ebp,PD OffsetToRef[ecx] ; Get inc to apply to ctr to get to ref1.
  1412. lea esi,[esi+edx-PITCH*16] ; Calculate address of 0-MV ref block.
  1413. ;
  1414. mov MBAddrCentralPoint,esi ; Set central point to 0-MV.
  1415. mov MBCentralInterSWD,eax
  1416. mov eax,Block1.CentralInterSWD ; Stash Zero MV SWD, in case we decide
  1417. mov edx,Block2.CentralInterSWD ; the best non-zero MV isn't enough
  1418. mov Block1.ZeroMVInterSWD,eax ; better than the zero MV.
  1419. mov Block2.ZeroMVInterSWD,edx
  1420. mov eax,Block3.CentralInterSWD
  1421. mov edx,Block4.CentralInterSWD
  1422. mov Block3.ZeroMVInterSWD,eax
  1423. mov Block4.ZeroMVInterSWD,edx
  1424. ; Activity Details for this section of code (refer to flow diagram above):
  1425. ;
  1426. ; 5) The SWD for two different reference macroblocks is calculated; ref1
  1427. ; into the high order 16 bits of ebp, and ref2 into the low 16 bits.
  1428. ; This is performed for each iteration of the state engine. A normal,
  1429. ; internal macroblock will perform 6 iterations, searching +/- 4
  1430. ; horizontally, then +/- 4 vertically, then +/- 2 horizontally, then
  1431. ; +/- 2 vertically, then +/- 1 horizontally, then +/- 1 vertically.
  1432. ;
  1433. ; Register usage for this section:
  1434. ;
  1435. ; Input:
  1436. ;
  1437. ; esi -- Addr of 0-motion macroblock in ref frame.
  1438. ; ebp -- Increment to apply to get to first ref1 macroblock.
  1439. ; edi -- Increment to apply to get to first ref2 macroblock.
  1440. ; ebx, ecx -- High order 24 bits are zero.
  1441. ;
  1442. ; Output:
  1443. ;
  1444. ; ebp -- SWD for the best-fit reference macroblock.
  1445. ; ebx -- Index of increment to apply to get to best-fit reference MB.
  1446. ; MBAddrCentralPoint -- the best-fit of the previous iteration; it is the
  1447. ; value to which OffsetToRef[ebx] must be added.
  1448. ;
  1449. ;
  1450. ; Expected performance for SWDLoop code:
  1451. ;
  1452. ; Execution frequency: Six times per block for which motion analysis is done
  1453. ; beyond the 0-motion vector.
  1454. ;
  1455. ; Pentium (tm) microprocessor times per six iterations:
  1456. ; 180 clocks for instruction execution setup to DoSWDLoop
  1457. ; 2520 clocks for DoSWDLoop procedure, instruction execution.
  1458. ; 192 clocks for bank conflicts in DoSWDLoop
  1459. ; 30 clocks generously estimated for an average of 6 cache line fills for
  1460. ; the reference area.
  1461. ; ----
  1462. ; 2922 clocks total time for this section.
  1463. MBFullPelMotionSearchLoop:
  1464. lea edi,[esi+edi+PITCH*8+8]
  1465. lea esi,[esi+ebp+PITCH*8+8]
  1466. mov Block4.Ref1Addr,esi
  1467. mov Block4.Ref2Addr,edi
  1468. sub esi,8
  1469. sub edi,8
  1470. mov Block3.Ref1Addr,esi
  1471. mov Block3.Ref2Addr,edi
  1472. sub esi,PITCH*8-8
  1473. sub edi,PITCH*8-8
  1474. mov Block2.Ref1Addr,esi
  1475. mov Block2.Ref2Addr,edi
  1476. sub esi,8
  1477. sub edi,8
  1478. mov Block1.Ref1Addr,esi
  1479. mov Block1.Ref2Addr,edi
  1480. ; esi -- Points to ref1
  1481. ; edi -- Points to ref2
  1482. ; ecx -- Upper 24 bits zero
  1483. ; ebx -- Upper 24 bits zero
  1484. call DoSWDLoop
  1485. ; ebp -- Ref1 SWD for block 4
  1486. ; edx -- Ref2 SWD for block 4
  1487. ; ecx -- Upper 24 bits zero
  1488. ; ebx -- Upper 24 bits zero
  1489. mov esi,MBCentralInterSWD ; Get SWD for central point of these 3 refs
  1490. xor eax,eax
  1491. add ebp,Block1.Ref1InterSWD
  1492. add edx,Block1.Ref2InterSWD
  1493. add ebp,Block2.Ref1InterSWD
  1494. add edx,Block2.Ref2InterSWD
  1495. add ebp,Block3.Ref1InterSWD
  1496. add edx,Block3.Ref2InterSWD
  1497. cmp ebp,edx ; Carry flag == 1 iff ref1 SWD < ref2 SWD.
  1498. mov edi,CurrSWDState ; Restore current state number.
  1499. adc eax,eax ; eax == 1 iff ref1 SWD < ref2 SWD.
  1500. cmp ebp,esi ; Carry flag == 1 iff ref1 SWD < central SWD.
  1501. adc eax,eax ;
  1502. cmp edx,esi ; Carry flag == 1 iff ref2 SWD < central SWD.
  1503. adc eax,eax ; 0 --> Pick central point.
  1504. ; ; 1 --> Pick ref2.
  1505. ; ; 2 --> Not possible.
  1506. ; ; 3 --> Pick ref2.
  1507. ; ; 4 --> Pick central point.
  1508. ; ; 5 --> Not possible.
  1509. ; ; 6 --> Pick ref1.
  1510. ; ; 7 --> Pick ref1.
  1511. mov MBRef2InterSWD,edx
  1512. mov MBRef1InterSWD,ebp
  1513. xor edx,edx
  1514. mov dl,PB PickPoint[eax] ; dl == 0: central pt; 2: ref1; 4: ref2
  1515. mov esi,MBAddrCentralPoint ; Reload address of central ref block.
  1516. ;
  1517. ;
  1518. mov ebp,Block1.CentralInterSWD[edx*2] ; Get SWD for each block, picked pt.
  1519. mov al,PB SWDState[edx+edi*8+1] ; al == Index of inc to apply to old central
  1520. ; ; point to get new central point.
  1521. mov Block1.CentralInterSWD,ebp ; Stash SWD for new central point.
  1522. mov ebp,Block2.CentralInterSWD[edx*2]
  1523. mov Block2.CentralInterSWD,ebp
  1524. mov ebp,Block3.CentralInterSWD[edx*2]
  1525. mov Block3.CentralInterSWD,ebp
  1526. mov ebp,Block4.CentralInterSWD[edx*2]
  1527. mov Block4.CentralInterSWD,ebp
  1528. mov ebp,MBCentralInterSWD[edx*2]; Get the SWD for the point we picked.
  1529. mov dl,PB SWDState[edx+edi*8] ; dl == New state number.
  1530. mov MBCentralInterSWD,ebp ; Stash SWD for new central point.
  1531. mov edi,PD OffsetToRef[eax] ; Get inc to apply to get to new central pt.
  1532. mov CurrSWDState,edx ; Stash current state number.
  1533. mov bl,PB SWDState[edx*8+3] ; bl == Index of inc to apply to central
  1534. ; ; point to get to next ref1.
  1535. mov cl,PB SWDState[edx*8+5] ; cl == Same as bl, but for ref2.
  1536. add esi,edi ; Move to new central point.
  1537. test dl,dl
  1538. mov ebp,PD OffsetToRef[ebx] ; Get inc to apply to ctr to get to ref1.
  1539. mov edi,PD OffsetToRef[ecx] ; Get inc to apply to ctr to get to ref2.
  1540. mov MBAddrCentralPoint,esi ; Stash address of new central ref block.
  1541. jne MBFullPelMotionSearchLoop ; Jump if not done searching.
  1542. ;Done searching for integer motion vector for full macroblock
  1543. IF PITCH-384
  1544. *** Error: The magic leaks out of the following code if PITCH isn't 384.
  1545. ENDIF
  1546. mov ecx,TargToRef ; To Linearize MV for winning ref blk.
  1547. mov eax,esi ; Copy of ref macroblock addr.
  1548. sub eax,ecx ; To Linearize MV for winning ref blk.
  1549. mov ecx,TargetMBAddr
  1550. sub eax,ecx
  1551. mov edx,MBlockActionStream ; Fetch list ptr.
  1552. mov ebx,eax
  1553. mov ebp,DoHalfPelEstimation ; Are we doing half pel motion estimation?
  1554. shl eax,25 ; Extract horz motion component.
  1555. mov [edx].BlkY1.PastRef,esi ; Save address of reference MB selected.
  1556. sar ebx,8 ; Hi 24 bits of linearized MV lookup vert MV.
  1557. mov ecx,MBCentralInterSWD
  1558. sar eax,24 ; Finish extract horz motion component.
  1559. test ebp,ebp
  1560. mov bl,PB UnlinearizedVertMV[ebx] ; Look up proper vert motion vector.
  1561. mov [edx].BlkY1.PHMV,al ; Save winning horz motion vector.
  1562. mov [edx].BlkY1.PVMV,bl ; Save winning vert motion vector.
  1563. IFDEF H261
  1564. ELSE
  1565. je SkipHalfPelSearch_1MV
  1566. ;Search for half pel motion vector for full macroblock.
  1567. mov Block1.AddrCentralPoint,esi
  1568. lea ebp,[esi+8]
  1569. mov Block2.AddrCentralPoint,ebp
  1570. add ebp,PITCH*8-8
  1571. mov Block3.AddrCentralPoint,ebp
  1572. xor ecx,ecx
  1573. mov cl,[edx].FirstMEState
  1574. add ebp,8
  1575. mov edi,esi
  1576. mov Block4.AddrCentralPoint,ebp
  1577. mov ebp,InitHalfPelSearchHorz[ecx*4-4]
  1578. ; ebp -- Initialized to 0, except when can't search off left or right edge.
  1579. ; edi -- Ref addr for block 1. Ref1 is .5 pel to left. Ref2 is .5 to right.
  1580. call DoSWDHalfPelHorzLoop
  1581. ; ebp, ebx -- Zero
  1582. ; ecx -- Ref1 SWD for block 4
  1583. ; edx -- Ref2 SWD for block 4
  1584. mov esi,MBlockActionStream
  1585. xor eax,eax ; Keep pairing happy
  1586. add ecx,Block1.Ref1InterSWD
  1587. add edx,Block1.Ref2InterSWD
  1588. add ecx,Block2.Ref1InterSWD
  1589. add edx,Block2.Ref2InterSWD
  1590. add ecx,Block3.Ref1InterSWD
  1591. add edx,Block3.Ref2InterSWD
  1592. mov bl,[esi].FirstMEState
  1593. mov edi,Block1.AddrCentralPoint
  1594. cmp ecx,edx
  1595. jl MBHorz_Ref1LTRef2
  1596. mov ebp,MBCentralInterSWD
  1597. mov esi,MBlockActionStream
  1598. sub ebp,edx
  1599. jle MBHorz_CenterBest
  1600. mov al,[esi].BlkY1.PHMV ; Half pel to the right is best.
  1601. mov ecx,Block1.Ref2InterSWD
  1602. mov Block1.CentralInterSWD_BLS,ecx
  1603. mov ecx,Block3.Ref2InterSWD
  1604. mov Block3.CentralInterSWD_BLS,ecx
  1605. mov ecx,Block2.Ref2InterSWD
  1606. mov Block2.CentralInterSWD_BLS,ecx
  1607. mov ecx,Block4.Ref2InterSWD
  1608. mov Block4.CentralInterSWD_BLS,ecx
  1609. inc al
  1610. mov [esi].BlkY1.PHMV,al
  1611. jmp MBHorz_Done
  1612. MBHorz_CenterBest:
  1613. mov ecx,Block1.CentralInterSWD
  1614. xor ebp,ebp
  1615. mov Block1.CentralInterSWD_BLS,ecx
  1616. mov ecx,Block2.CentralInterSWD
  1617. mov Block2.CentralInterSWD_BLS,ecx
  1618. mov ecx,Block3.CentralInterSWD
  1619. mov Block3.CentralInterSWD_BLS,ecx
  1620. mov ecx,Block4.CentralInterSWD
  1621. mov Block4.CentralInterSWD_BLS,ecx
  1622. jmp MBHorz_Done
  1623. MBHorz_Ref1LTRef2:
  1624. mov ebp,MBCentralInterSWD
  1625. mov esi,MBlockActionStream
  1626. sub ebp,ecx
  1627. jle MBHorz_CenterBest
  1628. mov al,[esi].BlkY1.PHMV ; Half pel to the left is best.
  1629. mov edx,[esi].BlkY1.PastRef
  1630. dec al
  1631. mov ecx,Block1.Ref1InterSWD
  1632. mov Block1.CentralInterSWD_BLS,ecx
  1633. mov ecx,Block3.Ref1InterSWD
  1634. mov Block3.CentralInterSWD_BLS,ecx
  1635. mov ecx,Block2.Ref1InterSWD
  1636. mov Block2.CentralInterSWD_BLS,ecx
  1637. mov ecx,Block4.Ref1InterSWD
  1638. mov Block4.CentralInterSWD_BLS,ecx
  1639. dec edx
  1640. mov [esi].BlkY1.PHMV,al
  1641. mov [esi].BlkY1.PastRef,edx
  1642. MBHorz_Done:
  1643. mov HalfPelHorzSavings,ebp
  1644. mov ebp,InitHalfPelSearchVert[ebx*4-4]
  1645. ; ebp -- Initialized to 0, except when can't search off left or right edge.
  1646. ; edi -- Ref addr for block 1. Ref1 is .5 pel above. Ref2 is .5 below.
  1647. call DoSWDHalfPelVertLoop
  1648. ; ebp, ebx -- Zero
  1649. ; ecx -- Ref1 SWD for block 4
  1650. ; edx -- Ref2 SWD for block 4
  1651. add ecx,Block1.Ref1InterSWD
  1652. add edx,Block1.Ref2InterSWD
  1653. add ecx,Block2.Ref1InterSWD
  1654. add edx,Block2.Ref2InterSWD
  1655. add ecx,Block3.Ref1InterSWD
  1656. add edx,Block3.Ref2InterSWD
  1657. cmp ecx,edx
  1658. jl MBVert_Ref1LTRef2
  1659. mov ebp,MBCentralInterSWD
  1660. mov esi,MBlockActionStream
  1661. sub ebp,edx
  1662. jle MBVert_CenterBest
  1663. mov ecx,Block1.CentralInterSWD
  1664. mov edx,Block1.Ref2InterSWD
  1665. sub ecx,edx
  1666. mov edx,Block1.CentralInterSWD_BLS
  1667. sub edx,ecx
  1668. mov al,[esi].BlkY1.PVMV ; Half pel below is best.
  1669. mov Block1.CentralInterSWD,edx
  1670. inc al
  1671. mov ecx,Block3.CentralInterSWD
  1672. mov edx,Block3.Ref2InterSWD
  1673. sub ecx,edx
  1674. mov edx,Block3.CentralInterSWD_BLS
  1675. sub edx,ecx
  1676. mov ecx,Block2.CentralInterSWD
  1677. mov Block3.CentralInterSWD,edx
  1678. mov edx,Block2.Ref2InterSWD
  1679. sub ecx,edx
  1680. mov edx,Block2.CentralInterSWD_BLS
  1681. sub edx,ecx
  1682. mov ecx,Block4.CentralInterSWD
  1683. mov Block2.CentralInterSWD,edx
  1684. mov edx,Block4.Ref2InterSWD
  1685. sub ecx,edx
  1686. mov edx,Block4.CentralInterSWD_BLS
  1687. sub edx,ecx
  1688. mov [esi].BlkY1.PVMV,al
  1689. mov Block4.CentralInterSWD,edx
  1690. jmp MBVert_Done
  1691. MBVert_CenterBest:
  1692. mov ecx,Block1.CentralInterSWD_BLS
  1693. xor ebp,ebp
  1694. mov Block1.CentralInterSWD,ecx
  1695. mov ecx,Block2.CentralInterSWD_BLS
  1696. mov Block2.CentralInterSWD,ecx
  1697. mov ecx,Block3.CentralInterSWD_BLS
  1698. mov Block3.CentralInterSWD,ecx
  1699. mov ecx,Block4.CentralInterSWD_BLS
  1700. mov Block4.CentralInterSWD,ecx
  1701. jmp MBVert_Done
  1702. MBVert_Ref1LTRef2:
  1703. mov ebp,MBCentralInterSWD
  1704. mov esi,MBlockActionStream
  1705. sub ebp,ecx
  1706. jle MBVert_CenterBest
  1707. mov ecx,Block1.CentralInterSWD
  1708. mov edx,Block1.Ref1InterSWD
  1709. sub ecx,edx
  1710. mov edx,Block1.CentralInterSWD_BLS
  1711. sub edx,ecx
  1712. mov al,[esi].BlkY1.PVMV ; Half pel above is best.
  1713. mov Block1.CentralInterSWD,edx
  1714. dec al
  1715. mov ecx,Block3.CentralInterSWD
  1716. mov edx,Block3.Ref1InterSWD
  1717. sub ecx,edx
  1718. mov edx,Block3.CentralInterSWD_BLS
  1719. sub edx,ecx
  1720. mov ecx,Block2.CentralInterSWD
  1721. mov Block3.CentralInterSWD,edx
  1722. mov edx,Block2.Ref1InterSWD
  1723. sub ecx,edx
  1724. mov edx,Block2.CentralInterSWD_BLS
  1725. sub edx,ecx
  1726. mov ecx,Block4.CentralInterSWD
  1727. mov Block2.CentralInterSWD,edx
  1728. mov edx,Block4.Ref1InterSWD
  1729. sub ecx,edx
  1730. mov edx,Block4.CentralInterSWD_BLS
  1731. sub edx,ecx
  1732. mov ecx,[esi].BlkY1.PastRef
  1733. mov Block4.CentralInterSWD,edx
  1734. sub ecx,PITCH
  1735. mov [esi].BlkY1.PVMV,al
  1736. mov [esi].BlkY1.PastRef,ecx
  1737. MBVert_Done:
  1738. mov ecx,HalfPelHorzSavings
  1739. mov edx,esi
  1740. add ebp,ecx ; Savings for horz and vert half pel motion.
  1741. mov ecx,MBCentralInterSWD ; Reload SWD for new central point.
  1742. sub ecx,ebp ; Approx SWD for prescribed half pel motion.
  1743. mov esi,[edx].BlkY1.PastRef ; Reload address of reference MB selected.
  1744. mov MBCentralInterSWD,ecx
  1745. SkipHalfPelSearch_1MV:
  1746. ENDIF ; H263
  1747. mov ebp,[edx].BlkY1.MVs ; Load Motion Vectors
  1748. add esi,8
  1749. mov [edx].BlkY2.PastRef,esi
  1750. mov [edx].BlkY2.MVs,ebp
  1751. lea edi,[esi+PITCH*8]
  1752. add esi,PITCH*8-8
  1753. mov [edx].BlkY3.PastRef,esi
  1754. mov [edx].BlkY3.MVs,ebp
  1755. mov [edx].BlkY4.PastRef,edi
  1756. mov [edx].BlkY4.MVs,ebp
  1757. IFDEF H261
  1758. ELSE ; H263
  1759. mov MBMotionVectors,ebp ; Stash macroblock level motion vectors.
  1760. mov ebp,640 ; ??? BlockMVDifferential
  1761. cmp ecx,ebp
  1762. jl NoBlockMotionVectors
  1763. mov ecx,DoBlockLevelVectors
  1764. test ecx,ecx ; Are we doing block level motion vectors?
  1765. je NoBlockMotionVectors
  1766. ; Activity Details for this section of code (refer to flow diagram above):
  1767. ;
  1768. ; The following search is done similarly to the searches done above, except
  1769. ; these are block searches, instead of macroblock searches.
  1770. ;
  1771. ; Expected performance:
  1772. ;
  1773. ; Execution frequency: Six times per block for which motion analysis is done
  1774. ; beyond the 0-motion vector.
  1775. ;
  1776. ; Pentium (tm) microprocessor times per six iterations:
  1777. ; 180 clocks for instruction execution setup to DoSWDLoop
  1778. ; 2520 clocks for DoSWDLoop procedure, instruction execution.
  1779. ; 192 clocks for bank conflicts in DoSWDLoop
  1780. ; 30 clocks generously estimated for an average of 6 cache line fills for
  1781. ; the reference area.
  1782. ; ----
  1783. ; 2922 clocks total time for this section.
  1784. ;
  1785. ; Set up for the "BlkFullPelSWDLoop_4blks" loop to follow.
  1786. ; - Store the SWD values for blocks 4, 3, 2, 1.
  1787. ; - Compute and store the address of the central reference
  1788. ; point for blocks 1, 2, 3, 4.
  1789. ; - Compute and store the first address for ref 1 (minus 4
  1790. ; pels horizontally) and ref 2 (plus 4 pels horizontally)
  1791. ; for blocks 4, 3, 2, 1 (in that order).
  1792. ; - Initialize MotionOffsetsCursor
  1793. ; - On exit:
  1794. ; esi = ref 1 address for block 1
  1795. ; edi = ref 2 address for block 1
  1796. ;
  1797. mov esi,Block4.CentralInterSWD
  1798. mov edi,Block3.CentralInterSWD
  1799. mov Block4.CentralInterSWD_BLS,esi
  1800. mov Block3.CentralInterSWD_BLS,edi
  1801. mov esi,Block2.CentralInterSWD
  1802. mov edi,Block1.CentralInterSWD
  1803. mov Block2.CentralInterSWD_BLS,esi
  1804. mov eax,MBAddrCentralPoint ; Reload addr of central, integer pel ref MB.
  1805. mov Block1.CentralInterSWD_BLS,edi
  1806. mov Block1.AddrCentralPoint,eax
  1807. lea edi,[eax+PITCH*8+8+1]
  1808. lea esi,[eax+PITCH*8+8-1]
  1809. mov Block4.Ref1Addr,esi
  1810. mov Block4.Ref2Addr,edi
  1811. sub esi,8
  1812. add eax,8
  1813. mov Block2.AddrCentralPoint,eax
  1814. add eax,PITCH*8-8
  1815. mov Block3.AddrCentralPoint,eax
  1816. add eax,8
  1817. mov Block4.AddrCentralPoint,eax
  1818. sub edi,8
  1819. mov Block3.Ref1Addr,esi
  1820. mov Block3.Ref2Addr,edi
  1821. sub esi,PITCH*8-8
  1822. sub edi,PITCH*8-8
  1823. mov Block2.Ref1Addr,esi
  1824. mov Block2.Ref2Addr,edi
  1825. sub esi,8
  1826. mov eax,OFFSET MotionOffsets
  1827. mov MotionOffsetsCursor,eax
  1828. sub edi,8
  1829. mov Block1.Ref1Addr,esi
  1830. mov Block1.Ref2Addr,edi
  1831. ;
  1832. ; This loop will execute 6 times:
  1833. ; +- 4 pels horizontally
  1834. ; +- 4 pels vertically
  1835. ; +- 2 pels horizontally
  1836. ; +- 2 pels vertically
  1837. ; +- 1 pel horizontally
  1838. ; +- 1 pel vertically
  1839. ; It terminates when ref1 = ref2. This simple termination
  1840. ; condition is what forces unrestricted motion vectors (UMV)
  1841. ; to be ON when advanced prediction (4MV) is ON. Otherwise
  1842. ; we would need a state engine as above to distinguish edge
  1843. ; pels.
  1844. ;
  1845. BlkFullPelSWDLoop_4blks:
  1846. ; esi -- Points to ref1
  1847. ; edi -- Points to ref2
  1848. ; ecx -- Upper 24 bits zero
  1849. ; ebx -- Upper 24 bits zero
  1850. call DoSWDLoop
  1851. ; ebp -- Ref1 SWD for block 4
  1852. ; edx -- Ref2 SWD for block 4
  1853. ; ecx -- Upper 24 bits zero
  1854. ; ebx -- Upper 24 bits zero
  1855. mov eax,MotionOffsetsCursor
  1856. BlkFullPelSWDLoop_1blk:
  1857. xor esi,esi
  1858. cmp ebp,edx ; CF == 1 iff ref1 SWD < ref2 SWD.
  1859. mov edi,BlockNM1.CentralInterSWD_BLS; Get SWD for central pt of these 3 refs
  1860. adc esi,esi ; esi == 1 iff ref1 SWD < ref2 SWD.
  1861. cmp ebp,edi ; CF == 1 iff ref1 SWD < central SWD.
  1862. mov ebp,BlockNM2.Ref1InterSWD ; Fetch next block's Ref1 SWD.
  1863. adc esi,esi
  1864. cmp edx,edi ; CF == 1 iff ref2 SWD < central SWD.
  1865. adc esi,esi ; 0 --> Pick central point.
  1866. ; ; 1 --> Pick ref2.
  1867. ; ; 2 --> Not possible.
  1868. ; ; 3 --> Pick ref2.
  1869. ; ; 4 --> Pick central point.
  1870. ; ; 5 --> Not possible.
  1871. ; ; 6 --> Pick ref1.
  1872. ; ; 7 --> Pick ref1.
  1873. mov edx,BlockNM2.Ref2InterSWD ; Fetch next block's Ref2 SWD.
  1874. sub esp,BlockLen ; Move ahead to next block.
  1875. mov edi,[eax] ; Next ref2 motion vector offset.
  1876. mov cl,PickPoint_BLS[esi] ; cl == 6: central pt; 2: ref1; 4: ref2
  1877. mov ebx,esp ; For testing completion.
  1878. ;
  1879. ;
  1880. mov esi,BlockN.AddrCentralPoint[ecx*2-12] ; Get the addr for pt we picked.
  1881. mov ecx,BlockN.CentralInterSWD[ecx*2] ; Get the SWD for point we picked.
  1882. mov BlockN.AddrCentralPoint,esi ; Stash addr for new central point.
  1883. sub esi,edi ; Compute next ref1 addr.
  1884. mov BlockN.Ref1Addr,esi ; Stash next ref1 addr.
  1885. mov BlockN.CentralInterSWD_BLS,ecx ; Stash the SWD for central point.
  1886. lea edi,[esi+edi*2] ; Compute next ref2 addr.
  1887. xor ecx,ecx
  1888. mov BlockN.Ref2Addr,edi ; Stash next ref2 addr.
  1889. and ebx,00000001FH ; Done when esp at 32-byte bound.
  1890. jne BlkFullPelSWDLoop_1blk
  1891. add esp,BlockLen*4
  1892. add eax,4 ; Advance MotionOffsets pointer.
  1893. mov MotionOffsetsCursor,eax
  1894. cmp esi,edi
  1895. jne BlkFullPelSWDLoop_4blks
  1896. IF PITCH-384
  1897. *** Error: The magic leaks out of the following code if PITCH isn't 384.
  1898. ENDIF
  1899. ;
  1900. ; The following code has been modified to correctly decode the motion vectors
  1901. ; The previous code was simply subtracting the target frame base address
  1902. ; from the chosen (central) reference block address.
  1903. ; What is now done is the begining reference macroblock address computed
  1904. ; in ebp, then subtracted from the chosen (central) reference block address.
  1905. ; Then, for blocks 2, 3, and 4, the distance from block 1 to that block
  1906. ; is subtracted. Care was taken to preserve the original pairing.
  1907. ;
  1908. mov esi,Block1.AddrCentralPoint ; B1a Reload address of central ref block.
  1909. mov ebp,TargetMBAddr ; **** CHANGE **** addr. of target MB
  1910. mov edi,Block2.AddrCentralPoint ; B2a
  1911. add ebp,TargToRef ; **** CHANGE **** add Reference - Target
  1912. ; mov ebp,PreviousFrameBaseAddress **** CHANGE **** DELETED
  1913. mov Block1.Ref1Addr,esi ; B1b Stash addr central ref block.
  1914. sub esi,ebp ; B1c Addr of ref blk, but in target frame.
  1915. mov Block2.Ref1Addr,edi ; B2b
  1916. sub edi,ebp ; B2c
  1917. sub edi,8 ; **** CHANGE **** Correct for block 2
  1918. mov eax,esi ; B1e Copy linearized MV.
  1919. sar esi,8 ; B1f High 24 bits of lin MV lookup vert MV.
  1920. mov ebx,edi ; B2e
  1921. sar edi,8 ; B2f
  1922. add eax,eax ; B1g Sign extend HMV; *2 (# of half pels).
  1923. mov Block1.BlkHMV,al ; B1h Save winning horz motion vector.
  1924. add ebx,ebx ; B2g
  1925. mov Block2.BlkHMV,bl ; B2h
  1926. mov al,UnlinearizedVertMV[esi] ; B1i Look up proper vert motion vector.
  1927. mov Block1.BlkVMV,al ; B1j Save winning vert motion vector.
  1928. mov al,UnlinearizedVertMV[edi] ; B2i
  1929. mov esi,Block3.AddrCentralPoint ; B3a
  1930. mov edi,Block4.AddrCentralPoint ; B4a
  1931. mov Block3.Ref1Addr,esi ; B3b
  1932. mov Block4.Ref1Addr,edi ; B4b
  1933. mov Block2.BlkVMV,al ; B2j
  1934. sub esi,ebp ; B3c
  1935. sub esi,8*PITCH ; **** CHANGE **** Correct for block 3
  1936. sub edi,ebp ; B4c
  1937. sub edi,8*PITCH+8 ; **** CHANGE **** Correct for block 4
  1938. mov eax,esi ; B3e
  1939. sar esi,8 ; B3f
  1940. mov ebx,edi ; B4e
  1941. sar edi,8 ; B4f
  1942. add eax,eax ; B3g
  1943. mov Block3.BlkHMV,al ; B3h
  1944. add ebx,ebx ; B4g
  1945. mov Block4.BlkHMV,bl ; B4h
  1946. mov al,UnlinearizedVertMV[esi] ; B3i
  1947. mov Block3.BlkVMV,al ; B3j
  1948. mov al,UnlinearizedVertMV[edi] ; B4i
  1949. mov ebp,Block1.CentralInterSWD_BLS
  1950. mov ebx,Block2.CentralInterSWD_BLS
  1951. add ebp,Block3.CentralInterSWD_BLS
  1952. add ebx,Block4.CentralInterSWD_BLS
  1953. add ebx,ebp
  1954. mov Block4.BlkVMV,al ; B4j
  1955. mov ecx,DoHalfPelEstimation
  1956. mov MBCentralInterSWD_BLS,ebx
  1957. test ecx,ecx
  1958. je NoHalfPelBlockLevelMVs
  1959. HalfPelBlockLevelMotionSearch:
  1960. mov edi,Block1.AddrCentralPoint
  1961. xor ebp,ebp
  1962. ; ebp -- Initialized to 0, implying can search both left and right.
  1963. ; edi -- Ref addr for block 1. Ref1 is .5 pel to left. Ref2 is .5 to right.
  1964. call DoSWDHalfPelHorzLoop
  1965. ; ebp, ebx -- Zero
  1966. ; ecx -- Ref1 SWD for block 4
  1967. ; edx -- Ref2 SWD for block 4
  1968. NextBlkHorz:
  1969. mov ebx,BlockNM1.CentralInterSWD_BLS
  1970. cmp ecx,edx
  1971. mov BlockNM1.HalfPelSavings,ebp
  1972. jl BlkHorz_Ref1LTRef2
  1973. mov al,BlockNM1.BlkHMV
  1974. sub esp,BlockLen
  1975. sub ebx,edx
  1976. jle BlkHorz_CenterBest
  1977. inc al
  1978. mov BlockN.HalfPelSavings,ebx
  1979. mov BlockN.BlkHMV,al
  1980. jmp BlkHorz_Done
  1981. BlkHorz_Ref1LTRef2:
  1982. mov al,BlockNM1.BlkHMV
  1983. sub esp,BlockLen
  1984. sub ebx,ecx
  1985. jle BlkHorz_CenterBest
  1986. mov ecx,BlockN.Ref1Addr
  1987. dec al
  1988. mov BlockN.HalfPelSavings,ebx
  1989. dec ecx
  1990. mov BlockN.BlkHMV,al
  1991. mov BlockN.Ref1Addr,ecx
  1992. BlkHorz_CenterBest:
  1993. BlkHorz_Done:
  1994. mov ecx,BlockNM1.Ref1InterSWD
  1995. mov edx,BlockNM1.Ref2InterSWD
  1996. test esp,000000018H
  1997. jne NextBlkHorz
  1998. mov edi,BlockN.AddrCentralPoint
  1999. add esp,BlockLen*4
  2000. ; ebp -- Initialized to 0, implying search both up and down is okay.
  2001. ; edi -- Ref addr for block 1. Ref1 is .5 pel above. Ref2 is .5 below.
  2002. call DoSWDHalfPelVertLoop
  2003. ; ebp, ebx -- Zero
  2004. ; ecx -- Ref1 SWD for block 4
  2005. ; edx -- Ref2 SWD for block 4
  2006. NextBlkVert:
  2007. mov ebx,BlockNM1.CentralInterSWD_BLS
  2008. cmp ecx,edx
  2009. mov edi,BlockNM1.HalfPelSavings
  2010. jl BlkVert_Ref1LTRef2
  2011. mov al,BlockNM1.BlkVMV
  2012. sub esp,BlockLen
  2013. sub edx,ebx
  2014. jge BlkVert_CenterBest
  2015. inc al
  2016. sub edi,edx
  2017. mov BlockN.BlkVMV,al
  2018. jmp BlkVert_Done
  2019. BlkVert_Ref1LTRef2:
  2020. mov al,BlockNM1.BlkVMV
  2021. sub esp,BlockLen
  2022. sub ecx,ebx
  2023. jge BlkVert_CenterBest
  2024. sub edi,ecx
  2025. mov ecx,BlockN.Ref1Addr
  2026. dec al
  2027. sub ecx,PITCH
  2028. mov BlockN.BlkVMV,al
  2029. mov BlockN.Ref1Addr,ecx
  2030. BlkVert_CenterBest:
  2031. BlkVert_Done:
  2032. mov ecx,BlockNM1.Ref1InterSWD
  2033. sub ebx,edi
  2034. mov BlockN.CentralInterSWD_BLS,ebx
  2035. mov edx,BlockNM1.Ref2InterSWD
  2036. test esp,000000018H
  2037. lea ebp,[ebp+edi]
  2038. jne NextBlkVert
  2039. mov ebx,MBCentralInterSWD_BLS+BlockLen*4
  2040. add esp,BlockLen*4
  2041. sub ebx,ebp
  2042. xor eax,eax ; ??? Keep pairing happy
  2043. NoHalfPelBlockLevelMVs:
  2044. mov eax,MBCentralInterSWD
  2045. mov ecx,BlockMVDifferential
  2046. sub eax,ebx
  2047. mov edi,MB0MVInterSWD
  2048. cmp eax,ecx
  2049. jle BlockMVNotBigEnoughGain
  2050. sub edi,ebx
  2051. mov ecx,NonZeroMVDifferential
  2052. cmp edi,ecx
  2053. jle NonZeroMVNotBigEnoughGain
  2054. ; Block motion vectors are best.
  2055. mov MBCentralInterSWD,ebx ; Set MBlock's SWD to sum of 4 blocks.
  2056. mov edx,MBlockActionStream
  2057. mov eax,Block1.CentralInterSWD_BLS ; Set each block's SWD.
  2058. mov ebx,Block2.CentralInterSWD_BLS
  2059. mov Block1.CentralInterSWD,eax
  2060. mov Block2.CentralInterSWD,ebx
  2061. mov eax,Block3.CentralInterSWD_BLS
  2062. mov ebx,Block4.CentralInterSWD_BLS
  2063. mov Block3.CentralInterSWD,eax
  2064. mov Block4.CentralInterSWD,ebx
  2065. mov eax,Block1.BlkMVs ; Set each block's motion vector.
  2066. mov ebx,Block2.BlkMVs
  2067. mov [edx].BlkY1.MVs,eax
  2068. mov [edx].BlkY2.MVs,ebx
  2069. mov eax,Block3.BlkMVs
  2070. mov ebx,Block4.BlkMVs
  2071. mov [edx].BlkY3.MVs,eax
  2072. mov [edx].BlkY4.MVs,ebx
  2073. mov eax,Block1.Ref1Addr ; Set each block's reference blk addr.
  2074. mov ebx,Block2.Ref1Addr
  2075. mov [edx].BlkY1.PastRef,eax
  2076. mov [edx].BlkY2.PastRef,ebx
  2077. mov eax,Block3.Ref1Addr
  2078. mov ebx,Block4.Ref1Addr
  2079. mov [edx].BlkY3.PastRef,eax
  2080. mov eax,INTER4MV ; Set type for MB to INTER-coded, 4 MVs.
  2081. mov [edx].BlkY4.PastRef,ebx
  2082. mov [edx].BlockType,al
  2083. jmp MotionVectorSettled
  2084. NoBlockMotionVectors:
  2085. ENDIF ; H263
  2086. mov edi,MB0MVInterSWD
  2087. BlockMVNotBigEnoughGain: ; Try MB-level motion vector.
  2088. mov eax,MBCentralInterSWD
  2089. mov ecx,NonZeroMVDifferential
  2090. sub edi,eax
  2091. mov edx,MBlockActionStream
  2092. cmp edi,ecx
  2093. jg MotionVectorSettled
  2094. NonZeroMVNotBigEnoughGain: ; Settle on zero MV.
  2095. mov eax,Block1.ZeroMVInterSWD ; Restore Zero MV SWD.
  2096. mov edx,Block2.ZeroMVInterSWD
  2097. mov Block1.CentralInterSWD,eax
  2098. mov Block2.CentralInterSWD,edx
  2099. mov eax,Block3.ZeroMVInterSWD
  2100. mov edx,Block4.ZeroMVInterSWD
  2101. mov Block3.CentralInterSWD,eax
  2102. mov Block4.CentralInterSWD,edx
  2103. mov eax,MB0MVInterSWD ; Restore SWD for zero motion vector.
  2104. BelowZeroThresh:
  2105. mov edx,MBlockActionStream
  2106. mov ebx,TargetMBAddr ; Get address of this target macroblock.
  2107. mov MBCentralInterSWD,eax ; Save SWD.
  2108. xor ebp,ebp
  2109. add ebx,TargToRef
  2110. mov [edx].BlkY1.MVs,ebp ; Set horz and vert MVs to 0 in all blks.
  2111. mov [edx].BlkY1.PastRef,ebx ; Save address of ref block, all blks.
  2112. add ebx,8
  2113. mov [edx].BlkY2.PastRef,ebx
  2114. mov [edx].BlkY2.MVs,ebp
  2115. lea ecx,[ebx+PITCH*8]
  2116. add ebx,PITCH*8-8
  2117. mov [edx].BlkY3.PastRef,ebx
  2118. mov [edx].BlkY3.MVs,ebp
  2119. mov [edx].BlkY4.PastRef,ecx
  2120. mov [edx].BlkY4.MVs,ebp
  2121. ; Activity Details for this section of code (refer to flow diagram above):
  2122. ;
  2123. ; 6) We've settled on the motion vector that will be used if we do indeed
  2124. ; code the macroblock with inter-coding. We need to determine if some
  2125. ; or all of the blocks can be forced as empty (copy).
  2126. ; blocks. If all the blocks can be forced empty, we force the whole
  2127. ; macroblock to be empty.
  2128. ;
  2129. ; Expected Pentium (tm) microprocessor performance for this section:
  2130. ;
  2131. ; Execution frequency: Once per macroblock.
  2132. ;
  2133. ; 23 clocks.
  2134. ;
  2135. MotionVectorSettled:
  2136. IFDEF H261
  2137. mov edi,MBCentralInterSWD
  2138. mov eax,DoSpatialFiltering ; Are we doing spatial filtering?
  2139. mov edi,TargetMBAddr
  2140. test eax,eax
  2141. je SkipSpatialFiltering
  2142. mov ebx,MBCentralInterSWD
  2143. mov esi,SpatialFiltThreshold
  2144. cmp ebx,esi
  2145. jle SkipSpatialFiltering
  2146. add edi,TargToSLF ; Compute addr at which to put SLF prediction.
  2147. xor ebx,ebx
  2148. mov esi,[edx].BlkY1.PastRef
  2149. xor edx,edx
  2150. mov ebp,16
  2151. xor ecx,ecx
  2152. SpatialFilterHorzLoop:
  2153. mov dl,[edi] ; Pre-load cache line for output.
  2154. mov bl,[esi+6] ; p6
  2155. mov al,[esi+7] ; p7
  2156. inc bl ; p6+1
  2157. mov cl,[esi+5] ; p5
  2158. mov [edi+7],al ; p7' = p7
  2159. add al,bl ; p7 + p6 + 1
  2160. add bl,cl ; p6 + p5 + 1
  2161. mov dl,[esi+4] ; p4
  2162. add eax,ebx ; p7 + 2p6 + p5 + 2
  2163. shr eax,2 ; p6' = (p7 + 2p6 + p5 + 2) / 4
  2164. inc dl ; p4 + 1
  2165. add cl,dl ; p5 + p4 + 1
  2166. mov [edi+6],al ; p6'
  2167. mov al,[esi+3] ; p3
  2168. add ebx,ecx ; p6 + 2p5 + p4 + 2
  2169. shr ebx,2 ; p5' = (p6 + 2p5 + p4 + 2) / 4
  2170. add dl,al ; p4 + p3 + 1
  2171. mov [edi+5],bl ; p5'
  2172. mov bl,[esi+2] ; p2
  2173. add ecx,edx ; p5 + 2p4 + p3 + 2
  2174. inc bl ; p2 + 1
  2175. shr ecx,2 ; p4' = (p5 + 2p4 + p3 + 2) / 4
  2176. add al,bl ; p3 + p2 + 1
  2177. mov [edi+4],cl ; p4'
  2178. add edx,eax ; p4 + 2p3 + p2 + 2
  2179. shr edx,2 ; p3' = (p4 + 2p3 + p2 + 2) / 4
  2180. mov cl,[esi+1] ; p1
  2181. add bl,cl ; p2 + p1 + 1
  2182. mov [edi+3],dl ; p3'
  2183. add eax,ebx ; p3 + 2p2 + p1 + 2
  2184. mov dl,[esi] ; p0
  2185. shr eax,2 ; p2' = (p3 + 2p2 + p1 + 2) / 4
  2186. inc ebx ; p2 + p1 + 2
  2187. mov [edi+2],al ; p2'
  2188. add ebx,ecx ; p2 + 2p1 + 2
  2189. mov [edi],dl ; p0' = p0
  2190. add ebx,edx ; p2 + 2p1 + p0 + 2
  2191. shr ebx,2 ; p1' = (p2 + 2p1 + p0 + 2) / 4
  2192. mov al,[esi+7+8]
  2193. mov [edi+1],bl ; p1'
  2194. mov bl,[esi+6+8]
  2195. inc bl
  2196. mov cl,[esi+5+8]
  2197. mov [edi+7+8],al
  2198. add al,bl
  2199. add bl,cl
  2200. mov dl,[esi+4+8]
  2201. add eax,ebx
  2202. ;
  2203. shr eax,2
  2204. inc dl
  2205. add cl,dl
  2206. mov [edi+6+8],al
  2207. mov al,[esi+3+8]
  2208. add ebx,ecx
  2209. shr ebx,2
  2210. add dl,al
  2211. mov [edi+5+8],bl
  2212. mov bl,[esi+2+8]
  2213. add ecx,edx
  2214. inc bl
  2215. shr ecx,2
  2216. add al,bl
  2217. mov [edi+4+8],cl
  2218. add edx,eax
  2219. shr edx,2
  2220. mov cl,[esi+1+8]
  2221. add bl,cl
  2222. mov [edi+3+8],dl
  2223. add eax,ebx
  2224. mov dl,[esi+8]
  2225. shr eax,2
  2226. inc ebx
  2227. mov [edi+2+8],al
  2228. add ebx,ecx
  2229. mov [edi+8],dl
  2230. add ebx,edx
  2231. shr ebx,2
  2232. add esi,PITCH
  2233. mov [edi+1+8],bl
  2234. add edi,PITCH
  2235. dec ebp ; Done?
  2236. jne SpatialFilterHorzLoop
  2237. mov VertFilterDoneAddr,edi
  2238. sub edi,PITCH*16
  2239. SpatialFilterVertLoop:
  2240. mov eax,[edi] ; p0
  2241. ; ; Bank conflict for sure.
  2242. ;
  2243. mov ebx,[edi+PITCH] ; p1
  2244. add eax,ebx ; p0+p1
  2245. mov ecx,[edi+PITCH*2] ; p2
  2246. add ebx,ecx ; p1+p2
  2247. mov edx,[edi+PITCH*3] ; p3
  2248. shr eax,1 ; (p0+p1)/2 dirty
  2249. mov esi,[edi+PITCH*4] ; p4
  2250. add ecx,edx ; p2+p3
  2251. mov ebp,[edi+PITCH*5] ; p5
  2252. shr ebx,1 ; (p1+p2)/2 dirty
  2253. add edx,esi ; p3+p4
  2254. and eax,07F7F7F7FH ; (p0+p1)/2 clean
  2255. and ebx,07F7F7F7FH ; (p1+p2)/2 clean
  2256. and ecx,0FEFEFEFEH ; p2+p3 pre-cleaned
  2257. and edx,0FEFEFEFEH ; p3+p4 pre-cleaned
  2258. shr ecx,1 ; (p2+p3)/2 clean
  2259. add esi,ebp ; p4+p5
  2260. shr edx,1 ; (p3+p4)/2 clean
  2261. lea eax,[eax+ebx+001010101H] ; (p0+p1)/2+(p1+p2)/2+1
  2262. shr esi,1 ; (p4+p5)/2 dirty
  2263. ;
  2264. and esi,07F7F7F7FH ; (p4+p5)/2 clean
  2265. lea ebx,[ebx+ecx+001010101H] ; (p1+p2)/2+(p2+p3)/2+1
  2266. shr eax,1 ; p1' = ((p0+p1)/2+(p1+p2)/2+1)/2 dirty
  2267. lea ecx,[ecx+edx+001010101H] ; (p2+p3)/2+(p3+p4)/2+1
  2268. shr ebx,1 ; p2' = ((p1+p2)/2+(p2+p3)/2+1)/2 dirty
  2269. lea edx,[edx+esi+001010101H] ; (p3+p4)/2+(p4+p5)/2+1
  2270. and eax,07F7F7F7FH ; p1' clean
  2271. and ebx,07F7F7F7FH ; p2' clean
  2272. shr ecx,1 ; p3' = ((p2+p3)/2+(p3+p4)/2+1)/2 dirty
  2273. mov [edi+PITCH],eax ; p1'
  2274. shr edx,1 ; p4' = ((p3+p4)/2+(p4+p5)/2+1)/2 dirty
  2275. mov eax,[edi+PITCH*6] ; p6
  2276. and ecx,07F7F7F7FH ; p3' clean
  2277. and edx,07F7F7F7FH ; p4' clean
  2278. mov [edi+PITCH*2],ebx ; p2'
  2279. add ebp,eax ; p5+p6
  2280. shr ebp,1 ; (p5+p6)/2 dirty
  2281. mov ebx,[edi+PITCH*7] ; p7
  2282. add eax,ebx ; p6+p7
  2283. and ebp,07F7F7F7FH ; (p5+p6)/2 clean
  2284. mov [edi+PITCH*3],ecx ; p3'
  2285. and eax,0FEFEFEFEH ; (p6+p7)/2 pre-cleaned
  2286. shr eax,1 ; (p6+p7)/2 clean
  2287. lea esi,[esi+ebp+001010101H] ; (p4+p5)/2+(p5+p6)/2+1
  2288. shr esi,1 ; p5' = ((p4+p5)/2+(p5+p6)/2+1)/2 dirty
  2289. mov [edi+PITCH*4],edx ; p4'
  2290. lea ebp,[ebp+eax+001010101H] ; (p5+p6)/2+(p6+p7)/2+1
  2291. and esi,07F7F7F7FH ; p5' clean
  2292. shr ebp,1 ; p6' = ((p5+p6)/2+(p6+p7)/2+1)/2 dirty
  2293. mov [edi+PITCH*5],esi ; p5'
  2294. and ebp,07F7F7F7FH ; p6' clean
  2295. add edi,4
  2296. test edi,00000000FH
  2297. mov [edi+PITCH*6-4],ebp ; p6'
  2298. jne SpatialFilterVertLoop
  2299. add edi,PITCH*8-16
  2300. mov eax,VertFilterDoneAddr
  2301. cmp eax,edi
  2302. jne SpatialFilterVertLoop
  2303. ; Activity Details for this section of code (refer to flow diagram above):
  2304. ;
  2305. ; 9) The SAD for the spatially filtered reference macroblock is calculated
  2306. ; with half the pel differences accumulating into the low order half
  2307. ; of ebp, and the other half into the high order half.
  2308. ;
  2309. ; Register usage for this section:
  2310. ;
  2311. ; Input of this section:
  2312. ;
  2313. ; edi -- Address of pel 0,0 of spatially filtered reference macroblock.
  2314. ;
  2315. ; Predominate usage for body of this section:
  2316. ;
  2317. ; edi -- Address of pel 0,0 of spatially filtered reference macroblock.
  2318. ; esi, eax -- -8 times pel values from target macroblock.
  2319. ; ebp[ 0:15] -- SAD Accumulator for half of the match points.
  2320. ; ebp[16:31] -- SAD Accumulator for other half of the match points.
  2321. ; edx[ 0: 7] -- Weighted difference for one pel.
  2322. ; edx[ 8:15] -- Zero.
  2323. ; edx[16:23] -- Weighted difference for another pel.
  2324. ; edx[24:31] -- Zero.
  2325. ; bl, cl -- Pel values from the spatially filtered reference macroblock.
  2326. ;
  2327. ; Expected Pentium (tm) microprocessor performance for this section:
  2328. ;
  2329. ; Execution frequency: Once per block for which motion analysis is done
  2330. ; beyond the 0-motion vector.
  2331. ;
  2332. ; 146 clocks instruction execution (typically).
  2333. ; 6 clocks for bank conflicts (1/8 chance with 48 dual mem ops).
  2334. ; 0 clocks for new cache line fills.
  2335. ; ----
  2336. ; 152 clocks total time for this section.
  2337. ;
  2338. SpatialFilterDone:
  2339. sub edi,PITCH*8-8 ; Get to block 4.
  2340. xor ebp,ebp
  2341. xor ebx,ebx
  2342. xor ecx,ecx
  2343. SLFSWDLoop:
  2344. mov eax,BlockNM1.N8T00 ; Get -8 times target Pel00.
  2345. mov bl,[edi] ; Get Pel00 in spatially filtered reference.
  2346. mov esi,BlockNM1.N8T04
  2347. mov cl,[edi+4]
  2348. mov edx,[eax+ebx*8] ; Get abs diff for spatial filtered ref pel00.
  2349. mov eax,BlockNM1.N8T02
  2350. mov dl,[esi+ecx*8+2] ; Get abs diff for spatial filtered ref pel04.
  2351. mov bl,[edi+2]
  2352. mov esi,BlockNM1.N8T06
  2353. mov cl,[edi+6]
  2354. mov ebp,edx
  2355. mov edx,[eax+ebx*8]
  2356. mov eax,BlockNM1.N8T11
  2357. mov dl,[esi+ecx*8+2]
  2358. mov bl,[edi+PITCH*1+1]
  2359. mov cl,[edi+PITCH*1+5]
  2360. mov esi,BlockNM1.N8T15
  2361. add ebp,edx
  2362. mov edx,[eax+ebx*8]
  2363. mov eax,BlockNM1.N8T13
  2364. mov dl,[esi+ecx*8+2]
  2365. mov bl,[edi+PITCH*1+3]
  2366. mov cl,[edi+PITCH*1+7]
  2367. mov esi,BlockNM1.N8T17
  2368. add ebp,edx
  2369. mov edx,[eax+ebx*8]
  2370. mov eax,BlockNM1.N8T20
  2371. mov dl,[esi+ecx*8+2]
  2372. mov bl,[edi+PITCH*2+0]
  2373. mov cl,[edi+PITCH*2+4]
  2374. mov esi,BlockNM1.N8T24
  2375. add ebp,edx
  2376. mov edx,[eax+ebx*8]
  2377. mov eax,BlockNM1.N8T22
  2378. mov dl,[esi+ecx*8+2]
  2379. mov bl,[edi+PITCH*2+2]
  2380. mov cl,[edi+PITCH*2+6]
  2381. mov esi,BlockNM1.N8T26
  2382. add ebp,edx
  2383. mov edx,[eax+ebx*8]
  2384. mov eax,BlockNM1.N8T31
  2385. mov dl,[esi+ecx*8+2]
  2386. mov bl,[edi+PITCH*3+1]
  2387. mov cl,[edi+PITCH*3+5]
  2388. mov esi,BlockNM1.N8T35
  2389. add ebp,edx
  2390. mov edx,[eax+ebx*8]
  2391. mov eax,BlockNM1.N8T33
  2392. mov dl,[esi+ecx*8+2]
  2393. mov bl,[edi+PITCH*3+3]
  2394. mov cl,[edi+PITCH*3+7]
  2395. mov esi,BlockNM1.N8T37
  2396. add ebp,edx
  2397. mov edx,[eax+ebx*8]
  2398. mov eax,BlockNM1.N8T40
  2399. mov dl,[esi+ecx*8+2]
  2400. mov bl,[edi+PITCH*4+0]
  2401. mov cl,[edi+PITCH*4+4]
  2402. mov esi,BlockNM1.N8T44
  2403. add ebp,edx
  2404. mov edx,[eax+ebx*8]
  2405. mov eax,BlockNM1.N8T42
  2406. mov dl,[esi+ecx*8+2]
  2407. mov bl,[edi+PITCH*4+2]
  2408. mov cl,[edi+PITCH*4+6]
  2409. mov esi,BlockNM1.N8T46
  2410. add ebp,edx
  2411. mov edx,[eax+ebx*8]
  2412. mov eax,BlockNM1.N8T51
  2413. mov dl,[esi+ecx*8+2]
  2414. mov bl,[edi+PITCH*5+1]
  2415. mov cl,[edi+PITCH*5+5]
  2416. mov esi,BlockNM1.N8T55
  2417. add ebp,edx
  2418. mov edx,[eax+ebx*8]
  2419. mov eax,BlockNM1.N8T53
  2420. mov dl,[esi+ecx*8+2]
  2421. mov bl,[edi+PITCH*5+3]
  2422. mov cl,[edi+PITCH*5+7]
  2423. mov esi,BlockNM1.N8T57
  2424. add ebp,edx
  2425. mov edx,[eax+ebx*8]
  2426. mov eax,BlockNM1.N8T60
  2427. mov dl,[esi+ecx*8+2]
  2428. mov bl,[edi+PITCH*6+0]
  2429. mov cl,[edi+PITCH*6+4]
  2430. mov esi,BlockNM1.N8T64
  2431. add ebp,edx
  2432. mov edx,[eax+ebx*8]
  2433. mov eax,BlockNM1.N8T62
  2434. mov dl,[esi+ecx*8+2]
  2435. mov bl,[edi+PITCH*6+2]
  2436. mov cl,[edi+PITCH*6+6]
  2437. mov esi,BlockNM1.N8T66
  2438. add ebp,edx
  2439. mov edx,[eax+ebx*8]
  2440. mov eax,BlockNM1.N8T71
  2441. mov dl,[esi+ecx*8+2]
  2442. mov bl,[edi+PITCH*7+1]
  2443. mov cl,[edi+PITCH*7+5]
  2444. mov esi,BlockNM1.N8T75
  2445. add ebp,edx
  2446. mov edx,[eax+ebx*8]
  2447. mov eax,BlockNM1.N8T73
  2448. mov dl,[esi+ecx*8+2]
  2449. mov bl,[edi+PITCH*7+3]
  2450. mov cl,[edi+PITCH*7+7]
  2451. mov esi,BlockNM1.N8T77
  2452. add ebp,edx
  2453. mov edx,[eax+ebx*8]
  2454. add edx,ebp
  2455. mov cl,[esi+ecx*8+2]
  2456. shr edx,16
  2457. add ebp,ecx
  2458. and ebp,0FFFFH
  2459. sub esp,BlockLen
  2460. add ebp,edx
  2461. sub edi,8
  2462. test esp,000000008H
  2463. mov BlockN.CentralInterSWD_SLF,ebp
  2464. jne SLFSWDLoop
  2465. test esp,000000010H
  2466. lea edi,[edi-PITCH*8+16]
  2467. jne SLFSWDLoop
  2468. mov eax,Block2.CentralInterSWD_SLF+BlockLen*4
  2469. mov ebx,Block3.CentralInterSWD_SLF+BlockLen*4
  2470. mov ecx,Block4.CentralInterSWD_SLF+BlockLen*4
  2471. add esp,BlockLen*4
  2472. add ebp,ecx
  2473. lea edx,[eax+ebx]
  2474. add ebp,edx
  2475. mov edx,SpatialFiltDifferential
  2476. lea esi,[edi+PITCH*8-8]
  2477. mov edi,MBCentralInterSWD
  2478. sub edi,edx
  2479. mov edx,MBlockActionStream
  2480. cmp ebp,edi
  2481. jge SpatialFilterNotAsGood
  2482. mov MBCentralInterSWD,ebp ; Spatial filter was better. Stash
  2483. mov ebp,Block1.CentralInterSWD_SLF ; pertinent calculations.
  2484. mov Block2.CentralInterSWD,eax
  2485. mov Block3.CentralInterSWD,ebx
  2486. mov Block4.CentralInterSWD,ecx
  2487. mov Block1.CentralInterSWD,ebp
  2488. mov [edx].BlkY1.PastRef,esi
  2489. mov al,INTERSLF
  2490. mov [edx].BlockType,al
  2491. SkipSpatialFiltering:
  2492. SpatialFilterNotAsGood:
  2493. ENDIF ; H261
  2494. mov al,[edx].CodedBlocks ; Fetch coded block pattern.
  2495. mov edi,EmptyThreshold ; Get threshold for forcing block empty?
  2496. mov ebp,MBCentralInterSWD
  2497. mov esi,InterSWDBlocks
  2498. mov ebx,Block4.CentralInterSWD ; Is SWD > threshold?
  2499. cmp ebx,edi
  2500. jg @f
  2501. and al,0F7H ; If not, indicate block 4 is NOT coded.
  2502. dec esi
  2503. sub ebp,ebx
  2504. @@:
  2505. mov ebx,Block3.CentralInterSWD
  2506. cmp ebx,edi
  2507. jg @f
  2508. and al,0FBH
  2509. dec esi
  2510. sub ebp,ebx
  2511. @@:
  2512. mov ebx,Block2.CentralInterSWD
  2513. cmp ebx,edi
  2514. jg @f
  2515. and al,0FDH
  2516. dec esi
  2517. sub ebp,ebx
  2518. @@:
  2519. mov ebx,Block1.CentralInterSWD
  2520. cmp ebx,edi
  2521. jg @f
  2522. and al,0FEH
  2523. dec esi
  2524. sub ebp,ebx
  2525. @@:
  2526. mov [edx].CodedBlocks,al ; Store coded block pattern.
  2527. add esi,4
  2528. mov InterSWDBlocks,esi
  2529. xor ebx,ebx
  2530. and eax,00FH
  2531. mov MBCentralInterSWD,ebp
  2532. cmp al,00FH ; Are any blocks marked empty?
  2533. jne InterBest ; If some blocks are empty, can't code as Intra
  2534. cmp ebp,InterCodingThreshold ; Is InterSWD below inter-coding threshhold.
  2535. lea esi,Block1+128
  2536. mov ebp,0
  2537. jae CalculateIntraSWD
  2538. InterBest:
  2539. mov ecx,InterSWDTotal
  2540. mov ebp,MBCentralInterSWD
  2541. add ecx,ebp ; Add to total for this macroblock class.
  2542. mov PD [edx].SWD,ebp
  2543. mov InterSWDTotal,ecx
  2544. jmp NextMacroBlock
  2545. ; Activity Details for this section of code (refer to flow diagram above):
  2546. ;
  2547. ; 11) The IntraSWD is calculated as two partial sums, one in the low order
  2548. ; 16 bits of ebp and one in the high order 16 bits. An average pel
  2549. ; value for each block will be calculated to the nearest half.
  2550. ;
  2551. ; Register usage for this section:
  2552. ;
  2553. ; Input of this section:
  2554. ;
  2555. ; None
  2556. ;
  2557. ; Predominate usage for body of this section:
  2558. ;
  2559. ; esi -- Address of target block 1 (3), plus 128.
  2560. ; ebp[ 0:15] -- IntraSWD Accumulator for block 1 (3).
  2561. ; ebp[16:31] -- IntraSWD Accumulator for block 2 (4).
  2562. ; edi -- Block 2 (4) target pel, times -8, and with WeightedDiff added.
  2563. ; edx -- Block 1 (3) target pel, times -8, and with WeightedDiff added.
  2564. ; ecx[ 0: 7] -- Weighted difference for one pel in block 2 (4).
  2565. ; ecx[ 8:15] -- Zero.
  2566. ; ecx[16:23] -- Weighted difference for one pel in block 1 (3).
  2567. ; ecx[24:31] -- Zero.
  2568. ; ebx -- Average block 2 (4) target pel to nearest .5.
  2569. ; eax -- Average block 1 (3) target pel to nearest .5.
  2570. ;
  2571. ; Output of this section:
  2572. ;
  2573. ; edi -- Scratch.
  2574. ; ebp[ 0:15] -- IntraSWD. (Also written to MBlockActionStream.)
  2575. ; ebp[16:31] -- garbage.
  2576. ; ebx -- Zero.
  2577. ; eax -- MBlockActionStream.
  2578. ;
  2579. ; Expected Pentium (tm) microprocessor performance for this section:
  2580. ;
  2581. ; Executed once per macroblock, (except for those for which one of more blocks
  2582. ; are marked empty, or where the InterSWD is less than a threshold).
  2583. ;
  2584. ; 183 clocks for instruction execution
  2585. ; 12 clocks for bank conflicts (94 dual mem ops with 1/8 chance of conflict)
  2586. ; ----
  2587. ; 195 clocks total time for this section.
  2588. IntraByDecree:
  2589. mov eax,InterSWDBlocks ; Inc by 4, because we will undo it below.
  2590. xor ebp,ebp
  2591. mov MBMotionVectors,ebp ; Stash zero for MB level motion vectors.
  2592. mov ebp,040000000H ; Set Inter SWD artificially high.
  2593. lea esi,Block1+128
  2594. add eax,4
  2595. mov MBCentralInterSWD,ebp
  2596. mov InterSWDBlocks,eax
  2597. CalculateIntraSWD:
  2598. CalculateIntraSWDLoop:
  2599. mov eax,[esi-128].AccumTargetPels ; Fetch acc of target pels for 1st block.
  2600. mov edx,[esi-128].N8T00
  2601. add eax,8
  2602. mov ebx,[esi-128+BlockLen].AccumTargetPels
  2603. shr eax,4 ; Average block 1 target pel rounded to nearest .5.
  2604. add ebx,8
  2605. shr ebx,4
  2606. mov edi,[esi-128+BlockLen].N8T00
  2607. mov ecx,PD [edx+eax*4]
  2608. mov edx,[esi-128].N8T02
  2609. mov cl,PB [edi+ebx*4+2]
  2610. mov edi,[esi-128+BlockLen].N8T02
  2611. add ebp,ecx
  2612. mov ecx,PD [edx+eax*4]
  2613. mov edx,[esi-128].N8T04
  2614. mov cl,PB [edi+ebx*4+2]
  2615. mov edi,[esi-128+BlockLen].N8T04
  2616. add ebp,ecx
  2617. mov ecx,PD [edx+eax*4]
  2618. mov edx,[esi-128].N8T06
  2619. mov cl,PB [edi+ebx*4+2]
  2620. mov edi,[esi-128+BlockLen].N8T06
  2621. add ebp,ecx
  2622. mov ecx,PD [edx+eax*4]
  2623. mov edx,[esi-128].N8T11
  2624. mov cl,PB [edi+ebx*4+2]
  2625. mov edi,[esi-128+BlockLen].N8T11
  2626. add ebp,ecx
  2627. mov ecx,PD [edx+eax*4]
  2628. mov edx,[esi-128].N8T13
  2629. mov cl,PB [edi+ebx*4+2]
  2630. mov edi,[esi-128+BlockLen].N8T13
  2631. add ebp,ecx
  2632. mov ecx,PD [edx+eax*4]
  2633. mov edx,[esi-128].N8T15
  2634. mov cl,PB [edi+ebx*4+2]
  2635. mov edi,[esi-128+BlockLen].N8T15
  2636. add ebp,ecx
  2637. mov ecx,PD [edx+eax*4]
  2638. mov edx,[esi-128].N8T17
  2639. mov cl,PB [edi+ebx*4+2]
  2640. mov edi,[esi-128+BlockLen].N8T17
  2641. add ebp,ecx
  2642. mov ecx,PD [edx+eax*4]
  2643. mov edx,[esi-128].N8T20
  2644. mov cl,PB [edi+ebx*4+2]
  2645. mov edi,[esi-128+BlockLen].N8T20
  2646. add ebp,ecx
  2647. mov ecx,PD [edx+eax*4]
  2648. mov edx,[esi-128].N8T22
  2649. mov cl,PB [edi+ebx*4+2]
  2650. mov edi,[esi-128+BlockLen].N8T22
  2651. add ebp,ecx
  2652. mov ecx,PD [edx+eax*4]
  2653. mov edx,[esi-128].N8T24
  2654. mov cl,PB [edi+ebx*4+2]
  2655. mov edi,[esi-128+BlockLen].N8T24
  2656. add ebp,ecx
  2657. mov ecx,PD [edx+eax*4]
  2658. mov edx,[esi-128].N8T26
  2659. mov cl,PB [edi+ebx*4+2]
  2660. mov edi,[esi-128+BlockLen].N8T26
  2661. add ebp,ecx
  2662. mov ecx,PD [edx+eax*4]
  2663. mov edx,[esi-128].N8T31
  2664. mov cl,PB [edi+ebx*4+2]
  2665. mov edi,[esi-128+BlockLen].N8T31
  2666. add ebp,ecx
  2667. mov ecx,PD [edx+eax*4]
  2668. mov edx,[esi-128].N8T33
  2669. mov cl,PB [edi+ebx*4+2]
  2670. mov edi,[esi-128+BlockLen].N8T33
  2671. add ebp,ecx
  2672. mov ecx,PD [edx+eax*4]
  2673. mov edx,[esi-128].N8T35
  2674. mov cl,PB [edi+ebx*4+2]
  2675. mov edi,[esi-128+BlockLen].N8T35
  2676. add ebp,ecx
  2677. mov ecx,PD [edx+eax*4]
  2678. mov edx,[esi-128].N8T37
  2679. mov cl,PB [edi+ebx*4+2]
  2680. mov edi,[esi-128+BlockLen].N8T37
  2681. add ebp,ecx
  2682. mov ecx,PD [edx+eax*4]
  2683. mov edx,[esi-128].N8T40
  2684. mov cl,PB [edi+ebx*4+2]
  2685. mov edi,[esi-128+BlockLen].N8T40
  2686. add ebp,ecx
  2687. mov ecx,PD [edx+eax*4]
  2688. mov edx,[esi-128].N8T42
  2689. mov cl,PB [edi+ebx*4+2]
  2690. mov edi,[esi-128+BlockLen].N8T42
  2691. add ebp,ecx
  2692. mov ecx,PD [edx+eax*4]
  2693. mov edx,[esi-128].N8T44
  2694. mov cl,PB [edi+ebx*4+2]
  2695. mov edi,[esi-128+BlockLen].N8T44
  2696. add ebp,ecx
  2697. mov ecx,PD [edx+eax*4]
  2698. mov edx,[esi-128].N8T46
  2699. mov cl,PB [edi+ebx*4+2]
  2700. mov edi,[esi-128+BlockLen].N8T46
  2701. add ebp,ecx
  2702. mov ecx,PD [edx+eax*4]
  2703. mov edx,[esi-128].N8T51
  2704. mov cl,PB [edi+ebx*4+2]
  2705. mov edi,[esi-128+BlockLen].N8T51
  2706. add ebp,ecx
  2707. mov ecx,PD [edx+eax*4]
  2708. mov edx,[esi-128].N8T53
  2709. mov cl,PB [edi+ebx*4+2]
  2710. mov edi,[esi-128+BlockLen].N8T53
  2711. add ebp,ecx
  2712. mov ecx,PD [edx+eax*4]
  2713. mov edx,[esi-128].N8T55
  2714. mov cl,PB [edi+ebx*4+2]
  2715. mov edi,[esi-128+BlockLen].N8T55
  2716. add ebp,ecx
  2717. mov ecx,PD [edx+eax*4]
  2718. mov edx,[esi-128].N8T57
  2719. mov cl,PB [edi+ebx*4+2]
  2720. mov edi,[esi-128+BlockLen].N8T57
  2721. add ebp,ecx
  2722. mov ecx,PD [edx+eax*4]
  2723. mov edx,[esi-128].N8T60
  2724. mov cl,PB [edi+ebx*4+2]
  2725. mov edi,[esi-128+BlockLen].N8T60
  2726. add ebp,ecx
  2727. mov ecx,PD [edx+eax*4]
  2728. mov edx,[esi-128].N8T62
  2729. mov cl,PB [edi+ebx*4+2]
  2730. mov edi,[esi-128+BlockLen].N8T62
  2731. add ebp,ecx
  2732. mov ecx,PD [edx+eax*4]
  2733. mov edx,[esi-128].N8T64
  2734. mov cl,PB [edi+ebx*4+2]
  2735. mov edi,[esi-128+BlockLen].N8T64
  2736. add ebp,ecx
  2737. mov ecx,PD [edx+eax*4]
  2738. mov edx,[esi-128].N8T66
  2739. mov cl,PB [edi+ebx*4+2]
  2740. mov edi,[esi-128+BlockLen].N8T66
  2741. add ebp,ecx
  2742. mov ecx,PD [edx+eax*4]
  2743. mov edx,[esi-128].N8T71
  2744. mov cl,PB [edi+ebx*4+2]
  2745. mov edi,[esi-128+BlockLen].N8T71
  2746. add ebp,ecx
  2747. mov ecx,PD [edx+eax*4]
  2748. mov edx,[esi-128].N8T73
  2749. mov cl,PB [edi+ebx*4+2]
  2750. mov edi,[esi-128+BlockLen].N8T73
  2751. add ebp,ecx
  2752. mov ecx,PD [edx+eax*4]
  2753. mov edx,[esi-128].N8T75
  2754. mov cl,PB [edi+ebx*4+2]
  2755. mov edi,[esi-128+BlockLen].N8T75
  2756. add ebp,ecx
  2757. mov ecx,PD [edx+eax*4]
  2758. mov edx,[esi-128].N8T77
  2759. mov cl,PB [edi+ebx*4+2]
  2760. mov edi,[esi-128+BlockLen].N8T77
  2761. add ebp,ecx
  2762. mov ecx,PD [edx+eax*4]
  2763. mov cl,PB [edi+ebx*4+2]
  2764. mov eax,000007FFFH
  2765. add ebp,ecx
  2766. add esi,BlockLen*2
  2767. and eax,ebp
  2768. mov ecx,MBCentralInterSWD
  2769. shr ebp,16
  2770. sub ecx,IntraCodingDifferential
  2771. add ebp,eax
  2772. mov edx,MBlockActionStream ; Reload list ptr.
  2773. cmp ecx,ebp ; Is IntraSWD > InterSWD - differential?
  2774. jl InterBest
  2775. lea ecx,Block1+128+BlockLen*2
  2776. cmp ecx,esi
  2777. je CalculateIntraSWDLoop
  2778. ; ebp -- IntraSWD
  2779. ; edx -- MBlockActionStream
  2780. DoneCalcIntraSWD:
  2781. IntraBest:
  2782. mov ecx,IntraSWDTotal
  2783. mov edi,IntraSWDBlocks
  2784. add ecx,ebp ; Add to total for this macroblock class.
  2785. add edi,4 ; Accumulate # of blocks for this type.
  2786. mov IntraSWDBlocks,edi
  2787. mov edi,InterSWDBlocks
  2788. sub edi,4
  2789. mov IntraSWDTotal,ecx
  2790. mov InterSWDBlocks,edi
  2791. mov bl,INTRA
  2792. mov PB [edx].BlockType,bl ; Indicate macroblock handling decision.
  2793. IFDEF H261
  2794. xor ebx,ebx
  2795. ELSE ; H263
  2796. mov ebx,MBMotionVectors ; Set MVs to best MB level motion vectors.
  2797. ENDIF
  2798. mov PD [edx].BlkY1.MVs,ebx
  2799. mov PD [edx].BlkY2.MVs,ebx
  2800. mov PD [edx].BlkY3.MVs,ebx
  2801. mov PD [edx].BlkY4.MVs,ebx
  2802. xor ebx,ebx
  2803. mov PD [edx].SWD,ebp
  2804. jmp NextMacroBlock
  2805. ;==============================================================================
  2806. ; Internal functions
  2807. ;==============================================================================
  2808. DoSWDLoop:
  2809. ; Upon entry:
  2810. ; esi -- Points to ref1
  2811. ; edi -- Points to ref2
  2812. ; ecx -- Upper 24 bits zero
  2813. ; ebx -- Upper 24 bits zero
  2814. mov bl,PB [esi] ; 00A -- Get Pel 00 in reference ref1.
  2815. mov eax,Block1.N8T00+4 ; 00B -- Get -8 times target pel 00.
  2816. mov cl,PB [edi] ; 00C -- Get Pel 00 in reference ref2.
  2817. sub esp,BlockLen*4+28
  2818. SWDLoop:
  2819. mov edx,PD [eax+ebx*8] ; 00D -- Get weighted diff for ref1 pel 00.
  2820. mov bl,PB [esi+2] ; 02A
  2821. mov dl,PB [eax+ecx*8+2] ; 00E -- Get weighted diff for ref2 pel 00.
  2822. mov eax,BlockN.N8T02+32 ; 02B
  2823. mov ebp,edx ; 00F -- Accum weighted diffs for pel 00.
  2824. mov cl,PB [edi+2] ; 02C
  2825. mov edx,PD [eax+ebx*8] ; 02D
  2826. mov bl,PB [esi+4] ; 04A
  2827. mov dl,PB [eax+ecx*8+2] ; 02E
  2828. mov eax,BlockN.N8T04+32 ; 04B
  2829. mov cl,PB [edi+4] ; 04C
  2830. add ebp,edx ; 02F
  2831. mov edx,PD [eax+ebx*8] ; 04D
  2832. mov bl,PB [esi+6]
  2833. mov dl,PB [eax+ecx*8+2] ; 04E
  2834. mov eax,BlockN.N8T06+32
  2835. mov cl,PB [edi+6]
  2836. add ebp,edx ; 04F
  2837. mov edx,PD [eax+ebx*8]
  2838. mov bl,PB [esi+PITCH*1+1]
  2839. mov dl,PB [eax+ecx*8+2]
  2840. mov eax,BlockN.N8T11+32
  2841. mov cl,PB [edi+PITCH*1+1]
  2842. add ebp,edx
  2843. mov edx,PD [eax+ebx*8]
  2844. mov bl,PB [esi+PITCH*1+3]
  2845. mov dl,PB [eax+ecx*8+2]
  2846. mov eax,BlockN.N8T13+32
  2847. mov cl,PB [edi+PITCH*1+3]
  2848. add ebp,edx
  2849. mov edx,PD [eax+ebx*8]
  2850. mov bl,PB [esi+PITCH*1+5]
  2851. mov dl,PB [eax+ecx*8+2]
  2852. mov eax,BlockN.N8T15+32
  2853. mov cl,PB [edi+PITCH*1+5]
  2854. add ebp,edx
  2855. mov edx,PD [eax+ebx*8]
  2856. mov bl,PB [esi+PITCH*1+7]
  2857. mov dl,PB [eax+ecx*8+2]
  2858. mov eax,BlockN.N8T17+32
  2859. mov cl,PB [edi+PITCH*1+7]
  2860. add ebp,edx
  2861. mov edx,PD [eax+ebx*8]
  2862. mov bl,PB [esi+PITCH*2+0]
  2863. mov dl,PB [eax+ecx*8+2]
  2864. mov eax,BlockN.N8T20+32
  2865. mov cl,PB [edi+PITCH*2+0]
  2866. add ebp,edx
  2867. mov edx,PD [eax+ebx*8]
  2868. mov bl,PB [esi+PITCH*2+2]
  2869. mov dl,PB [eax+ecx*8+2]
  2870. mov eax,BlockN.N8T22+32
  2871. mov cl,PB [edi+PITCH*2+2]
  2872. add ebp,edx
  2873. mov edx,PD [eax+ebx*8]
  2874. mov bl,PB [esi+PITCH*2+4]
  2875. mov dl,PB [eax+ecx*8+2]
  2876. mov eax,BlockN.N8T24+32
  2877. mov cl,PB [edi+PITCH*2+4]
  2878. add ebp,edx
  2879. mov edx,PD [eax+ebx*8]
  2880. mov bl,PB [esi+PITCH*2+6]
  2881. mov dl,PB [eax+ecx*8+2]
  2882. mov eax,BlockN.N8T26+32
  2883. mov cl,PB [edi+PITCH*2+6]
  2884. add ebp,edx
  2885. mov edx,PD [eax+ebx*8]
  2886. mov bl,PB [esi+PITCH*3+1]
  2887. mov dl,PB [eax+ecx*8+2]
  2888. mov eax,BlockN.N8T31+32
  2889. mov cl,PB [edi+PITCH*3+1]
  2890. add ebp,edx
  2891. mov edx,PD [eax+ebx*8]
  2892. mov bl,PB [esi+PITCH*3+3]
  2893. mov dl,PB [eax+ecx*8+2]
  2894. mov eax,BlockN.N8T33+32
  2895. mov cl,PB [edi+PITCH*3+3]
  2896. add ebp,edx
  2897. mov edx,PD [eax+ebx*8]
  2898. mov bl,PB [esi+PITCH*3+5]
  2899. mov dl,PB [eax+ecx*8+2]
  2900. mov eax,BlockN.N8T35+32
  2901. mov cl,PB [edi+PITCH*3+5]
  2902. add ebp,edx
  2903. mov edx,PD [eax+ebx*8]
  2904. mov bl,PB [esi+PITCH*3+7]
  2905. mov dl,PB [eax+ecx*8+2]
  2906. mov eax,BlockN.N8T37+32
  2907. mov cl,PB [edi+PITCH*3+7]
  2908. add ebp,edx
  2909. mov edx,PD [eax+ebx*8]
  2910. mov bl,PB [esi+PITCH*4+0]
  2911. mov dl,PB [eax+ecx*8+2]
  2912. mov eax,BlockN.N8T40+32
  2913. mov cl,PB [edi+PITCH*4+0]
  2914. add ebp,edx
  2915. mov edx,PD [eax+ebx*8]
  2916. mov bl,PB [esi+PITCH*4+2]
  2917. mov dl,PB [eax+ecx*8+2]
  2918. mov eax,BlockN.N8T42+32
  2919. mov cl,PB [edi+PITCH*4+2]
  2920. add ebp,edx
  2921. mov edx,PD [eax+ebx*8]
  2922. mov bl,PB [esi+PITCH*4+4]
  2923. mov dl,PB [eax+ecx*8+2]
  2924. mov eax,BlockN.N8T44+32
  2925. mov cl,PB [edi+PITCH*4+4]
  2926. add ebp,edx
  2927. mov edx,PD [eax+ebx*8]
  2928. mov bl,PB [esi+PITCH*4+6]
  2929. mov dl,PB [eax+ecx*8+2]
  2930. mov eax,BlockN.N8T46+32
  2931. mov cl,PB [edi+PITCH*4+6]
  2932. add ebp,edx
  2933. mov edx,PD [eax+ebx*8]
  2934. mov bl,PB [esi+PITCH*5+1]
  2935. mov dl,PB [eax+ecx*8+2]
  2936. mov eax,BlockN.N8T51+32
  2937. mov cl,PB [edi+PITCH*5+1]
  2938. add ebp,edx
  2939. mov edx,PD [eax+ebx*8]
  2940. mov bl,PB [esi+PITCH*5+3]
  2941. mov dl,PB [eax+ecx*8+2]
  2942. mov eax,BlockN.N8T53+32
  2943. mov cl,PB [edi+PITCH*5+3]
  2944. add ebp,edx
  2945. mov edx,PD [eax+ebx*8]
  2946. mov bl,PB [esi+PITCH*5+5]
  2947. mov dl,PB [eax+ecx*8+2]
  2948. mov eax,BlockN.N8T55+32
  2949. mov cl,PB [edi+PITCH*5+5]
  2950. add ebp,edx
  2951. mov edx,PD [eax+ebx*8]
  2952. mov bl,PB [esi+PITCH*5+7]
  2953. mov dl,PB [eax+ecx*8+2]
  2954. mov eax,BlockN.N8T57+32
  2955. mov cl,PB [edi+PITCH*5+7]
  2956. add ebp,edx
  2957. mov edx,PD [eax+ebx*8]
  2958. mov bl,PB [esi+PITCH*6+0]
  2959. mov dl,PB [eax+ecx*8+2]
  2960. mov eax,BlockN.N8T60+32
  2961. mov cl,PB [edi+PITCH*6+0]
  2962. add ebp,edx
  2963. mov edx,PD [eax+ebx*8]
  2964. mov bl,PB [esi+PITCH*6+2]
  2965. mov dl,PB [eax+ecx*8+2]
  2966. mov eax,BlockN.N8T62+32
  2967. mov cl,PB [edi+PITCH*6+2]
  2968. add ebp,edx
  2969. mov edx,PD [eax+ebx*8]
  2970. mov bl,PB [esi+PITCH*6+4]
  2971. mov dl,PB [eax+ecx*8+2]
  2972. mov eax,BlockN.N8T64+32
  2973. mov cl,PB [edi+PITCH*6+4]
  2974. add ebp,edx
  2975. mov edx,PD [eax+ebx*8]
  2976. mov bl,PB [esi+PITCH*6+6]
  2977. mov dl,PB [eax+ecx*8+2]
  2978. mov eax,BlockN.N8T66+32
  2979. mov cl,PB [edi+PITCH*6+6]
  2980. add ebp,edx
  2981. mov edx,PD [eax+ebx*8]
  2982. mov bl,PB [esi+PITCH*7+1]
  2983. mov dl,PB [eax+ecx*8+2]
  2984. mov eax,BlockN.N8T71+32
  2985. mov cl,PB [edi+PITCH*7+1]
  2986. add ebp,edx
  2987. mov edx,PD [eax+ebx*8]
  2988. mov bl,PB [esi+PITCH*7+3]
  2989. mov dl,PB [eax+ecx*8+2]
  2990. mov eax,BlockN.N8T73+32
  2991. mov cl,PB [edi+PITCH*7+3]
  2992. add ebp,edx
  2993. mov edx,PD [eax+ebx*8]
  2994. mov bl,PB [esi+PITCH*7+5]
  2995. mov dl,PB [eax+ecx*8+2]
  2996. mov eax,BlockN.N8T75+32
  2997. mov cl,PB [edi+PITCH*7+5]
  2998. add ebp,edx
  2999. mov edx,PD [eax+ebx*8]
  3000. mov bl,PB [esi+PITCH*7+7]
  3001. mov dl,PB [eax+ecx*8+2]
  3002. mov eax,BlockN.N8T77+32
  3003. mov cl,PB [edi+PITCH*7+7]
  3004. add ebp,edx
  3005. mov edx,PD [eax+ebx*8]
  3006. add esp,BlockLen
  3007. mov dl,PB [eax+ecx*8+2]
  3008. mov eax,ebp
  3009. add ebp,edx
  3010. add edx,eax
  3011. shr ebp,16 ; Extract SWD for ref1.
  3012. and edx,00000FFFFH ; Extract SWD for ref2.
  3013. mov esi,BlockN.Ref1Addr+32 ; Get address of next ref1 block.
  3014. mov edi,BlockN.Ref2Addr+32 ; Get address of next ref2 block.
  3015. mov BlockNM1.Ref1InterSWD+32,ebp ; Store SWD for ref1.
  3016. mov BlockNM1.Ref2InterSWD+32,edx ; Store SWD for ref2.
  3017. mov bl,PB [esi] ; 00A -- Get Pel 02 in reference ref1.
  3018. mov eax,BlockN.N8T00+32 ; 00B -- Get -8 times target pel 00.
  3019. test esp,000000018H ; Done when esp is 32-byte aligned.
  3020. mov cl,PB [edi] ; 00C -- Get Pel 02 in reference ref2.
  3021. jne SWDLoop
  3022. ; Output:
  3023. ; ebp -- Ref1 SWD for block 4
  3024. ; edx -- Ref2 SWD for block 4
  3025. ; ecx -- Upper 24 bits zero
  3026. ; ebx -- Upper 24 bits zero
  3027. add esp,28
  3028. ret
  3029. IFDEF H261
  3030. ELSE ; H263
  3031. DoSWDHalfPelHorzLoop:
  3032. ; ebp -- Initialized to 0, except when can't search off left or right edge.
  3033. ; edi -- Ref addr for block 1. Ref1 is .5 pel to left. Ref2 is .5 to right.
  3034. xor ecx,ecx
  3035. sub esp,BlockLen*4+28
  3036. xor eax,eax
  3037. xor ebx,ebx
  3038. SWDHalfPelHorzLoop:
  3039. mov al,[edi] ; 00A -- Fetch center ref pel 00.
  3040. mov esi,BlockN.N8T00+32; 00B -- Target pel 00 (times -8).
  3041. mov bl,[edi+2] ; 02A -- Fetch center ref pel 02.
  3042. mov edx,BlockN.N8T02+32; 02B -- Target pel 02 (times -8).
  3043. lea esi,[esi+eax*4] ; 00C -- Combine target pel 00 and center ref pel 00.
  3044. mov al,[edi-1] ; 00D -- Get pel to left for match against pel 00.
  3045. lea edx,[edx+ebx*4] ; 02C -- Combine target pel 02 and center ref pel 02.
  3046. mov bl,[edi+1] ; 00E -- Get pel to right for match against pel 00,
  3047. ; ; 02D -- and pel to left for match against pel 02.
  3048. mov ecx,[esi+eax*4] ; 00F -- [16:23] weighted diff for left ref pel 00.
  3049. mov al,[edi+3] ; 02E -- Get pel to right for match against pel 02.
  3050. add ebp,ecx ; 00G -- Accumulate left ref pel 00.
  3051. mov ecx,[edx+ebx*4] ; 02F -- [16:23] weighted diff for left ref pel 02.
  3052. mov cl,[edx+eax*4+2] ; 02H -- [0:7] is weighted diff for right ref pel 02.
  3053. mov al,[edi+4] ; 04A
  3054. add ebp,ecx ; 02I -- Accumulate right ref pel 02,
  3055. ; ; 02G -- Accumulate left ref pel 02.
  3056. mov bl,[esi+ebx*4+2] ; 00H -- [0:7] is weighted diff for right ref pel 00.
  3057. add ebp,ebx ; 00I -- Accumulate right ref pel 00.
  3058. mov esi,BlockN.N8T04+32; 04B
  3059. mov bl,[edi+6] ; 06A
  3060. mov edx,BlockN.N8T06+32; 06B
  3061. lea esi,[esi+eax*4] ; 04C
  3062. mov al,[edi+3] ; 04D
  3063. lea edx,[edx+ebx*4] ; 06C
  3064. mov bl,[edi+5] ; 04E & 06D
  3065. mov ecx,[esi+eax*4] ; 04F
  3066. mov al,[edi+7] ; 06E
  3067. add ebp,ecx ; 04G
  3068. mov ecx,[edx+ebx*4] ; 06F
  3069. mov cl,[edx+eax*4+2] ; 06H
  3070. mov al,[edi+PITCH*1+1] ; 11A
  3071. add ebp,ecx ; 04I & 06G
  3072. mov bl,[esi+ebx*4+2] ; 04H
  3073. add ebp,ebx ; 04I
  3074. mov esi,BlockN.N8T11+32; 11B
  3075. mov bl,[edi+PITCH*1+3] ; 13A
  3076. mov edx,BlockN.N8T13+32; 13B
  3077. lea esi,[esi+eax*4] ; 11C
  3078. mov al,[edi+PITCH*1+0] ; 11D
  3079. lea edx,[edx+ebx*4] ; 13C
  3080. mov bl,[edi+PITCH*1+2] ; 11E & 13D
  3081. mov ecx,[esi+eax*4] ; 11F
  3082. mov al,[edi+PITCH*1+4] ; 13E
  3083. add ebp,ecx ; 11G
  3084. mov ecx,[edx+ebx*4] ; 13F
  3085. mov cl,[edx+eax*4+2] ; 13H
  3086. mov al,[edi+PITCH*1+5] ; 15A
  3087. add ebp,ecx ; 11I & 13G
  3088. mov bl,[esi+ebx*4+2] ; 11H
  3089. add ebp,ebx ; 11I
  3090. mov esi,BlockN.N8T15+32; 15B
  3091. mov bl,[edi+PITCH*1+7] ; 17A
  3092. mov edx,BlockN.N8T17+32; 17B
  3093. lea esi,[esi+eax*4] ; 15C
  3094. mov al,[edi+PITCH*1+4] ; 15D
  3095. lea edx,[edx+ebx*4] ; 17C
  3096. mov bl,[edi+PITCH*1+6] ; 15E & 17D
  3097. mov ecx,[esi+eax*4] ; 15F
  3098. mov al,[edi+PITCH*1+8] ; 17E
  3099. add ebp,ecx ; 15G
  3100. mov ecx,[edx+ebx*4] ; 17F
  3101. mov cl,[edx+eax*4+2] ; 17H
  3102. mov al,[edi+PITCH*2+0] ; 20A
  3103. add ebp,ecx ; 15I & 17G
  3104. mov bl,[esi+ebx*4+2] ; 15H
  3105. add ebp,ebx ; 15I
  3106. mov esi,BlockN.N8T20+32; 20B
  3107. mov bl,[edi+PITCH*2+2] ; 22A
  3108. mov edx,BlockN.N8T22+32; 22B
  3109. lea esi,[esi+eax*4] ; 20C
  3110. mov al,[edi+PITCH*2-1] ; 20D
  3111. lea edx,[edx+ebx*4] ; 22C
  3112. mov bl,[edi+PITCH*2+1] ; 20E & 22D
  3113. mov ecx,[esi+eax*4] ; 20F
  3114. mov al,[edi+PITCH*2+3] ; 22E
  3115. add ebp,ecx ; 20G
  3116. mov ecx,[edx+ebx*4] ; 22F
  3117. mov cl,[edx+eax*4+2] ; 22H
  3118. mov al,[edi+PITCH*2+4] ; 24A
  3119. add ebp,ecx ; 20I & 22G
  3120. mov bl,[esi+ebx*4+2] ; 20H
  3121. add ebp,ebx ; 20I
  3122. mov esi,BlockN.N8T24+32; 24B
  3123. mov bl,[edi+PITCH*2+6] ; 26A
  3124. mov edx,BlockN.N8T26+32; 26B
  3125. lea esi,[esi+eax*4] ; 24C
  3126. mov al,[edi+PITCH*2+3] ; 24D
  3127. lea edx,[edx+ebx*4] ; 26C
  3128. mov bl,[edi+PITCH*2+5] ; 24E & 26D
  3129. mov ecx,[esi+eax*4] ; 24F
  3130. mov al,[edi+PITCH*2+7] ; 26E
  3131. add ebp,ecx ; 24G
  3132. mov ecx,[edx+ebx*4] ; 26F
  3133. mov cl,[edx+eax*4+2] ; 26H
  3134. mov al,[edi+PITCH*3+1] ; 31A
  3135. add ebp,ecx ; 24I & 26G
  3136. mov bl,[esi+ebx*4+2] ; 24H
  3137. add ebp,ebx ; 24I
  3138. mov esi,BlockN.N8T31+32; 31B
  3139. mov bl,[edi+PITCH*3+3] ; 33A
  3140. mov edx,BlockN.N8T33+32; 33B
  3141. lea esi,[esi+eax*4] ; 31C
  3142. mov al,[edi+PITCH*3+0] ; 31D
  3143. lea edx,[edx+ebx*4] ; 33C
  3144. mov bl,[edi+PITCH*3+2] ; 31E & 33D
  3145. mov ecx,[esi+eax*4] ; 31F
  3146. mov al,[edi+PITCH*3+4] ; 33E
  3147. add ebp,ecx ; 31G
  3148. mov ecx,[edx+ebx*4] ; 33F
  3149. mov cl,[edx+eax*4+2] ; 33H
  3150. mov al,[edi+PITCH*3+5] ; 35A
  3151. add ebp,ecx ; 31I & 33G
  3152. mov bl,[esi+ebx*4+2] ; 31H
  3153. add ebp,ebx ; 31I
  3154. mov esi,BlockN.N8T35+32; 35B
  3155. mov bl,[edi+PITCH*3+7] ; 37A
  3156. mov edx,BlockN.N8T37+32; 37B
  3157. lea esi,[esi+eax*4] ; 35C
  3158. mov al,[edi+PITCH*3+4] ; 35D
  3159. lea edx,[edx+ebx*4] ; 37C
  3160. mov bl,[edi+PITCH*3+6] ; 35E & 37D
  3161. mov ecx,[esi+eax*4] ; 35F
  3162. mov al,[edi+PITCH*3+8] ; 37E
  3163. add ebp,ecx ; 35G
  3164. mov ecx,[edx+ebx*4] ; 37F
  3165. mov cl,[edx+eax*4+2] ; 37H
  3166. mov al,[edi+PITCH*4+0] ; 40A
  3167. add ebp,ecx ; 35I & 37G
  3168. mov bl,[esi+ebx*4+2] ; 35H
  3169. add ebp,ebx ; 35I
  3170. mov esi,BlockN.N8T40+32; 40B
  3171. mov bl,[edi+PITCH*4+2] ; 42A
  3172. mov edx,BlockN.N8T42+32; 42B
  3173. lea esi,[esi+eax*4] ; 40C
  3174. mov al,[edi+PITCH*4-1] ; 40D
  3175. lea edx,[edx+ebx*4] ; 42C
  3176. mov bl,[edi+PITCH*4+1] ; 40E & 42D
  3177. mov ecx,[esi+eax*4] ; 40F
  3178. mov al,[edi+PITCH*4+3] ; 42E
  3179. add ebp,ecx ; 40G
  3180. mov ecx,[edx+ebx*4] ; 42F
  3181. mov cl,[edx+eax*4+2] ; 42H
  3182. mov al,[edi+PITCH*4+4] ; 44A
  3183. add ebp,ecx ; 40I & 42G
  3184. mov bl,[esi+ebx*4+2] ; 40H
  3185. add ebp,ebx ; 40I
  3186. mov esi,BlockN.N8T44+32; 44B
  3187. mov bl,[edi+PITCH*4+6] ; 46A
  3188. mov edx,BlockN.N8T46+32; 46B
  3189. lea esi,[esi+eax*4] ; 44C
  3190. mov al,[edi+PITCH*4+3] ; 44D
  3191. lea edx,[edx+ebx*4] ; 46C
  3192. mov bl,[edi+PITCH*4+5] ; 44E & 46D
  3193. mov ecx,[esi+eax*4] ; 44F
  3194. mov al,[edi+PITCH*4+7] ; 46E
  3195. add ebp,ecx ; 44G
  3196. mov ecx,[edx+ebx*4] ; 46F
  3197. mov cl,[edx+eax*4+2] ; 46H
  3198. mov al,[edi+PITCH*5+1] ; 51A
  3199. add ebp,ecx ; 44I & 46G
  3200. mov bl,[esi+ebx*4+2] ; 44H
  3201. add ebp,ebx ; 44I
  3202. mov esi,BlockN.N8T51+32; 51B
  3203. mov bl,[edi+PITCH*5+3] ; 53A
  3204. mov edx,BlockN.N8T53+32; 53B
  3205. lea esi,[esi+eax*4] ; 51C
  3206. mov al,[edi+PITCH*5+0] ; 51D
  3207. lea edx,[edx+ebx*4] ; 53C
  3208. mov bl,[edi+PITCH*5+2] ; 51E & 53D
  3209. mov ecx,[esi+eax*4] ; 51F
  3210. mov al,[edi+PITCH*5+4] ; 53E
  3211. add ebp,ecx ; 51G
  3212. mov ecx,[edx+ebx*4] ; 53F
  3213. mov cl,[edx+eax*4+2] ; 53H
  3214. mov al,[edi+PITCH*5+5] ; 55A
  3215. add ebp,ecx ; 51I & 53G
  3216. mov bl,[esi+ebx*4+2] ; 51H
  3217. add ebp,ebx ; 51I
  3218. mov esi,BlockN.N8T55+32; 55B
  3219. mov bl,[edi+PITCH*5+7] ; 57A
  3220. mov edx,BlockN.N8T57+32; 57B
  3221. lea esi,[esi+eax*4] ; 55C
  3222. mov al,[edi+PITCH*5+4] ; 55D
  3223. lea edx,[edx+ebx*4] ; 57C
  3224. mov bl,[edi+PITCH*5+6] ; 55E & 57D
  3225. mov ecx,[esi+eax*4] ; 55F
  3226. mov al,[edi+PITCH*5+8] ; 57E
  3227. add ebp,ecx ; 55G
  3228. mov ecx,[edx+ebx*4] ; 57F
  3229. mov cl,[edx+eax*4+2] ; 57H
  3230. mov al,[edi+PITCH*6+0] ; 60A
  3231. add ebp,ecx ; 55I & 57G
  3232. mov bl,[esi+ebx*4+2] ; 55H
  3233. add ebp,ebx ; 55I
  3234. mov esi,BlockN.N8T60+32; 60B
  3235. mov bl,[edi+PITCH*6+2] ; 62A
  3236. mov edx,BlockN.N8T62+32; 62B
  3237. lea esi,[esi+eax*4] ; 60C
  3238. mov al,[edi+PITCH*6-1] ; 60D
  3239. lea edx,[edx+ebx*4] ; 62C
  3240. mov bl,[edi+PITCH*6+1] ; 60E & 62D
  3241. mov ecx,[esi+eax*4] ; 60F
  3242. mov al,[edi+PITCH*6+3] ; 62E
  3243. add ebp,ecx ; 60G
  3244. mov ecx,[edx+ebx*4] ; 62F
  3245. mov cl,[edx+eax*4+2] ; 62H
  3246. mov al,[edi+PITCH*6+4] ; 64A
  3247. add ebp,ecx ; 60I & 62G
  3248. mov bl,[esi+ebx*4+2] ; 60H
  3249. add ebp,ebx ; 60I
  3250. mov esi,BlockN.N8T64+32; 64B
  3251. mov bl,[edi+PITCH*6+6] ; 66A
  3252. mov edx,BlockN.N8T66+32; 66B
  3253. lea esi,[esi+eax*4] ; 64C
  3254. mov al,[edi+PITCH*6+3] ; 64D
  3255. lea edx,[edx+ebx*4] ; 66C
  3256. mov bl,[edi+PITCH*6+5] ; 64E & 66D
  3257. mov ecx,[esi+eax*4] ; 64F
  3258. mov al,[edi+PITCH*6+7] ; 66E
  3259. add ebp,ecx ; 64G
  3260. mov ecx,[edx+ebx*4] ; 66F
  3261. mov cl,[edx+eax*4+2] ; 66H
  3262. mov al,[edi+PITCH*7+1] ; 71A
  3263. add ebp,ecx ; 64I & 66G
  3264. mov bl,[esi+ebx*4+2] ; 64H
  3265. add ebp,ebx ; 64I
  3266. mov esi,BlockN.N8T71+32; 71B
  3267. mov bl,[edi+PITCH*7+3] ; 73A
  3268. mov edx,BlockN.N8T73+32; 73B
  3269. lea esi,[esi+eax*4] ; 71C
  3270. mov al,[edi+PITCH*7+0] ; 71D
  3271. lea edx,[edx+ebx*4] ; 73C
  3272. mov bl,[edi+PITCH*7+2] ; 71E & 73D
  3273. mov ecx,[esi+eax*4] ; 71F
  3274. mov al,[edi+PITCH*7+4] ; 73E
  3275. add ebp,ecx ; 71G
  3276. mov ecx,[edx+ebx*4] ; 73F
  3277. mov cl,[edx+eax*4+2] ; 73H
  3278. mov al,[edi+PITCH*7+5] ; 75A
  3279. add ebp,ecx ; 71I & 73G
  3280. mov bl,[esi+ebx*4+2] ; 71H
  3281. add ebp,ebx ; 71I
  3282. mov esi,BlockN.N8T75+32; 75B
  3283. mov bl,[edi+PITCH*7+7] ; 77A
  3284. mov edx,BlockN.N8T77+32; 77B
  3285. lea esi,[esi+eax*4] ; 75C
  3286. mov al,[edi+PITCH*7+4] ; 75D
  3287. lea edx,[edx+ebx*4] ; 77C
  3288. mov bl,[edi+PITCH*7+6] ; 75E & 77D
  3289. mov ecx,[esi+eax*4] ; 75F
  3290. mov al,[edi+PITCH*7+8] ; 77E
  3291. add ebp,ecx ; 75G
  3292. mov ecx,[edx+ebx*4] ; 77F
  3293. mov cl,[edx+eax*4+2] ; 77H
  3294. add esp,BlockLen
  3295. add ecx,ebp ; 75I & 77G
  3296. mov bl,[esi+ebx*4+2] ; 75H
  3297. add ebx,ecx ; 75I
  3298. mov edi,BlockN.AddrCentralPoint+32 ; Get address of next ref1 block.
  3299. shr ecx,16 ; Extract SWD for ref1.
  3300. and ebx,00000FFFFH ; Extract SWD for ref2.
  3301. mov BlockNM1.Ref1InterSWD+32,ecx ; Store SWD for ref1.
  3302. mov BlockNM1.Ref2InterSWD+32,ebx ; Store SWD for ref2.
  3303. xor ebp,ebp
  3304. mov edx,ebx
  3305. test esp,000000018H
  3306. mov ebx,ebp
  3307. jne SWDHalfPelHorzLoop
  3308. ; Output:
  3309. ; ebp, ebx -- Zero
  3310. ; ecx -- Ref1 SWD for block 4
  3311. ; edx -- Ref2 SWD for block 4
  3312. add esp,28
  3313. ret
  3314. DoSWDHalfPelVertLoop:
  3315. ; ebp -- Initialized to 0, except when can't search off left or right edge.
  3316. ; edi -- Ref addr for block 1. Ref1 is .5 pel up. Ref2 is .5 down.
  3317. xor ecx,ecx
  3318. sub esp,BlockLen*4+28
  3319. xor eax,eax
  3320. xor ebx,ebx
  3321. SWDHalfPelVertLoop:
  3322. mov al,[edi]
  3323. mov esi,BlockN.N8T00+32
  3324. mov bl,[edi+2*PITCH]
  3325. mov edx,BlockN.N8T20+32
  3326. lea esi,[esi+eax*4]
  3327. mov al,[edi-1*PITCH]
  3328. lea edx,[edx+ebx*4]
  3329. mov bl,[edi+1*PITCH]
  3330. mov ecx,[esi+eax*4]
  3331. mov al,[edi+3*PITCH]
  3332. add ebp,ecx
  3333. mov ecx,[edx+ebx*4]
  3334. mov cl,[edx+eax*4+2]
  3335. mov al,[edi+4*PITCH]
  3336. add ebp,ecx
  3337. mov bl,[esi+ebx*4+2]
  3338. add ebp,ebx
  3339. mov esi,BlockN.N8T40+32
  3340. mov bl,[edi+6*PITCH]
  3341. mov edx,BlockN.N8T60+32
  3342. lea esi,[esi+eax*4]
  3343. mov al,[edi+3*PITCH]
  3344. lea edx,[edx+ebx*4]
  3345. mov bl,[edi+5*PITCH]
  3346. mov ecx,[esi+eax*4]
  3347. mov al,[edi+7*PITCH]
  3348. add ebp,ecx
  3349. mov ecx,[edx+ebx*4]
  3350. mov cl,[edx+eax*4+2]
  3351. mov al,[edi+1+1*PITCH]
  3352. add ebp,ecx
  3353. mov bl,[esi+ebx*4+2]
  3354. add ebp,ebx
  3355. mov esi,BlockN.N8T11+32
  3356. mov bl,[edi+1+3*PITCH]
  3357. mov edx,BlockN.N8T31+32
  3358. lea esi,[esi+eax*4]
  3359. mov al,[edi+1+0*PITCH]
  3360. lea edx,[edx+ebx*4]
  3361. mov bl,[edi+1+2*PITCH]
  3362. mov ecx,[esi+eax*4]
  3363. mov al,[edi+1+4*PITCH]
  3364. add ebp,ecx
  3365. mov ecx,[edx+ebx*4]
  3366. mov cl,[edx+eax*4+2]
  3367. mov al,[edi+1+5*PITCH]
  3368. add ebp,ecx
  3369. mov bl,[esi+ebx*4+2]
  3370. add ebp,ebx
  3371. mov esi,BlockN.N8T51+32
  3372. mov bl,[edi+1+7*PITCH]
  3373. mov edx,BlockN.N8T71+32
  3374. lea esi,[esi+eax*4]
  3375. mov al,[edi+1+4*PITCH]
  3376. lea edx,[edx+ebx*4]
  3377. mov bl,[edi+1+6*PITCH]
  3378. mov ecx,[esi+eax*4]
  3379. mov al,[edi+1+8*PITCH]
  3380. add ebp,ecx
  3381. mov ecx,[edx+ebx*4]
  3382. mov cl,[edx+eax*4+2]
  3383. mov al,[edi+2+0*PITCH]
  3384. add ebp,ecx
  3385. mov bl,[esi+ebx*4+2]
  3386. add ebp,ebx
  3387. mov esi,BlockN.N8T02+32
  3388. mov bl,[edi+2+2*PITCH]
  3389. mov edx,BlockN.N8T22+32
  3390. lea esi,[esi+eax*4]
  3391. mov al,[edi+2-1*PITCH]
  3392. lea edx,[edx+ebx*4]
  3393. mov bl,[edi+2+1*PITCH]
  3394. mov ecx,[esi+eax*4]
  3395. mov al,[edi+2+3*PITCH]
  3396. add ebp,ecx
  3397. mov ecx,[edx+ebx*4]
  3398. mov cl,[edx+eax*4+2]
  3399. mov al,[edi+2+4*PITCH]
  3400. add ebp,ecx
  3401. mov bl,[esi+ebx*4+2]
  3402. add ebp,ebx
  3403. mov esi,BlockN.N8T42+32
  3404. mov bl,[edi+2+6*PITCH]
  3405. mov edx,BlockN.N8T62+32
  3406. lea esi,[esi+eax*4]
  3407. mov al,[edi+2+3*PITCH]
  3408. lea edx,[edx+ebx*4]
  3409. mov bl,[edi+2+5*PITCH]
  3410. mov ecx,[esi+eax*4]
  3411. mov al,[edi+2+7*PITCH]
  3412. add ebp,ecx
  3413. mov ecx,[edx+ebx*4]
  3414. mov cl,[edx+eax*4+2]
  3415. mov al,[edi+3+1*PITCH]
  3416. add ebp,ecx
  3417. mov bl,[esi+ebx*4+2]
  3418. add ebp,ebx
  3419. mov esi,BlockN.N8T13+32
  3420. mov bl,[edi+3+3*PITCH]
  3421. mov edx,BlockN.N8T33+32
  3422. lea esi,[esi+eax*4]
  3423. mov al,[edi+3+0*PITCH]
  3424. lea edx,[edx+ebx*4]
  3425. mov bl,[edi+3+2*PITCH]
  3426. mov ecx,[esi+eax*4]
  3427. mov al,[edi+3+4*PITCH]
  3428. add ebp,ecx
  3429. mov ecx,[edx+ebx*4]
  3430. mov cl,[edx+eax*4+2]
  3431. mov al,[edi+3+5*PITCH]
  3432. add ebp,ecx
  3433. mov bl,[esi+ebx*4+2]
  3434. add ebp,ebx
  3435. mov esi,BlockN.N8T53+32
  3436. mov bl,[edi+3+7*PITCH]
  3437. mov edx,BlockN.N8T73+32
  3438. lea esi,[esi+eax*4]
  3439. mov al,[edi+3+4*PITCH]
  3440. lea edx,[edx+ebx*4]
  3441. mov bl,[edi+3+6*PITCH]
  3442. mov ecx,[esi+eax*4]
  3443. mov al,[edi+3+8*PITCH]
  3444. add ebp,ecx
  3445. mov ecx,[edx+ebx*4]
  3446. mov cl,[edx+eax*4+2]
  3447. mov al,[edi+4+0*PITCH]
  3448. add ebp,ecx
  3449. mov bl,[esi+ebx*4+2]
  3450. add ebp,ebx
  3451. mov esi,BlockN.N8T04+32
  3452. mov bl,[edi+4+2*PITCH]
  3453. mov edx,BlockN.N8T24+32
  3454. lea esi,[esi+eax*4]
  3455. mov al,[edi+4-1*PITCH]
  3456. lea edx,[edx+ebx*4]
  3457. mov bl,[edi+4+1*PITCH]
  3458. mov ecx,[esi+eax*4]
  3459. mov al,[edi+4+3*PITCH]
  3460. add ebp,ecx
  3461. mov ecx,[edx+ebx*4]
  3462. mov cl,[edx+eax*4+2]
  3463. mov al,[edi+4+4*PITCH]
  3464. add ebp,ecx
  3465. mov bl,[esi+ebx*4+2]
  3466. add ebp,ebx
  3467. mov esi,BlockN.N8T44+32
  3468. mov bl,[edi+4+6*PITCH]
  3469. mov edx,BlockN.N8T64+32
  3470. lea esi,[esi+eax*4]
  3471. mov al,[edi+4+3*PITCH]
  3472. lea edx,[edx+ebx*4]
  3473. mov bl,[edi+4+5*PITCH]
  3474. mov ecx,[esi+eax*4]
  3475. mov al,[edi+4+7*PITCH]
  3476. add ebp,ecx
  3477. mov ecx,[edx+ebx*4]
  3478. mov cl,[edx+eax*4+2]
  3479. mov al,[edi+5+1*PITCH]
  3480. add ebp,ecx
  3481. mov bl,[esi+ebx*4+2]
  3482. add ebp,ebx
  3483. mov esi,BlockN.N8T15+32
  3484. mov bl,[edi+5+3*PITCH]
  3485. mov edx,BlockN.N8T35+32
  3486. lea esi,[esi+eax*4]
  3487. mov al,[edi+5+0*PITCH]
  3488. lea edx,[edx+ebx*4]
  3489. mov bl,[edi+5+2*PITCH]
  3490. mov ecx,[esi+eax*4]
  3491. mov al,[edi+5+4*PITCH]
  3492. add ebp,ecx
  3493. mov ecx,[edx+ebx*4]
  3494. mov cl,[edx+eax*4+2]
  3495. mov al,[edi+5+5*PITCH]
  3496. add ebp,ecx
  3497. mov bl,[esi+ebx*4+2]
  3498. add ebp,ebx
  3499. mov esi,BlockN.N8T55+32
  3500. mov bl,[edi+5+7*PITCH]
  3501. mov edx,BlockN.N8T75+32
  3502. lea esi,[esi+eax*4]
  3503. mov al,[edi+5+4*PITCH]
  3504. lea edx,[edx+ebx*4]
  3505. mov bl,[edi+5+6*PITCH]
  3506. mov ecx,[esi+eax*4]
  3507. mov al,[edi+5+8*PITCH]
  3508. add ebp,ecx
  3509. mov ecx,[edx+ebx*4]
  3510. mov cl,[edx+eax*4+2]
  3511. mov al,[edi+6+0*PITCH]
  3512. add ebp,ecx
  3513. mov bl,[esi+ebx*4+2]
  3514. add ebp,ebx
  3515. mov esi,BlockN.N8T06+32
  3516. mov bl,[edi+6+2*PITCH]
  3517. mov edx,BlockN.N8T26+32
  3518. lea esi,[esi+eax*4]
  3519. mov al,[edi+6-1*PITCH]
  3520. lea edx,[edx+ebx*4]
  3521. mov bl,[edi+6+1*PITCH]
  3522. mov ecx,[esi+eax*4]
  3523. mov al,[edi+6+3*PITCH]
  3524. add ebp,ecx
  3525. mov ecx,[edx+ebx*4]
  3526. mov cl,[edx+eax*4+2]
  3527. mov al,[edi+6+4*PITCH]
  3528. add ebp,ecx
  3529. mov bl,[esi+ebx*4+2]
  3530. add ebp,ebx
  3531. mov esi,BlockN.N8T46+32
  3532. mov bl,[edi+6+6*PITCH]
  3533. mov edx,BlockN.N8T66+32
  3534. lea esi,[esi+eax*4]
  3535. mov al,[edi+6+3*PITCH]
  3536. lea edx,[edx+ebx*4]
  3537. mov bl,[edi+6+5*PITCH]
  3538. mov ecx,[esi+eax*4]
  3539. mov al,[edi+6+7*PITCH]
  3540. add ebp,ecx
  3541. mov ecx,[edx+ebx*4]
  3542. mov cl,[edx+eax*4+2]
  3543. mov al,[edi+7+1*PITCH]
  3544. add ebp,ecx
  3545. mov bl,[esi+ebx*4+2]
  3546. add ebp,ebx
  3547. mov esi,BlockN.N8T17+32
  3548. mov bl,[edi+7+3*PITCH]
  3549. mov edx,BlockN.N8T37+32
  3550. lea esi,[esi+eax*4]
  3551. mov al,[edi+7+0*PITCH]
  3552. lea edx,[edx+ebx*4]
  3553. mov bl,[edi+7+2*PITCH]
  3554. mov ecx,[esi+eax*4]
  3555. mov al,[edi+7+4*PITCH]
  3556. add ebp,ecx
  3557. mov ecx,[edx+ebx*4]
  3558. mov cl,[edx+eax*4+2]
  3559. mov al,[edi+7+5*PITCH]
  3560. add ebp,ecx
  3561. mov bl,[esi+ebx*4+2]
  3562. add ebp,ebx
  3563. mov esi,BlockN.N8T57+32
  3564. mov bl,[edi+7+7*PITCH]
  3565. mov edx,BlockN.N8T77+32
  3566. lea esi,[esi+eax*4]
  3567. mov al,[edi+7+4*PITCH]
  3568. lea edx,[edx+ebx*4]
  3569. mov bl,[edi+7+6*PITCH]
  3570. mov ecx,[esi+eax*4]
  3571. mov al,[edi+7+8*PITCH]
  3572. add ebp,ecx
  3573. mov ecx,[edx+ebx*4]
  3574. mov cl,[edx+eax*4+2]
  3575. add esp,BlockLen
  3576. add ecx,ebp
  3577. mov bl,[esi+ebx*4+2]
  3578. add ebx,ecx
  3579. mov edi,BlockN.AddrCentralPoint+32
  3580. shr ecx,16
  3581. and ebx,00000FFFFH
  3582. mov BlockNM1.Ref1InterSWD+32,ecx
  3583. mov BlockNM1.Ref2InterSWD+32,ebx
  3584. xor ebp,ebp
  3585. mov edx,ebx
  3586. test esp,000000018H
  3587. mov ebx,ebp
  3588. jne SWDHalfPelVertLoop
  3589. ; Output:
  3590. ; ebp, ebx -- Zero
  3591. ; ecx -- Ref1 SWD for block 4
  3592. ; edx -- Ref2 SWD for block 4
  3593. add esp,28
  3594. ret
  3595. ENDIF ; H263
  3596. ; Performance for common macroblocks:
  3597. ; 298 clocks: prepare target pels, compute avg target pel, compute 0-MV SWD.
  3598. ; 90 clocks: compute IntraSWD.
  3599. ; 1412 clocks: 6-level search for best SWD.
  3600. ; 16 clocks: record best fit.
  3601. ; 945 clocks: calculate spatial loop filtered prediction.
  3602. ; 152 clocks: calculate SWD for spatially filtered prediction and classify.
  3603. ; ----
  3604. ; 2913 clocks total
  3605. ;
  3606. ; Performance for macroblocks in which 0-motion vector is "good enough":
  3607. ; 298 clocks: prepare target pels, compute avg target pel, compute 0-MV SWD.
  3608. ; 90 clocks: compute IntraSWD.
  3609. ; 16 clocks: record best fit.
  3610. ; 58 clocks: extra cache fill burden on adjacent MB if SWD-search not done.
  3611. ; 945 clocks: calculate spatial loop filtered prediction.
  3612. ; 152 clocks: calculate SWD for spatially filtered prediction and classify.
  3613. ; ----
  3614. ; 1559 clocks total
  3615. ;
  3616. ; Performance for macroblocks marked as intrablock by decree of caller:
  3617. ; 298 clocks: prepare target pels, compute avg target pel, compute 0-MV SWD.
  3618. ; 90 clocks: compute IntraSWD.
  3619. ; 58 clocks: extra cache fill burden on adjacent MB if SWD-search not done.
  3620. ; 20 clocks: classify (just weight the SWD for # of match points).
  3621. ; ----
  3622. ; 476 clocks total
  3623. ;
  3624. ; 160*120 performance, generously estimated (assuming lots of motion):
  3625. ;
  3626. ; 2913 * 80 = 233000 clocks for luma.
  3627. ; 2913 * 12 = 35000 clocks for chroma.
  3628. ; 268000 clocks per frame * 15 = 4,020,000 clocks/sec.
  3629. ;
  3630. ; 160*120 performance, assuming typical motion:
  3631. ;
  3632. ; 2913 * 40 + 1559 * 40 = 179000 clocks for luma.
  3633. ; 2913 * 8 + 1559 * 4 = 30000 clocks for chroma.
  3634. ; 209000 clocks per frame * 15 = 3,135,000 clocks/sec.
  3635. ;
  3636. ; Add 10-20% to allow for initial cache-filling, and unfortunate cases where
  3637. ; cache-filling policy preempts areas of the tables that are not locally "hot",
  3638. ; instead of preempting macroblocks upon which the processing was just finished.
  3639. Done:
  3640. mov eax,IntraSWDTotal
  3641. mov ebx,IntraSWDBlocks
  3642. mov ecx,InterSWDTotal
  3643. mov edx,InterSWDBlocks
  3644. mov esp,StashESP
  3645. mov edi,[esp+IntraSWDTotal_arg]
  3646. mov [edi],eax
  3647. mov edi,[esp+IntraSWDBlocks_arg]
  3648. mov [edi],ebx
  3649. mov edi,[esp+InterSWDTotal_arg]
  3650. mov [edi],ecx
  3651. mov edi,[esp+InterSWDBlocks_arg]
  3652. mov [edi],edx
  3653. pop ebx
  3654. pop ebp
  3655. pop edi
  3656. pop esi
  3657. rturn
  3658. MOTIONESTIMATION endp
  3659. END