Leaked source code of windows server 2003
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

3972 lines
147 KiB

  1. ;/* *************************************************************************
  2. ;** INTEL Corporation Proprietary Information
  3. ;**
  4. ;** This listing is supplied under the terms of a license
  5. ;** agreement with INTEL Corporation and may not be copied
  6. ;** nor disclosed except in accordance with the terms of
  7. ;** that agreement.
  8. ;**
  9. ;** Copyright (c) 1995 Intel Corporation.
  10. ;** All Rights Reserved.
  11. ;**
  12. ;** *************************************************************************
  13. ;*/
  14. ;////////////////////////////////////////////////////////////////////////////
  15. ;//
  16. ;// $Header: R:\h26x\h26x\src\enc\ex5me.asv 1.17 24 Sep 1996 11:27:00 BNICKERS $
  17. ;//
  18. ;// $Log: R:\h26x\h26x\src\enc\ex5me.asv $
  19. ;//
  20. ;// Rev 1.17 24 Sep 1996 11:27:00 BNICKERS
  21. ;//
  22. ;// Fix register colision.
  23. ;//
  24. ;// Rev 1.16 24 Sep 1996 10:40:32 BNICKERS
  25. ;// For H261, zero out motion vectors when classifying MB as Intra.
  26. ;//
  27. ;// Rev 1.13 19 Aug 1996 13:48:26 BNICKERS
  28. ;// Provide threshold and differential variables for spatial filtering.
  29. ;//
  30. ;// Rev 1.12 17 Jun 1996 15:19:34 BNICKERS
  31. ;// Fix recording of block and MB SWDs for Spatial Loop Filtering case in H261.
  32. ;//
  33. ;// Rev 1.11 30 May 1996 16:40:14 BNICKERS
  34. ;// Fix order of arguments.
  35. ;//
  36. ;// Rev 1.10 30 May 1996 15:08:36 BNICKERS
  37. ;// Fixed minor error in recent IA ME speed improvements.
  38. ;//
  39. ;// Rev 1.9 29 May 1996 15:37:58 BNICKERS
  40. ;// Acceleration of IA version of ME.
  41. ;//
  42. ;// Rev 1.8 15 Apr 1996 10:48:48 AKASAI
  43. ;// Fixed bug in Spatial loop filter code. Code had been unrolled and
  44. ;// the second case had not been updated in the fix put in place of
  45. ;// (for) the first case. Basically an ebx instead of bl that cased
  46. ;// and overflow from 7F to 3F.
  47. ;//
  48. ;// Rev 1.7 15 Feb 1996 15:39:26 BNICKERS
  49. ;// No change.
  50. ;//
  51. ;// Rev 1.6 15 Feb 1996 14:39:00 BNICKERS
  52. ;// Fix bug wherein access to area outside stack frame was occurring.
  53. ;//
  54. ;// Rev 1.5 15 Jan 1996 14:31:40 BNICKERS
  55. ;// Fix decrement of ref area addr when half pel upward is best in block ME.
  56. ;// Broadcast macroblock level MV when block gets classified as Intra.
  57. ;//
  58. ;// Rev 1.4 12 Jan 1996 13:16:08 BNICKERS
  59. ;// Fix SLF so that 3 7F pels doesn't overflow, and result in 3F instead of 7F.
  60. ;//
  61. ;// Rev 1.3 27 Dec 1995 15:32:46 RMCKENZX
  62. ;// Added copyright notice
  63. ;//
  64. ;// Rev 1.2 19 Dec 1995 17:11:16 RMCKENZX
  65. ;// fixed 2 bugs:
  66. ;// 1. do +-15 pel search if central and NOT 4 mv / macroblock
  67. ;// (was doing when central AND 4 mv / macroblock)
  68. ;// 2. correctly compute motion vectors when doing 4 motion
  69. ;// vectors per block.
  70. ;//
  71. ;// Rev 1.1 28 Nov 1995 15:25:48 AKASAI
  72. ;// Added white space so that will complie with the long lines.
  73. ;//
  74. ;// Rev 1.0 28 Nov 1995 14:37:00 BECHOLS
  75. ;// Initial revision.
  76. ;//
  77. ;//
  78. ;// Rev 1.13 22 Nov 1995 15:32:42 DBRUCKS
  79. ;// Brian made this change on my system.
  80. ;// Increased a value to simplify debugging
  81. ;//
  82. ;//
  83. ;//
  84. ;// Rev 1.12 17 Nov 1995 10:43:58 BNICKERS
  85. ;// Fix problems with B-Frame ME.
  86. ;//
  87. ;//
  88. ;//
  89. ;// Rev 1.11 31 Oct 1995 11:44:26 BNICKERS
  90. ;// Save/restore ebx.
  91. ;//
  92. ;////////////////////////////////////////////////////////////////////////////
  93. ;
  94. ; MotionEstimation -- This function performs motion estimation for the macroblocks identified
  95. ; in the input list.
  96. ; Conditional assembly selects either the H263 or H261 version.
  97. ;
  98. ; Input Arguments:
  99. ;
  100. ; MBlockActionStream
  101. ;
  102. ; The list of macroblocks for which we need to perform motion estimation.
  103. ;
  104. ; Upon input, the following fields must be defined:
  105. ;
  106. ; CodedBlocks -- Bit 6 must be set for the last macroblock to be processed.
  107. ;
  108. ; FirstMEState -- must be 0 for macroblocks that are forced to be Intracoded. An
  109. ; IntraSWD will be calculated.
  110. ; Other macroblocks must have the following values:
  111. ; 1: upper left, without advanced prediction. (Advanced prediction
  112. ; only applies to H263.)
  113. ; 2: upper edge, without advanced prediction.
  114. ; 3: upper right, without advanced prediction.
  115. ; 4: left edge, without advanced prediction.
  116. ; 5: central block, or any block if advanced prediction is being done.
  117. ; 6: right edge, without advanced prediction.
  118. ; 7: lower left, without advanced prediction.
  119. ; 8: lower edge, without advanced prediction.
  120. ; 9: lower right, without advanced prediction.
  121. ; If vertical motion is NOT allowed:
  122. ; 10: left edge, without advanced prediction.
  123. ; 11: central block, or any block if advanced prediction is being done.
  124. ; 12: right edge, without advanced prediction.
  125. ; *** Note that with advanced prediction, only initial states 0, 4, or
  126. ; 11 can be specified. Doing block level motion vectors mandates
  127. ; advanced prediction, but in that case, only initial
  128. ; states 0 and 4 are allowed.
  129. ;
  130. ; BlkOffset -- must be defined for each of the blocks in the macroblocks.
  131. ;
  132. ; TargetFrameBaseAddress -- Address of upper left viewable pel in the target Y plane.
  133. ;
  134. ; PreviousFrameBaseAddress -- Address of upper left viewable pel in the previous Y plane. Whether this is the
  135. ; reconstructed previous frame, or the original, is up to the caller to decide.
  136. ;
  137. ; FilteredFrameBaseAddress -- Address of upper left viewable pel in the scratch area that this function can record
  138. ; the spatially filtered prediction for each block, so that frame differencing can
  139. ; utilize it rather than have to recompute it. (H261 only)
  140. ;
  141. ; DoRadius15Search -- TRUE if central macroblocks should search a distance of 15 from center. Else searches 7 out.
  142. ;
  143. ; DoHalfPelEstimation -- TRUE if we should do ME to half pel resolution. This is only applicable for H263 and must
  144. ; be FALSE for H261. (Note: TRUE must be 1; FALSE must be 0).
  145. ;
  146. ; DoBlockLevelVectors -- TRUE if we should do ME at block level. This is only applicable for H263 and must be FALSE
  147. ; for H261. (Note: TRUE must be 1; FALSE must be 0).
  148. ; DoSpatialFiltering -- TRUE if we should determine if spatially filtering the prediction reduces the SWD. Only
  149. ; applicable for H261 and must be FALSE for H263. (Note: TRUE must be 1; FALSE must be 0).
  150. ;
  151. ; ZeroVectorThreshold -- If the SWD for a macroblock is less than this threshold, we do not bother searching for a
  152. ; better motion vector. Compute as follows, where D is the average tolerable pel difference
  153. ; to satisfy this threshold. (Initial recommendation: D=2 ==> ZVT=384)
  154. ; ZVT = (128 * ((int)((D**1.6)+.5)))
  155. ;
  156. ; NonZeroDifferential -- After searching for the best motion vector (or individual block motion vectors, if enabled),
  157. ; if the macroblock's SWD is not better than it was for the zero vector -- not better by at
  158. ; least this amount -- then we revert to the zero vector. We are comparing two macroblock
  159. ; SWDs, both calculated as follows: (Initial recommendation: NZD=128)
  160. ; For each of 128 match points, where D is its Abs Diff, accumulate ((int)(M**1.6)+.5)))
  161. ;
  162. ; BlockMVDifferential -- The amount by which the sum of four block level SWDs must be better than a single macroblock
  163. ; level SWD to cause us to choose block level motion vectors. See NonZeroDifferential for
  164. ; how the SWDs are calculated. Only applicable for H261. (Initial recommendation: BMVD=128)
  165. ;
  166. ; EmptyThreshold -- If the SWD for a block is less than this, the block is forced empty. Compute as follows, where D
  167. ; is the average tolerable pel diff to satisfy threshold. (Initial recommendation: D=3 ==> ET=96)
  168. ; ET = (32 * ((int)((D**1.6)+.5)))
  169. ;
  170. ; InterCodingThreshold -- If any of the blocks are forced empty, we can simply skip calculating the INTRASWD for the
  171. ; macroblock. If none of the blocks are forced empty, we will compare the macroblock's SWD
  172. ; against this threshold. If below the threshold, we will likewise skip calculating the
  173. ; INTRASWD. Otherwise, we will calculate the INTRASWD, and if it is less than the [Inter]SWD,
  174. ; we will classify the block as INTRA-coded. Compute as follows, where D is the average
  175. ; tolerable pel difference to satisfy threshold. (Initial recommendation: D=4 ==> ICT=1152)
  176. ; ICT = (128 * ((int)((D**1.6)+.5)))
  177. ;
  178. ; IntraCodingDifferential -- For INTRA coding to occur, the INTRASWD must be better than the INTERSWD by at least
  179. ; this amount.
  180. ;
  181. ; Output Arguments
  182. ;
  183. ; MBlockActionStream
  184. ;
  185. ; These fields are defined as follows upon return:
  186. ;
  187. ; BlockType -- Set to INTRA, INTER1MV, or (H263 only) INTER4MV.
  188. ;
  189. ; PHMV and PVMV -- The horizontal and vertical motion vectors, in units of a half pel.
  190. ;
  191. ; BHMV and BVMV -- These fields get clobbered.
  192. ;
  193. ; PastRef -- If BlockType != INTRA, set to the address of the reference block.
  194. ;
  195. ; If Horizontal MV indicates a half pel position, the prediction for the upper left pel of the block
  196. ; is the average of the pel at PastRef and the one at PastRef+1.
  197. ;
  198. ; If Vertical MV indicates a half pel position, the prediction for the upper left pel of the block
  199. ; is the average of the pel at PastRef and the one at PastRef+PITCH.
  200. ;
  201. ; If both MVs indicate half pel positions, the prediction for the upper left pel of the block is the
  202. ; average of the pels at PastRef, PastRef+1, PastRef+PITCH, and PastRef+PITCH+1.
  203. ;
  204. ; Indications of a half pel position can only happen for H263.
  205. ;
  206. ; In H261, when spatial filtering is done, the address will be in the SpatiallyFilteredFrame, where
  207. ; this function stashes the spatially filtered prediction for subsequent reuse by frame differencing.
  208. ;
  209. ; CodedBlocks -- Bits 4 and 5 are turned on, indicating that the U and V blocks should be processed. (If the
  210. ; FDCT function finds them to quantize to empty, it will mark them as empty.)
  211. ;
  212. ; Bits 0 thru 3 are cleared for each of blocks 1 thru 4 that MotionEstimation forces empty;
  213. ; they are set otherwise.
  214. ;
  215. ; Bits 6 and 7 are left unchanged.
  216. ;
  217. ; SWD -- Set to the sum of the SWDs for the four luma blocks in the macroblock. The SWD for any block that is
  218. ; forced empty, is NOT included in the sum.
  219. ;
  220. ;
  221. ;
  222. ; IntraSWDTotal -- The sum of the block SWDs for all Intracoded macroblocks.
  223. ;
  224. ; IntraSWDBlocks -- The number of blocks that make up the IntraSWDTotal.
  225. ;
  226. ; InterSWDTotal -- The sum of the block SWDs for all Intercoded macroblocks.
  227. ; None of the blocks forced empty are included in this.
  228. ;
  229. ; InterSWDBlocks -- The number of blocks that make up the InterSWDTotal.
  230. ;
  231. ;
  232. ; Other assumptions:
  233. ;
  234. ; For performance reasons, it is assumed that the layout of current and previous frames (and spatially filtered
  235. ; frame for H261) rigourously conforms to the following guide.
  236. ;
  237. ; The spatially filtered frame (only present and applicable for H261) is an output frame into which MotionEstimation
  238. ; places spatially filtered macroblocks as it determines if filtering is good for a macroblock. If it determines
  239. ; such, frame differencing will be able to re-use the spatially filtered macroblock, rather than recomputing it.
  240. ;
  241. ; Cache
  242. ; Alignment
  243. ; Points: v v v v v v v v v v v v v
  244. ; 16 | 352 (narrower pictures are left justified) | 16
  245. ; +---+---------------------------------------------------------------------------------------+---+
  246. ; | D | Current Frame Y Plane | D |
  247. ; | u | | u |
  248. ; Frame | m | | m |
  249. ; Height | m | | m |
  250. ; Lines | y | | y |
  251. ; | | | |
  252. ; +---+---------------------------------------------------------------------------------------+---+
  253. ; | |
  254. ; | |
  255. ; | |
  256. ; 24 lines | Dummy Space (24 lines plus 8 bytes. Can be reduced to 8 bytes if unrestricted motion |
  257. ; | vectors is NOT selected.) |
  258. ; | |
  259. ; | 8 176 16 176 |8
  260. ; | +-+-------------------------------------------------------------------------------------------+-+
  261. ; +-+D| Current Frame U Plane | D | Current Frame V Plane |D|
  262. ; Frame |u| | u | |u|
  263. ; Height |m| | m | |m|
  264. ; Div By 2 |m| | m | |m|
  265. ; Lines |y| | y | |y|
  266. ; +-+-------------------------------------------+---+-------------------------------------------+-+
  267. ; 72 dummy bytes. I.e. enough dummy space to assure that MOD ((Previous_Frame - Current_Frame), 128) == 80
  268. ; +-----------------------------------------------------------------------------------------------+
  269. ; | |
  270. ; 16 lines | If Unrestricted Motion Vectors selected, 16 lines must appear above and below previous frame, |
  271. ; | and these lines plus the 16 columns to the left and 16 columns to the right of the previous |
  272. ; | frame must be initialized to the values at the edges and corners, propagated outward. If |
  273. ; | Unrestricted Motion Vectors is off, these lines don't have to be allocated. |
  274. ; | |
  275. ; | +---------------------------------------------------------------------------------------+ +
  276. ; Frame | | Previous Frame Y Plane | |
  277. ; Height | | | |
  278. ; Lines | | | |
  279. ; | | | |
  280. ; | | | |
  281. ; | +---------------------------------------------------------------------------------------+ +
  282. ; | |
  283. ; 16 lines | See comment above Previous Y Plane |
  284. ; | |
  285. ; |+--- 8 bytes of dummy space. Must be there, whether unrestricted MV or not. |
  286. ; || |
  287. ; |v+-----------------------------------------------+---------------------------------------------+-+
  288. ; +-+ | |
  289. ; | See comment above Previous Y Plane. | See comment above Previous Y Plane. |
  290. ; 8 lines | Same idea here, but 8 lines are needed above | Same idea here, but 8 lines are needed |
  291. ; | and below U plane, and 8 columns on each side.| and below V plane, and 8 columns on each side.|
  292. ; | | |
  293. ; |8 176 8|8 176 8|
  294. ; | +-------------------------------------------+ | +-------------------------------------------+ |
  295. ; | | Previous Frame U Plane | | | Previous Frame V Plane | |
  296. ; Frame | | | | | | |
  297. ; Height | | | | | | |
  298. ; Div By 2 | | | | | | |
  299. ; Lines | | | | | | |
  300. ; | +-------------------------------------------+ | +-------------------------------------------+ |
  301. ; | | |
  302. ; 8 lines | See comment above Previous U Plane | See comment above Previous V Plane |
  303. ; | | |
  304. ; | | |
  305. ; | | |
  306. ; +-----------------------------------------------+---------------------------------------------+-+
  307. ; Enough dummy space to assure that MOD ((Spatial_Frame - Previous_Frame), 4096) == 2032
  308. ; +---+---------------------------------------------------------------------------------------+---+
  309. ; | D | Spatially Filtered Y Plane (present only for H261) | D |
  310. ; | u | | u |
  311. ; Frame | m | | m |
  312. ; Height | m | | m |
  313. ; Lines | y | | y |
  314. ; | | | |
  315. ; +---+---------------------------------------------------------------------------------------+---+
  316. ; | |
  317. ; | |
  318. ; | |
  319. ; 24 lines | Dummy Space (24 lines plus 8 bytes. Can be reduced to 8 bytes if unrestricted motion |
  320. ; | vectors is NOT selected, which is certainly the case for H261.) |
  321. ; | |
  322. ; | 8 176 16 176 |8
  323. ; | +-+-------------------------------------------------------------------------------------------+-+
  324. ; +-+D| Spatially Filtered U plane (H261 only) | D | Spatially Filtered V plane (H261 only) |D|
  325. ; Frame |u| | u | |u|
  326. ; Height |m| | m | |m|
  327. ; Div By 2 |m| | m | |m|
  328. ; Lines |y| | y | |y|
  329. ; +-+-------------------------------------------+---+-------------------------------------------+-+
  330. ;
  331. ; Cache layout of the target block and the full range for the reference area (as restricted to +/- 7 in vertical,
  332. ; and +/- 7 (expandable to +/- 15) in horizontal, is as shown here. Each box represents a cache line (32 bytes),
  333. ; increasing incrementally from left to right, and then to the next row (like reading a book). The 128 boxes taken
  334. ; as a whole represent 4Kbytes. The boxes are populated as follows:
  335. ;
  336. ; R -- Data from the reference area. Each box contains 23 of the pels belonging to a line of the reference area.
  337. ; The remaining 7 pels of the line is either in the box to the left (for reference areas used to provide
  338. ; predictions for target macroblocks that begin at an address 0-mod-32), or to the right (for target MBs that
  339. ; begin at an address 16-mod-32). There are 30 R's corresponding to the 30-line limit on the vertical distance
  340. ; we might search.
  341. ;
  342. ; T -- Data from the target macroblock. Each box contains a full line (16 pels) for each of two adjacent
  343. ; macroblocks. There are 16 T's corresponding to the 16 lines of the macroblocks.
  344. ;
  345. ; S -- Space for the spatially filtered macroblock (H261 only).
  346. ;
  347. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  348. ; | T | | R | | T | | R | | S | | R | |
  349. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  350. ; | T | | R | | T | | R | | S | | R | |
  351. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  352. ; | T | | R | | T | | R | | S | | R | |
  353. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  354. ; | T | | R | | T | | R | | S | | R | |
  355. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  356. ; | T | | R | | T | | R | | S | | R | |
  357. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  358. ; | T | | R | | S | | R | | S | | R | |
  359. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  360. ; | T | | R | | S | | R | | S | | R | |
  361. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  362. ; | T | | R | | S | | R | | S | | R | |
  363. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  364. ; | T | | R | | S | | R | | S | | R | |
  365. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  366. ; | T | | R | | S | | R | | S | | R | |
  367. ; +---+---+---+---+---+---+---+---+---+---+---+---+
  368. ; | T | | R | | S | | R | |
  369. ; +---+---+---+---+---+---+---+---+
  370. ;
  371. ; Thus, in a logical sense, the above data fits into one of the 4K data cache pages, leaving the other for all other
  372. ; data. Care has been taken to assure that the tables and the stack space needed by this function fit nicely into
  373. ; the other data cache page. Only the MBlockActionStream remains to conflict with the above data structures. That
  374. ; is both unavoidable, and of minimal consequence.
  375. ; An algorithm has been selected that calculates fewer SWDs (Sum of Weighted Differences) than the typical log search.
  376. ; In the typical log search, a three level search is done, in which the SWDs are compared for the center point and a
  377. ; point at each 45 degrees, initially 4 pels away, then 2, then 1. This requires a total of 25 SWDs for each
  378. ; macroblock (except those near edges or corners).
  379. ;
  380. ; In this algorithm, six levels are performed, with each odd level being a horizontal search, and each even level being
  381. ; a vertical search. Each search compares the SWD for the center point with that of a point in each direction on the
  382. ; applicable axis. This requires 13 SWDs, and a lot simpler control structure. Here is an example picture of a
  383. ; search, in which "0" represents the initial center point (the 0,0 motion vector), "A", and "a" represent the first
  384. ; search points, etc. In this example, the "winner" of each level of the search proceeds as follows: a, B, C, C, E, F,
  385. ; arriving at a motion vector of -1 horizontal, 5 vertical.
  386. ;
  387. ; ...............
  388. ; ...............
  389. ; ...............
  390. ; ...b...........
  391. ; ...............
  392. ; ...............
  393. ; ...............
  394. ; ...a...0...A...
  395. ; ...............
  396. ; .....d.........
  397. ; ......f........
  398. ; .c.BeCE........
  399. ; ......F........
  400. ; .....D.........
  401. ; ...............
  402. ;
  403. ;
  404. ; A word about data cache performance. Conceptually, the tables and local variables used by this function are placed
  405. ; in memory such that they will fit in one 4K page of the on-chip data cache. For the Pentium (tm) microprocessor,
  406. ; this leaves the other 4K page for other purposes. The other data structures consist of:
  407. ;
  408. ; The current frame, from which we need to access the lines of the 16*16 macroblock. Since cache lines are 32 bytes
  409. ; wide, the cache fill operations that fetch one target macroblock will serve to fetch the macroblock to the right,
  410. ; so an average of 8 cache lines are fetched for each macroblock.
  411. ;
  412. ; The previous frame, from which we need to access a reference area of 30*30 pels. For each macroblock for which we
  413. ; need to search for a motion vector, we will typically need to access no more than about 25 of these, but in general
  414. ; these lines span the 30 lines of the search area. Since cache lines are 32 bytes wide, the cache fill operations
  415. ; that fetch reference data for one macroblock, will tend to fetch data that is useful as reference data for the
  416. ; macroblock to the right, so an average of about 15 (rounded up to be safe) cache lines are fetched for each
  417. ; macroblock.
  418. ;
  419. ; The MBlockActionStream, which controls the searching (since we don't need to motion estimate blocks that are
  420. ; legislated to be intra) will disrupt cache behaviour of the other data structures, but not to a significant degree.
  421. ;
  422. ; By setting the pitch to a constant of 384, and by allocating the frames as described above, the one available 4K page
  423. ; of data cache will be able to contain the 30 lines of the reference area, the 16 lines of the target area, and the
  424. ; 16 lines of the spatially filtered area (H261 only) without any collisions.
  425. ;
  426. ;
  427. ; Here is a flowchart of the major sections of this function:
  428. ;
  429. ; +-- Execute once for Y part of each macroblock that is NOT Intra By Decree --+
  430. ; | |
  431. ; | +---------------------------------------------------------------+ |
  432. ; | | 1) Compute average value for target match points. | |
  433. ; | | 2) Prepare match points in target MB for easier matching. | |
  434. ; | | 3) Compute the SWD for (0,0) motion vector. | |
  435. ; | +---------------------------------------------------------------+ |
  436. ; | | |
  437. ; | v |
  438. ; | /---------------------------------\ Yes |
  439. ; | < 4) Is 0-motion SWD good enough? >-------------------------+ |
  440. ; | \---------------------------------/ | |
  441. ; | | | |
  442. ; | |No | |
  443. ; | v | |
  444. ; | +--- 5) While state engine has more motion vectors to check ---+ | |
  445. ; | | | | |
  446. ; | | | | |
  447. ; | | +---------------------------------------------------+ | | |
  448. ; | | | 5) Compute SWDs for 2 ref MBs and pick best of 3. |----->| | |
  449. ; | | +---------------------------------------------------+ | | |
  450. ; | | | | |
  451. ; | +--------------------------------------------------------------+ | |
  452. ; | | | |
  453. ; | v | |
  454. ; | /-----------------------------------------\ | |
  455. ; | < 6) Is best motion vector the 0-vector? > | |
  456. ; | \-----------------------------------------/ | |
  457. ; | | | | |
  458. ; | |No |Yes | |
  459. ; | v v | |
  460. ; | +-----------------+ +-------------------------------------------+ | |
  461. ; | | Mark all blocks | | 6) Identify as empty block any in which: |<-+ |
  462. ; | +--| non-empty. | | --> 0-motion SWD < EmptyThresh, and | |
  463. ; | | +-----------------+ +-------------------------------------------+ |
  464. ; | | | |
  465. ; | | v |
  466. ; | | /--------------------------------\ Yes +--------------------------+ |
  467. ; | | < 6) Are all blocks marked empty? >--->| 6) Classify FORCEDEMPTY |-->|
  468. ; | | \--------------------------------/ +--------------------------+ |
  469. ; | | | |
  470. ; | | |No |
  471. ; | | v |
  472. ; | | /--------------------------------------------\ |
  473. ; | | < 7) Are any non-phantom blocks marked empty? > |
  474. ; | | \--------------------------------------------/ |
  475. ; | | | | |
  476. ; | | |No |Yes |
  477. ; | v v v |
  478. ; | +---------------------+ +--------------------------------+ |
  479. ; | | 8) Compute IntraSWD | | Set IntraSWD artificially high | |
  480. ; | +---------------------+ +--------------------------------+ |
  481. ; | | | |
  482. ; | v v |
  483. ; | +-------------------------------+ |
  484. ; | | 10) Classify block as one of: | |
  485. ; | | INTRA |--------------------------------->|
  486. ; | | INTER | |
  487. ; | +-------------------------------+ |
  488. ; | |
  489. ; +----------------------------------------------------------------------------+
  490. ;
  491. ;
  492. OPTION PROLOGUE:None
  493. OPTION EPILOGUE:ReturnAndRelieveEpilogueMacro
  494. OPTION M510
  495. include e3inst.inc
  496. include e3mbad.inc
  497. .xlist
  498. include memmodel.inc
  499. .list
  500. .DATA
  501. ; Storage for tables and temps used by Motion Estimation function. Fit into
  502. ; 4Kbytes contiguous memory so that it uses one cache page, leaving other
  503. ; for reference area of previous frame and target macroblock of current frame.
  504. PickPoint DB 0,4,?,4,0,?,2,2 ; Map CF accum to new central pt selector.
  505. IFNDEF H261
  506. PickPoint_BLS DB 6,4,?,4,6,?,2,2 ; Same, for when doing block level search.
  507. ENDIF
  508. OffsetToRef LABEL DWORD ; Linearized adjustments to affect horz/vert motion.
  509. DD ? ; This index used when zero-valued motion vector is good enough.
  510. DD 0 ; Best fit of 3 SWDs is previous center.
  511. DD 1 ; Best fit of 3 SWDs is the ref block 1 pel to the right.
  512. DD -1 ; Best fit of 3 SWDs is the ref block 1 pel to the left.
  513. DD 1*PITCH ; Best fit of 3 SWDs is the ref block 1 pel above.
  514. DD -1*PITCH ; Best fit of 3 SWDs is the ref block 1 pel below.
  515. DD 2 ; Best fit of 3 SWDs is the ref block 2 pels to the right.
  516. DD -2 ; Best fit of 3 SWDs is the ref block 2 pels to the left.
  517. DD 2*PITCH ; Best fit of 3 SWDs is the ref block 2 pel above.
  518. DD -2*PITCH ; Best fit of 3 SWDs is the ref block 2 pel below.
  519. DD 4 ; Best fit of 3 SWDs is the ref block 4 pels to the right.
  520. DD -4 ; Best fit of 3 SWDs is the ref block 4 pels to the left.
  521. DD 4*PITCH ; Best fit of 3 SWDs is the ref block 4 pel above.
  522. DD -4*PITCH ; Best fit of 3 SWDs is the ref block 4 pel below.
  523. DD 7 ; Best fit of 3 SWDs is the ref block 7 pels to the right.
  524. DD -7 ; Best fit of 3 SWDs is the ref block 7 pels to the left.
  525. DD 7*PITCH ; Best fit of 3 SWDs is the ref block 7 pel above.
  526. DD -7*PITCH ; Best fit of 3 SWDs is the ref block 7 pel below.
  527. M0 = 4 ; Define symbolic indices into OffsetToRef lookup table.
  528. MHP1 = 8
  529. MHN1 = 12
  530. MVP1 = 16
  531. MVN1 = 20
  532. MHP2 = 24
  533. MHN2 = 28
  534. MVP2 = 32
  535. MVN2 = 36
  536. MHP4 = 40
  537. MHN4 = 44
  538. MVP4 = 48
  539. MVN4 = 52
  540. MHP7 = 56
  541. MHN7 = 60
  542. MVP7 = 64
  543. MVN7 = 68
  544. ; Map linearized motion vector to vertical part.
  545. ; (Mask bottom byte of linearized MV to zero, then use result
  546. ; as index into this array to get vertical MV.)
  547. IF PITCH-384
  548. *** error: The magic of this table assumes a pitch of 384.
  549. ENDIF
  550. DB -32, -32
  551. DB -30
  552. DB -28, -28
  553. DB -26
  554. DB -24, -24
  555. DB -22
  556. DB -20, -20
  557. DB -18
  558. DB -16, -16
  559. DB -14
  560. DB -12, -12
  561. DB -10
  562. DB -8, -8
  563. DB -6
  564. DB -4, -4
  565. DB -2
  566. DB 0
  567. UnlinearizedVertMV DB 0
  568. DB 2
  569. DB 4, 4
  570. DB 6
  571. DB 8, 8
  572. DB 10
  573. DB 12, 12
  574. DB 14
  575. DB 16, 16
  576. DB 18
  577. DB 20, 20
  578. DB 22
  579. DB 24, 24
  580. DB 26
  581. DB 28, 28
  582. DB 30
  583. ; Map initial states to initializers for half pel search. Where search would
  584. ; illegally take us off edge of picture, set initializer artificially high.
  585. IFNDEF H261
  586. InitHalfPelSearchHorz LABEL DWORD
  587. DD 040000000H, 000000000H, 000004000H
  588. DD 040000000H, 000000000H, 000004000H
  589. DD 040000000H, 000000000H, 000004000H
  590. DD 040000000H, 000000000H, 000004000H
  591. InitHalfPelSearchVert LABEL DWORD
  592. DD 040000000H, 040000000H, 040000000H
  593. DD 000000000H, 000000000H, 000000000H
  594. DD 000004000H, 000004000H, 000004000H
  595. DD 040004000H, 040004000H, 040004000H
  596. ENDIF
  597. SWDState LABEL BYTE ; Rules that govern state engine of motion estimator.
  598. DB 8 DUP (?) ; 0: not used.
  599. ; 1: Upper Left Corner. Explore 4 right and 4 down.
  600. DB 21, M0 ; (0,0)
  601. DB 22, MHP4 ; (0,4)
  602. DB 23, MVP4, ?, ? ; (4,0)
  603. ; 2: Upper Edge. Explore 4 left and 4 right.
  604. DB 22, M0 ; (0, 0)
  605. DB 22, MHN4 ; (0,-4)
  606. DB 22, MHP4, ?, ? ; (0, 4)
  607. ; 3: Upper Right Corner. Explore 4 right and 4 down.
  608. DB 31, M0 ; (0, 0)
  609. DB 22, MHN4 ; (0,-4)
  610. DB 32, MVP4, ?, ? ; (4, 0)
  611. ; 4: Left Edge. Explore 4 up and 4 down.
  612. DB 23, M0 ; ( 0,0)
  613. DB 23, MVN4 ; (-4,0)
  614. DB 23, MVP4, ?, ? ; ( 4,0)
  615. ; 5: Interior Macroblock. Explore 4 up and 4 down.
  616. DB 37, M0 ; ( 0,0)
  617. DB 37, MVN4 ; (-4,0)
  618. DB 37, MVP4, ?, ? ; ( 4,0)
  619. ; 6: Right Edge. Explore 4 up and 4 down.
  620. DB 32, M0 ; ( 0,0)
  621. DB 32, MVN4 ; (-4,0)
  622. DB 32, MVP4, ?, ? ; ( 4,0)
  623. ; 7: Lower Left Corner. Explore 4 up and 4 right.
  624. DB 38, M0 ; ( 0,0)
  625. DB 39, MHP4 ; ( 0,4)
  626. DB 23, MVN4, ?, ? ; (-4,0)
  627. ; 8: Lower Edge. Explore 4 left and 4 right.
  628. DB 39, M0 ; (0, 0)
  629. DB 39, MHN4 ; (0,-4)
  630. DB 39, MHP4, ?, ? ; (0, 4)
  631. ; 9: Lower Right Corner. Explore 4 up and 4 left.
  632. DB 44, M0 ; ( 0, 0)
  633. DB 39, MHN4 ; ( 0,-4)
  634. DB 32, MVN4, ?, ? ; (-4, 0)
  635. ; 10: Left Edge, No Vertical Motion Allowed.
  636. DB 46, M0 ; (0,0)
  637. DB 48, MHP2 ; (0,2)
  638. DB 47, MHP4, ?, ? ; (0,4)
  639. ; 11: Interior Macroblock, No Vertical Motion Allowed.
  640. DB 47, M0 ; (0, 0)
  641. DB 47, MHN4 ; (0,-4)
  642. DB 47, MHP4, ?, ? ; (0, 4)
  643. ; 12: Right Edge, No Vertical Motion Allowed.
  644. DB 49, M0 ; (0, 0)
  645. DB 48, MHN2 ; (0,-2)
  646. DB 47, MHN4, ?, ? ; (0,-4)
  647. ; 13: Horz by 2, Vert by 2, Horz by 1, Vert by 1.
  648. DB 14, M0
  649. DB 14, MHP2
  650. DB 14, MHN2, ?, ?
  651. ; 14: Vert by 2, Horz by 1, Vert by 1.
  652. DB 15, M0
  653. DB 15, MVP2
  654. DB 15, MVN2, ?, ?
  655. ; 15: Horz by 1, Vert by 1.
  656. DB 16, M0
  657. DB 16, MHP1
  658. DB 16, MHN1, ?, ?
  659. ; 16: Vert by 1.
  660. DB 0, M0
  661. DB 0, MVP1
  662. DB 0, MVN1, ?, ?
  663. ; 17: Vert by 2, Horz by 2, Vert by 1, Horz by 1.
  664. DB 18, M0
  665. DB 18, MVP2
  666. DB 18, MVN2, ?, ?
  667. ; 18: Horz by 2, Vert by 1, Horz by 1.
  668. DB 19, M0
  669. DB 19, MHP2
  670. DB 19, MHN2, ?, ?
  671. ; 19: Vert by 1, Horz by 1.
  672. DB 20, M0
  673. DB 20, MVP1
  674. DB 20, MVN1, ?, ?
  675. ; 20: Horz by 1.
  676. DB 0, M0
  677. DB 0, MHP1
  678. DB 0, MHN1, ?, ?
  679. ; 21: From 1A. Upper Left. Try 2 right and 2 down.
  680. DB 24, M0 ; (0, 0)
  681. DB 25, MHP2 ; (0, 2)
  682. DB 26, MVP2, ?, ? ; (2, 0)
  683. ; 22: From 1B.
  684. ; From 2 center point would be (0,-4/0/4).
  685. ; From 3B center point would be (0,-4).
  686. DB 27, M0 ; (0, 4)
  687. DB 18, MVP2 ; (2, 4) Next: Horz 2, Vert 1, Horz 1. (1:3,1:7)
  688. DB 13, MVP4, ?, ? ; (4, 4) Next: Horz 2, Vert 2, Horz 1, Vert 1. (1:7,1:7)
  689. ; 23: From 1C.
  690. ; From 4 center point would be (-4/0/4,0).
  691. ; From 7C center point would be (-4,0).
  692. DB 29, M0 ; (4, 0)
  693. DB 14, MHP2 ; (4, 2) Next: Vert 2, Horz 1, Vert 1. (1:7,1:3)
  694. DB 17, MHP4, ?, ? ; (4, 4) Next: Vert 2, Horz 2, Vert 1, Horz 1. (1:7,1:7)
  695. ; 24: From 21A. Upper Left. Try 1 right and 1 down.
  696. DB 0, M0 ; (0, 0)
  697. DB 0, MHP1 ; (1, 0)
  698. DB 0, MVP1, ?, ? ; (0, 1)
  699. ; 25: From 21B.
  700. ; From 31B center point would be (0,-2).
  701. DB 20, M0 ; (0, 2) Next: Horz 1 (0,1:3)
  702. DB 20, MVP1 ; (1, 2) Next: Horz 1 (1,1:3)
  703. DB 15, MVP2, ?, ? ; (2, 2) Next: Horz 1, Vert 1 (1:3,1:3)
  704. ; 26: From 21C.
  705. ; From 38C center point would be (-2,0).
  706. DB 16, M0 ; (2, 0) Next: Vert 1 (1:3,0)
  707. DB 16, MHP1 ; (2, 1) Next: Vert 1 (1:3,1)
  708. DB 19, MHP2, ?, ? ; (2, 2) Next: Vert 1, Horz 1 (1:3,1:3)
  709. ; 27: From 22A.
  710. DB 28, M0 ; (0, 4)
  711. DB 28, MHN2 ; (0, 2)
  712. DB 28, MHP2, ?, ? ; (0, 6)
  713. ; 28: From 27.
  714. DB 20, M0 ; (0, 2/4/6) Next: Horz 1. (0,1:7)
  715. DB 20, MVP1 ; (1, 2/4/6) Next: Horz 1. (1,1:7)
  716. DB 20, MVP2, ?, ? ; (2, 2/4/6) Next: Horz 1. (2,1:7)
  717. ; 29: From 23A.
  718. DB 30, M0 ; (4, 0)
  719. DB 30, MVN2 ; (2, 0)
  720. DB 30, MVP2, ?, ? ; (6, 0)
  721. ; 30: From 29.
  722. DB 16, M0 ; (2/4/6, 0) Next: Vert 1. (1:7,0)
  723. DB 16, MHP1 ; (2/4/6, 1) Next: Vert 1. (1:7,1)
  724. DB 16, MHP2, ?, ? ; (2/4/6, 2) Next: Vert 1. (1:7,2)
  725. ; 31: From 3A. Upper Right. Try 2 left and 2 down.
  726. DB 33, M0 ; (0, 0)
  727. DB 25, MHN2 ; (0,-2)
  728. DB 34, MVP2, ?, ? ; (2, 0)
  729. ; 32: From 3C.
  730. ; From 6 center point would be (-4/0/4, 0)
  731. ; From 9C center point would be (-4, 0)
  732. DB 35, M0 ; (4, 0)
  733. DB 14, MHN2 ; (4,-2) Next: Vert2,Horz1,Vert1. (1:7,-1:-3)
  734. DB 17, MHN4, ?, ? ; (4,-4) Next: Vert2,Horz2,Vert1,Horz1. (1:7,-1:-7)
  735. ; 33: From 31A. Upper Right. Try 1 left and 1 down.
  736. DB 0, M0 ; (0, 0)
  737. DB 0, MHN1 ; (0,-1)
  738. DB 0, MVP1, ?, ? ; (1, 0)
  739. ; 34: From 31C.
  740. ; From 44C center point would be (-2, 0)
  741. DB 16, M0 ; (2, 0) Next: Vert 1 (1:3, 0)
  742. DB 16, MHN1 ; (2,-1) Next: Vert 1 (1:3,-1)
  743. DB 19, MHN2, ?, ? ; (2,-2) Next: Vert 1, Horz 1 (1:3,-1:-3)
  744. ; 35: From 32A.
  745. DB 36, M0 ; (4, 0)
  746. DB 36, MVN2 ; (2, 0)
  747. DB 36, MVP2, ?, ? ; (6, 0)
  748. ; 36: From 35.
  749. DB 16, M0 ; (2/4/6, 0) Next: Vert 1. (1:7, 0)
  750. DB 16, MHN1 ; (2/4/6,-1) Next: Vert 1. (1:7,-1)
  751. DB 16, MHN2, ?, ? ; (2/4/6,-2) Next: Vert 1. (1:7,-2)
  752. ; 37: From 5.
  753. DB 17, M0 ; (-4/0/4, 0) Next: Vert2,Horz2,Vert1,Horz1 (-7:7,-3: 3)
  754. DB 17, MHP4 ; (-4/0/4,-4) Next: Vert2,Horz2,Vert1,Horz1 (-7:7, 1: 7)
  755. DB 17, MHN4, ?, ? ; (-4/0/4, 4) Next: Vert2,Horz2,Vert1,Horz1 (-7:7,-7:-1)
  756. ; 38: From 7A. Lower Left. Try 2 right and 2 up.
  757. DB 42, M0 ; ( 0,0)
  758. DB 43, MHP2 ; ( 0,2)
  759. DB 26, MVN2, ?, ? ; (-2,0)
  760. ; 39: From 13B.
  761. ; From 14 center point would be (0,-4/0/4)
  762. ; From 16B center point would be (0,-4)
  763. DB 40, M0 ; ( 0,4)
  764. DB 18, MVN2 ; (-2,4) Next: Horz2,Vert1,Horz1. (-3:-1,1:7)
  765. DB 13, MVN4, ?, ? ; (-4,4) Next: Horz2,Vert2,Horz1,Vert1. (-7:-1,1:7)
  766. ; 40: From 39A.
  767. DB 41, M0 ; (0, 4)
  768. DB 41, MHN2 ; (0, 2)
  769. DB 41, MHP2, ?, ? ; (0, 6)
  770. ; 41: From 40.
  771. DB 20, M0 ; ( 0,2/4/6) Next: Horz 1. ( 0,1:7)
  772. DB 20, MVN1 ; (-1,2/4/6) Next: Horz 1. (-1,1:7)
  773. DB 20, MVN2, ?, ? ; (-2,2/4/6) Next: Horz 1. (-2,1:7)
  774. ; 42: From 38A. Lower Left. Try 1 right and 1 up.
  775. DB 0, M0 ; ( 0,0)
  776. DB 0, MHP1 ; ( 0,1)
  777. DB 0, MVN1, ?, ? ; (-1,0)
  778. ; 43: From 38B.
  779. ; From 44B center point would be (0,-2)
  780. DB 20, M0 ; ( 0,2) Next: Horz 1 ( 0,1:3)
  781. DB 20, MVN1 ; (-1,2) Next: Horz 1 (-1,1:3)
  782. DB 15, MVN2, ?, ? ; (-2,2) Next: Horz 1, Vert 1 (-1:-3,1:3)
  783. ; 44: From 9A. Lower Right. Try 2 left and 2 up.
  784. DB 45, M0 ; ( 0, 0)
  785. DB 43, MHN2 ; ( 0,-2)
  786. DB 34, MVN2, ?, ? ; (-2, 0)
  787. ; 45: From 44A. Lower Right. Try 1 left and 1 up.
  788. DB 0, M0 ; ( 0, 0)
  789. DB 0, MHN1 ; ( 0,-1)
  790. DB 0, MVN1, ?, ? ; (-1, 0)
  791. ; 46: From 17A.
  792. DB 0, M0 ; (0,0)
  793. DB 0, MHP1 ; (0,1)
  794. DB 0, MHP1, ?, ? ; (0,1)
  795. ; 47: From 10C.
  796. ; From 11 center point would be (0,4/0/-4)
  797. ; From 12C center point would be (0,-4)
  798. DB 48, M0 ; (0,4)
  799. DB 48, MHN2 ; (0,2)
  800. DB 48, MHP2, ?, ? ; (0,6)
  801. ; 48 From 10B.
  802. ; From 47 center point would be (0,2/4/6)
  803. ; From 12B center point would be (0,-2)
  804. DB 0, M0 ; (0,2)
  805. DB 0, MHN1 ; (0,1)
  806. DB 0, MHP1, ?, ? ; (0,3)
  807. ; 49 From 12A.
  808. DB 0, M0 ; (0, 0)
  809. DB 0, MHN1 ; (0,-1)
  810. DB 0, MHN1, ?, ? ; (0,-1)
  811. ; 50: Interior Macroblock. Explore 7 up and 7 down.
  812. DB 51, M0 ; ( 0,0)
  813. DB 51, MVN7 ; (-7,0)
  814. DB 51, MVP7, ?, ? ; ( 7,0)
  815. ; 51: Explore 7 left and 7 right.
  816. DB 5, M0 ; (-7|0|7, 0)
  817. DB 5, MHN7 ; (-7|0|7,-7)
  818. DB 5, MHP7, ?, ? ; (-7|0|7, 7)
  819. MulByNeg8 LABEL DWORD
  820. CNT = 0
  821. REPEAT 128
  822. DD WeightedDiff+CNT
  823. CNT = CNT - 8
  824. ENDM
  825. ; The following treachery puts the numbers into byte 2 of each aligned DWORD.
  826. DB 0, 0
  827. DD 193 DUP (255)
  828. DD 250,243,237,231,225,219,213,207,201,195,189,184,178,172,167,162,156
  829. DD 151,146,141,135,130,126,121,116,111,107,102, 97, 93, 89, 84, 80, 76
  830. DD 72, 68, 64, 61, 57, 53, 50, 46, 43, 40, 37, 34, 31, 28, 25, 22, 20
  831. DD 18, 15, 13, 11, 9, 7, 6, 4, 3, 2, 1
  832. DB 0, 0
  833. WeightedDiff LABEL DWORD
  834. DB 0, 0
  835. DD 0, 0, 1, 2, 3, 4, 6, 7, 9, 11, 13, 15, 18
  836. DD 20, 22, 25, 28, 31, 34, 37, 40, 43, 46, 50, 53, 57, 61, 64, 68, 72
  837. DD 76, 80, 84, 89, 93, 97,102,107,111,116,121,126,130,135,141,146,151
  838. DD 156,162,167,172,178,184,189,195,201,207,213,219,225,231,237,243,250
  839. DD 191 DUP (255)
  840. DB 255, 0
  841. IFNDEF H261
  842. MotionOffsets DD 1*PITCH,0,?,?
  843. ENDIF
  844. RemnantOfCacheLine DB 8 DUP (?)
  845. LocalStorage LABEL DWORD ; Local storage goes on the stack at addresses
  846. ; whose lower 12 bits match this address.
  847. .CODE
  848. ASSUME cs : FLAT
  849. ASSUME ds : FLAT
  850. ASSUME es : FLAT
  851. ASSUME fs : FLAT
  852. ASSUME gs : FLAT
  853. ASSUME ss : FLAT
  854. MOTIONESTIMATION proc C AMBAS: DWORD,
  855. ATargFrmBase: DWORD,
  856. APrevFrmBase: DWORD,
  857. AFiltFrmBase: DWORD,
  858. ADo15Search: DWORD,
  859. ADoHalfPelEst: DWORD,
  860. ADoBlkLvlVec: DWORD,
  861. ADoSpatialFilt: DWORD,
  862. AZeroVectorThresh: DWORD,
  863. ANonZeroMVDiff: DWORD,
  864. ABlockMVDiff: DWORD,
  865. AEmptyThresh: DWORD,
  866. AInterCodThresh: DWORD,
  867. AIntraCodDiff: DWORD,
  868. ASpatialFiltThresh: DWORD,
  869. ASpatialFiltDiff: DWORD,
  870. AIntraSWDTot: DWORD,
  871. AIntraSWDBlks: DWORD,
  872. AInterSWDTot: DWORD,
  873. AInterSWDBlks: DWORD
  874. LocalFrameSize = 128 + 168*4 + 32 ; 128 for locals; 168*4 for blocks; 32 for dummy block.
  875. RegStoSize = 16
  876. ; Arguments:
  877. MBlockActionStream_arg = RegStoSize + 4
  878. TargetFrameBaseAddress_arg = RegStoSize + 8
  879. PreviousFrameBaseAddress_arg = RegStoSize + 12
  880. FilteredFrameBaseAddress_arg = RegStoSize + 16
  881. DoRadius15Search_arg = RegStoSize + 20
  882. DoHalfPelEstimation_arg = RegStoSize + 24
  883. DoBlockLevelVectors_arg = RegStoSize + 28
  884. DoSpatialFiltering_arg = RegStoSize + 32
  885. ZeroVectorThreshold_arg = RegStoSize + 36
  886. NonZeroMVDifferential_arg = RegStoSize + 40
  887. BlockMVDifferential_arg = RegStoSize + 44
  888. EmptyThreshold_arg = RegStoSize + 48
  889. InterCodingThreshold_arg = RegStoSize + 52
  890. IntraCodingDifferential_arg = RegStoSize + 56
  891. SpatialFiltThreshold_arg = RegStoSize + 60
  892. SpatialFiltDifferential_arg = RegStoSize + 64
  893. IntraSWDTotal_arg = RegStoSize + 68
  894. IntraSWDBlocks_arg = RegStoSize + 72
  895. InterSWDTotal_arg = RegStoSize + 76
  896. InterSWDBlocks_arg = RegStoSize + 80
  897. EndOfArgList = RegStoSize + 84
  898. ; Locals (on local stack frame)
  899. MBlockActionStream EQU [esp+ 0]
  900. CurrSWDState EQU [esp+ 4]
  901. MotionOffsetsCursor EQU CurrSWDState
  902. HalfPelHorzSavings EQU CurrSWDState
  903. VertFilterDoneAddr EQU CurrSWDState
  904. IntraSWDTotal EQU [esp+ 8]
  905. IntraSWDBlocks EQU [esp+ 12]
  906. InterSWDTotal EQU [esp+ 16]
  907. InterSWDBlocks EQU [esp+ 20]
  908. MBCentralInterSWD EQU [esp+ 24]
  909. MBRef1InterSWD EQU [esp+ 28]
  910. MBRef2InterSWD EQU [esp+ 32]
  911. MBCentralInterSWD_BLS EQU [esp+ 36]
  912. MB0MVInterSWD EQU [esp+ 40]
  913. MBAddrCentralPoint EQU [esp+ 44]
  914. MBMotionVectors EQU [esp+ 48]
  915. DoHalfPelEstimation EQU [esp+ 52]
  916. DoBlockLevelVectors EQU [esp+ 56]
  917. DoSpatialFiltering EQU [esp+ 60]
  918. ZeroVectorThreshold EQU [esp+ 64]
  919. NonZeroMVDifferential EQU [esp+ 68]
  920. BlockMVDifferential EQU [esp+ 72]
  921. EmptyThreshold EQU [esp+ 76]
  922. InterCodingThreshold EQU [esp+ 80]
  923. IntraCodingDifferential EQU [esp+ 84]
  924. SpatialFiltThreshold EQU [esp+ 88]
  925. SpatialFiltDifferential EQU [esp+ 92]
  926. TargetMBAddr EQU [esp+ 96]
  927. TargetFrameBaseAddress EQU [esp+ 100]
  928. PreviousFrameBaseAddress EQU [esp+ 104]
  929. TargToRef EQU [esp+ 108]
  930. TargToSLF EQU [esp+ 112]
  931. DoRadius15Search EQU [esp+ 116]
  932. StashESP EQU [esp+ 120]
  933. BlockLen EQU 168
  934. Block1 EQU [esp+ 128+40] ; "128" is for locals. "40" is so offsets range from -40 to 124.
  935. Block2 EQU Block1 + BlockLen
  936. Block3 EQU Block2 + BlockLen
  937. Block4 EQU Block3 + BlockLen
  938. BlockN EQU Block4 + BlockLen
  939. BlockNM1 EQU Block4
  940. BlockNM2 EQU Block3
  941. BlockNP1 EQU Block4 + BlockLen + BlockLen
  942. DummyBlock EQU Block4 + BlockLen
  943. Ref1Addr EQU -40
  944. Ref2Addr EQU -36
  945. AddrCentralPoint EQU -32
  946. CentralInterSWD EQU -28
  947. Ref1InterSWD EQU -24
  948. Ref2InterSWD EQU -20
  949. CentralInterSWD_BLS EQU -16 ; CentralInterSWD, when doing blk level search.
  950. CentralInterSWD_SLF EQU -16 ; CentralInterSWD, when doing spatial filter.
  951. HalfPelSavings EQU Ref2Addr
  952. ZeroMVInterSWD EQU -12
  953. BlkHMV EQU -8
  954. BlkVMV EQU -7
  955. BlkMVs EQU -8
  956. AccumTargetPels EQU -4
  957. ; Offsets for Negated Quadrupled Target Pels:
  958. N8T00 EQU 0
  959. N8T04 EQU 4
  960. N8T02 EQU 8
  961. N8T06 EQU 12
  962. N8T20 EQU 16
  963. N8T24 EQU 20
  964. N8T22 EQU 24
  965. N8T26 EQU 28
  966. N8T40 EQU 32
  967. N8T44 EQU 36
  968. N8T42 EQU 40
  969. N8T46 EQU 44
  970. N8T60 EQU 48
  971. N8T64 EQU 52
  972. N8T62 EQU 56
  973. N8T66 EQU 60
  974. N8T11 EQU 64
  975. N8T15 EQU 68
  976. N8T13 EQU 72
  977. N8T17 EQU 76
  978. N8T31 EQU 80
  979. N8T35 EQU 84
  980. N8T33 EQU 88
  981. N8T37 EQU 92
  982. N8T51 EQU 96
  983. N8T55 EQU 100
  984. N8T53 EQU 104
  985. N8T57 EQU 108
  986. N8T71 EQU 112
  987. N8T75 EQU 116
  988. N8T73 EQU 120
  989. N8T77 EQU 124
  990. push esi
  991. push edi
  992. push ebp
  993. push ebx
  994. ; Adjust stack ptr so that local frame fits nicely in cache w.r.t. other data.
  995. mov esi,esp
  996. sub esp,000001000H
  997. mov eax,[esp] ; Cause system to commit page.
  998. sub esp,000001000H
  999. and esp,0FFFFF000H
  1000. mov ebx,OFFSET LocalStorage+31
  1001. and ebx,000000FE0H
  1002. mov edx,PD [esi+MBlockActionStream_arg]
  1003. or esp,ebx
  1004. mov eax,PD [esi+TargetFrameBaseAddress_arg]
  1005. mov TargetFrameBaseAddress,eax
  1006. mov ebx,PD [esi+PreviousFrameBaseAddress_arg]
  1007. mov PreviousFrameBaseAddress,ebx
  1008. sub ebx,eax
  1009. mov ecx,PD [esi+FilteredFrameBaseAddress_arg]
  1010. sub ecx,eax
  1011. mov TargToRef,ebx
  1012. mov TargToSLF,ecx
  1013. mov eax,PD [esi+EmptyThreshold_arg]
  1014. mov EmptyThreshold,eax
  1015. mov eax,PD [esi+DoHalfPelEstimation_arg]
  1016. mov DoHalfPelEstimation,eax
  1017. mov eax,PD [esi+DoBlockLevelVectors_arg]
  1018. mov DoBlockLevelVectors,eax
  1019. mov eax,PD [esi+DoRadius15Search_arg]
  1020. mov DoRadius15Search,eax
  1021. mov eax,PD [esi+DoSpatialFiltering_arg]
  1022. mov DoSpatialFiltering,eax
  1023. mov eax,PD [esi+ZeroVectorThreshold_arg]
  1024. mov ZeroVectorThreshold,eax
  1025. mov eax,PD [esi+NonZeroMVDifferential_arg]
  1026. mov NonZeroMVDifferential,eax
  1027. mov eax,PD [esi+BlockMVDifferential_arg]
  1028. mov BlockMVDifferential,eax
  1029. mov eax,PD [esi+InterCodingThreshold_arg]
  1030. mov InterCodingThreshold,eax
  1031. mov eax,PD [esi+IntraCodingDifferential_arg]
  1032. mov IntraCodingDifferential,eax
  1033. mov eax,PD [esi+SpatialFiltThreshold_arg]
  1034. mov SpatialFiltThreshold,eax
  1035. mov eax,PD [esi+SpatialFiltDifferential_arg]
  1036. mov SpatialFiltDifferential,eax
  1037. xor ebx,ebx
  1038. mov IntraSWDBlocks,ebx
  1039. mov InterSWDBlocks,ebx
  1040. mov IntraSWDTotal,ebx
  1041. mov InterSWDTotal,ebx
  1042. mov Block1.BlkMVs,ebx
  1043. mov Block2.BlkMVs,ebx
  1044. mov Block3.BlkMVs,ebx
  1045. mov Block4.BlkMVs,ebx
  1046. mov DummyBlock.Ref1Addr,esp
  1047. mov DummyBlock.Ref2Addr,esp
  1048. mov StashESP,esi
  1049. jmp FirstMacroBlock
  1050. ; Activity Details for this section of code (refer to flow diagram above):
  1051. ;
  1052. ; 1) To calculate an average value for the target match points of each
  1053. ; block, we sum the 32 match points. The totals for each of the 4
  1054. ; blocks is output seperately.
  1055. ;
  1056. ; 2) Define each prepared match point in the target macroblock as the
  1057. ; real match point times negative 8, with the base address of the
  1058. ; WeightedDiff lookup table added. I.e.
  1059. ;
  1060. ; for (i = 0; i < 16; i += 2)
  1061. ; for (j = 0; j < 16; j += 2)
  1062. ; N8T[i][j] = ( -8 * Target[i][j]) + ((U32) WeightedDiff);
  1063. ;
  1064. ; Both the multiply and the add of the WeightedDiff array base are
  1065. ; effected by a table lookup into the array MulByNeg8.
  1066. ;
  1067. ; Then the SWD of a reference macroblock can be calculated as follows:
  1068. ;
  1069. ; SWD = 0;
  1070. ; for each match point (i,j)
  1071. ; SWD += *((U32 *) (N8T[i][j] + 8 * Ref[i][j]));
  1072. ;
  1073. ; In assembly, the fetch of WeightedDiff array element amounts to this:
  1074. ;
  1075. ; mov edi,DWORD PTR N8T[i][j] ; Fetch N8T[i][j]
  1076. ; mov dl,BYTE PTR Ref[i][j] ; Fetch Ref[i][j]
  1077. ; mov edi,DWORD PTR[edi+edx*8] ; Fetch WeithtedDiff of target & ref.
  1078. ;
  1079. ; 3) We calculate the 0-motion SWD, as described just above. We use 32
  1080. ; match points per block, and write the result seperately for each
  1081. ; block. The result is accumulated into the high half of ebp.
  1082. ;
  1083. ; 4) If the SWD for the 0-motion vector is below a threshold, we don't
  1084. ; bother searching for other possibly better motion vectors. Presently,
  1085. ; this threshold is set such that an average difference of less than
  1086. ; three per match point causes the 0-motion vector to be accepted.
  1087. ;
  1088. ; Register usage for this section:
  1089. ;
  1090. ; Input of this section:
  1091. ;
  1092. ; edx -- MBlockActionStream
  1093. ;
  1094. ; Predominate usage for body of this section:
  1095. ;
  1096. ; esi -- Target block address.
  1097. ; edi -- 0-motion reference block address.
  1098. ; ebp[ 0:12] -- Accumulator for target pels.
  1099. ; ebp[13:15] -- Loop control
  1100. ; ebp[16:31] -- Accumulator for weighted diff between target and 0-MV ref.
  1101. ; edx -- Address at which to store -8 times pels.
  1102. ; ecx -- A reference pel.
  1103. ; ebx -- A target pel.
  1104. ; eax -- A target pel times -8; and a weighted difference.
  1105. ;
  1106. ; Expected Pentium (tm) microprocessor performance for section:
  1107. ;
  1108. ; Executed once per macroblock.
  1109. ;
  1110. ; 520 clocks for instruction execution
  1111. ; 8 clocks for bank conflicts (64 dual mem ops with 1/8 chance of conflict)
  1112. ; 80 clocks generously estimated for an average of 8 cache line fills for
  1113. ; the target macroblock and 8 cache line fills for the reference area.
  1114. ; ----
  1115. ; 608 clocks total time for this section.
  1116. ;
  1117. NextMacroBlock:
  1118. mov bl,[edx].CodedBlocks
  1119. add edx,SIZEOF T_MacroBlockActionDescr
  1120. and ebx,000000040H ; Check for end-of-stream
  1121. jne Done
  1122. FirstMacroBlock:
  1123. mov cl,[edx].CodedBlocks ; Init CBP for macroblock.
  1124. mov ebp,TargetFrameBaseAddress
  1125. mov bl,[edx].FirstMEState ; First State
  1126. mov eax,DoRadius15Search ; Searching 15 full pels out, or just 7?
  1127. neg al ; doing blk lvl => al=0, not => al=-1
  1128. or cl,03FH ; Indicate all 6 blocks are coded.
  1129. and al,bl
  1130. mov esi,[edx].BlkY1.BlkOffset ; Get address of next macroblock to do.
  1131. cmp al,5
  1132. jne @f
  1133. mov bl,50 ; Cause us to search +/- 15 if central
  1134. ; ; block and willing to go that far.
  1135. @@:
  1136. mov edi,TargToRef
  1137. add esi,ebp
  1138. mov CurrSWDState,ebx ; Stash First State Number as current.
  1139. add edi,esi
  1140. xor ebp,ebp
  1141. mov TargetMBAddr,esi ; Stash address of target macroblock.
  1142. mov MBlockActionStream,edx ; Stash list ptr.
  1143. mov [edx].CodedBlocks,cl
  1144. mov ecx,INTER1MV ; Speculate INTER-coding, 1 motion vector.
  1145. mov [edx].BlockType,cl
  1146. lea edx,Block1
  1147. PrepMatchPointsNextBlock:
  1148. mov bl,PB [esi+6] ; 06A -- Target Pel 00.
  1149. add ebp,ebx ; 06B -- Accumulate target pels.
  1150. mov cl,PB [edi+6] ; 06C -- Reference Pel 00.
  1151. mov eax,MulByNeg8[ebx*4] ; 06D -- Target Pel 00 * -8.
  1152. mov bl,PB [esi+4] ; 04A
  1153. mov [edx].N8T06,eax ; 06E -- Store negated quadrupled Pel 00.
  1154. add ebp,ebx ; 04B
  1155. mov eax,PD [eax+ecx*8] ; 06F -- Weighted difference for Pel 00.
  1156. mov cl,PB [edi+4] ; 04C
  1157. add ebp,eax ; 06G -- Accumulate weighted difference.
  1158. mov eax,MulByNeg8[ebx*4] ; 04D
  1159. mov bl,PB [esi+2] ; 02A
  1160. mov [edx].N8T04,eax ; 04E
  1161. add ebp,ebx ; 02B
  1162. mov eax,PD [eax+ecx*8] ; 04F
  1163. mov cl,PB [edi+2] ; 02C
  1164. add ebp,eax ; 04G
  1165. mov eax,MulByNeg8[ebx*4] ; 02D
  1166. mov bl,PB [esi] ; 00A
  1167. mov [edx].N8T02,eax ; 02E
  1168. add ebp,ebx ; 00B
  1169. mov eax,PD [eax+ecx*8] ; 02F
  1170. add esi,PITCH+1
  1171. mov cl,PB [edi] ; 00C
  1172. add edi,PITCH+1
  1173. lea ebp,[ebp+eax+000004000H] ; 02G (plus loop control)
  1174. mov eax,MulByNeg8[ebx*4] ; 00D
  1175. mov bl,PB [esi+6] ; 17A
  1176. mov [edx].N8T00,eax ; 00E
  1177. add ebp,ebx ; 17B
  1178. mov eax,PD [eax+ecx*8] ; 00F
  1179. mov cl,PB [edi+6] ; 17C
  1180. add ebp,eax ; 00G
  1181. mov eax,MulByNeg8[ebx*4] ; 17D
  1182. mov bl,PB [esi+4] ; 15A
  1183. mov [edx].N8T17,eax ; 17E
  1184. add ebp,ebx ; 15B
  1185. mov eax,PD [eax+ecx*8] ; 17F
  1186. mov cl,PB [edi+4] ; 15C
  1187. add ebp,eax ; 17G
  1188. mov eax,MulByNeg8[ebx*4] ; 15D
  1189. mov bl,PB [esi+2] ; 13A
  1190. mov [edx].N8T15,eax ; 15E
  1191. add ebp,ebx ; 13B
  1192. mov eax,PD [eax+ecx*8] ; 15F
  1193. mov cl,PB [edi+2] ; 13C
  1194. add ebp,eax ; 15G
  1195. mov eax,MulByNeg8[ebx*4] ; 13D
  1196. mov bl,PB [esi] ; 11A
  1197. mov [edx].N8T13,eax ; 13E
  1198. add ebp,ebx ; 11B
  1199. mov eax,PD [eax+ecx*8] ; 13F
  1200. add esi,PITCH-1
  1201. mov cl,PB [edi] ; 11C
  1202. add edi,PITCH-1
  1203. add ebp,eax ; 13G
  1204. mov eax,MulByNeg8[ebx*4] ; 11D
  1205. mov bl,PB [esi+6] ; 26A
  1206. mov [edx].N8T11,eax ; 11E
  1207. add ebp,ebx ; 26B
  1208. mov eax,PD [eax+ecx*8] ; 11F
  1209. mov cl,PB [edi+6] ; 26C
  1210. add ebp,eax ; 11G
  1211. mov eax,MulByNeg8[ebx*4] ; 26D
  1212. mov bl,PB [esi+4] ; 24A
  1213. mov [edx].N8T26,eax ; 26E
  1214. add ebp,ebx ; 24B
  1215. mov eax,PD [eax+ecx*8] ; 26F
  1216. mov cl,PB [edi+4] ; 24C
  1217. add ebp,eax ; 26G
  1218. mov eax,MulByNeg8[ebx*4] ; 24D
  1219. mov bl,PB [esi+2] ; 22A
  1220. mov [edx].N8T24,eax ; 24E
  1221. add ebp,ebx ; 22B
  1222. mov eax,PD [eax+ecx*8] ; 24F
  1223. mov cl,PB [edi+2] ; 22C
  1224. add ebp,eax ; 24G
  1225. mov eax,MulByNeg8[ebx*4] ; 22D
  1226. mov bl,PB [esi] ; 20A
  1227. mov [edx].N8T22,eax ; 22E
  1228. add ebp,ebx ; 20B
  1229. mov eax,PD [eax+ecx*8] ; 22F
  1230. add esi,PITCH+1
  1231. mov cl,PB [edi] ; 20C
  1232. add edi,PITCH+1
  1233. add ebp,eax ; 22G
  1234. mov eax,MulByNeg8[ebx*4] ; 20D
  1235. mov bl,PB [esi+6] ; 37A
  1236. mov [edx].N8T20,eax ; 20E
  1237. add ebp,ebx ; 37B
  1238. mov eax,PD [eax+ecx*8] ; 20F
  1239. mov cl,PB [edi+6] ; 37C
  1240. add ebp,eax ; 20G
  1241. mov eax,MulByNeg8[ebx*4] ; 37D
  1242. mov bl,PB [esi+4] ; 35A
  1243. mov [edx].N8T37,eax ; 37E
  1244. add ebp,ebx ; 35B
  1245. mov eax,PD [eax+ecx*8] ; 37F
  1246. mov cl,PB [edi+4] ; 35C
  1247. add ebp,eax ; 37G
  1248. mov eax,MulByNeg8[ebx*4] ; 35D
  1249. mov bl,PB [esi+2] ; 33A
  1250. mov [edx].N8T35,eax ; 35E
  1251. add ebp,ebx ; 33B
  1252. mov eax,PD [eax+ecx*8] ; 35F
  1253. mov cl,PB [edi+2] ; 33C
  1254. add ebp,eax ; 35G
  1255. mov eax,MulByNeg8[ebx*4] ; 33D
  1256. mov bl,PB [esi] ; 31A
  1257. mov [edx].N8T33,eax ; 33E
  1258. add ebp,ebx ; 31B
  1259. mov eax,PD [eax+ecx*8] ; 33F
  1260. add esi,PITCH-1
  1261. mov cl,PB [edi] ; 31C
  1262. add edi,PITCH-1
  1263. add ebp,eax ; 33G
  1264. mov eax,MulByNeg8[ebx*4] ; 31D
  1265. mov bl,PB [esi+6] ; 46A
  1266. mov [edx].N8T31,eax ; 31E
  1267. add ebp,ebx ; 46B
  1268. mov eax,PD [eax+ecx*8] ; 31F
  1269. mov cl,PB [edi+6] ; 46C
  1270. add ebp,eax ; 31G
  1271. mov eax,MulByNeg8[ebx*4] ; 46D
  1272. mov bl,PB [esi+4] ; 44A
  1273. mov [edx].N8T46,eax ; 46E
  1274. add ebp,ebx ; 44B
  1275. mov eax,PD [eax+ecx*8] ; 46F
  1276. mov cl,PB [edi+4] ; 44C
  1277. add ebp,eax ; 46G
  1278. mov eax,MulByNeg8[ebx*4] ; 44D
  1279. mov bl,PB [esi+2] ; 42A
  1280. mov [edx].N8T44,eax ; 44E
  1281. add ebp,ebx ; 42B
  1282. mov eax,PD [eax+ecx*8] ; 44F
  1283. mov cl,PB [edi+2] ; 42C
  1284. add ebp,eax ; 44G
  1285. mov eax,MulByNeg8[ebx*4] ; 42D
  1286. mov bl,PB [esi] ; 40A
  1287. mov [edx].N8T42,eax ; 42E
  1288. add ebp,ebx ; 40B
  1289. mov eax,PD [eax+ecx*8] ; 42F
  1290. add esi,PITCH+1
  1291. mov cl,PB [edi] ; 40C
  1292. add edi,PITCH+1
  1293. add ebp,eax ; 42G
  1294. mov eax,MulByNeg8[ebx*4] ; 40D
  1295. mov bl,PB [esi+6] ; 57A
  1296. mov [edx].N8T40,eax ; 40E
  1297. add ebp,ebx ; 57B
  1298. mov eax,PD [eax+ecx*8] ; 40F
  1299. mov cl,PB [edi+6] ; 57C
  1300. add ebp,eax ; 40G
  1301. mov eax,MulByNeg8[ebx*4] ; 57D
  1302. mov bl,PB [esi+4] ; 55A
  1303. mov [edx].N8T57,eax ; 57E
  1304. add ebp,ebx ; 55B
  1305. mov eax,PD [eax+ecx*8] ; 57F
  1306. mov cl,PB [edi+4] ; 55C
  1307. add ebp,eax ; 57G
  1308. mov eax,MulByNeg8[ebx*4] ; 55D
  1309. mov bl,PB [esi+2] ; 53A
  1310. mov [edx].N8T55,eax ; 55E
  1311. add ebp,ebx ; 53B
  1312. mov eax,PD [eax+ecx*8] ; 55F
  1313. mov cl,PB [edi+2] ; 53C
  1314. add ebp,eax ; 55G
  1315. mov eax,MulByNeg8[ebx*4] ; 53D
  1316. mov bl,PB [esi] ; 51A
  1317. mov [edx].N8T53,eax ; 53E
  1318. add ebp,ebx ; 51B
  1319. mov eax,PD [eax+ecx*8] ; 53F
  1320. add esi,PITCH-1
  1321. mov cl,PB [edi] ; 51C
  1322. add edi,PITCH-1
  1323. add ebp,eax ; 53G
  1324. mov eax,MulByNeg8[ebx*4] ; 51D
  1325. mov bl,PB [esi+6] ; 66A
  1326. mov [edx].N8T51,eax ; 51E
  1327. add ebp,ebx ; 66B
  1328. mov eax,PD [eax+ecx*8] ; 51F
  1329. mov cl,PB [edi+6] ; 66C
  1330. add ebp,eax ; 51G
  1331. mov eax,MulByNeg8[ebx*4] ; 66D
  1332. mov bl,PB [esi+4] ; 64A
  1333. mov [edx].N8T66,eax ; 66E
  1334. add ebp,ebx ; 64B
  1335. mov eax,PD [eax+ecx*8] ; 66F
  1336. mov cl,PB [edi+4] ; 64C
  1337. add ebp,eax ; 66G
  1338. mov eax,MulByNeg8[ebx*4] ; 64D
  1339. mov bl,PB [esi+2] ; 62A
  1340. mov [edx].N8T64,eax ; 64E
  1341. add ebp,ebx ; 62B
  1342. mov eax,PD [eax+ecx*8] ; 64F
  1343. mov cl,PB [edi+2] ; 62C
  1344. add ebp,eax ; 64G
  1345. mov eax,MulByNeg8[ebx*4] ; 62D
  1346. mov bl,PB [esi] ; 60A
  1347. mov [edx].N8T62,eax ; 62E
  1348. add ebp,ebx ; 60B
  1349. mov eax,PD [eax+ecx*8] ; 62F
  1350. add esi,PITCH+1
  1351. mov cl,PB [edi] ; 60C
  1352. add edi,PITCH+1
  1353. add ebp,eax ; 62G
  1354. mov eax,MulByNeg8[ebx*4] ; 60D
  1355. mov bl,PB [esi+6] ; 77A
  1356. mov [edx].N8T60,eax ; 60E
  1357. add ebp,ebx ; 77B
  1358. mov eax,PD [eax+ecx*8] ; 60F
  1359. mov cl,PB [edi+6] ; 77C
  1360. add ebp,eax ; 60G
  1361. mov eax,MulByNeg8[ebx*4] ; 77D
  1362. mov bl,PB [esi+4] ; 75A
  1363. mov [edx].N8T77,eax ; 77E
  1364. add ebp,ebx ; 75B
  1365. mov eax,PD [eax+ecx*8] ; 77F
  1366. mov cl,PB [edi+4] ; 75C
  1367. add ebp,eax ; 77G
  1368. mov eax,MulByNeg8[ebx*4] ; 75D
  1369. mov bl,PB [esi+2] ; 73A
  1370. mov [edx].N8T75,eax ; 75E
  1371. add ebp,ebx ; 73B
  1372. mov eax,PD [eax+ecx*8] ; 75F
  1373. mov cl,PB [edi+2] ; 73C
  1374. add ebp,eax ; 75G
  1375. mov eax,MulByNeg8[ebx*4] ; 73D
  1376. mov bl,PB [esi] ; 71A
  1377. mov [edx].N8T73,eax ; 73E
  1378. add ebp,ebx ; 71B
  1379. mov eax,PD [eax+ecx*8] ; 73F
  1380. mov cl,PB [edi] ; 71C
  1381. add esi,PITCH-1-PITCH*8+8
  1382. add edi,PITCH-1-PITCH*8+8
  1383. add ebp,eax ; 73G
  1384. mov eax,MulByNeg8[ebx*4] ; 71D
  1385. mov ebx,ebp
  1386. mov [edx].N8T71,eax ; 71E
  1387. and ebx,000001FFFH ; Extract sum of target pels.
  1388. add edx,BlockLen ; Move to next output block
  1389. mov eax,PD [eax+ecx*8] ; 71F
  1390. mov [edx-BlockLen].AccumTargetPels,ebx ; Store acc of target pels for block.
  1391. add eax,ebp ; 71G
  1392. and ebp,000006000H ; Extract loop control
  1393. shr eax,16 ; Extract SWD; CF == 1 every second iter.
  1394. mov ebx,ecx
  1395. mov [edx-BlockLen].CentralInterSWD,eax ; Store SWD for 0-motion vector.
  1396. jnc PrepMatchPointsNextBlock
  1397. add esi,PITCH*8-16 ; Advance to block 3, or off end.
  1398. add edi,PITCH*8-16 ; Advance to block 3, or off end.
  1399. xor ebp,000002000H
  1400. jne PrepMatchPointsNextBlock ; Jump if advancing to block 3.
  1401. mov ebx,CurrSWDState ; Fetch First State Number for engine.
  1402. mov edi,Block1.CentralInterSWD
  1403. test bl,bl ; Test for INTRA-BY-DECREE.
  1404. je IntraByDecree
  1405. add eax,Block2.CentralInterSWD
  1406. add edi,Block3.CentralInterSWD
  1407. add eax,edi
  1408. mov edx,ZeroVectorThreshold
  1409. cmp eax,edx ; Compare 0-MV against ZeroVectorThresh
  1410. jle BelowZeroThresh ; Jump if 0-MV is good enough.
  1411. mov cl,PB SWDState[ebx*8+3] ; cl == Index of inc to apply to central
  1412. ; ; point to get to ref1.
  1413. mov bl,PB SWDState[ebx*8+5] ; bl == Same as cl, but for ref2.
  1414. mov edx,TargToRef
  1415. mov MB0MVInterSWD,eax ; Stash SWD for zero motion vector.
  1416. mov edi,PD OffsetToRef[ebx] ; Get inc to apply to ctr to get to ref2.
  1417. mov ebp,PD OffsetToRef[ecx] ; Get inc to apply to ctr to get to ref1.
  1418. lea esi,[esi+edx-PITCH*16] ; Calculate address of 0-MV ref block.
  1419. ;
  1420. mov MBAddrCentralPoint,esi ; Set central point to 0-MV.
  1421. mov MBCentralInterSWD,eax
  1422. mov eax,Block1.CentralInterSWD ; Stash Zero MV SWD, in case we decide
  1423. mov edx,Block2.CentralInterSWD ; the best non-zero MV isn't enough
  1424. mov Block1.ZeroMVInterSWD,eax ; better than the zero MV.
  1425. mov Block2.ZeroMVInterSWD,edx
  1426. mov eax,Block3.CentralInterSWD
  1427. mov edx,Block4.CentralInterSWD
  1428. mov Block3.ZeroMVInterSWD,eax
  1429. mov Block4.ZeroMVInterSWD,edx
  1430. ; Activity Details for this section of code (refer to flow diagram above):
  1431. ;
  1432. ; 5) The SWD for two different reference macroblocks is calculated; ref1
  1433. ; into the high order 16 bits of ebp, and ref2 into the low 16 bits.
  1434. ; This is performed for each iteration of the state engine. A normal,
  1435. ; internal macroblock will perform 6 iterations, searching +/- 4
  1436. ; horizontally, then +/- 4 vertically, then +/- 2 horizontally, then
  1437. ; +/- 2 vertically, then +/- 1 horizontally, then +/- 1 vertically.
  1438. ;
  1439. ; Register usage for this section:
  1440. ;
  1441. ; Input:
  1442. ;
  1443. ; esi -- Addr of 0-motion macroblock in ref frame.
  1444. ; ebp -- Increment to apply to get to first ref1 macroblock.
  1445. ; edi -- Increment to apply to get to first ref2 macroblock.
  1446. ; ebx, ecx -- High order 24 bits are zero.
  1447. ;
  1448. ; Output:
  1449. ;
  1450. ; ebp -- SWD for the best-fit reference macroblock.
  1451. ; ebx -- Index of increment to apply to get to best-fit reference MB.
  1452. ; MBAddrCentralPoint -- the best-fit of the previous iteration; it is the
  1453. ; value to which OffsetToRef[ebx] must be added.
  1454. ;
  1455. ;
  1456. ; Expected performance for SWDLoop code:
  1457. ;
  1458. ; Execution frequency: Six times per block for which motion analysis is done
  1459. ; beyond the 0-motion vector.
  1460. ;
  1461. ; Pentium (tm) microprocessor times per six iterations:
  1462. ; 180 clocks for instruction execution setup to DoSWDLoop
  1463. ; 2520 clocks for DoSWDLoop procedure, instruction execution.
  1464. ; 192 clocks for bank conflicts in DoSWDLoop
  1465. ; 30 clocks generously estimated for an average of 6 cache line fills for
  1466. ; the reference area.
  1467. ; ----
  1468. ; 2922 clocks total time for this section.
  1469. MBFullPelMotionSearchLoop:
  1470. lea edi,[esi+edi+PITCH*8+8]
  1471. lea esi,[esi+ebp+PITCH*8+8]
  1472. mov Block4.Ref1Addr,esi
  1473. mov Block4.Ref2Addr,edi
  1474. sub esi,8
  1475. sub edi,8
  1476. mov Block3.Ref1Addr,esi
  1477. mov Block3.Ref2Addr,edi
  1478. sub esi,PITCH*8-8
  1479. sub edi,PITCH*8-8
  1480. mov Block2.Ref1Addr,esi
  1481. mov Block2.Ref2Addr,edi
  1482. sub esi,8
  1483. sub edi,8
  1484. mov Block1.Ref1Addr,esi
  1485. mov Block1.Ref2Addr,edi
  1486. ; esi -- Points to ref1
  1487. ; edi -- Points to ref2
  1488. ; ecx -- Upper 24 bits zero
  1489. ; ebx -- Upper 24 bits zero
  1490. call DoSWDLoop
  1491. ; ebp -- Ref1 SWD for block 4
  1492. ; edx -- Ref2 SWD for block 4
  1493. ; ecx -- Upper 24 bits zero
  1494. ; ebx -- Upper 24 bits zero
  1495. mov esi,MBCentralInterSWD ; Get SWD for central point of these 3 refs
  1496. xor eax,eax
  1497. add ebp,Block1.Ref1InterSWD
  1498. add edx,Block1.Ref2InterSWD
  1499. add ebp,Block2.Ref1InterSWD
  1500. add edx,Block2.Ref2InterSWD
  1501. add ebp,Block3.Ref1InterSWD
  1502. add edx,Block3.Ref2InterSWD
  1503. cmp ebp,edx ; Carry flag == 1 iff ref1 SWD < ref2 SWD.
  1504. mov edi,CurrSWDState ; Restore current state number.
  1505. adc eax,eax ; eax == 1 iff ref1 SWD < ref2 SWD.
  1506. cmp ebp,esi ; Carry flag == 1 iff ref1 SWD < central SWD.
  1507. adc eax,eax ;
  1508. cmp edx,esi ; Carry flag == 1 iff ref2 SWD < central SWD.
  1509. adc eax,eax ; 0 --> Pick central point.
  1510. ; ; 1 --> Pick ref2.
  1511. ; ; 2 --> Not possible.
  1512. ; ; 3 --> Pick ref2.
  1513. ; ; 4 --> Pick central point.
  1514. ; ; 5 --> Not possible.
  1515. ; ; 6 --> Pick ref1.
  1516. ; ; 7 --> Pick ref1.
  1517. mov MBRef2InterSWD,edx
  1518. mov MBRef1InterSWD,ebp
  1519. xor edx,edx
  1520. mov dl,PB PickPoint[eax] ; dl == 0: central pt; 2: ref1; 4: ref2
  1521. mov esi,MBAddrCentralPoint ; Reload address of central ref block.
  1522. ;
  1523. ;
  1524. mov ebp,Block1.CentralInterSWD[edx*2] ; Get SWD for each block, picked pt.
  1525. mov al,PB SWDState[edx+edi*8+1] ; al == Index of inc to apply to old central
  1526. ; ; point to get new central point.
  1527. mov Block1.CentralInterSWD,ebp ; Stash SWD for new central point.
  1528. mov ebp,Block2.CentralInterSWD[edx*2]
  1529. mov Block2.CentralInterSWD,ebp
  1530. mov ebp,Block3.CentralInterSWD[edx*2]
  1531. mov Block3.CentralInterSWD,ebp
  1532. mov ebp,Block4.CentralInterSWD[edx*2]
  1533. mov Block4.CentralInterSWD,ebp
  1534. mov ebp,MBCentralInterSWD[edx*2]; Get the SWD for the point we picked.
  1535. mov dl,PB SWDState[edx+edi*8] ; dl == New state number.
  1536. mov MBCentralInterSWD,ebp ; Stash SWD for new central point.
  1537. mov edi,PD OffsetToRef[eax] ; Get inc to apply to get to new central pt.
  1538. mov CurrSWDState,edx ; Stash current state number.
  1539. mov bl,PB SWDState[edx*8+3] ; bl == Index of inc to apply to central
  1540. ; ; point to get to next ref1.
  1541. mov cl,PB SWDState[edx*8+5] ; cl == Same as bl, but for ref2.
  1542. add esi,edi ; Move to new central point.
  1543. test dl,dl
  1544. mov ebp,PD OffsetToRef[ebx] ; Get inc to apply to ctr to get to ref1.
  1545. mov edi,PD OffsetToRef[ecx] ; Get inc to apply to ctr to get to ref2.
  1546. mov MBAddrCentralPoint,esi ; Stash address of new central ref block.
  1547. jne MBFullPelMotionSearchLoop ; Jump if not done searching.
  1548. ;Done searching for integer motion vector for full macroblock
  1549. IF PITCH-384
  1550. *** Error: The magic leaks out of the following code if PITCH isn't 384.
  1551. ENDIF
  1552. mov ecx,TargToRef ; To Linearize MV for winning ref blk.
  1553. mov eax,esi ; Copy of ref macroblock addr.
  1554. sub eax,ecx ; To Linearize MV for winning ref blk.
  1555. mov ecx,TargetMBAddr
  1556. sub eax,ecx
  1557. mov edx,MBlockActionStream ; Fetch list ptr.
  1558. mov ebx,eax
  1559. mov ebp,DoHalfPelEstimation ; Are we doing half pel motion estimation?
  1560. shl eax,25 ; Extract horz motion component.
  1561. mov [edx].BlkY1.PastRef,esi ; Save address of reference MB selected.
  1562. sar ebx,8 ; Hi 24 bits of linearized MV lookup vert MV.
  1563. mov ecx,MBCentralInterSWD
  1564. sar eax,24 ; Finish extract horz motion component.
  1565. test ebp,ebp
  1566. mov bl,PB UnlinearizedVertMV[ebx] ; Look up proper vert motion vector.
  1567. mov [edx].BlkY1.PHMV,al ; Save winning horz motion vector.
  1568. mov [edx].BlkY1.PVMV,bl ; Save winning vert motion vector.
  1569. IFDEF H261
  1570. ELSE
  1571. je SkipHalfPelSearch_1MV
  1572. ;Search for half pel motion vector for full macroblock.
  1573. mov Block1.AddrCentralPoint,esi
  1574. lea ebp,[esi+8]
  1575. mov Block2.AddrCentralPoint,ebp
  1576. add ebp,PITCH*8-8
  1577. mov Block3.AddrCentralPoint,ebp
  1578. xor ecx,ecx
  1579. mov cl,[edx].FirstMEState
  1580. add ebp,8
  1581. mov edi,esi
  1582. mov Block4.AddrCentralPoint,ebp
  1583. mov ebp,InitHalfPelSearchHorz[ecx*4-4]
  1584. ; ebp -- Initialized to 0, except when can't search off left or right edge.
  1585. ; edi -- Ref addr for block 1. Ref1 is .5 pel to left. Ref2 is .5 to right.
  1586. call DoSWDHalfPelHorzLoop
  1587. ; ebp, ebx -- Zero
  1588. ; ecx -- Ref1 SWD for block 4
  1589. ; edx -- Ref2 SWD for block 4
  1590. mov esi,MBlockActionStream
  1591. xor eax,eax ; Keep pairing happy
  1592. add ecx,Block1.Ref1InterSWD
  1593. add edx,Block1.Ref2InterSWD
  1594. add ecx,Block2.Ref1InterSWD
  1595. add edx,Block2.Ref2InterSWD
  1596. add ecx,Block3.Ref1InterSWD
  1597. add edx,Block3.Ref2InterSWD
  1598. mov bl,[esi].FirstMEState
  1599. mov edi,Block1.AddrCentralPoint
  1600. cmp ecx,edx
  1601. jl MBHorz_Ref1LTRef2
  1602. mov ebp,MBCentralInterSWD
  1603. mov esi,MBlockActionStream
  1604. sub ebp,edx
  1605. jle MBHorz_CenterBest
  1606. mov al,[esi].BlkY1.PHMV ; Half pel to the right is best.
  1607. mov ecx,Block1.Ref2InterSWD
  1608. mov Block1.CentralInterSWD_BLS,ecx
  1609. mov ecx,Block3.Ref2InterSWD
  1610. mov Block3.CentralInterSWD_BLS,ecx
  1611. mov ecx,Block2.Ref2InterSWD
  1612. mov Block2.CentralInterSWD_BLS,ecx
  1613. mov ecx,Block4.Ref2InterSWD
  1614. mov Block4.CentralInterSWD_BLS,ecx
  1615. inc al
  1616. mov [esi].BlkY1.PHMV,al
  1617. jmp MBHorz_Done
  1618. MBHorz_CenterBest:
  1619. mov ecx,Block1.CentralInterSWD
  1620. xor ebp,ebp
  1621. mov Block1.CentralInterSWD_BLS,ecx
  1622. mov ecx,Block2.CentralInterSWD
  1623. mov Block2.CentralInterSWD_BLS,ecx
  1624. mov ecx,Block3.CentralInterSWD
  1625. mov Block3.CentralInterSWD_BLS,ecx
  1626. mov ecx,Block4.CentralInterSWD
  1627. mov Block4.CentralInterSWD_BLS,ecx
  1628. jmp MBHorz_Done
  1629. MBHorz_Ref1LTRef2:
  1630. mov ebp,MBCentralInterSWD
  1631. mov esi,MBlockActionStream
  1632. sub ebp,ecx
  1633. jle MBHorz_CenterBest
  1634. mov al,[esi].BlkY1.PHMV ; Half pel to the left is best.
  1635. mov edx,[esi].BlkY1.PastRef
  1636. dec al
  1637. mov ecx,Block1.Ref1InterSWD
  1638. mov Block1.CentralInterSWD_BLS,ecx
  1639. mov ecx,Block3.Ref1InterSWD
  1640. mov Block3.CentralInterSWD_BLS,ecx
  1641. mov ecx,Block2.Ref1InterSWD
  1642. mov Block2.CentralInterSWD_BLS,ecx
  1643. mov ecx,Block4.Ref1InterSWD
  1644. mov Block4.CentralInterSWD_BLS,ecx
  1645. dec edx
  1646. mov [esi].BlkY1.PHMV,al
  1647. mov [esi].BlkY1.PastRef,edx
  1648. MBHorz_Done:
  1649. mov HalfPelHorzSavings,ebp
  1650. mov ebp,InitHalfPelSearchVert[ebx*4-4]
  1651. ; ebp -- Initialized to 0, except when can't search off left or right edge.
  1652. ; edi -- Ref addr for block 1. Ref1 is .5 pel above. Ref2 is .5 below.
  1653. call DoSWDHalfPelVertLoop
  1654. ; ebp, ebx -- Zero
  1655. ; ecx -- Ref1 SWD for block 4
  1656. ; edx -- Ref2 SWD for block 4
  1657. add ecx,Block1.Ref1InterSWD
  1658. add edx,Block1.Ref2InterSWD
  1659. add ecx,Block2.Ref1InterSWD
  1660. add edx,Block2.Ref2InterSWD
  1661. add ecx,Block3.Ref1InterSWD
  1662. add edx,Block3.Ref2InterSWD
  1663. cmp ecx,edx
  1664. jl MBVert_Ref1LTRef2
  1665. mov ebp,MBCentralInterSWD
  1666. mov esi,MBlockActionStream
  1667. sub ebp,edx
  1668. jle MBVert_CenterBest
  1669. mov ecx,Block1.CentralInterSWD
  1670. mov edx,Block1.Ref2InterSWD
  1671. sub ecx,edx
  1672. mov edx,Block1.CentralInterSWD_BLS
  1673. sub edx,ecx
  1674. mov al,[esi].BlkY1.PVMV ; Half pel below is best.
  1675. mov Block1.CentralInterSWD,edx
  1676. inc al
  1677. mov ecx,Block3.CentralInterSWD
  1678. mov edx,Block3.Ref2InterSWD
  1679. sub ecx,edx
  1680. mov edx,Block3.CentralInterSWD_BLS
  1681. sub edx,ecx
  1682. mov ecx,Block2.CentralInterSWD
  1683. mov Block3.CentralInterSWD,edx
  1684. mov edx,Block2.Ref2InterSWD
  1685. sub ecx,edx
  1686. mov edx,Block2.CentralInterSWD_BLS
  1687. sub edx,ecx
  1688. mov ecx,Block4.CentralInterSWD
  1689. mov Block2.CentralInterSWD,edx
  1690. mov edx,Block4.Ref2InterSWD
  1691. sub ecx,edx
  1692. mov edx,Block4.CentralInterSWD_BLS
  1693. sub edx,ecx
  1694. mov [esi].BlkY1.PVMV,al
  1695. mov Block4.CentralInterSWD,edx
  1696. jmp MBVert_Done
  1697. MBVert_CenterBest:
  1698. mov ecx,Block1.CentralInterSWD_BLS
  1699. xor ebp,ebp
  1700. mov Block1.CentralInterSWD,ecx
  1701. mov ecx,Block2.CentralInterSWD_BLS
  1702. mov Block2.CentralInterSWD,ecx
  1703. mov ecx,Block3.CentralInterSWD_BLS
  1704. mov Block3.CentralInterSWD,ecx
  1705. mov ecx,Block4.CentralInterSWD_BLS
  1706. mov Block4.CentralInterSWD,ecx
  1707. jmp MBVert_Done
  1708. MBVert_Ref1LTRef2:
  1709. mov ebp,MBCentralInterSWD
  1710. mov esi,MBlockActionStream
  1711. sub ebp,ecx
  1712. jle MBVert_CenterBest
  1713. mov ecx,Block1.CentralInterSWD
  1714. mov edx,Block1.Ref1InterSWD
  1715. sub ecx,edx
  1716. mov edx,Block1.CentralInterSWD_BLS
  1717. sub edx,ecx
  1718. mov al,[esi].BlkY1.PVMV ; Half pel above is best.
  1719. mov Block1.CentralInterSWD,edx
  1720. dec al
  1721. mov ecx,Block3.CentralInterSWD
  1722. mov edx,Block3.Ref1InterSWD
  1723. sub ecx,edx
  1724. mov edx,Block3.CentralInterSWD_BLS
  1725. sub edx,ecx
  1726. mov ecx,Block2.CentralInterSWD
  1727. mov Block3.CentralInterSWD,edx
  1728. mov edx,Block2.Ref1InterSWD
  1729. sub ecx,edx
  1730. mov edx,Block2.CentralInterSWD_BLS
  1731. sub edx,ecx
  1732. mov ecx,Block4.CentralInterSWD
  1733. mov Block2.CentralInterSWD,edx
  1734. mov edx,Block4.Ref1InterSWD
  1735. sub ecx,edx
  1736. mov edx,Block4.CentralInterSWD_BLS
  1737. sub edx,ecx
  1738. mov ecx,[esi].BlkY1.PastRef
  1739. mov Block4.CentralInterSWD,edx
  1740. sub ecx,PITCH
  1741. mov [esi].BlkY1.PVMV,al
  1742. mov [esi].BlkY1.PastRef,ecx
  1743. MBVert_Done:
  1744. mov ecx,HalfPelHorzSavings
  1745. mov edx,esi
  1746. add ebp,ecx ; Savings for horz and vert half pel motion.
  1747. mov ecx,MBCentralInterSWD ; Reload SWD for new central point.
  1748. sub ecx,ebp ; Approx SWD for prescribed half pel motion.
  1749. mov esi,[edx].BlkY1.PastRef ; Reload address of reference MB selected.
  1750. mov MBCentralInterSWD,ecx
  1751. SkipHalfPelSearch_1MV:
  1752. ENDIF ; H263
  1753. mov ebp,[edx].BlkY1.MVs ; Load Motion Vectors
  1754. add esi,8
  1755. mov [edx].BlkY2.PastRef,esi
  1756. mov [edx].BlkY2.MVs,ebp
  1757. lea edi,[esi+PITCH*8]
  1758. add esi,PITCH*8-8
  1759. mov [edx].BlkY3.PastRef,esi
  1760. mov [edx].BlkY3.MVs,ebp
  1761. mov [edx].BlkY4.PastRef,edi
  1762. mov [edx].BlkY4.MVs,ebp
  1763. IFDEF H261
  1764. ELSE ; H263
  1765. mov MBMotionVectors,ebp ; Stash macroblock level motion vectors.
  1766. mov ebp,640 ; ??? BlockMVDifferential
  1767. cmp ecx,ebp
  1768. jl NoBlockMotionVectors
  1769. mov ecx,DoBlockLevelVectors
  1770. test ecx,ecx ; Are we doing block level motion vectors?
  1771. je NoBlockMotionVectors
  1772. ; Activity Details for this section of code (refer to flow diagram above):
  1773. ;
  1774. ; The following search is done similarly to the searches done above, except
  1775. ; these are block searches, instead of macroblock searches.
  1776. ;
  1777. ; Expected performance:
  1778. ;
  1779. ; Execution frequency: Six times per block for which motion analysis is done
  1780. ; beyond the 0-motion vector.
  1781. ;
  1782. ; Pentium (tm) microprocessor times per six iterations:
  1783. ; 180 clocks for instruction execution setup to DoSWDLoop
  1784. ; 2520 clocks for DoSWDLoop procedure, instruction execution.
  1785. ; 192 clocks for bank conflicts in DoSWDLoop
  1786. ; 30 clocks generously estimated for an average of 6 cache line fills for
  1787. ; the reference area.
  1788. ; ----
  1789. ; 2922 clocks total time for this section.
  1790. ;
  1791. ; Set up for the "BlkFullPelSWDLoop_4blks" loop to follow.
  1792. ; - Store the SWD values for blocks 4, 3, 2, 1.
  1793. ; - Compute and store the address of the central reference
  1794. ; point for blocks 1, 2, 3, 4.
  1795. ; - Compute and store the first address for ref 1 (minus 4
  1796. ; pels horizontally) and ref 2 (plus 4 pels horizontally)
  1797. ; for blocks 4, 3, 2, 1 (in that order).
  1798. ; - Initialize MotionOffsetsCursor
  1799. ; - On exit:
  1800. ; esi = ref 1 address for block 1
  1801. ; edi = ref 2 address for block 1
  1802. ;
  1803. mov esi,Block4.CentralInterSWD
  1804. mov edi,Block3.CentralInterSWD
  1805. mov Block4.CentralInterSWD_BLS,esi
  1806. mov Block3.CentralInterSWD_BLS,edi
  1807. mov esi,Block2.CentralInterSWD
  1808. mov edi,Block1.CentralInterSWD
  1809. mov Block2.CentralInterSWD_BLS,esi
  1810. mov eax,MBAddrCentralPoint ; Reload addr of central, integer pel ref MB.
  1811. mov Block1.CentralInterSWD_BLS,edi
  1812. mov Block1.AddrCentralPoint,eax
  1813. lea edi,[eax+PITCH*8+8+1]
  1814. lea esi,[eax+PITCH*8+8-1]
  1815. mov Block4.Ref1Addr,esi
  1816. mov Block4.Ref2Addr,edi
  1817. sub esi,8
  1818. add eax,8
  1819. mov Block2.AddrCentralPoint,eax
  1820. add eax,PITCH*8-8
  1821. mov Block3.AddrCentralPoint,eax
  1822. add eax,8
  1823. mov Block4.AddrCentralPoint,eax
  1824. sub edi,8
  1825. mov Block3.Ref1Addr,esi
  1826. mov Block3.Ref2Addr,edi
  1827. sub esi,PITCH*8-8
  1828. sub edi,PITCH*8-8
  1829. mov Block2.Ref1Addr,esi
  1830. mov Block2.Ref2Addr,edi
  1831. sub esi,8
  1832. mov eax,OFFSET MotionOffsets
  1833. mov MotionOffsetsCursor,eax
  1834. sub edi,8
  1835. mov Block1.Ref1Addr,esi
  1836. mov Block1.Ref2Addr,edi
  1837. ;
  1838. ; This loop will execute 6 times:
  1839. ; +- 4 pels horizontally
  1840. ; +- 4 pels vertically
  1841. ; +- 2 pels horizontally
  1842. ; +- 2 pels vertically
  1843. ; +- 1 pel horizontally
  1844. ; +- 1 pel vertically
  1845. ; It terminates when ref1 = ref2. This simple termination
  1846. ; condition is what forces unrestricted motion vectors (UMV)
  1847. ; to be ON when advanced prediction (4MV) is ON. Otherwise
  1848. ; we would need a state engine as above to distinguish edge
  1849. ; pels.
  1850. ;
  1851. BlkFullPelSWDLoop_4blks:
  1852. ; esi -- Points to ref1
  1853. ; edi -- Points to ref2
  1854. ; ecx -- Upper 24 bits zero
  1855. ; ebx -- Upper 24 bits zero
  1856. call DoSWDLoop
  1857. ; ebp -- Ref1 SWD for block 4
  1858. ; edx -- Ref2 SWD for block 4
  1859. ; ecx -- Upper 24 bits zero
  1860. ; ebx -- Upper 24 bits zero
  1861. mov eax,MotionOffsetsCursor
  1862. BlkFullPelSWDLoop_1blk:
  1863. xor esi,esi
  1864. cmp ebp,edx ; CF == 1 iff ref1 SWD < ref2 SWD.
  1865. mov edi,BlockNM1.CentralInterSWD_BLS; Get SWD for central pt of these 3 refs
  1866. adc esi,esi ; esi == 1 iff ref1 SWD < ref2 SWD.
  1867. cmp ebp,edi ; CF == 1 iff ref1 SWD < central SWD.
  1868. mov ebp,BlockNM2.Ref1InterSWD ; Fetch next block's Ref1 SWD.
  1869. adc esi,esi
  1870. cmp edx,edi ; CF == 1 iff ref2 SWD < central SWD.
  1871. adc esi,esi ; 0 --> Pick central point.
  1872. ; ; 1 --> Pick ref2.
  1873. ; ; 2 --> Not possible.
  1874. ; ; 3 --> Pick ref2.
  1875. ; ; 4 --> Pick central point.
  1876. ; ; 5 --> Not possible.
  1877. ; ; 6 --> Pick ref1.
  1878. ; ; 7 --> Pick ref1.
  1879. mov edx,BlockNM2.Ref2InterSWD ; Fetch next block's Ref2 SWD.
  1880. sub esp,BlockLen ; Move ahead to next block.
  1881. mov edi,[eax] ; Next ref2 motion vector offset.
  1882. mov cl,PickPoint_BLS[esi] ; cl == 6: central pt; 2: ref1; 4: ref2
  1883. mov ebx,esp ; For testing completion.
  1884. ;
  1885. ;
  1886. mov esi,BlockN.AddrCentralPoint[ecx*2-12] ; Get the addr for pt we picked.
  1887. mov ecx,BlockN.CentralInterSWD[ecx*2] ; Get the SWD for point we picked.
  1888. mov BlockN.AddrCentralPoint,esi ; Stash addr for new central point.
  1889. sub esi,edi ; Compute next ref1 addr.
  1890. mov BlockN.Ref1Addr,esi ; Stash next ref1 addr.
  1891. mov BlockN.CentralInterSWD_BLS,ecx ; Stash the SWD for central point.
  1892. lea edi,[esi+edi*2] ; Compute next ref2 addr.
  1893. xor ecx,ecx
  1894. mov BlockN.Ref2Addr,edi ; Stash next ref2 addr.
  1895. and ebx,00000001FH ; Done when esp at 32-byte bound.
  1896. jne BlkFullPelSWDLoop_1blk
  1897. add esp,BlockLen*4
  1898. add eax,4 ; Advance MotionOffsets pointer.
  1899. mov MotionOffsetsCursor,eax
  1900. cmp esi,edi
  1901. jne BlkFullPelSWDLoop_4blks
  1902. IF PITCH-384
  1903. *** Error: The magic leaks out of the following code if PITCH isn't 384.
  1904. ENDIF
  1905. ;
  1906. ; The following code has been modified to correctly decode the motion vectors
  1907. ; The previous code was simply subtracting the target frame base address
  1908. ; from the chosen (central) reference block address.
  1909. ; What is now done is the begining reference macroblock address computed
  1910. ; in ebp, then subtracted from the chosen (central) reference block address.
  1911. ; Then, for blocks 2, 3, and 4, the distance from block 1 to that block
  1912. ; is subtracted. Care was taken to preserve the original pairing.
  1913. ;
  1914. mov esi,Block1.AddrCentralPoint ; B1a Reload address of central ref block.
  1915. mov ebp,TargetMBAddr ; **** CHANGE **** addr. of target MB
  1916. mov edi,Block2.AddrCentralPoint ; B2a
  1917. add ebp,TargToRef ; **** CHANGE **** add Reference - Target
  1918. ; mov ebp,PreviousFrameBaseAddress **** CHANGE **** DELETED
  1919. mov Block1.Ref1Addr,esi ; B1b Stash addr central ref block.
  1920. sub esi,ebp ; B1c Addr of ref blk, but in target frame.
  1921. mov Block2.Ref1Addr,edi ; B2b
  1922. sub edi,ebp ; B2c
  1923. sub edi,8 ; **** CHANGE **** Correct for block 2
  1924. mov eax,esi ; B1e Copy linearized MV.
  1925. sar esi,8 ; B1f High 24 bits of lin MV lookup vert MV.
  1926. mov ebx,edi ; B2e
  1927. sar edi,8 ; B2f
  1928. add eax,eax ; B1g Sign extend HMV; *2 (# of half pels).
  1929. mov Block1.BlkHMV,al ; B1h Save winning horz motion vector.
  1930. add ebx,ebx ; B2g
  1931. mov Block2.BlkHMV,bl ; B2h
  1932. mov al,UnlinearizedVertMV[esi] ; B1i Look up proper vert motion vector.
  1933. mov Block1.BlkVMV,al ; B1j Save winning vert motion vector.
  1934. mov al,UnlinearizedVertMV[edi] ; B2i
  1935. mov esi,Block3.AddrCentralPoint ; B3a
  1936. mov edi,Block4.AddrCentralPoint ; B4a
  1937. mov Block3.Ref1Addr,esi ; B3b
  1938. mov Block4.Ref1Addr,edi ; B4b
  1939. mov Block2.BlkVMV,al ; B2j
  1940. sub esi,ebp ; B3c
  1941. sub esi,8*PITCH ; **** CHANGE **** Correct for block 3
  1942. sub edi,ebp ; B4c
  1943. sub edi,8*PITCH+8 ; **** CHANGE **** Correct for block 4
  1944. mov eax,esi ; B3e
  1945. sar esi,8 ; B3f
  1946. mov ebx,edi ; B4e
  1947. sar edi,8 ; B4f
  1948. add eax,eax ; B3g
  1949. mov Block3.BlkHMV,al ; B3h
  1950. add ebx,ebx ; B4g
  1951. mov Block4.BlkHMV,bl ; B4h
  1952. mov al,UnlinearizedVertMV[esi] ; B3i
  1953. mov Block3.BlkVMV,al ; B3j
  1954. mov al,UnlinearizedVertMV[edi] ; B4i
  1955. mov ebp,Block1.CentralInterSWD_BLS
  1956. mov ebx,Block2.CentralInterSWD_BLS
  1957. add ebp,Block3.CentralInterSWD_BLS
  1958. add ebx,Block4.CentralInterSWD_BLS
  1959. add ebx,ebp
  1960. mov Block4.BlkVMV,al ; B4j
  1961. mov ecx,DoHalfPelEstimation
  1962. mov MBCentralInterSWD_BLS,ebx
  1963. test ecx,ecx
  1964. je NoHalfPelBlockLevelMVs
  1965. HalfPelBlockLevelMotionSearch:
  1966. mov edi,Block1.AddrCentralPoint
  1967. xor ebp,ebp
  1968. ; ebp -- Initialized to 0, implying can search both left and right.
  1969. ; edi -- Ref addr for block 1. Ref1 is .5 pel to left. Ref2 is .5 to right.
  1970. call DoSWDHalfPelHorzLoop
  1971. ; ebp, ebx -- Zero
  1972. ; ecx -- Ref1 SWD for block 4
  1973. ; edx -- Ref2 SWD for block 4
  1974. NextBlkHorz:
  1975. mov ebx,BlockNM1.CentralInterSWD_BLS
  1976. cmp ecx,edx
  1977. mov BlockNM1.HalfPelSavings,ebp
  1978. jl BlkHorz_Ref1LTRef2
  1979. mov al,BlockNM1.BlkHMV
  1980. sub esp,BlockLen
  1981. sub ebx,edx
  1982. jle BlkHorz_CenterBest
  1983. inc al
  1984. mov BlockN.HalfPelSavings,ebx
  1985. mov BlockN.BlkHMV,al
  1986. jmp BlkHorz_Done
  1987. BlkHorz_Ref1LTRef2:
  1988. mov al,BlockNM1.BlkHMV
  1989. sub esp,BlockLen
  1990. sub ebx,ecx
  1991. jle BlkHorz_CenterBest
  1992. mov ecx,BlockN.Ref1Addr
  1993. dec al
  1994. mov BlockN.HalfPelSavings,ebx
  1995. dec ecx
  1996. mov BlockN.BlkHMV,al
  1997. mov BlockN.Ref1Addr,ecx
  1998. BlkHorz_CenterBest:
  1999. BlkHorz_Done:
  2000. mov ecx,BlockNM1.Ref1InterSWD
  2001. mov edx,BlockNM1.Ref2InterSWD
  2002. test esp,000000018H
  2003. jne NextBlkHorz
  2004. mov edi,BlockN.AddrCentralPoint
  2005. add esp,BlockLen*4
  2006. ; ebp -- Initialized to 0, implying search both up and down is okay.
  2007. ; edi -- Ref addr for block 1. Ref1 is .5 pel above. Ref2 is .5 below.
  2008. call DoSWDHalfPelVertLoop
  2009. ; ebp, ebx -- Zero
  2010. ; ecx -- Ref1 SWD for block 4
  2011. ; edx -- Ref2 SWD for block 4
  2012. NextBlkVert:
  2013. mov ebx,BlockNM1.CentralInterSWD_BLS
  2014. cmp ecx,edx
  2015. mov edi,BlockNM1.HalfPelSavings
  2016. jl BlkVert_Ref1LTRef2
  2017. mov al,BlockNM1.BlkVMV
  2018. sub esp,BlockLen
  2019. sub edx,ebx
  2020. jge BlkVert_CenterBest
  2021. inc al
  2022. sub edi,edx
  2023. mov BlockN.BlkVMV,al
  2024. jmp BlkVert_Done
  2025. BlkVert_Ref1LTRef2:
  2026. mov al,BlockNM1.BlkVMV
  2027. sub esp,BlockLen
  2028. sub ecx,ebx
  2029. jge BlkVert_CenterBest
  2030. sub edi,ecx
  2031. mov ecx,BlockN.Ref1Addr
  2032. dec al
  2033. sub ecx,PITCH
  2034. mov BlockN.BlkVMV,al
  2035. mov BlockN.Ref1Addr,ecx
  2036. BlkVert_CenterBest:
  2037. BlkVert_Done:
  2038. mov ecx,BlockNM1.Ref1InterSWD
  2039. sub ebx,edi
  2040. mov BlockN.CentralInterSWD_BLS,ebx
  2041. mov edx,BlockNM1.Ref2InterSWD
  2042. test esp,000000018H
  2043. lea ebp,[ebp+edi]
  2044. jne NextBlkVert
  2045. mov ebx,MBCentralInterSWD_BLS+BlockLen*4
  2046. add esp,BlockLen*4
  2047. sub ebx,ebp
  2048. xor eax,eax ; ??? Keep pairing happy
  2049. NoHalfPelBlockLevelMVs:
  2050. mov eax,MBCentralInterSWD
  2051. mov ecx,BlockMVDifferential
  2052. sub eax,ebx
  2053. mov edi,MB0MVInterSWD
  2054. cmp eax,ecx
  2055. jle BlockMVNotBigEnoughGain
  2056. sub edi,ebx
  2057. mov ecx,NonZeroMVDifferential
  2058. cmp edi,ecx
  2059. jle NonZeroMVNotBigEnoughGain
  2060. ; Block motion vectors are best.
  2061. mov MBCentralInterSWD,ebx ; Set MBlock's SWD to sum of 4 blocks.
  2062. mov edx,MBlockActionStream
  2063. mov eax,Block1.CentralInterSWD_BLS ; Set each block's SWD.
  2064. mov ebx,Block2.CentralInterSWD_BLS
  2065. mov Block1.CentralInterSWD,eax
  2066. mov Block2.CentralInterSWD,ebx
  2067. mov eax,Block3.CentralInterSWD_BLS
  2068. mov ebx,Block4.CentralInterSWD_BLS
  2069. mov Block3.CentralInterSWD,eax
  2070. mov Block4.CentralInterSWD,ebx
  2071. mov eax,Block1.BlkMVs ; Set each block's motion vector.
  2072. mov ebx,Block2.BlkMVs
  2073. mov [edx].BlkY1.MVs,eax
  2074. mov [edx].BlkY2.MVs,ebx
  2075. mov eax,Block3.BlkMVs
  2076. mov ebx,Block4.BlkMVs
  2077. mov [edx].BlkY3.MVs,eax
  2078. mov [edx].BlkY4.MVs,ebx
  2079. mov eax,Block1.Ref1Addr ; Set each block's reference blk addr.
  2080. mov ebx,Block2.Ref1Addr
  2081. mov [edx].BlkY1.PastRef,eax
  2082. mov [edx].BlkY2.PastRef,ebx
  2083. mov eax,Block3.Ref1Addr
  2084. mov ebx,Block4.Ref1Addr
  2085. mov [edx].BlkY3.PastRef,eax
  2086. mov eax,INTER4MV ; Set type for MB to INTER-coded, 4 MVs.
  2087. mov [edx].BlkY4.PastRef,ebx
  2088. mov [edx].BlockType,al
  2089. jmp MotionVectorSettled
  2090. NoBlockMotionVectors:
  2091. ENDIF ; H263
  2092. mov edi,MB0MVInterSWD
  2093. BlockMVNotBigEnoughGain: ; Try MB-level motion vector.
  2094. mov eax,MBCentralInterSWD
  2095. mov ecx,NonZeroMVDifferential
  2096. sub edi,eax
  2097. mov edx,MBlockActionStream
  2098. cmp edi,ecx
  2099. jg MotionVectorSettled
  2100. NonZeroMVNotBigEnoughGain: ; Settle on zero MV.
  2101. mov eax,Block1.ZeroMVInterSWD ; Restore Zero MV SWD.
  2102. mov edx,Block2.ZeroMVInterSWD
  2103. mov Block1.CentralInterSWD,eax
  2104. mov Block2.CentralInterSWD,edx
  2105. mov eax,Block3.ZeroMVInterSWD
  2106. mov edx,Block4.ZeroMVInterSWD
  2107. mov Block3.CentralInterSWD,eax
  2108. mov Block4.CentralInterSWD,edx
  2109. mov eax,MB0MVInterSWD ; Restore SWD for zero motion vector.
  2110. BelowZeroThresh:
  2111. mov edx,MBlockActionStream
  2112. mov ebx,TargetMBAddr ; Get address of this target macroblock.
  2113. mov MBCentralInterSWD,eax ; Save SWD.
  2114. xor ebp,ebp
  2115. add ebx,TargToRef
  2116. mov [edx].BlkY1.MVs,ebp ; Set horz and vert MVs to 0 in all blks.
  2117. mov [edx].BlkY1.PastRef,ebx ; Save address of ref block, all blks.
  2118. add ebx,8
  2119. mov [edx].BlkY2.PastRef,ebx
  2120. mov [edx].BlkY2.MVs,ebp
  2121. lea ecx,[ebx+PITCH*8]
  2122. add ebx,PITCH*8-8
  2123. mov [edx].BlkY3.PastRef,ebx
  2124. mov [edx].BlkY3.MVs,ebp
  2125. mov [edx].BlkY4.PastRef,ecx
  2126. mov [edx].BlkY4.MVs,ebp
  2127. ; Activity Details for this section of code (refer to flow diagram above):
  2128. ;
  2129. ; 6) We've settled on the motion vector that will be used if we do indeed
  2130. ; code the macroblock with inter-coding. We need to determine if some
  2131. ; or all of the blocks can be forced as empty (copy).
  2132. ; blocks. If all the blocks can be forced empty, we force the whole
  2133. ; macroblock to be empty.
  2134. ;
  2135. ; Expected Pentium (tm) microprocessor performance for this section:
  2136. ;
  2137. ; Execution frequency: Once per macroblock.
  2138. ;
  2139. ; 23 clocks.
  2140. ;
  2141. MotionVectorSettled:
  2142. IFDEF H261
  2143. mov edi,MBCentralInterSWD
  2144. mov eax,DoSpatialFiltering ; Are we doing spatial filtering?
  2145. mov edi,TargetMBAddr
  2146. test eax,eax
  2147. je SkipSpatialFiltering
  2148. mov ebx,MBCentralInterSWD
  2149. mov esi,SpatialFiltThreshold
  2150. cmp ebx,esi
  2151. jle SkipSpatialFiltering
  2152. add edi,TargToSLF ; Compute addr at which to put SLF prediction.
  2153. xor ebx,ebx
  2154. mov esi,[edx].BlkY1.PastRef
  2155. xor edx,edx
  2156. mov ebp,16
  2157. xor ecx,ecx
  2158. SpatialFilterHorzLoop:
  2159. mov dl,[edi] ; Pre-load cache line for output.
  2160. mov bl,[esi+6] ; p6
  2161. mov al,[esi+7] ; p7
  2162. inc bl ; p6+1
  2163. mov cl,[esi+5] ; p5
  2164. mov [edi+7],al ; p7' = p7
  2165. add al,bl ; p7 + p6 + 1
  2166. add bl,cl ; p6 + p5 + 1
  2167. mov dl,[esi+4] ; p4
  2168. add eax,ebx ; p7 + 2p6 + p5 + 2
  2169. shr eax,2 ; p6' = (p7 + 2p6 + p5 + 2) / 4
  2170. inc dl ; p4 + 1
  2171. add cl,dl ; p5 + p4 + 1
  2172. mov [edi+6],al ; p6'
  2173. mov al,[esi+3] ; p3
  2174. add ebx,ecx ; p6 + 2p5 + p4 + 2
  2175. shr ebx,2 ; p5' = (p6 + 2p5 + p4 + 2) / 4
  2176. add dl,al ; p4 + p3 + 1
  2177. mov [edi+5],bl ; p5'
  2178. mov bl,[esi+2] ; p2
  2179. add ecx,edx ; p5 + 2p4 + p3 + 2
  2180. inc bl ; p2 + 1
  2181. shr ecx,2 ; p4' = (p5 + 2p4 + p3 + 2) / 4
  2182. add al,bl ; p3 + p2 + 1
  2183. mov [edi+4],cl ; p4'
  2184. add edx,eax ; p4 + 2p3 + p2 + 2
  2185. shr edx,2 ; p3' = (p4 + 2p3 + p2 + 2) / 4
  2186. mov cl,[esi+1] ; p1
  2187. add bl,cl ; p2 + p1 + 1
  2188. mov [edi+3],dl ; p3'
  2189. add eax,ebx ; p3 + 2p2 + p1 + 2
  2190. mov dl,[esi] ; p0
  2191. shr eax,2 ; p2' = (p3 + 2p2 + p1 + 2) / 4
  2192. inc ebx ; p2 + p1 + 2
  2193. mov [edi+2],al ; p2'
  2194. add ebx,ecx ; p2 + 2p1 + 2
  2195. mov [edi],dl ; p0' = p0
  2196. add ebx,edx ; p2 + 2p1 + p0 + 2
  2197. shr ebx,2 ; p1' = (p2 + 2p1 + p0 + 2) / 4
  2198. mov al,[esi+7+8]
  2199. mov [edi+1],bl ; p1'
  2200. mov bl,[esi+6+8]
  2201. inc bl
  2202. mov cl,[esi+5+8]
  2203. mov [edi+7+8],al
  2204. add al,bl
  2205. add bl,cl
  2206. mov dl,[esi+4+8]
  2207. add eax,ebx
  2208. ;
  2209. shr eax,2
  2210. inc dl
  2211. add cl,dl
  2212. mov [edi+6+8],al
  2213. mov al,[esi+3+8]
  2214. add ebx,ecx
  2215. shr ebx,2
  2216. add dl,al
  2217. mov [edi+5+8],bl
  2218. mov bl,[esi+2+8]
  2219. add ecx,edx
  2220. inc bl
  2221. shr ecx,2
  2222. add al,bl
  2223. mov [edi+4+8],cl
  2224. add edx,eax
  2225. shr edx,2
  2226. mov cl,[esi+1+8]
  2227. add bl,cl
  2228. mov [edi+3+8],dl
  2229. add eax,ebx
  2230. mov dl,[esi+8]
  2231. shr eax,2
  2232. inc ebx
  2233. mov [edi+2+8],al
  2234. add ebx,ecx
  2235. mov [edi+8],dl
  2236. add ebx,edx
  2237. shr ebx,2
  2238. add esi,PITCH
  2239. mov [edi+1+8],bl
  2240. add edi,PITCH
  2241. dec ebp ; Done?
  2242. jne SpatialFilterHorzLoop
  2243. mov VertFilterDoneAddr,edi
  2244. sub edi,PITCH*16
  2245. SpatialFilterVertLoop:
  2246. mov eax,[edi] ; p0
  2247. ; ; Bank conflict for sure.
  2248. ;
  2249. mov ebx,[edi+PITCH] ; p1
  2250. add eax,ebx ; p0+p1
  2251. mov ecx,[edi+PITCH*2] ; p2
  2252. add ebx,ecx ; p1+p2
  2253. mov edx,[edi+PITCH*3] ; p3
  2254. shr eax,1 ; (p0+p1)/2 dirty
  2255. mov esi,[edi+PITCH*4] ; p4
  2256. add ecx,edx ; p2+p3
  2257. mov ebp,[edi+PITCH*5] ; p5
  2258. shr ebx,1 ; (p1+p2)/2 dirty
  2259. add edx,esi ; p3+p4
  2260. and eax,07F7F7F7FH ; (p0+p1)/2 clean
  2261. and ebx,07F7F7F7FH ; (p1+p2)/2 clean
  2262. and ecx,0FEFEFEFEH ; p2+p3 pre-cleaned
  2263. and edx,0FEFEFEFEH ; p3+p4 pre-cleaned
  2264. shr ecx,1 ; (p2+p3)/2 clean
  2265. add esi,ebp ; p4+p5
  2266. shr edx,1 ; (p3+p4)/2 clean
  2267. lea eax,[eax+ebx+001010101H] ; (p0+p1)/2+(p1+p2)/2+1
  2268. shr esi,1 ; (p4+p5)/2 dirty
  2269. ;
  2270. and esi,07F7F7F7FH ; (p4+p5)/2 clean
  2271. lea ebx,[ebx+ecx+001010101H] ; (p1+p2)/2+(p2+p3)/2+1
  2272. shr eax,1 ; p1' = ((p0+p1)/2+(p1+p2)/2+1)/2 dirty
  2273. lea ecx,[ecx+edx+001010101H] ; (p2+p3)/2+(p3+p4)/2+1
  2274. shr ebx,1 ; p2' = ((p1+p2)/2+(p2+p3)/2+1)/2 dirty
  2275. lea edx,[edx+esi+001010101H] ; (p3+p4)/2+(p4+p5)/2+1
  2276. and eax,07F7F7F7FH ; p1' clean
  2277. and ebx,07F7F7F7FH ; p2' clean
  2278. shr ecx,1 ; p3' = ((p2+p3)/2+(p3+p4)/2+1)/2 dirty
  2279. mov [edi+PITCH],eax ; p1'
  2280. shr edx,1 ; p4' = ((p3+p4)/2+(p4+p5)/2+1)/2 dirty
  2281. mov eax,[edi+PITCH*6] ; p6
  2282. and ecx,07F7F7F7FH ; p3' clean
  2283. and edx,07F7F7F7FH ; p4' clean
  2284. mov [edi+PITCH*2],ebx ; p2'
  2285. add ebp,eax ; p5+p6
  2286. shr ebp,1 ; (p5+p6)/2 dirty
  2287. mov ebx,[edi+PITCH*7] ; p7
  2288. add eax,ebx ; p6+p7
  2289. and ebp,07F7F7F7FH ; (p5+p6)/2 clean
  2290. mov [edi+PITCH*3],ecx ; p3'
  2291. and eax,0FEFEFEFEH ; (p6+p7)/2 pre-cleaned
  2292. shr eax,1 ; (p6+p7)/2 clean
  2293. lea esi,[esi+ebp+001010101H] ; (p4+p5)/2+(p5+p6)/2+1
  2294. shr esi,1 ; p5' = ((p4+p5)/2+(p5+p6)/2+1)/2 dirty
  2295. mov [edi+PITCH*4],edx ; p4'
  2296. lea ebp,[ebp+eax+001010101H] ; (p5+p6)/2+(p6+p7)/2+1
  2297. and esi,07F7F7F7FH ; p5' clean
  2298. shr ebp,1 ; p6' = ((p5+p6)/2+(p6+p7)/2+1)/2 dirty
  2299. mov [edi+PITCH*5],esi ; p5'
  2300. and ebp,07F7F7F7FH ; p6' clean
  2301. add edi,4
  2302. test edi,00000000FH
  2303. mov [edi+PITCH*6-4],ebp ; p6'
  2304. jne SpatialFilterVertLoop
  2305. add edi,PITCH*8-16
  2306. mov eax,VertFilterDoneAddr
  2307. cmp eax,edi
  2308. jne SpatialFilterVertLoop
  2309. ; Activity Details for this section of code (refer to flow diagram above):
  2310. ;
  2311. ; 9) The SAD for the spatially filtered reference macroblock is calculated
  2312. ; with half the pel differences accumulating into the low order half
  2313. ; of ebp, and the other half into the high order half.
  2314. ;
  2315. ; Register usage for this section:
  2316. ;
  2317. ; Input of this section:
  2318. ;
  2319. ; edi -- Address of pel 0,0 of spatially filtered reference macroblock.
  2320. ;
  2321. ; Predominate usage for body of this section:
  2322. ;
  2323. ; edi -- Address of pel 0,0 of spatially filtered reference macroblock.
  2324. ; esi, eax -- -8 times pel values from target macroblock.
  2325. ; ebp[ 0:15] -- SAD Accumulator for half of the match points.
  2326. ; ebp[16:31] -- SAD Accumulator for other half of the match points.
  2327. ; edx[ 0: 7] -- Weighted difference for one pel.
  2328. ; edx[ 8:15] -- Zero.
  2329. ; edx[16:23] -- Weighted difference for another pel.
  2330. ; edx[24:31] -- Zero.
  2331. ; bl, cl -- Pel values from the spatially filtered reference macroblock.
  2332. ;
  2333. ; Expected Pentium (tm) microprocessor performance for this section:
  2334. ;
  2335. ; Execution frequency: Once per block for which motion analysis is done
  2336. ; beyond the 0-motion vector.
  2337. ;
  2338. ; 146 clocks instruction execution (typically).
  2339. ; 6 clocks for bank conflicts (1/8 chance with 48 dual mem ops).
  2340. ; 0 clocks for new cache line fills.
  2341. ; ----
  2342. ; 152 clocks total time for this section.
  2343. ;
  2344. SpatialFilterDone:
  2345. sub edi,PITCH*8-8 ; Get to block 4.
  2346. xor ebp,ebp
  2347. xor ebx,ebx
  2348. xor ecx,ecx
  2349. SLFSWDLoop:
  2350. mov eax,BlockNM1.N8T00 ; Get -8 times target Pel00.
  2351. mov bl,[edi] ; Get Pel00 in spatially filtered reference.
  2352. mov esi,BlockNM1.N8T04
  2353. mov cl,[edi+4]
  2354. mov edx,[eax+ebx*8] ; Get abs diff for spatial filtered ref pel00.
  2355. mov eax,BlockNM1.N8T02
  2356. mov dl,[esi+ecx*8+2] ; Get abs diff for spatial filtered ref pel04.
  2357. mov bl,[edi+2]
  2358. mov esi,BlockNM1.N8T06
  2359. mov cl,[edi+6]
  2360. mov ebp,edx
  2361. mov edx,[eax+ebx*8]
  2362. mov eax,BlockNM1.N8T11
  2363. mov dl,[esi+ecx*8+2]
  2364. mov bl,[edi+PITCH*1+1]
  2365. mov cl,[edi+PITCH*1+5]
  2366. mov esi,BlockNM1.N8T15
  2367. add ebp,edx
  2368. mov edx,[eax+ebx*8]
  2369. mov eax,BlockNM1.N8T13
  2370. mov dl,[esi+ecx*8+2]
  2371. mov bl,[edi+PITCH*1+3]
  2372. mov cl,[edi+PITCH*1+7]
  2373. mov esi,BlockNM1.N8T17
  2374. add ebp,edx
  2375. mov edx,[eax+ebx*8]
  2376. mov eax,BlockNM1.N8T20
  2377. mov dl,[esi+ecx*8+2]
  2378. mov bl,[edi+PITCH*2+0]
  2379. mov cl,[edi+PITCH*2+4]
  2380. mov esi,BlockNM1.N8T24
  2381. add ebp,edx
  2382. mov edx,[eax+ebx*8]
  2383. mov eax,BlockNM1.N8T22
  2384. mov dl,[esi+ecx*8+2]
  2385. mov bl,[edi+PITCH*2+2]
  2386. mov cl,[edi+PITCH*2+6]
  2387. mov esi,BlockNM1.N8T26
  2388. add ebp,edx
  2389. mov edx,[eax+ebx*8]
  2390. mov eax,BlockNM1.N8T31
  2391. mov dl,[esi+ecx*8+2]
  2392. mov bl,[edi+PITCH*3+1]
  2393. mov cl,[edi+PITCH*3+5]
  2394. mov esi,BlockNM1.N8T35
  2395. add ebp,edx
  2396. mov edx,[eax+ebx*8]
  2397. mov eax,BlockNM1.N8T33
  2398. mov dl,[esi+ecx*8+2]
  2399. mov bl,[edi+PITCH*3+3]
  2400. mov cl,[edi+PITCH*3+7]
  2401. mov esi,BlockNM1.N8T37
  2402. add ebp,edx
  2403. mov edx,[eax+ebx*8]
  2404. mov eax,BlockNM1.N8T40
  2405. mov dl,[esi+ecx*8+2]
  2406. mov bl,[edi+PITCH*4+0]
  2407. mov cl,[edi+PITCH*4+4]
  2408. mov esi,BlockNM1.N8T44
  2409. add ebp,edx
  2410. mov edx,[eax+ebx*8]
  2411. mov eax,BlockNM1.N8T42
  2412. mov dl,[esi+ecx*8+2]
  2413. mov bl,[edi+PITCH*4+2]
  2414. mov cl,[edi+PITCH*4+6]
  2415. mov esi,BlockNM1.N8T46
  2416. add ebp,edx
  2417. mov edx,[eax+ebx*8]
  2418. mov eax,BlockNM1.N8T51
  2419. mov dl,[esi+ecx*8+2]
  2420. mov bl,[edi+PITCH*5+1]
  2421. mov cl,[edi+PITCH*5+5]
  2422. mov esi,BlockNM1.N8T55
  2423. add ebp,edx
  2424. mov edx,[eax+ebx*8]
  2425. mov eax,BlockNM1.N8T53
  2426. mov dl,[esi+ecx*8+2]
  2427. mov bl,[edi+PITCH*5+3]
  2428. mov cl,[edi+PITCH*5+7]
  2429. mov esi,BlockNM1.N8T57
  2430. add ebp,edx
  2431. mov edx,[eax+ebx*8]
  2432. mov eax,BlockNM1.N8T60
  2433. mov dl,[esi+ecx*8+2]
  2434. mov bl,[edi+PITCH*6+0]
  2435. mov cl,[edi+PITCH*6+4]
  2436. mov esi,BlockNM1.N8T64
  2437. add ebp,edx
  2438. mov edx,[eax+ebx*8]
  2439. mov eax,BlockNM1.N8T62
  2440. mov dl,[esi+ecx*8+2]
  2441. mov bl,[edi+PITCH*6+2]
  2442. mov cl,[edi+PITCH*6+6]
  2443. mov esi,BlockNM1.N8T66
  2444. add ebp,edx
  2445. mov edx,[eax+ebx*8]
  2446. mov eax,BlockNM1.N8T71
  2447. mov dl,[esi+ecx*8+2]
  2448. mov bl,[edi+PITCH*7+1]
  2449. mov cl,[edi+PITCH*7+5]
  2450. mov esi,BlockNM1.N8T75
  2451. add ebp,edx
  2452. mov edx,[eax+ebx*8]
  2453. mov eax,BlockNM1.N8T73
  2454. mov dl,[esi+ecx*8+2]
  2455. mov bl,[edi+PITCH*7+3]
  2456. mov cl,[edi+PITCH*7+7]
  2457. mov esi,BlockNM1.N8T77
  2458. add ebp,edx
  2459. mov edx,[eax+ebx*8]
  2460. add edx,ebp
  2461. mov cl,[esi+ecx*8+2]
  2462. shr edx,16
  2463. add ebp,ecx
  2464. and ebp,0FFFFH
  2465. sub esp,BlockLen
  2466. add ebp,edx
  2467. sub edi,8
  2468. test esp,000000008H
  2469. mov BlockN.CentralInterSWD_SLF,ebp
  2470. jne SLFSWDLoop
  2471. test esp,000000010H
  2472. lea edi,[edi-PITCH*8+16]
  2473. jne SLFSWDLoop
  2474. mov eax,Block2.CentralInterSWD_SLF+BlockLen*4
  2475. mov ebx,Block3.CentralInterSWD_SLF+BlockLen*4
  2476. mov ecx,Block4.CentralInterSWD_SLF+BlockLen*4
  2477. add esp,BlockLen*4
  2478. add ebp,ecx
  2479. lea edx,[eax+ebx]
  2480. add ebp,edx
  2481. mov edx,SpatialFiltDifferential
  2482. lea esi,[edi+PITCH*8-8]
  2483. mov edi,MBCentralInterSWD
  2484. sub edi,edx
  2485. mov edx,MBlockActionStream
  2486. cmp ebp,edi
  2487. jge SpatialFilterNotAsGood
  2488. mov MBCentralInterSWD,ebp ; Spatial filter was better. Stash
  2489. mov ebp,Block1.CentralInterSWD_SLF ; pertinent calculations.
  2490. mov Block2.CentralInterSWD,eax
  2491. mov Block3.CentralInterSWD,ebx
  2492. mov Block4.CentralInterSWD,ecx
  2493. mov Block1.CentralInterSWD,ebp
  2494. mov [edx].BlkY1.PastRef,esi
  2495. mov al,INTERSLF
  2496. mov [edx].BlockType,al
  2497. SkipSpatialFiltering:
  2498. SpatialFilterNotAsGood:
  2499. ENDIF ; H261
  2500. mov al,[edx].CodedBlocks ; Fetch coded block pattern.
  2501. mov edi,EmptyThreshold ; Get threshold for forcing block empty?
  2502. mov ebp,MBCentralInterSWD
  2503. mov esi,InterSWDBlocks
  2504. mov ebx,Block4.CentralInterSWD ; Is SWD > threshold?
  2505. cmp ebx,edi
  2506. jg @f
  2507. and al,0F7H ; If not, indicate block 4 is NOT coded.
  2508. dec esi
  2509. sub ebp,ebx
  2510. @@:
  2511. mov ebx,Block3.CentralInterSWD
  2512. cmp ebx,edi
  2513. jg @f
  2514. and al,0FBH
  2515. dec esi
  2516. sub ebp,ebx
  2517. @@:
  2518. mov ebx,Block2.CentralInterSWD
  2519. cmp ebx,edi
  2520. jg @f
  2521. and al,0FDH
  2522. dec esi
  2523. sub ebp,ebx
  2524. @@:
  2525. mov ebx,Block1.CentralInterSWD
  2526. cmp ebx,edi
  2527. jg @f
  2528. and al,0FEH
  2529. dec esi
  2530. sub ebp,ebx
  2531. @@:
  2532. mov [edx].CodedBlocks,al ; Store coded block pattern.
  2533. add esi,4
  2534. mov InterSWDBlocks,esi
  2535. xor ebx,ebx
  2536. and eax,00FH
  2537. mov MBCentralInterSWD,ebp
  2538. cmp al,00FH ; Are any blocks marked empty?
  2539. jne InterBest ; If some blocks are empty, can't code as Intra
  2540. cmp ebp,InterCodingThreshold ; Is InterSWD below inter-coding threshhold.
  2541. lea esi,Block1+128
  2542. mov ebp,0
  2543. jae CalculateIntraSWD
  2544. InterBest:
  2545. mov ecx,InterSWDTotal
  2546. mov ebp,MBCentralInterSWD
  2547. add ecx,ebp ; Add to total for this macroblock class.
  2548. mov PD [edx].SWD,ebp
  2549. mov InterSWDTotal,ecx
  2550. jmp NextMacroBlock
  2551. ; Activity Details for this section of code (refer to flow diagram above):
  2552. ;
  2553. ; 11) The IntraSWD is calculated as two partial sums, one in the low order
  2554. ; 16 bits of ebp and one in the high order 16 bits. An average pel
  2555. ; value for each block will be calculated to the nearest half.
  2556. ;
  2557. ; Register usage for this section:
  2558. ;
  2559. ; Input of this section:
  2560. ;
  2561. ; None
  2562. ;
  2563. ; Predominate usage for body of this section:
  2564. ;
  2565. ; esi -- Address of target block 1 (3), plus 128.
  2566. ; ebp[ 0:15] -- IntraSWD Accumulator for block 1 (3).
  2567. ; ebp[16:31] -- IntraSWD Accumulator for block 2 (4).
  2568. ; edi -- Block 2 (4) target pel, times -8, and with WeightedDiff added.
  2569. ; edx -- Block 1 (3) target pel, times -8, and with WeightedDiff added.
  2570. ; ecx[ 0: 7] -- Weighted difference for one pel in block 2 (4).
  2571. ; ecx[ 8:15] -- Zero.
  2572. ; ecx[16:23] -- Weighted difference for one pel in block 1 (3).
  2573. ; ecx[24:31] -- Zero.
  2574. ; ebx -- Average block 2 (4) target pel to nearest .5.
  2575. ; eax -- Average block 1 (3) target pel to nearest .5.
  2576. ;
  2577. ; Output of this section:
  2578. ;
  2579. ; edi -- Scratch.
  2580. ; ebp[ 0:15] -- IntraSWD. (Also written to MBlockActionStream.)
  2581. ; ebp[16:31] -- garbage.
  2582. ; ebx -- Zero.
  2583. ; eax -- MBlockActionStream.
  2584. ;
  2585. ; Expected Pentium (tm) microprocessor performance for this section:
  2586. ;
  2587. ; Executed once per macroblock, (except for those for which one of more blocks
  2588. ; are marked empty, or where the InterSWD is less than a threshold).
  2589. ;
  2590. ; 183 clocks for instruction execution
  2591. ; 12 clocks for bank conflicts (94 dual mem ops with 1/8 chance of conflict)
  2592. ; ----
  2593. ; 195 clocks total time for this section.
  2594. IntraByDecree:
  2595. mov eax,InterSWDBlocks ; Inc by 4, because we will undo it below.
  2596. xor ebp,ebp
  2597. mov MBMotionVectors,ebp ; Stash zero for MB level motion vectors.
  2598. mov ebp,040000000H ; Set Inter SWD artificially high.
  2599. lea esi,Block1+128
  2600. add eax,4
  2601. mov MBCentralInterSWD,ebp
  2602. mov InterSWDBlocks,eax
  2603. CalculateIntraSWD:
  2604. CalculateIntraSWDLoop:
  2605. mov eax,[esi-128].AccumTargetPels ; Fetch acc of target pels for 1st block.
  2606. mov edx,[esi-128].N8T00
  2607. add eax,8
  2608. mov ebx,[esi-128+BlockLen].AccumTargetPels
  2609. shr eax,4 ; Average block 1 target pel rounded to nearest .5.
  2610. add ebx,8
  2611. shr ebx,4
  2612. mov edi,[esi-128+BlockLen].N8T00
  2613. mov ecx,PD [edx+eax*4]
  2614. mov edx,[esi-128].N8T02
  2615. mov cl,PB [edi+ebx*4+2]
  2616. mov edi,[esi-128+BlockLen].N8T02
  2617. add ebp,ecx
  2618. mov ecx,PD [edx+eax*4]
  2619. mov edx,[esi-128].N8T04
  2620. mov cl,PB [edi+ebx*4+2]
  2621. mov edi,[esi-128+BlockLen].N8T04
  2622. add ebp,ecx
  2623. mov ecx,PD [edx+eax*4]
  2624. mov edx,[esi-128].N8T06
  2625. mov cl,PB [edi+ebx*4+2]
  2626. mov edi,[esi-128+BlockLen].N8T06
  2627. add ebp,ecx
  2628. mov ecx,PD [edx+eax*4]
  2629. mov edx,[esi-128].N8T11
  2630. mov cl,PB [edi+ebx*4+2]
  2631. mov edi,[esi-128+BlockLen].N8T11
  2632. add ebp,ecx
  2633. mov ecx,PD [edx+eax*4]
  2634. mov edx,[esi-128].N8T13
  2635. mov cl,PB [edi+ebx*4+2]
  2636. mov edi,[esi-128+BlockLen].N8T13
  2637. add ebp,ecx
  2638. mov ecx,PD [edx+eax*4]
  2639. mov edx,[esi-128].N8T15
  2640. mov cl,PB [edi+ebx*4+2]
  2641. mov edi,[esi-128+BlockLen].N8T15
  2642. add ebp,ecx
  2643. mov ecx,PD [edx+eax*4]
  2644. mov edx,[esi-128].N8T17
  2645. mov cl,PB [edi+ebx*4+2]
  2646. mov edi,[esi-128+BlockLen].N8T17
  2647. add ebp,ecx
  2648. mov ecx,PD [edx+eax*4]
  2649. mov edx,[esi-128].N8T20
  2650. mov cl,PB [edi+ebx*4+2]
  2651. mov edi,[esi-128+BlockLen].N8T20
  2652. add ebp,ecx
  2653. mov ecx,PD [edx+eax*4]
  2654. mov edx,[esi-128].N8T22
  2655. mov cl,PB [edi+ebx*4+2]
  2656. mov edi,[esi-128+BlockLen].N8T22
  2657. add ebp,ecx
  2658. mov ecx,PD [edx+eax*4]
  2659. mov edx,[esi-128].N8T24
  2660. mov cl,PB [edi+ebx*4+2]
  2661. mov edi,[esi-128+BlockLen].N8T24
  2662. add ebp,ecx
  2663. mov ecx,PD [edx+eax*4]
  2664. mov edx,[esi-128].N8T26
  2665. mov cl,PB [edi+ebx*4+2]
  2666. mov edi,[esi-128+BlockLen].N8T26
  2667. add ebp,ecx
  2668. mov ecx,PD [edx+eax*4]
  2669. mov edx,[esi-128].N8T31
  2670. mov cl,PB [edi+ebx*4+2]
  2671. mov edi,[esi-128+BlockLen].N8T31
  2672. add ebp,ecx
  2673. mov ecx,PD [edx+eax*4]
  2674. mov edx,[esi-128].N8T33
  2675. mov cl,PB [edi+ebx*4+2]
  2676. mov edi,[esi-128+BlockLen].N8T33
  2677. add ebp,ecx
  2678. mov ecx,PD [edx+eax*4]
  2679. mov edx,[esi-128].N8T35
  2680. mov cl,PB [edi+ebx*4+2]
  2681. mov edi,[esi-128+BlockLen].N8T35
  2682. add ebp,ecx
  2683. mov ecx,PD [edx+eax*4]
  2684. mov edx,[esi-128].N8T37
  2685. mov cl,PB [edi+ebx*4+2]
  2686. mov edi,[esi-128+BlockLen].N8T37
  2687. add ebp,ecx
  2688. mov ecx,PD [edx+eax*4]
  2689. mov edx,[esi-128].N8T40
  2690. mov cl,PB [edi+ebx*4+2]
  2691. mov edi,[esi-128+BlockLen].N8T40
  2692. add ebp,ecx
  2693. mov ecx,PD [edx+eax*4]
  2694. mov edx,[esi-128].N8T42
  2695. mov cl,PB [edi+ebx*4+2]
  2696. mov edi,[esi-128+BlockLen].N8T42
  2697. add ebp,ecx
  2698. mov ecx,PD [edx+eax*4]
  2699. mov edx,[esi-128].N8T44
  2700. mov cl,PB [edi+ebx*4+2]
  2701. mov edi,[esi-128+BlockLen].N8T44
  2702. add ebp,ecx
  2703. mov ecx,PD [edx+eax*4]
  2704. mov edx,[esi-128].N8T46
  2705. mov cl,PB [edi+ebx*4+2]
  2706. mov edi,[esi-128+BlockLen].N8T46
  2707. add ebp,ecx
  2708. mov ecx,PD [edx+eax*4]
  2709. mov edx,[esi-128].N8T51
  2710. mov cl,PB [edi+ebx*4+2]
  2711. mov edi,[esi-128+BlockLen].N8T51
  2712. add ebp,ecx
  2713. mov ecx,PD [edx+eax*4]
  2714. mov edx,[esi-128].N8T53
  2715. mov cl,PB [edi+ebx*4+2]
  2716. mov edi,[esi-128+BlockLen].N8T53
  2717. add ebp,ecx
  2718. mov ecx,PD [edx+eax*4]
  2719. mov edx,[esi-128].N8T55
  2720. mov cl,PB [edi+ebx*4+2]
  2721. mov edi,[esi-128+BlockLen].N8T55
  2722. add ebp,ecx
  2723. mov ecx,PD [edx+eax*4]
  2724. mov edx,[esi-128].N8T57
  2725. mov cl,PB [edi+ebx*4+2]
  2726. mov edi,[esi-128+BlockLen].N8T57
  2727. add ebp,ecx
  2728. mov ecx,PD [edx+eax*4]
  2729. mov edx,[esi-128].N8T60
  2730. mov cl,PB [edi+ebx*4+2]
  2731. mov edi,[esi-128+BlockLen].N8T60
  2732. add ebp,ecx
  2733. mov ecx,PD [edx+eax*4]
  2734. mov edx,[esi-128].N8T62
  2735. mov cl,PB [edi+ebx*4+2]
  2736. mov edi,[esi-128+BlockLen].N8T62
  2737. add ebp,ecx
  2738. mov ecx,PD [edx+eax*4]
  2739. mov edx,[esi-128].N8T64
  2740. mov cl,PB [edi+ebx*4+2]
  2741. mov edi,[esi-128+BlockLen].N8T64
  2742. add ebp,ecx
  2743. mov ecx,PD [edx+eax*4]
  2744. mov edx,[esi-128].N8T66
  2745. mov cl,PB [edi+ebx*4+2]
  2746. mov edi,[esi-128+BlockLen].N8T66
  2747. add ebp,ecx
  2748. mov ecx,PD [edx+eax*4]
  2749. mov edx,[esi-128].N8T71
  2750. mov cl,PB [edi+ebx*4+2]
  2751. mov edi,[esi-128+BlockLen].N8T71
  2752. add ebp,ecx
  2753. mov ecx,PD [edx+eax*4]
  2754. mov edx,[esi-128].N8T73
  2755. mov cl,PB [edi+ebx*4+2]
  2756. mov edi,[esi-128+BlockLen].N8T73
  2757. add ebp,ecx
  2758. mov ecx,PD [edx+eax*4]
  2759. mov edx,[esi-128].N8T75
  2760. mov cl,PB [edi+ebx*4+2]
  2761. mov edi,[esi-128+BlockLen].N8T75
  2762. add ebp,ecx
  2763. mov ecx,PD [edx+eax*4]
  2764. mov edx,[esi-128].N8T77
  2765. mov cl,PB [edi+ebx*4+2]
  2766. mov edi,[esi-128+BlockLen].N8T77
  2767. add ebp,ecx
  2768. mov ecx,PD [edx+eax*4]
  2769. mov cl,PB [edi+ebx*4+2]
  2770. mov eax,000007FFFH
  2771. add ebp,ecx
  2772. add esi,BlockLen*2
  2773. and eax,ebp
  2774. mov ecx,MBCentralInterSWD
  2775. shr ebp,16
  2776. sub ecx,IntraCodingDifferential
  2777. add ebp,eax
  2778. mov edx,MBlockActionStream ; Reload list ptr.
  2779. cmp ecx,ebp ; Is IntraSWD > InterSWD - differential?
  2780. jl InterBest
  2781. lea ecx,Block1+128+BlockLen*2
  2782. cmp ecx,esi
  2783. je CalculateIntraSWDLoop
  2784. ; ebp -- IntraSWD
  2785. ; edx -- MBlockActionStream
  2786. DoneCalcIntraSWD:
  2787. IntraBest:
  2788. mov ecx,IntraSWDTotal
  2789. mov edi,IntraSWDBlocks
  2790. add ecx,ebp ; Add to total for this macroblock class.
  2791. add edi,4 ; Accumulate # of blocks for this type.
  2792. mov IntraSWDBlocks,edi
  2793. mov edi,InterSWDBlocks
  2794. sub edi,4
  2795. mov IntraSWDTotal,ecx
  2796. mov InterSWDBlocks,edi
  2797. mov bl,INTRA
  2798. mov PB [edx].BlockType,bl ; Indicate macroblock handling decision.
  2799. IFDEF H261
  2800. xor ebx,ebx
  2801. ELSE ; H263
  2802. mov ebx,MBMotionVectors ; Set MVs to best MB level motion vectors.
  2803. ENDIF
  2804. mov PD [edx].BlkY1.MVs,ebx
  2805. mov PD [edx].BlkY2.MVs,ebx
  2806. mov PD [edx].BlkY3.MVs,ebx
  2807. mov PD [edx].BlkY4.MVs,ebx
  2808. xor ebx,ebx
  2809. mov PD [edx].SWD,ebp
  2810. jmp NextMacroBlock
  2811. ;==============================================================================
  2812. ; Internal functions
  2813. ;==============================================================================
  2814. DoSWDLoop:
  2815. ; Upon entry:
  2816. ; esi -- Points to ref1
  2817. ; edi -- Points to ref2
  2818. ; ecx -- Upper 24 bits zero
  2819. ; ebx -- Upper 24 bits zero
  2820. mov bl,PB [esi] ; 00A -- Get Pel 00 in reference ref1.
  2821. mov eax,Block1.N8T00+4 ; 00B -- Get -8 times target pel 00.
  2822. mov cl,PB [edi] ; 00C -- Get Pel 00 in reference ref2.
  2823. sub esp,BlockLen*4+28
  2824. SWDLoop:
  2825. mov edx,PD [eax+ebx*8] ; 00D -- Get weighted diff for ref1 pel 00.
  2826. mov bl,PB [esi+2] ; 02A
  2827. mov dl,PB [eax+ecx*8+2] ; 00E -- Get weighted diff for ref2 pel 00.
  2828. mov eax,BlockN.N8T02+32 ; 02B
  2829. mov ebp,edx ; 00F -- Accum weighted diffs for pel 00.
  2830. mov cl,PB [edi+2] ; 02C
  2831. mov edx,PD [eax+ebx*8] ; 02D
  2832. mov bl,PB [esi+4] ; 04A
  2833. mov dl,PB [eax+ecx*8+2] ; 02E
  2834. mov eax,BlockN.N8T04+32 ; 04B
  2835. mov cl,PB [edi+4] ; 04C
  2836. add ebp,edx ; 02F
  2837. mov edx,PD [eax+ebx*8] ; 04D
  2838. mov bl,PB [esi+6]
  2839. mov dl,PB [eax+ecx*8+2] ; 04E
  2840. mov eax,BlockN.N8T06+32
  2841. mov cl,PB [edi+6]
  2842. add ebp,edx ; 04F
  2843. mov edx,PD [eax+ebx*8]
  2844. mov bl,PB [esi+PITCH*1+1]
  2845. mov dl,PB [eax+ecx*8+2]
  2846. mov eax,BlockN.N8T11+32
  2847. mov cl,PB [edi+PITCH*1+1]
  2848. add ebp,edx
  2849. mov edx,PD [eax+ebx*8]
  2850. mov bl,PB [esi+PITCH*1+3]
  2851. mov dl,PB [eax+ecx*8+2]
  2852. mov eax,BlockN.N8T13+32
  2853. mov cl,PB [edi+PITCH*1+3]
  2854. add ebp,edx
  2855. mov edx,PD [eax+ebx*8]
  2856. mov bl,PB [esi+PITCH*1+5]
  2857. mov dl,PB [eax+ecx*8+2]
  2858. mov eax,BlockN.N8T15+32
  2859. mov cl,PB [edi+PITCH*1+5]
  2860. add ebp,edx
  2861. mov edx,PD [eax+ebx*8]
  2862. mov bl,PB [esi+PITCH*1+7]
  2863. mov dl,PB [eax+ecx*8+2]
  2864. mov eax,BlockN.N8T17+32
  2865. mov cl,PB [edi+PITCH*1+7]
  2866. add ebp,edx
  2867. mov edx,PD [eax+ebx*8]
  2868. mov bl,PB [esi+PITCH*2+0]
  2869. mov dl,PB [eax+ecx*8+2]
  2870. mov eax,BlockN.N8T20+32
  2871. mov cl,PB [edi+PITCH*2+0]
  2872. add ebp,edx
  2873. mov edx,PD [eax+ebx*8]
  2874. mov bl,PB [esi+PITCH*2+2]
  2875. mov dl,PB [eax+ecx*8+2]
  2876. mov eax,BlockN.N8T22+32
  2877. mov cl,PB [edi+PITCH*2+2]
  2878. add ebp,edx
  2879. mov edx,PD [eax+ebx*8]
  2880. mov bl,PB [esi+PITCH*2+4]
  2881. mov dl,PB [eax+ecx*8+2]
  2882. mov eax,BlockN.N8T24+32
  2883. mov cl,PB [edi+PITCH*2+4]
  2884. add ebp,edx
  2885. mov edx,PD [eax+ebx*8]
  2886. mov bl,PB [esi+PITCH*2+6]
  2887. mov dl,PB [eax+ecx*8+2]
  2888. mov eax,BlockN.N8T26+32
  2889. mov cl,PB [edi+PITCH*2+6]
  2890. add ebp,edx
  2891. mov edx,PD [eax+ebx*8]
  2892. mov bl,PB [esi+PITCH*3+1]
  2893. mov dl,PB [eax+ecx*8+2]
  2894. mov eax,BlockN.N8T31+32
  2895. mov cl,PB [edi+PITCH*3+1]
  2896. add ebp,edx
  2897. mov edx,PD [eax+ebx*8]
  2898. mov bl,PB [esi+PITCH*3+3]
  2899. mov dl,PB [eax+ecx*8+2]
  2900. mov eax,BlockN.N8T33+32
  2901. mov cl,PB [edi+PITCH*3+3]
  2902. add ebp,edx
  2903. mov edx,PD [eax+ebx*8]
  2904. mov bl,PB [esi+PITCH*3+5]
  2905. mov dl,PB [eax+ecx*8+2]
  2906. mov eax,BlockN.N8T35+32
  2907. mov cl,PB [edi+PITCH*3+5]
  2908. add ebp,edx
  2909. mov edx,PD [eax+ebx*8]
  2910. mov bl,PB [esi+PITCH*3+7]
  2911. mov dl,PB [eax+ecx*8+2]
  2912. mov eax,BlockN.N8T37+32
  2913. mov cl,PB [edi+PITCH*3+7]
  2914. add ebp,edx
  2915. mov edx,PD [eax+ebx*8]
  2916. mov bl,PB [esi+PITCH*4+0]
  2917. mov dl,PB [eax+ecx*8+2]
  2918. mov eax,BlockN.N8T40+32
  2919. mov cl,PB [edi+PITCH*4+0]
  2920. add ebp,edx
  2921. mov edx,PD [eax+ebx*8]
  2922. mov bl,PB [esi+PITCH*4+2]
  2923. mov dl,PB [eax+ecx*8+2]
  2924. mov eax,BlockN.N8T42+32
  2925. mov cl,PB [edi+PITCH*4+2]
  2926. add ebp,edx
  2927. mov edx,PD [eax+ebx*8]
  2928. mov bl,PB [esi+PITCH*4+4]
  2929. mov dl,PB [eax+ecx*8+2]
  2930. mov eax,BlockN.N8T44+32
  2931. mov cl,PB [edi+PITCH*4+4]
  2932. add ebp,edx
  2933. mov edx,PD [eax+ebx*8]
  2934. mov bl,PB [esi+PITCH*4+6]
  2935. mov dl,PB [eax+ecx*8+2]
  2936. mov eax,BlockN.N8T46+32
  2937. mov cl,PB [edi+PITCH*4+6]
  2938. add ebp,edx
  2939. mov edx,PD [eax+ebx*8]
  2940. mov bl,PB [esi+PITCH*5+1]
  2941. mov dl,PB [eax+ecx*8+2]
  2942. mov eax,BlockN.N8T51+32
  2943. mov cl,PB [edi+PITCH*5+1]
  2944. add ebp,edx
  2945. mov edx,PD [eax+ebx*8]
  2946. mov bl,PB [esi+PITCH*5+3]
  2947. mov dl,PB [eax+ecx*8+2]
  2948. mov eax,BlockN.N8T53+32
  2949. mov cl,PB [edi+PITCH*5+3]
  2950. add ebp,edx
  2951. mov edx,PD [eax+ebx*8]
  2952. mov bl,PB [esi+PITCH*5+5]
  2953. mov dl,PB [eax+ecx*8+2]
  2954. mov eax,BlockN.N8T55+32
  2955. mov cl,PB [edi+PITCH*5+5]
  2956. add ebp,edx
  2957. mov edx,PD [eax+ebx*8]
  2958. mov bl,PB [esi+PITCH*5+7]
  2959. mov dl,PB [eax+ecx*8+2]
  2960. mov eax,BlockN.N8T57+32
  2961. mov cl,PB [edi+PITCH*5+7]
  2962. add ebp,edx
  2963. mov edx,PD [eax+ebx*8]
  2964. mov bl,PB [esi+PITCH*6+0]
  2965. mov dl,PB [eax+ecx*8+2]
  2966. mov eax,BlockN.N8T60+32
  2967. mov cl,PB [edi+PITCH*6+0]
  2968. add ebp,edx
  2969. mov edx,PD [eax+ebx*8]
  2970. mov bl,PB [esi+PITCH*6+2]
  2971. mov dl,PB [eax+ecx*8+2]
  2972. mov eax,BlockN.N8T62+32
  2973. mov cl,PB [edi+PITCH*6+2]
  2974. add ebp,edx
  2975. mov edx,PD [eax+ebx*8]
  2976. mov bl,PB [esi+PITCH*6+4]
  2977. mov dl,PB [eax+ecx*8+2]
  2978. mov eax,BlockN.N8T64+32
  2979. mov cl,PB [edi+PITCH*6+4]
  2980. add ebp,edx
  2981. mov edx,PD [eax+ebx*8]
  2982. mov bl,PB [esi+PITCH*6+6]
  2983. mov dl,PB [eax+ecx*8+2]
  2984. mov eax,BlockN.N8T66+32
  2985. mov cl,PB [edi+PITCH*6+6]
  2986. add ebp,edx
  2987. mov edx,PD [eax+ebx*8]
  2988. mov bl,PB [esi+PITCH*7+1]
  2989. mov dl,PB [eax+ecx*8+2]
  2990. mov eax,BlockN.N8T71+32
  2991. mov cl,PB [edi+PITCH*7+1]
  2992. add ebp,edx
  2993. mov edx,PD [eax+ebx*8]
  2994. mov bl,PB [esi+PITCH*7+3]
  2995. mov dl,PB [eax+ecx*8+2]
  2996. mov eax,BlockN.N8T73+32
  2997. mov cl,PB [edi+PITCH*7+3]
  2998. add ebp,edx
  2999. mov edx,PD [eax+ebx*8]
  3000. mov bl,PB [esi+PITCH*7+5]
  3001. mov dl,PB [eax+ecx*8+2]
  3002. mov eax,BlockN.N8T75+32
  3003. mov cl,PB [edi+PITCH*7+5]
  3004. add ebp,edx
  3005. mov edx,PD [eax+ebx*8]
  3006. mov bl,PB [esi+PITCH*7+7]
  3007. mov dl,PB [eax+ecx*8+2]
  3008. mov eax,BlockN.N8T77+32
  3009. mov cl,PB [edi+PITCH*7+7]
  3010. add ebp,edx
  3011. mov edx,PD [eax+ebx*8]
  3012. add esp,BlockLen
  3013. mov dl,PB [eax+ecx*8+2]
  3014. mov eax,ebp
  3015. add ebp,edx
  3016. add edx,eax
  3017. shr ebp,16 ; Extract SWD for ref1.
  3018. and edx,00000FFFFH ; Extract SWD for ref2.
  3019. mov esi,BlockN.Ref1Addr+32 ; Get address of next ref1 block.
  3020. mov edi,BlockN.Ref2Addr+32 ; Get address of next ref2 block.
  3021. mov BlockNM1.Ref1InterSWD+32,ebp ; Store SWD for ref1.
  3022. mov BlockNM1.Ref2InterSWD+32,edx ; Store SWD for ref2.
  3023. mov bl,PB [esi] ; 00A -- Get Pel 02 in reference ref1.
  3024. mov eax,BlockN.N8T00+32 ; 00B -- Get -8 times target pel 00.
  3025. test esp,000000018H ; Done when esp is 32-byte aligned.
  3026. mov cl,PB [edi] ; 00C -- Get Pel 02 in reference ref2.
  3027. jne SWDLoop
  3028. ; Output:
  3029. ; ebp -- Ref1 SWD for block 4
  3030. ; edx -- Ref2 SWD for block 4
  3031. ; ecx -- Upper 24 bits zero
  3032. ; ebx -- Upper 24 bits zero
  3033. add esp,28
  3034. ret
  3035. IFDEF H261
  3036. ELSE ; H263
  3037. DoSWDHalfPelHorzLoop:
  3038. ; ebp -- Initialized to 0, except when can't search off left or right edge.
  3039. ; edi -- Ref addr for block 1. Ref1 is .5 pel to left. Ref2 is .5 to right.
  3040. xor ecx,ecx
  3041. sub esp,BlockLen*4+28
  3042. xor eax,eax
  3043. xor ebx,ebx
  3044. SWDHalfPelHorzLoop:
  3045. mov al,[edi] ; 00A -- Fetch center ref pel 00.
  3046. mov esi,BlockN.N8T00+32; 00B -- Target pel 00 (times -8).
  3047. mov bl,[edi+2] ; 02A -- Fetch center ref pel 02.
  3048. mov edx,BlockN.N8T02+32; 02B -- Target pel 02 (times -8).
  3049. lea esi,[esi+eax*4] ; 00C -- Combine target pel 00 and center ref pel 00.
  3050. mov al,[edi-1] ; 00D -- Get pel to left for match against pel 00.
  3051. lea edx,[edx+ebx*4] ; 02C -- Combine target pel 02 and center ref pel 02.
  3052. mov bl,[edi+1] ; 00E -- Get pel to right for match against pel 00,
  3053. ; ; 02D -- and pel to left for match against pel 02.
  3054. mov ecx,[esi+eax*4] ; 00F -- [16:23] weighted diff for left ref pel 00.
  3055. mov al,[edi+3] ; 02E -- Get pel to right for match against pel 02.
  3056. add ebp,ecx ; 00G -- Accumulate left ref pel 00.
  3057. mov ecx,[edx+ebx*4] ; 02F -- [16:23] weighted diff for left ref pel 02.
  3058. mov cl,[edx+eax*4+2] ; 02H -- [0:7] is weighted diff for right ref pel 02.
  3059. mov al,[edi+4] ; 04A
  3060. add ebp,ecx ; 02I -- Accumulate right ref pel 02,
  3061. ; ; 02G -- Accumulate left ref pel 02.
  3062. mov bl,[esi+ebx*4+2] ; 00H -- [0:7] is weighted diff for right ref pel 00.
  3063. add ebp,ebx ; 00I -- Accumulate right ref pel 00.
  3064. mov esi,BlockN.N8T04+32; 04B
  3065. mov bl,[edi+6] ; 06A
  3066. mov edx,BlockN.N8T06+32; 06B
  3067. lea esi,[esi+eax*4] ; 04C
  3068. mov al,[edi+3] ; 04D
  3069. lea edx,[edx+ebx*4] ; 06C
  3070. mov bl,[edi+5] ; 04E & 06D
  3071. mov ecx,[esi+eax*4] ; 04F
  3072. mov al,[edi+7] ; 06E
  3073. add ebp,ecx ; 04G
  3074. mov ecx,[edx+ebx*4] ; 06F
  3075. mov cl,[edx+eax*4+2] ; 06H
  3076. mov al,[edi+PITCH*1+1] ; 11A
  3077. add ebp,ecx ; 04I & 06G
  3078. mov bl,[esi+ebx*4+2] ; 04H
  3079. add ebp,ebx ; 04I
  3080. mov esi,BlockN.N8T11+32; 11B
  3081. mov bl,[edi+PITCH*1+3] ; 13A
  3082. mov edx,BlockN.N8T13+32; 13B
  3083. lea esi,[esi+eax*4] ; 11C
  3084. mov al,[edi+PITCH*1+0] ; 11D
  3085. lea edx,[edx+ebx*4] ; 13C
  3086. mov bl,[edi+PITCH*1+2] ; 11E & 13D
  3087. mov ecx,[esi+eax*4] ; 11F
  3088. mov al,[edi+PITCH*1+4] ; 13E
  3089. add ebp,ecx ; 11G
  3090. mov ecx,[edx+ebx*4] ; 13F
  3091. mov cl,[edx+eax*4+2] ; 13H
  3092. mov al,[edi+PITCH*1+5] ; 15A
  3093. add ebp,ecx ; 11I & 13G
  3094. mov bl,[esi+ebx*4+2] ; 11H
  3095. add ebp,ebx ; 11I
  3096. mov esi,BlockN.N8T15+32; 15B
  3097. mov bl,[edi+PITCH*1+7] ; 17A
  3098. mov edx,BlockN.N8T17+32; 17B
  3099. lea esi,[esi+eax*4] ; 15C
  3100. mov al,[edi+PITCH*1+4] ; 15D
  3101. lea edx,[edx+ebx*4] ; 17C
  3102. mov bl,[edi+PITCH*1+6] ; 15E & 17D
  3103. mov ecx,[esi+eax*4] ; 15F
  3104. mov al,[edi+PITCH*1+8] ; 17E
  3105. add ebp,ecx ; 15G
  3106. mov ecx,[edx+ebx*4] ; 17F
  3107. mov cl,[edx+eax*4+2] ; 17H
  3108. mov al,[edi+PITCH*2+0] ; 20A
  3109. add ebp,ecx ; 15I & 17G
  3110. mov bl,[esi+ebx*4+2] ; 15H
  3111. add ebp,ebx ; 15I
  3112. mov esi,BlockN.N8T20+32; 20B
  3113. mov bl,[edi+PITCH*2+2] ; 22A
  3114. mov edx,BlockN.N8T22+32; 22B
  3115. lea esi,[esi+eax*4] ; 20C
  3116. mov al,[edi+PITCH*2-1] ; 20D
  3117. lea edx,[edx+ebx*4] ; 22C
  3118. mov bl,[edi+PITCH*2+1] ; 20E & 22D
  3119. mov ecx,[esi+eax*4] ; 20F
  3120. mov al,[edi+PITCH*2+3] ; 22E
  3121. add ebp,ecx ; 20G
  3122. mov ecx,[edx+ebx*4] ; 22F
  3123. mov cl,[edx+eax*4+2] ; 22H
  3124. mov al,[edi+PITCH*2+4] ; 24A
  3125. add ebp,ecx ; 20I & 22G
  3126. mov bl,[esi+ebx*4+2] ; 20H
  3127. add ebp,ebx ; 20I
  3128. mov esi,BlockN.N8T24+32; 24B
  3129. mov bl,[edi+PITCH*2+6] ; 26A
  3130. mov edx,BlockN.N8T26+32; 26B
  3131. lea esi,[esi+eax*4] ; 24C
  3132. mov al,[edi+PITCH*2+3] ; 24D
  3133. lea edx,[edx+ebx*4] ; 26C
  3134. mov bl,[edi+PITCH*2+5] ; 24E & 26D
  3135. mov ecx,[esi+eax*4] ; 24F
  3136. mov al,[edi+PITCH*2+7] ; 26E
  3137. add ebp,ecx ; 24G
  3138. mov ecx,[edx+ebx*4] ; 26F
  3139. mov cl,[edx+eax*4+2] ; 26H
  3140. mov al,[edi+PITCH*3+1] ; 31A
  3141. add ebp,ecx ; 24I & 26G
  3142. mov bl,[esi+ebx*4+2] ; 24H
  3143. add ebp,ebx ; 24I
  3144. mov esi,BlockN.N8T31+32; 31B
  3145. mov bl,[edi+PITCH*3+3] ; 33A
  3146. mov edx,BlockN.N8T33+32; 33B
  3147. lea esi,[esi+eax*4] ; 31C
  3148. mov al,[edi+PITCH*3+0] ; 31D
  3149. lea edx,[edx+ebx*4] ; 33C
  3150. mov bl,[edi+PITCH*3+2] ; 31E & 33D
  3151. mov ecx,[esi+eax*4] ; 31F
  3152. mov al,[edi+PITCH*3+4] ; 33E
  3153. add ebp,ecx ; 31G
  3154. mov ecx,[edx+ebx*4] ; 33F
  3155. mov cl,[edx+eax*4+2] ; 33H
  3156. mov al,[edi+PITCH*3+5] ; 35A
  3157. add ebp,ecx ; 31I & 33G
  3158. mov bl,[esi+ebx*4+2] ; 31H
  3159. add ebp,ebx ; 31I
  3160. mov esi,BlockN.N8T35+32; 35B
  3161. mov bl,[edi+PITCH*3+7] ; 37A
  3162. mov edx,BlockN.N8T37+32; 37B
  3163. lea esi,[esi+eax*4] ; 35C
  3164. mov al,[edi+PITCH*3+4] ; 35D
  3165. lea edx,[edx+ebx*4] ; 37C
  3166. mov bl,[edi+PITCH*3+6] ; 35E & 37D
  3167. mov ecx,[esi+eax*4] ; 35F
  3168. mov al,[edi+PITCH*3+8] ; 37E
  3169. add ebp,ecx ; 35G
  3170. mov ecx,[edx+ebx*4] ; 37F
  3171. mov cl,[edx+eax*4+2] ; 37H
  3172. mov al,[edi+PITCH*4+0] ; 40A
  3173. add ebp,ecx ; 35I & 37G
  3174. mov bl,[esi+ebx*4+2] ; 35H
  3175. add ebp,ebx ; 35I
  3176. mov esi,BlockN.N8T40+32; 40B
  3177. mov bl,[edi+PITCH*4+2] ; 42A
  3178. mov edx,BlockN.N8T42+32; 42B
  3179. lea esi,[esi+eax*4] ; 40C
  3180. mov al,[edi+PITCH*4-1] ; 40D
  3181. lea edx,[edx+ebx*4] ; 42C
  3182. mov bl,[edi+PITCH*4+1] ; 40E & 42D
  3183. mov ecx,[esi+eax*4] ; 40F
  3184. mov al,[edi+PITCH*4+3] ; 42E
  3185. add ebp,ecx ; 40G
  3186. mov ecx,[edx+ebx*4] ; 42F
  3187. mov cl,[edx+eax*4+2] ; 42H
  3188. mov al,[edi+PITCH*4+4] ; 44A
  3189. add ebp,ecx ; 40I & 42G
  3190. mov bl,[esi+ebx*4+2] ; 40H
  3191. add ebp,ebx ; 40I
  3192. mov esi,BlockN.N8T44+32; 44B
  3193. mov bl,[edi+PITCH*4+6] ; 46A
  3194. mov edx,BlockN.N8T46+32; 46B
  3195. lea esi,[esi+eax*4] ; 44C
  3196. mov al,[edi+PITCH*4+3] ; 44D
  3197. lea edx,[edx+ebx*4] ; 46C
  3198. mov bl,[edi+PITCH*4+5] ; 44E & 46D
  3199. mov ecx,[esi+eax*4] ; 44F
  3200. mov al,[edi+PITCH*4+7] ; 46E
  3201. add ebp,ecx ; 44G
  3202. mov ecx,[edx+ebx*4] ; 46F
  3203. mov cl,[edx+eax*4+2] ; 46H
  3204. mov al,[edi+PITCH*5+1] ; 51A
  3205. add ebp,ecx ; 44I & 46G
  3206. mov bl,[esi+ebx*4+2] ; 44H
  3207. add ebp,ebx ; 44I
  3208. mov esi,BlockN.N8T51+32; 51B
  3209. mov bl,[edi+PITCH*5+3] ; 53A
  3210. mov edx,BlockN.N8T53+32; 53B
  3211. lea esi,[esi+eax*4] ; 51C
  3212. mov al,[edi+PITCH*5+0] ; 51D
  3213. lea edx,[edx+ebx*4] ; 53C
  3214. mov bl,[edi+PITCH*5+2] ; 51E & 53D
  3215. mov ecx,[esi+eax*4] ; 51F
  3216. mov al,[edi+PITCH*5+4] ; 53E
  3217. add ebp,ecx ; 51G
  3218. mov ecx,[edx+ebx*4] ; 53F
  3219. mov cl,[edx+eax*4+2] ; 53H
  3220. mov al,[edi+PITCH*5+5] ; 55A
  3221. add ebp,ecx ; 51I & 53G
  3222. mov bl,[esi+ebx*4+2] ; 51H
  3223. add ebp,ebx ; 51I
  3224. mov esi,BlockN.N8T55+32; 55B
  3225. mov bl,[edi+PITCH*5+7] ; 57A
  3226. mov edx,BlockN.N8T57+32; 57B
  3227. lea esi,[esi+eax*4] ; 55C
  3228. mov al,[edi+PITCH*5+4] ; 55D
  3229. lea edx,[edx+ebx*4] ; 57C
  3230. mov bl,[edi+PITCH*5+6] ; 55E & 57D
  3231. mov ecx,[esi+eax*4] ; 55F
  3232. mov al,[edi+PITCH*5+8] ; 57E
  3233. add ebp,ecx ; 55G
  3234. mov ecx,[edx+ebx*4] ; 57F
  3235. mov cl,[edx+eax*4+2] ; 57H
  3236. mov al,[edi+PITCH*6+0] ; 60A
  3237. add ebp,ecx ; 55I & 57G
  3238. mov bl,[esi+ebx*4+2] ; 55H
  3239. add ebp,ebx ; 55I
  3240. mov esi,BlockN.N8T60+32; 60B
  3241. mov bl,[edi+PITCH*6+2] ; 62A
  3242. mov edx,BlockN.N8T62+32; 62B
  3243. lea esi,[esi+eax*4] ; 60C
  3244. mov al,[edi+PITCH*6-1] ; 60D
  3245. lea edx,[edx+ebx*4] ; 62C
  3246. mov bl,[edi+PITCH*6+1] ; 60E & 62D
  3247. mov ecx,[esi+eax*4] ; 60F
  3248. mov al,[edi+PITCH*6+3] ; 62E
  3249. add ebp,ecx ; 60G
  3250. mov ecx,[edx+ebx*4] ; 62F
  3251. mov cl,[edx+eax*4+2] ; 62H
  3252. mov al,[edi+PITCH*6+4] ; 64A
  3253. add ebp,ecx ; 60I & 62G
  3254. mov bl,[esi+ebx*4+2] ; 60H
  3255. add ebp,ebx ; 60I
  3256. mov esi,BlockN.N8T64+32; 64B
  3257. mov bl,[edi+PITCH*6+6] ; 66A
  3258. mov edx,BlockN.N8T66+32; 66B
  3259. lea esi,[esi+eax*4] ; 64C
  3260. mov al,[edi+PITCH*6+3] ; 64D
  3261. lea edx,[edx+ebx*4] ; 66C
  3262. mov bl,[edi+PITCH*6+5] ; 64E & 66D
  3263. mov ecx,[esi+eax*4] ; 64F
  3264. mov al,[edi+PITCH*6+7] ; 66E
  3265. add ebp,ecx ; 64G
  3266. mov ecx,[edx+ebx*4] ; 66F
  3267. mov cl,[edx+eax*4+2] ; 66H
  3268. mov al,[edi+PITCH*7+1] ; 71A
  3269. add ebp,ecx ; 64I & 66G
  3270. mov bl,[esi+ebx*4+2] ; 64H
  3271. add ebp,ebx ; 64I
  3272. mov esi,BlockN.N8T71+32; 71B
  3273. mov bl,[edi+PITCH*7+3] ; 73A
  3274. mov edx,BlockN.N8T73+32; 73B
  3275. lea esi,[esi+eax*4] ; 71C
  3276. mov al,[edi+PITCH*7+0] ; 71D
  3277. lea edx,[edx+ebx*4] ; 73C
  3278. mov bl,[edi+PITCH*7+2] ; 71E & 73D
  3279. mov ecx,[esi+eax*4] ; 71F
  3280. mov al,[edi+PITCH*7+4] ; 73E
  3281. add ebp,ecx ; 71G
  3282. mov ecx,[edx+ebx*4] ; 73F
  3283. mov cl,[edx+eax*4+2] ; 73H
  3284. mov al,[edi+PITCH*7+5] ; 75A
  3285. add ebp,ecx ; 71I & 73G
  3286. mov bl,[esi+ebx*4+2] ; 71H
  3287. add ebp,ebx ; 71I
  3288. mov esi,BlockN.N8T75+32; 75B
  3289. mov bl,[edi+PITCH*7+7] ; 77A
  3290. mov edx,BlockN.N8T77+32; 77B
  3291. lea esi,[esi+eax*4] ; 75C
  3292. mov al,[edi+PITCH*7+4] ; 75D
  3293. lea edx,[edx+ebx*4] ; 77C
  3294. mov bl,[edi+PITCH*7+6] ; 75E & 77D
  3295. mov ecx,[esi+eax*4] ; 75F
  3296. mov al,[edi+PITCH*7+8] ; 77E
  3297. add ebp,ecx ; 75G
  3298. mov ecx,[edx+ebx*4] ; 77F
  3299. mov cl,[edx+eax*4+2] ; 77H
  3300. add esp,BlockLen
  3301. add ecx,ebp ; 75I & 77G
  3302. mov bl,[esi+ebx*4+2] ; 75H
  3303. add ebx,ecx ; 75I
  3304. mov edi,BlockN.AddrCentralPoint+32 ; Get address of next ref1 block.
  3305. shr ecx,16 ; Extract SWD for ref1.
  3306. and ebx,00000FFFFH ; Extract SWD for ref2.
  3307. mov BlockNM1.Ref1InterSWD+32,ecx ; Store SWD for ref1.
  3308. mov BlockNM1.Ref2InterSWD+32,ebx ; Store SWD for ref2.
  3309. xor ebp,ebp
  3310. mov edx,ebx
  3311. test esp,000000018H
  3312. mov ebx,ebp
  3313. jne SWDHalfPelHorzLoop
  3314. ; Output:
  3315. ; ebp, ebx -- Zero
  3316. ; ecx -- Ref1 SWD for block 4
  3317. ; edx -- Ref2 SWD for block 4
  3318. add esp,28
  3319. ret
  3320. DoSWDHalfPelVertLoop:
  3321. ; ebp -- Initialized to 0, except when can't search off left or right edge.
  3322. ; edi -- Ref addr for block 1. Ref1 is .5 pel up. Ref2 is .5 down.
  3323. xor ecx,ecx
  3324. sub esp,BlockLen*4+28
  3325. xor eax,eax
  3326. xor ebx,ebx
  3327. SWDHalfPelVertLoop:
  3328. mov al,[edi]
  3329. mov esi,BlockN.N8T00+32
  3330. mov bl,[edi+2*PITCH]
  3331. mov edx,BlockN.N8T20+32
  3332. lea esi,[esi+eax*4]
  3333. mov al,[edi-1*PITCH]
  3334. lea edx,[edx+ebx*4]
  3335. mov bl,[edi+1*PITCH]
  3336. mov ecx,[esi+eax*4]
  3337. mov al,[edi+3*PITCH]
  3338. add ebp,ecx
  3339. mov ecx,[edx+ebx*4]
  3340. mov cl,[edx+eax*4+2]
  3341. mov al,[edi+4*PITCH]
  3342. add ebp,ecx
  3343. mov bl,[esi+ebx*4+2]
  3344. add ebp,ebx
  3345. mov esi,BlockN.N8T40+32
  3346. mov bl,[edi+6*PITCH]
  3347. mov edx,BlockN.N8T60+32
  3348. lea esi,[esi+eax*4]
  3349. mov al,[edi+3*PITCH]
  3350. lea edx,[edx+ebx*4]
  3351. mov bl,[edi+5*PITCH]
  3352. mov ecx,[esi+eax*4]
  3353. mov al,[edi+7*PITCH]
  3354. add ebp,ecx
  3355. mov ecx,[edx+ebx*4]
  3356. mov cl,[edx+eax*4+2]
  3357. mov al,[edi+1+1*PITCH]
  3358. add ebp,ecx
  3359. mov bl,[esi+ebx*4+2]
  3360. add ebp,ebx
  3361. mov esi,BlockN.N8T11+32
  3362. mov bl,[edi+1+3*PITCH]
  3363. mov edx,BlockN.N8T31+32
  3364. lea esi,[esi+eax*4]
  3365. mov al,[edi+1+0*PITCH]
  3366. lea edx,[edx+ebx*4]
  3367. mov bl,[edi+1+2*PITCH]
  3368. mov ecx,[esi+eax*4]
  3369. mov al,[edi+1+4*PITCH]
  3370. add ebp,ecx
  3371. mov ecx,[edx+ebx*4]
  3372. mov cl,[edx+eax*4+2]
  3373. mov al,[edi+1+5*PITCH]
  3374. add ebp,ecx
  3375. mov bl,[esi+ebx*4+2]
  3376. add ebp,ebx
  3377. mov esi,BlockN.N8T51+32
  3378. mov bl,[edi+1+7*PITCH]
  3379. mov edx,BlockN.N8T71+32
  3380. lea esi,[esi+eax*4]
  3381. mov al,[edi+1+4*PITCH]
  3382. lea edx,[edx+ebx*4]
  3383. mov bl,[edi+1+6*PITCH]
  3384. mov ecx,[esi+eax*4]
  3385. mov al,[edi+1+8*PITCH]
  3386. add ebp,ecx
  3387. mov ecx,[edx+ebx*4]
  3388. mov cl,[edx+eax*4+2]
  3389. mov al,[edi+2+0*PITCH]
  3390. add ebp,ecx
  3391. mov bl,[esi+ebx*4+2]
  3392. add ebp,ebx
  3393. mov esi,BlockN.N8T02+32
  3394. mov bl,[edi+2+2*PITCH]
  3395. mov edx,BlockN.N8T22+32
  3396. lea esi,[esi+eax*4]
  3397. mov al,[edi+2-1*PITCH]
  3398. lea edx,[edx+ebx*4]
  3399. mov bl,[edi+2+1*PITCH]
  3400. mov ecx,[esi+eax*4]
  3401. mov al,[edi+2+3*PITCH]
  3402. add ebp,ecx
  3403. mov ecx,[edx+ebx*4]
  3404. mov cl,[edx+eax*4+2]
  3405. mov al,[edi+2+4*PITCH]
  3406. add ebp,ecx
  3407. mov bl,[esi+ebx*4+2]
  3408. add ebp,ebx
  3409. mov esi,BlockN.N8T42+32
  3410. mov bl,[edi+2+6*PITCH]
  3411. mov edx,BlockN.N8T62+32
  3412. lea esi,[esi+eax*4]
  3413. mov al,[edi+2+3*PITCH]
  3414. lea edx,[edx+ebx*4]
  3415. mov bl,[edi+2+5*PITCH]
  3416. mov ecx,[esi+eax*4]
  3417. mov al,[edi+2+7*PITCH]
  3418. add ebp,ecx
  3419. mov ecx,[edx+ebx*4]
  3420. mov cl,[edx+eax*4+2]
  3421. mov al,[edi+3+1*PITCH]
  3422. add ebp,ecx
  3423. mov bl,[esi+ebx*4+2]
  3424. add ebp,ebx
  3425. mov esi,BlockN.N8T13+32
  3426. mov bl,[edi+3+3*PITCH]
  3427. mov edx,BlockN.N8T33+32
  3428. lea esi,[esi+eax*4]
  3429. mov al,[edi+3+0*PITCH]
  3430. lea edx,[edx+ebx*4]
  3431. mov bl,[edi+3+2*PITCH]
  3432. mov ecx,[esi+eax*4]
  3433. mov al,[edi+3+4*PITCH]
  3434. add ebp,ecx
  3435. mov ecx,[edx+ebx*4]
  3436. mov cl,[edx+eax*4+2]
  3437. mov al,[edi+3+5*PITCH]
  3438. add ebp,ecx
  3439. mov bl,[esi+ebx*4+2]
  3440. add ebp,ebx
  3441. mov esi,BlockN.N8T53+32
  3442. mov bl,[edi+3+7*PITCH]
  3443. mov edx,BlockN.N8T73+32
  3444. lea esi,[esi+eax*4]
  3445. mov al,[edi+3+4*PITCH]
  3446. lea edx,[edx+ebx*4]
  3447. mov bl,[edi+3+6*PITCH]
  3448. mov ecx,[esi+eax*4]
  3449. mov al,[edi+3+8*PITCH]
  3450. add ebp,ecx
  3451. mov ecx,[edx+ebx*4]
  3452. mov cl,[edx+eax*4+2]
  3453. mov al,[edi+4+0*PITCH]
  3454. add ebp,ecx
  3455. mov bl,[esi+ebx*4+2]
  3456. add ebp,ebx
  3457. mov esi,BlockN.N8T04+32
  3458. mov bl,[edi+4+2*PITCH]
  3459. mov edx,BlockN.N8T24+32
  3460. lea esi,[esi+eax*4]
  3461. mov al,[edi+4-1*PITCH]
  3462. lea edx,[edx+ebx*4]
  3463. mov bl,[edi+4+1*PITCH]
  3464. mov ecx,[esi+eax*4]
  3465. mov al,[edi+4+3*PITCH]
  3466. add ebp,ecx
  3467. mov ecx,[edx+ebx*4]
  3468. mov cl,[edx+eax*4+2]
  3469. mov al,[edi+4+4*PITCH]
  3470. add ebp,ecx
  3471. mov bl,[esi+ebx*4+2]
  3472. add ebp,ebx
  3473. mov esi,BlockN.N8T44+32
  3474. mov bl,[edi+4+6*PITCH]
  3475. mov edx,BlockN.N8T64+32
  3476. lea esi,[esi+eax*4]
  3477. mov al,[edi+4+3*PITCH]
  3478. lea edx,[edx+ebx*4]
  3479. mov bl,[edi+4+5*PITCH]
  3480. mov ecx,[esi+eax*4]
  3481. mov al,[edi+4+7*PITCH]
  3482. add ebp,ecx
  3483. mov ecx,[edx+ebx*4]
  3484. mov cl,[edx+eax*4+2]
  3485. mov al,[edi+5+1*PITCH]
  3486. add ebp,ecx
  3487. mov bl,[esi+ebx*4+2]
  3488. add ebp,ebx
  3489. mov esi,BlockN.N8T15+32
  3490. mov bl,[edi+5+3*PITCH]
  3491. mov edx,BlockN.N8T35+32
  3492. lea esi,[esi+eax*4]
  3493. mov al,[edi+5+0*PITCH]
  3494. lea edx,[edx+ebx*4]
  3495. mov bl,[edi+5+2*PITCH]
  3496. mov ecx,[esi+eax*4]
  3497. mov al,[edi+5+4*PITCH]
  3498. add ebp,ecx
  3499. mov ecx,[edx+ebx*4]
  3500. mov cl,[edx+eax*4+2]
  3501. mov al,[edi+5+5*PITCH]
  3502. add ebp,ecx
  3503. mov bl,[esi+ebx*4+2]
  3504. add ebp,ebx
  3505. mov esi,BlockN.N8T55+32
  3506. mov bl,[edi+5+7*PITCH]
  3507. mov edx,BlockN.N8T75+32
  3508. lea esi,[esi+eax*4]
  3509. mov al,[edi+5+4*PITCH]
  3510. lea edx,[edx+ebx*4]
  3511. mov bl,[edi+5+6*PITCH]
  3512. mov ecx,[esi+eax*4]
  3513. mov al,[edi+5+8*PITCH]
  3514. add ebp,ecx
  3515. mov ecx,[edx+ebx*4]
  3516. mov cl,[edx+eax*4+2]
  3517. mov al,[edi+6+0*PITCH]
  3518. add ebp,ecx
  3519. mov bl,[esi+ebx*4+2]
  3520. add ebp,ebx
  3521. mov esi,BlockN.N8T06+32
  3522. mov bl,[edi+6+2*PITCH]
  3523. mov edx,BlockN.N8T26+32
  3524. lea esi,[esi+eax*4]
  3525. mov al,[edi+6-1*PITCH]
  3526. lea edx,[edx+ebx*4]
  3527. mov bl,[edi+6+1*PITCH]
  3528. mov ecx,[esi+eax*4]
  3529. mov al,[edi+6+3*PITCH]
  3530. add ebp,ecx
  3531. mov ecx,[edx+ebx*4]
  3532. mov cl,[edx+eax*4+2]
  3533. mov al,[edi+6+4*PITCH]
  3534. add ebp,ecx
  3535. mov bl,[esi+ebx*4+2]
  3536. add ebp,ebx
  3537. mov esi,BlockN.N8T46+32
  3538. mov bl,[edi+6+6*PITCH]
  3539. mov edx,BlockN.N8T66+32
  3540. lea esi,[esi+eax*4]
  3541. mov al,[edi+6+3*PITCH]
  3542. lea edx,[edx+ebx*4]
  3543. mov bl,[edi+6+5*PITCH]
  3544. mov ecx,[esi+eax*4]
  3545. mov al,[edi+6+7*PITCH]
  3546. add ebp,ecx
  3547. mov ecx,[edx+ebx*4]
  3548. mov cl,[edx+eax*4+2]
  3549. mov al,[edi+7+1*PITCH]
  3550. add ebp,ecx
  3551. mov bl,[esi+ebx*4+2]
  3552. add ebp,ebx
  3553. mov esi,BlockN.N8T17+32
  3554. mov bl,[edi+7+3*PITCH]
  3555. mov edx,BlockN.N8T37+32
  3556. lea esi,[esi+eax*4]
  3557. mov al,[edi+7+0*PITCH]
  3558. lea edx,[edx+ebx*4]
  3559. mov bl,[edi+7+2*PITCH]
  3560. mov ecx,[esi+eax*4]
  3561. mov al,[edi+7+4*PITCH]
  3562. add ebp,ecx
  3563. mov ecx,[edx+ebx*4]
  3564. mov cl,[edx+eax*4+2]
  3565. mov al,[edi+7+5*PITCH]
  3566. add ebp,ecx
  3567. mov bl,[esi+ebx*4+2]
  3568. add ebp,ebx
  3569. mov esi,BlockN.N8T57+32
  3570. mov bl,[edi+7+7*PITCH]
  3571. mov edx,BlockN.N8T77+32
  3572. lea esi,[esi+eax*4]
  3573. mov al,[edi+7+4*PITCH]
  3574. lea edx,[edx+ebx*4]
  3575. mov bl,[edi+7+6*PITCH]
  3576. mov ecx,[esi+eax*4]
  3577. mov al,[edi+7+8*PITCH]
  3578. add ebp,ecx
  3579. mov ecx,[edx+ebx*4]
  3580. mov cl,[edx+eax*4+2]
  3581. add esp,BlockLen
  3582. add ecx,ebp
  3583. mov bl,[esi+ebx*4+2]
  3584. add ebx,ecx
  3585. mov edi,BlockN.AddrCentralPoint+32
  3586. shr ecx,16
  3587. and ebx,00000FFFFH
  3588. mov BlockNM1.Ref1InterSWD+32,ecx
  3589. mov BlockNM1.Ref2InterSWD+32,ebx
  3590. xor ebp,ebp
  3591. mov edx,ebx
  3592. test esp,000000018H
  3593. mov ebx,ebp
  3594. jne SWDHalfPelVertLoop
  3595. ; Output:
  3596. ; ebp, ebx -- Zero
  3597. ; ecx -- Ref1 SWD for block 4
  3598. ; edx -- Ref2 SWD for block 4
  3599. add esp,28
  3600. ret
  3601. ENDIF ; H263
  3602. ; Performance for common macroblocks:
  3603. ; 298 clocks: prepare target pels, compute avg target pel, compute 0-MV SWD.
  3604. ; 90 clocks: compute IntraSWD.
  3605. ; 1412 clocks: 6-level search for best SWD.
  3606. ; 16 clocks: record best fit.
  3607. ; 945 clocks: calculate spatial loop filtered prediction.
  3608. ; 152 clocks: calculate SWD for spatially filtered prediction and classify.
  3609. ; ----
  3610. ; 2913 clocks total
  3611. ;
  3612. ; Performance for macroblocks in which 0-motion vector is "good enough":
  3613. ; 298 clocks: prepare target pels, compute avg target pel, compute 0-MV SWD.
  3614. ; 90 clocks: compute IntraSWD.
  3615. ; 16 clocks: record best fit.
  3616. ; 58 clocks: extra cache fill burden on adjacent MB if SWD-search not done.
  3617. ; 945 clocks: calculate spatial loop filtered prediction.
  3618. ; 152 clocks: calculate SWD for spatially filtered prediction and classify.
  3619. ; ----
  3620. ; 1559 clocks total
  3621. ;
  3622. ; Performance for macroblocks marked as intrablock by decree of caller:
  3623. ; 298 clocks: prepare target pels, compute avg target pel, compute 0-MV SWD.
  3624. ; 90 clocks: compute IntraSWD.
  3625. ; 58 clocks: extra cache fill burden on adjacent MB if SWD-search not done.
  3626. ; 20 clocks: classify (just weight the SWD for # of match points).
  3627. ; ----
  3628. ; 476 clocks total
  3629. ;
  3630. ; 160*120 performance, generously estimated (assuming lots of motion):
  3631. ;
  3632. ; 2913 * 80 = 233000 clocks for luma.
  3633. ; 2913 * 12 = 35000 clocks for chroma.
  3634. ; 268000 clocks per frame * 15 = 4,020,000 clocks/sec.
  3635. ;
  3636. ; 160*120 performance, assuming typical motion:
  3637. ;
  3638. ; 2913 * 40 + 1559 * 40 = 179000 clocks for luma.
  3639. ; 2913 * 8 + 1559 * 4 = 30000 clocks for chroma.
  3640. ; 209000 clocks per frame * 15 = 3,135,000 clocks/sec.
  3641. ;
  3642. ; Add 10-20% to allow for initial cache-filling, and unfortunate cases where
  3643. ; cache-filling policy preempts areas of the tables that are not locally "hot",
  3644. ; instead of preempting macroblocks upon which the processing was just finished.
  3645. Done:
  3646. mov eax,IntraSWDTotal
  3647. mov ebx,IntraSWDBlocks
  3648. mov ecx,InterSWDTotal
  3649. mov edx,InterSWDBlocks
  3650. mov esp,StashESP
  3651. mov edi,[esp+IntraSWDTotal_arg]
  3652. mov [edi],eax
  3653. mov edi,[esp+IntraSWDBlocks_arg]
  3654. mov [edi],ebx
  3655. mov edi,[esp+InterSWDTotal_arg]
  3656. mov [edi],ecx
  3657. mov edi,[esp+InterSWDBlocks_arg]
  3658. mov [edi],edx
  3659. pop ebx
  3660. pop ebp
  3661. pop edi
  3662. pop esi
  3663. rturn
  3664. MOTIONESTIMATION endp
  3665. END