Leaked source code of windows server 2003
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

421 lines
14 KiB

  1. /*
  2. * B A S E 6 4 . C P P
  3. *
  4. * Sources Base64 encoding
  5. *
  6. * Copyright 1986-1997 Microsoft Corporation, All Rights Reserved
  7. */
  8. #include "_xmllib.h"
  9. /* From RFC 1521:
  10. 5.2. Base64 Content-Transfer-Encoding
  11. The Base64 Content-Transfer-Encoding is designed to represent
  12. arbitrary sequences of octets in a form that need not be humanly
  13. readable. The encoding and decoding algorithms are simple, but the
  14. encoded data are consistently only about 33 percent larger than the
  15. unencoded data. This encoding is virtually identical to the one used
  16. in Privacy Enhanced Mail (PEM) applications, as defined in RFC 1421.
  17. The base64 encoding is adapted from RFC 1421, with one change: base64
  18. eliminates the "*" mechanism for embedded clear text.
  19. A 65-character subset of US-ASCII is used, enabling 6 bits to be
  20. represented per printable character. (The extra 65th character, "=",
  21. is used to signify a special processing function.)
  22. NOTE: This subset has the important property that it is
  23. represented identically in all versions of ISO 646, including US
  24. ASCII, and all characters in the subset are also represented
  25. identically in all versions of EBCDIC. Other popular encodings,
  26. such as the encoding used by the uuencode utility and the base85
  27. encoding specified as part of Level 2 PostScript, do not share
  28. these properties, and thus do not fulfill the portability
  29. requirements a binary transport encoding for mail must meet.
  30. The encoding process represents 24-bit groups of input bits as output
  31. strings of 4 encoded characters. Proceeding from left to right, a
  32. 24-bit input group is formed by concatenating 3 8-bit input groups.
  33. These 24 bits are then treated as 4 concatenated 6-bit groups, each
  34. of which is translated into a single digit in the base64 alphabet.
  35. When encoding a bit stream via the base64 encoding, the bit stream
  36. must be presumed to be ordered with the most-significant-bit first.
  37. That is, the first bit in the stream will be the high-order bit in
  38. the first byte, and the eighth bit will be the low-order bit in the
  39. first byte, and so on.
  40. Each 6-bit group is used as an index into an array of 64 printable
  41. characters. The character referenced by the index is placed in the
  42. output string. These characters, identified in Table 1, below, are
  43. selected so as to be universally representable, and the set excludes
  44. characters with particular significance to SMTP (e.g., ".", CR, LF)
  45. and to the encapsulation boundaries defined in this document (e.g.,
  46. "-").
  47. Table 1: The Base64 Alphabet
  48. Value Encoding Value Encoding Value Encoding Value Encoding
  49. 0 A 17 R 34 i 51 z
  50. 1 B 18 S 35 j 52 0
  51. 2 C 19 T 36 k 53 1
  52. 3 D 20 U 37 l 54 2
  53. 4 E 21 V 38 m 55 3
  54. 5 F 22 W 39 n 56 4
  55. 6 G 23 X 40 o 57 5
  56. 7 H 24 Y 41 p 58 6
  57. 8 I 25 Z 42 q 59 7
  58. 9 J 26 a 43 r 60 8
  59. 10 K 27 b 44 s 61 9
  60. 11 L 28 c 45 t 62 +
  61. 12 M 29 d 46 u 63 /
  62. 13 N 30 e 47 v
  63. 14 O 31 f 48 w (pad) =
  64. 15 P 32 g 49 x
  65. 16 Q 33 h 50 y
  66. The output stream (encoded bytes) must be represented in lines of no
  67. more than 76 characters each. All line breaks or other characters
  68. not found in Table 1 must be ignored by decoding software. In base64
  69. data, characters other than those in Table 1, line breaks, and other
  70. white space probably indicate a transmission error, about which a
  71. warning message or even a message rejection might be appropriate
  72. under some circumstances.
  73. Special processing is performed if fewer than 24 bits are available
  74. at the end of the data being encoded. A full encoding quantum is
  75. always completed at the end of a body. When fewer than 24 input bits
  76. are available in an input group, zero bits are added (on the right)
  77. to form an integral number of 6-bit groups. Padding at the end of
  78. the data is performed using the '=' character. Since all base64
  79. input is an integral number of octets, only the following cases can
  80. arise: (1) the final quantum of encoding input is an integral
  81. multiple of 24 bits; here, the final unit of encoded output will be
  82. an integral multiple of 4 characters with no "=" padding, (2) the
  83. final quantum of encoding input is exactly 8 bits; here, the final
  84. unit of encoded output will be two characters followed by two "="
  85. padding characters, or (3) the final quantum of encoding input is
  86. exactly 16 bits; here, the final unit of encoded output will be three
  87. characters followed by one "=" padding character.
  88. Because it is used only for padding at the end of the data, the
  89. occurrence of any '=' characters may be taken as evidence that the
  90. end of the data has been reached (without truncation in transit). No
  91. such assurance is possible, however, when the number of octets
  92. transmitted was a multiple of three.
  93. Any characters outside of the base64 alphabet are to be ignored in
  94. base64-encoded data. The same applies to any illegal sequence of
  95. characters in the base64 encoding, such as "====="
  96. Care must be taken to use the proper octets for line breaks if base64
  97. encoding is applied directly to text material that has not been
  98. converted to canonical form. In particular, text line breaks must be
  99. converted into CRLF sequences prior to base64 encoding. The important
  100. thing to note is that this may be done directly by the encoder rather
  101. than in a prior canonicalization step in some implementations.
  102. NOTE: There is no need to worry about quoting apparent
  103. encapsulation boundaries within base64-encoded parts of multipart
  104. entities because no hyphen characters are used in the base64
  105. encoding.
  106. */
  107. VOID inline
  108. EncodeAtom (LPBYTE pbIn, WCHAR* pwszOut, UINT cbIn)
  109. {
  110. static const WCHAR wszBase64[] = L"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
  111. L"abcdefghijklmnopqrstuvwxyz"
  112. L"0123456789+/";
  113. Assert (pbIn);
  114. Assert (pwszOut);
  115. Assert (cbIn);
  116. // Set cbIn to 3 if it's greater than three: convenient for 'switch'
  117. //
  118. if (cbIn > 3)
  119. cbIn = 3;
  120. pwszOut[0] = wszBase64[pbIn[0] >> 2];
  121. switch(cbIn)
  122. {
  123. case 3:
  124. // End of stream has not been reached yet
  125. //
  126. pwszOut[1] = wszBase64[((pbIn[0] & 0x03) << 4) + (pbIn[1] >> 4)];
  127. pwszOut[2] = wszBase64[((pbIn[1] & 0x0F) << 2) + (pbIn[2] >> 6)];
  128. pwszOut[3] = wszBase64[pbIn[2] & 0x3F];
  129. return;
  130. case 2:
  131. // At the end of stream: pad with 1 byte
  132. //
  133. pwszOut[1] = wszBase64[((pbIn[0] & 0x03) << 4) + (pbIn[1] >> 4)];
  134. pwszOut[2] = wszBase64[ (pbIn[1] & 0x0F) << 2];
  135. pwszOut[3] = L'=';
  136. return;
  137. case 1:
  138. // At the end of stream: pad with 2 bytes
  139. //
  140. pwszOut[1] = wszBase64[ (pbIn[0] & 0x03) << 4];
  141. pwszOut[2] = L'=';
  142. pwszOut[3] = L'=';
  143. return;
  144. default:
  145. // Should never happen
  146. //
  147. Assert (FALSE);
  148. }
  149. }
  150. // ------------------------------------------------------------------------
  151. // EncodeBase64
  152. //
  153. // Encode cbIn bytes of data from pbIn into the provided buffer
  154. // at pwszOut, up to cchOut chars.
  155. //$REVIEW: Shouldn't this function return some kind of error if
  156. //$REVIEW: cchOut didn't have enough space for the entire output string?!!!
  157. //
  158. void
  159. EncodeBase64 (LPBYTE pbIn, UINT cbIn, WCHAR* pwszOut, UINT cchOut)
  160. {
  161. // They must have passed us at least one char of space -- for the terminal NULL.
  162. Assert (cchOut);
  163. // Loop through, encoding atoms as we go...
  164. //
  165. while (cbIn)
  166. {
  167. // NOTE: Yes, test for STRICTLY more than 4 WCHARs of space.
  168. // We will use 4 WCHARs on this pass of the loop, and we always
  169. // need one for the terminal NULL!
  170. Assert (cchOut > 4);
  171. // Encode the next three bytes of data into four chars of output string.
  172. // (NOTE: This does handle the case where we have <3 bytes of data
  173. // left to encode -- thus we pass in cbIn!)
  174. //
  175. EncodeAtom (pbIn, pwszOut, cbIn);
  176. // Update our pointers and counters.
  177. pbIn += min(cbIn, 3);
  178. pwszOut += 4;
  179. cchOut -= 4;
  180. cbIn -= min(cbIn, 3);
  181. }
  182. // Ensure Termination
  183. // (But first, check that we still have one WCHAR of space left
  184. // for the terminal NULL!)
  185. //
  186. Assert (cchOut >= 1);
  187. *pwszOut = 0;
  188. }
  189. SCODE
  190. ScDecodeBase64 (WCHAR* pwszIn, UINT cchIn, LPBYTE pbOut, UINT* pcbOut)
  191. {
  192. // Base64 Reverse alphabet. Indexed by base 64 alphabet character
  193. //
  194. static const BYTE bEq = 254;
  195. static const BYTE rgbDict[128] = {
  196. 255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255, // 0-F
  197. 255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255, // 10-1F
  198. 255,255,255,255,255,255,255,255,255,255,255, 62,255,255,255, 63, // 20-2F
  199. 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,255,255,255,bEq,255,255, // 30-3f
  200. 255, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, // 40-4f
  201. 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,255,255,255,255,255, // 50-5f
  202. 255, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, // 60-6f
  203. 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,255,255,255,255,255 // 70-7f
  204. };
  205. SCODE sc = S_OK;
  206. UINT cchConsumed = 0;
  207. UINT cbProduced = 0;
  208. UINT cbFudge = 0;
  209. Assert (pbOut);
  210. Assert (pcbOut);
  211. // Check that they didn't lie about the size of their buffer!
  212. //
  213. Assert (!IsBadWritePtr(pbOut, *pcbOut));
  214. // Check that the size of the output buffer is adequate for
  215. // decoded data.
  216. //
  217. Assert (*pcbOut >= CbNeededDecodeBase64(cchIn));
  218. Assert (pwszIn);
  219. // Output is generated in 3-byte increments for 4 bytes of input
  220. //
  221. Assert ((cchIn*3)/4 <= *pcbOut);
  222. // Go until there is nothing left to decode...
  223. //
  224. while (cchConsumed < cchIn)
  225. {
  226. Assert (cbProduced <= *pcbOut);
  227. BYTE rgb[4];
  228. UINT ib = 0;
  229. // However, if there is not enough space to
  230. // decode the next atom into, then this has
  231. // got to be an error...
  232. //
  233. if (*pcbOut - cbProduced < 3)
  234. {
  235. sc = E_DAV_BASE64_ENCODING_ERROR;
  236. DebugTrace ("ScDecodeBase64: Not enough space to decode next base64 atom.");
  237. break;
  238. }
  239. // The characters that do not fall into base 64 alphabet must be
  240. // ignored, so let us assemble the 4 byte chunk of data that we
  241. // will actually go with for the conversion
  242. //
  243. while ((cchConsumed < cchIn) &&
  244. (ib < 4))
  245. {
  246. // If the symbol is in the alphabet ...
  247. //
  248. if ((pwszIn[cchConsumed] < sizeof(rgbDict)) &&
  249. (rgbDict[pwszIn[cchConsumed]] != 0xFF))
  250. {
  251. // ... save the character off into the
  252. // array
  253. //
  254. rgb[ib++] = rgbDict[pwszIn[cchConsumed]];
  255. }
  256. // ... go for the next character in the line
  257. //
  258. cchConsumed++;
  259. }
  260. // If there is no more data at all, then go
  261. // away with no error, as up to that point
  262. // we converted everything just fine, and
  263. // the characters in the end were ignorable
  264. //
  265. if (0 == ib)
  266. {
  267. Assert(cchConsumed == cchIn);
  268. break;
  269. }
  270. else if ((4 != ib) || (0 != cbFudge))
  271. {
  272. // There was some data to convert, but not enough to fill in
  273. // the 4 byte buffer then data is incomplete and cannot be converted;
  274. // If the end bEq markers were present some time before, data
  275. // is also invalid, there should not be any data after the end
  276. //
  277. sc = E_DAV_BASE64_ENCODING_ERROR;
  278. DebugTrace ("ScDecodeBase64: Invalid base64 input encountered, data not complete, or extra data after padding: %ws\n", pwszIn);
  279. break;
  280. }
  281. // Check that the characters 1 and 2 are not bEq
  282. //
  283. if ((rgb[0] == bEq) ||
  284. (rgb[1] == bEq))
  285. {
  286. sc = E_DAV_BASE64_ENCODING_ERROR;
  287. DebugTrace ("ScDecodeBase64: Invalid base64 input encountered, terminating '=' characters earlier than expected: %ws\n", pwszIn);
  288. break;
  289. }
  290. // Check if the third character is bEq
  291. //
  292. if (rgb[2] == bEq)
  293. {
  294. rgb[2] = 0;
  295. cbFudge += 1;
  296. // ... the fourth should be also bEq if the third was that way
  297. //
  298. if (rgb[3] != bEq)
  299. {
  300. sc = E_DAV_BASE64_ENCODING_ERROR;
  301. DebugTrace ("ScDecodeBase64: Invalid base64 input encountered, terminating '=' characters earlier than expected: %ws\n", pwszIn);
  302. break;
  303. }
  304. }
  305. // Check if the fourth character is bEq
  306. //
  307. if (rgb[3] == bEq)
  308. {
  309. rgb[3] = 0;
  310. cbFudge += 1;
  311. }
  312. // Make sure that these are well formed 6bit characters.
  313. //
  314. Assert((rgb[0] & 0x3f) == rgb[0]);
  315. Assert((rgb[1] & 0x3f) == rgb[1]);
  316. Assert((rgb[2] & 0x3f) == rgb[2]);
  317. Assert((rgb[3] & 0x3f) == rgb[3]);
  318. // Ok, we now have 4 6bit characters making up the 3 bytes of output.
  319. //
  320. // Assemble them together to make a 3 byte word.
  321. //
  322. DWORD dwValue = (rgb[0] << 18) +
  323. (rgb[1] << 12) +
  324. (rgb[2] << 6) +
  325. (rgb[3]);
  326. // This addition had better not have wrapped.
  327. //
  328. Assert ((dwValue & 0xff000000) == 0);
  329. // Copy over the 3 bytes into the output stream.
  330. //
  331. pbOut[0] = (BYTE)((dwValue & 0x00ff0000) >> 16);
  332. Assert(pbOut[0] == (rgb[0] << 2) + (rgb[1] >> 4));
  333. pbOut[1] = (BYTE)((dwValue & 0x0000ff00) >> 8);
  334. Assert(pbOut[1] == ((rgb[1] & 0xf) << 4) + (rgb[2] >> 2));
  335. pbOut[2] = (BYTE)((dwValue & 0x000000ff) >> 0);
  336. Assert(pbOut[2] == ((rgb[2] & 0x3) << 6) + rgb[3]);
  337. cbProduced += 3;
  338. pbOut += 3;
  339. // If cbFudge is non 0, it means we had "=" signs at the end
  340. // of the buffer. In this case, we overcounted the actual
  341. // number of characters in the buffer.
  342. //
  343. // Although cbFuge is counted in 6 bit chunks, but it assumes
  344. // values just 0, 1 or 2. And that allows us to say that the
  345. // number of bytes actually produced were by cbFuge less.
  346. // Eg. if cbFuge = 1, then uuuuuu is padded, which could
  347. // happen only when zzzzzzzz chunk was empty
  348. // if cbFuge = 2, then zzzzzz uuuuuu is padded, which could
  349. // happen only when yyyyyyyy and zzzzzzzz were empty
  350. //
  351. // xxxxxx yyyyyy zzzzzz uuuuuu <- 6 bit chunks
  352. // xxxxxx xxyyyy yyyyzz zzzzzz <- 8 bit chunks
  353. //
  354. if (cbFudge)
  355. {
  356. Assert ((cbFudge < 3) && (cbFudge < cbProduced));
  357. cbProduced -= cbFudge;
  358. pbOut -= cbFudge;
  359. }
  360. }
  361. // Tell the caller the actuall size...
  362. //
  363. *pcbOut = cbProduced;
  364. return sc;
  365. }