Source code of Windows XP (NT5)
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

365 lines
16 KiB

  1. 9/2/1998 JosephJ
  2. As part of implementing promiscuous mode multicast, we need to make some changes
  3. to the mars server.
  4. Some observations:
  5. * RCF2022 has special support for routers wanting to monitor blocks of
  6. NONOVERLAPPING addresses. It handles these blocks somewhat independantly
  7. from individual group joins: a block can overlap one or more individual
  8. group joins. On leaving the block, the mars server maintains the
  9. individual group joins.
  10. Changes/fixes required:
  11. 1. MarsPunchHoles needs to be written to change time complexity from
  12. O(number of addresses in block)
  13. to
  14. O(number of mcs address-ranges + number of individual groups).
  15. Currently, it runs through all 268 million possible addresses
  16. when processing a join of the entire class-D address space!
  17. Fix is jump to the end of the hole (if in a hole) or to the
  18. beginning of the next hole (if not in a hole).
  19. Also, it was enumerating the list of groups wrongly -- using
  20. pGroup->Next instead of pGroup->NextGroup.
  21. 2. MarsAddClusterMemberTogroups:
  22. When processing an join of the entire class-D space, it needs to check if
  23. the member has already joined the entire space, in which case it
  24. should treat the join request as a duplicate (send it back on the private
  25. vc). Currently it will treat it as a new entry and hence will send
  26. it on the cluster-control-vc. We would end up with duplicate entries.
  27. MarsDelClusterMemberFromGroups:
  28. Similarly, when when processing a leave of the entire class-D space
  29. when in fact the member has already left, we must make sure
  30. that we reflect it privately.
  31. 3. MARS_MULTI processing: MarsHandleRequest had to be modified to
  32. also return the hw addresses of members monitoring the entire class-D
  33. address space (but also make sure that it doesn't send duplicate
  34. hw addresses if the member is also minitoring the specific
  35. group address).
  36. 9/14/1998 JosephJ
  37. There are reports of bugchecks during boot up. One is bug# 212412
  38. (ATMARPS: MarsSendRedirect referenced past end of pool). The stack was:
  39. f7cfbcdc f276ba36 00000000 fa502000 00000000 ntoskrnl!KiTrap0E+0xc3
  40. f7cfbd80 f276cc04 80a11828 f92cff18 00000000 atmarps!MarsSendRedirect+0xbe
  41. MarsSendRedirect is a timer callback function. It it creates and sends
  42. a redirect packet containing all the configured and registered addresses.
  43. However it doesn't claim any lock. So it's possible that it's in the
  44. middle of doing this when registered addresses are added (they are
  45. added on ArpSCoRequestComplete callbacks.
  46. So we fix this by adding a lock to the interface when reading/modifying
  47. these addresses: in MarsSendRedirect and in ArpSCoRequestComplete. Checked
  48. in today.
  49. 10/24/1998 JosephJ
  50. Added statistics -- created ARP_SERVER_STATISTICS and MARS_SERVER_STATISTICS
  51. (iocto.h, ioctl.c, also the added functionality to atmarp.exe to dump this
  52. stuff).
  53. 10/25/1998 JosephJ
  54. Added support to actually fill out the statistics: ndis.c, mars.c, arps.c, etc.
  55. Some clarifications...
  56. Mcast Joins: failed doesn't include dups.
  57. Mcast requests: MCMesh requests include responses to those nodes that are in
  58. promiscuous mode.
  59. Discarded pkts: usually means pkts discarded due to error/unsupported format or
  60. because a resource allocation failure. Also includes pkts discarded because
  61. they are not from a cluster member. If the discarded pkt is known to be for a
  62. particular task, say a join, then the count of failed joins is also incremented.
  63. We do not explicitly keep track of failed mars or arp requests which are
  64. ignored (either acked or nacked) -- so if you see the arp request count
  65. increment but the ack+nak not incrementing, you can conclude that the request
  66. is being ignored.
  67. 10/25/1998 JosephJ
  68. While implementing the above, found the following existing bugs:
  69. MarsReqThread -- if it gets a pkt with an unrecognized opcode, it simply
  70. leaves it in limbo -- it should put it back into the free list! Otherwise
  71. we'll soon run out of pkts if we get barraged with bad pkts.
  72. MarsDelClusterMemberFromGroups: it should
  73. (a) delete a mars entry when
  74. it's list of members goes to NULL -- delete consists of removing
  75. it from the hash table and calling ArpSFreeBlock on it.
  76. (b) actually free the pGroup, calling FREE_MEM.
  77. checked in 10/28/1998
  78. 12/22/1998
  79. Made a change to the following code in ArpSCoRequestComplete:
  80. [Old code]
  81. pIntF->NumAddressesRegd ++;
  82. if (pIntF->NumAddressesRegd < pIntF->NumRegdAddresses)
  83. {
  84. ...
  85. }
  86. [New code]
  87. We inly increment NumAddressRegd if
  88. (pIntF->NumAddressesRegd < pIntF->NumRegdAddresses).
  89. We assert otherwise. See 12/22/1998 Note in
  90. ArpSCoRequestComplete for details.
  91. 2/26/1999 JosephJ
  92. Bug: #297656 ATMARPS: Unbinding arp/mars on server A does not unregister
  93. the well-known address from server A.
  94. Fix:
  95. I unregister the registered addresses in shutdowninterface, and block until all
  96. the unregistrations complete. I decided to block because the completions will
  97. likely be asynchronous and we immediately go on to close the Af and then
  98. deallocate the object, so I was not comfortable with not waiting for
  99. completions) of the unregistrations.
  100. 2/29/1999 JosephJ
  101. Stress hit the following assert with checked atmarpc.sys, 1990:
  102. Assert NdisPartyHandle != NULL failed: file mars.c, line 3247
  103. The assert was because the ndispartyhandle was NULL in the context of the
  104. dropparty handler.
  105. The crutial observation is the following 2 identical lines of debug spew before
  106. the assert:
  107. 0:MARS: AddMemberToCC: pVc 868178c8, pMember 87736808, ConnState 2
  108. 0:MARS: AddMemberToCC: pVc 868178c8, pMember 87736808, ConnState 2
  109. The 2nd call to NdisAddParty (from MarsAddMemberToClusterControlVc) clobbers
  110. the previous value of pMember->NdisPartyHandle on entry and then probably fails
  111. (because there's already a party).
  112. Looking for ways that MarsAddMemberToClusterControlVc could be called twice
  113. (which is not supposed to happen), the only way I can see this happening
  114. consistant with the debug spew, is that an incoming registration-join from a
  115. new member (MarsHandleJoin) came at the same time that we were handling
  116. post-processing of the initial CC MakeCall complete
  117. (ArpSMakeCallComplete, 1195,ndis.c).
  118. There are holes in the way vc and member flags are set and checked that would
  119. lead to MarsAddMemberToClusterControlVc being called from both code paths.
  120. (The enumeration in line 1185 of ArpSMakeCallComplete is also dangerous in
  121. that it could potentially deref a freed pMember, but that's not what happened
  122. here).
  123. The following functions have similary dangerous emumeration/assumption that
  124. pMember will be valid:
  125. MarsAbortAllMembers
  126. ArpSDropPartyComplete
  127. MarsDelMemberFromClusterControlVc
  128. FIXES to all of the above:
  129. 1. Added new function MarsIsValidClusterMember that makes sure a particular
  130. member is in the cluster list.
  131. 2. This function is called from:
  132. MarsDelMemberFromClusterControlVc (which simply returns if invalid).
  133. ArpSMakeCallComplete (which stops enumeration if a pMember is invalid).
  134. 3. MarsAbortAllMembers fixed to do a safe enumeration of all members.
  135. 4. MarsAddMemberToClusterControlVc returns without doing anything if
  136. if (MARS_GET_CM_CONN_STATE(pMember) != CM_CONN_IDLE) (check is made
  137. AFTER getting the IF lock).
  138. 3/3/1999 JosephJ Revisiting MarsAbortAllMembers and DelMembersfromVc
  139. -- safe enumeration used.
  140. 3/3/1999 JosephJ CM_INVALID use
  141. -- fix del member on add-party complete, but be careful on make-call complete
  142. (maybe ok if there are no other members at this time).
  143. 3/16/1999 JosephJ Actually checked in the following files...
  144. ioctl.c v13 fix uninit var
  145. mars.c v12 299201 -- various robustness-re
  146. mars.h v2 299201 -- various robustness-re
  147. ndis.c v16 299201 -- various robustness-re
  148. protos.h v9 299201 -- various robustness-re
  149. All of the above are the fixes explained in 2/29 and 3/3 entries above
  150. registry.c v5 ArpSReadAdapterConfig... p -- p
  151. (make sure proper default values are in place in case the
  152. corresponding call to read registry values fails --
  153. The are two cases:
  154. - pConfig->RegAddresses -- this is benign in all cases (because
  155. pConfig->NumAllocedRegdAddresses is set to 0 on failure),
  156. EXCEPT in ArpSReadAdapterConfiguration, where
  157. PrevRegAddresses is freed if it is non NULL.
  158. Fix is to initialize pConfig->RegAddresses to NULL before
  159. calling NdisReadocnfiguration.
  160. - pConfig->pMcsList -- this is not benign.
  161. Fix is to init it to NULL before calling NdisReadocnfiguration.
  162. 04/20/1999 JosephJ Fix for 327626 Need Rouge ARP Server detection on ARP
  163. registration.
  164. 1st version of fix:
  165. BEFORE registering address, we try to make a call to the address. If it
  166. succeeds, we log an event in the event log and fail the initialization.
  167. 2nd version of fix: keep call open -- if we get an incoming close, we
  168. then try again.
  169. Note: it's possible that if two notes are doing this at the same time they
  170. may still both come up, i.e., rogue detection will fail.
  171. 05/05/1999 JosephJ 331517 - bugchecks due to wrong VC PacketSize
  172. The bugchk was triggered by the fact that the mars server got several incoming VCs
  173. with very large max packet size (they should all be 9188, but we saw 64008 for
  174. several and 18200 for one).
  175. Fix (in MarsHandleRequest (mars.c)) is to replace SHORT and USHORT local variables
  176. by ULONG.
  177. 05/05/1999 JosephJ rogue ARP server detection contd...
  178. ArpSCoAfRegisterNotify -> ArpSOpenAfComplete ->
  179. ArpSRegisterSap -> ArpSRegisterSapComplete
  180. ArpSBindAdapter -> ArpSReadAdapterConfiguration -> ArpSQueryAndSetAddresses
  181. ArpSCoRequestComplete
  182. Configured address (pIntF->ConfiguredAddress) filled by GET OID_CO_GET_ADDRESSES
  183. registered addresses (pIntF->RegAddresses[])
  184. Fri 05/14/1999 JosephJ Rogue ARP server detection contd.
  185. Things are kicked off on getting a OID_CO_ADDRESS_CHANGE notification.
  186. We (as before) query the adapter for the configured address.
  187. On completion of this (OID_CO_GET_ADDRESS), we start the process of validating
  188. and registering all addresses, by calling ArpSValidateAndSetRegdAddresses.
  189. ArpSValidateAndSetRegdAddresses allocates and initializes pIntF->pRegAddrCtxt
  190. (which keeps all context associated with the validation and registration of
  191. addresses). A reference is added to pIntF for pRegAddrCtxt.
  192. The function then calls ArpSValidateOneRegdAddress.
  193. ArpSValidateOneRegdAddress attempts to initiate the validation & registration
  194. of a single address. If there are no addresses left to be processed, it
  195. will unlink pIntF->pRegAddrCtxt (deref pIntF and deallocate pRegAddrCtxt).
  196. "Validation" consists of making a point-to-point to call to the address,
  197. using the same call params as the atmarp client uses. If the call fails,
  198. the address is considered "validated". The protocol's context for the VC
  199. is pRegAddrCtxt itself.
  200. The next stage happens in the make-call complete handler
  201. (ArpSMakeRegAddrCallComplete).
  202. ArpSMakeRegAddrCallComplete:
  203. - on successful make call (which is a failed validation),
  204. it it immediately closes the call. The close call handler
  205. (ArpSCloseRegAddrCallComplete) deltes the vc, increments
  206. pRegAddrCtxt->RegAddrIndex and calls
  207. ArpSValidateOneRegdAddress (to initiate the validation & registration
  208. of the NEXT address, if any).
  209. -- On failed make call (which is a successful validation), it deletes the
  210. vc and initiates registration of the address (calls NdisRequest with
  211. OID_CO_ADD_ADDRESS). On completion of the OID_CO_ADD_ADDRESS
  212. (ArpSCoRequestComplete), we do the following:
  213. - on success, copy over the addres to
  214. pIntF->RegAddresses[pIntF->NumAddressesRegd] and
  215. increment pIntF->NumAddressesRegd ++
  216. - on failure or success, we increment pRegAddrCtxt->RegAddrIndex
  217. and call ArpSValidateOneRegdAddress (to initiate the validation &
  218. registration of the NEXT address, if any).
  219. A note on the use of pIntF->RegAddresses[...]
  220. This array is initialized with all the user-specified addresses read from
  221. the registry. As validation proceeds, however, the successfully validated
  222. and registered addresses are copied sequentially into this array. If
  223. *all* the addresses are successfully validated and registered, then the
  224. end values in the array is the same as the initial values. However, if
  225. some intermediate addresses are not validated or registered succcessfully,
  226. then the end result will be different. In all cases, the first
  227. pIntF->NumAddressesRegd entries will contain the registered addresses.
  228. 09/30/1999 JosephJ Bug 405851
  229. *RC3SS: ATMARPS: Bugcheck unloading atmarps during shutdown
  230. The basic problem is that ArpSIfList contains a pointer to a just-freed pIF.
  231. The biggest problem I've found is that INTF_CLOSING state is set only in the
  232. ArpSCloseAdapterComplete handler. This means that ArpSStopInterface (which
  233. calls on ArpSReferenceIntF which *does* check the INTF_CLOSING flag) can be
  234. called multiple times for the same adapter. Also ArpSReferenceIntF is called
  235. from some other places. ArpSStopInterface is called from: ArpSShutDown
  236. (called from arp's unload handler as WELL as when handling IRP_MJ_SHUTDOWN)
  237. ArpSCoRequest(AF closing) and ArpSUnbindAdapter(unbind adapter handler).
  238. ArpSStopInterface is NOT idempotent (it expects to be called only once per pIF),
  239. but given the above, it CAN becalled multiple times per IF (i.e., because
  240. INTF_CLOSING is only set on close adapter complete).
  241. One specific problem with this: it assumes it can use pIntF->CleanupEvent.
  242. Another problem:
  243. ArpSShutDown: it goes through each IF in ArpSIfList, refs it (which would fail
  244. if the IF is INTF_CLOSING), release list lock, then calls ArpSStopInterface.
  245. It is definately flawed in the way it uses pIF->Next -- pIF->Next could well be
  246. gone by the time it gets to it. HOWEVER that's not the problem we're seeing
  247. -- we're seeing the case of the ArpSIfList ITSELF being corrupted.
  248. FIXES:
  249. 1. arps.c ArpSShutdown now refs the pNext pointer -- a little bit of intricate
  250. code that makes sure pNext is still around when we need it.
  251. 2. We use the INTF_ADAPTER_OPENED flag, set on successful completion of
  252. open adapter (in ArpSOpenAdapterComplete). We use this to make sure
  253. that NdisCloseAdapter is called only once. NdisCloseAdapter was called
  254. in a bunch of places -- now we call ArpSTryCloseAdapter, which checks
  255. the INTF_ADAPTER_OPENED flag first.
  256. 3. ArpSStopInterface now doesn't clobber pIntF->CleanupEvent -- if it's
  257. NON-NULL it simply doesn't wait.
  258. 10/07/1999 JosephJ fix for 412018
  259. *RC3SS: ATMARPS bugchecks on unloading if no call manager bound to adapter.
  260. On unloading, we were trying to deregister addresses which had never
  261. been registered (in fact the AF was never opened). Fix is in
  262. DeregisterAllAddresses (ndis.c): it used to check that
  263. pIntF->NumAllocedRegdAddresses is non-null ; now it checks that
  264. NumAddressesRegd is non-null.
  265. 01/06/2000 JosephJ fix for 416301 corrupt ArpSIfList
  266. ArpSOpenAfComplete(ndis.c): we were setting a pInfF flag without holding the IF
  267. lock (we just had the IF list lock). This was possibly corrupting the other
  268. bits in the flag field. This is most likely the cause of the problem.
  269. Also got rid of INTF_IN_LIST field, which is not required.
  270. ArpSDereferenceIntF(arps.c): Arvindm added code to make sure two threads don't
  271. try to deref the interface to zero at the same time. This is not likely to
  272. be the cause of the bug (bug is that the list is corrupted, not that
  273. an IF entry was de-allocated prematurely or twice) but is a hole that needs to
  274. be fixed.
  275. 03/30/2000 JosephJ Hit an assert in arps.c when the close-call handler for
  276. the validation make call (ArpSCloseRegAddrCallComplete) is called during
  277. shutdown -- we've nuked pIntF->NumAllocedRegdAddresses by this time as
  278. part of shutdown, so the following assert in the function fails:
  279. ASSERT(pRegAddrCtxt->RegAddrIndex < pIntF->NumAllocedRegdAddresses);
  280. Fix is to bracket this code by if (!(pIntF->Flags & INTF_STOPPING)).
  281. 04/18/2000 JosephJ
  282. Removed assert from the code below in ArpSValidateAndSetRegdAddresses
  283. if (pIntF->pRegAddrCtxt != NULL)
  284. {
  285. ASSERT(FALSE);
  286. break;
  287. }
  288. This could happen if we get an OID_CO_ADDRESS_CHANGE when we are
  289. either processing an earlier one, or are in the process of
  290. initializing. We get this case (and hit the assert) during pnp stress
  291. ( 1c_reset script against an Olicom 616X) -- Whistler bug#102805