Unified Communications

 View Only
Expand all | Collapse all

96x1's needing re-login after CM interchange

  • 1.  96x1's needing re-login after CM interchange

    Posted 01-28-2020 12:50 AM

    Hi all, our CM 8.1 interchanged today (still looking into the why of that) and about 5% of our 96x1 H.323 phones got logged out, and had to be re-logged in, i.e. extension and passcode. 95% of phones popped right back up in less than a minute, as expected  


    Any thoughts on why the re-login of these few sets? Thanks!!!!



  • 2.  RE: 96x1's needing re-login after CM interchange

    Posted 01-29-2020 08:28 AM
    Hi Chip,

    In using CM 8.1 we have seen this a few times, where the phone did not save the last login information. We do allow the extension and ID to be saved on phones where they are not in a hot desk environment. One other note, we are using all SIP phones, so I am not sure if it is only related to H.323.

    Kind Regards,
    Richard

    Richard C. Browne, Jr.
    Senior Telecommunications Engineer
    MAP Communications, Inc.
    840 Greenbrier Circle
    Suite 200
    Chesapeake, VA, 23320
    Phone: +1 800-955-9888 / +1 757-424-1191
    Facsimile: +1 757-578-4963
    Email: rbrowne@mapcommunications.com

    Visit our interactive web site
    www.MapCommunications.com




  • 3.  RE: 96x1's needing re-login after CM interchange

    Posted 01-29-2020 09:24 AM
    Maybe instability at the time of the interchange.  I've also noticed with the h323 firmware that if the phone is off hook, it gets stuck in "discover".  Just putting it back on hook brings it back though so I guess I'm not being helpful.

    What version of cm 8.1 are you on?  Any shared call control?  I suggest sp2 (8.1.0.2) with the custom patch 25788 that includes the fixes for trunk size and shared call control (see the psn noted in the release notes). 

    Sam Osheroff
    UW-IT Telecom Engineer
    University of Washington
    sosherof@uw.edu
    206.221.6362





  • 4.  RE: 96x1's needing re-login after CM interchange

    Posted 01-30-2020 11:02 AM
    Sam, we're on R018x.01.0.890.0  It's not a shared call control situation though.The phones that needed re-login were not off hook. A pleasant surprise was that calls in progress stayed active during the interchange. I believe that's by design, and if memory serves we tested that when we first installed the CM. Thanks for the feedback!


  • 5.  RE: 96x1's needing re-login after CM interchange

    Posted 01-30-2020 09:02 AM
    Hey Chip -

    We have run into this before on our old CM 6.3, and few years ago we raised a case with Avaya, and there response was "This happens when you have a mass of H.323 phones trying to register at once, to the same IP Address. The phone doesn't get a response in time, and after a period of waiting it reverts back to needing a Login/Password." In our case we had to do a reboot, and at that time we had over 4500 H.323 phones hit 3 specific address sitting on CLANs.

    For us, it made sense, because of how we utilize CLANs for H.323 phones, and the design was to have our Enterprise wide DHCP Server point to one address at first (Then secondary/tertiary). So not sure if you are using PROCR or CLANs for registration points, it might be an apples to oranges comparison (as PROCR SHOULD be able to handle more traffic).. 

    Just wanted to share our experience with this from a few years back.

    ------------------------------
    James Davis
    Voice and Data Senior Engineer
    University of Nebraska Medical Center
    Omaha NE
    ------------------------------



  • 6.  RE: 96x1's needing re-login after CM interchange

    Posted 01-30-2020 11:08 AM

    Hi James, this part really concerns us:

    few years ago we raised a case with Avaya, and there response was "This happens when you have a mass of H.323 phones trying to register at once, to the same IP Address. The phone doesn't get a response in time, and after a period of waiting it reverts back to needing a Login/Password."

    So if 5% of your 4500 phones needed to be re-logged in, that's 225 phones that needed manual intervention. You have a team that can do that? And the whole phoneset password situation is a bit of a mess: you might have had to change all those individually. Fortunally we didn't have nearly that many.

    We're going to need to sit down with Avaya and talk about how to design this better in case it doesn't happen in the future.




  • 7.  RE: 96x1's needing re-login after CM interchange

    Posted 01-30-2020 11:13 AM
    By the way, we discovered why the interchange occured. Issues related to PSN020151u, which in a nutshell says "problems can happen when CM is on VMware (or AVP)" when you basically do anything to the active server. In our case, a VMware engineer attempted to dismount a leftover .iso drive from the active CM server. It caused the active CM to restart, and the two CM's briefly went into an Active/Active state, which it then needed to recover from. I don't know if this specifically caused the 5% of phone to need to be re-logged in, or if that would be expected to occur from any interchange situation. Most phones went into discover mode and came back up within about 30 seconds.


  • 8.  RE: 96x1's needing re-login after CM interchange

    Posted 01-30-2020 12:14 PM
    Chip,

    H.323 registration can become problematic when there are too many phones registering at the exact same moment.  In the CM world, occupancy is a measurement of how busy the system is.  In addition, each registration point has a number of registrations it can accept at any given moment.  During a dirty interchange (which is what sounded like what happened), the CM server petty much comes up cold, and loses all the registration data that existed previously.  Meaning, each phone needs to login fresh, taking up system resources.  When you have all the phones pointing to a single registration point (like PROCR), the interface will tell the phones that it's unavailable if its too busy.  If the phone has no other registration points, it will wipe the memory and try again, thinking it is the problem (causing a logout).  Allowing other registration points on your network (mostly, that would be CLANs), will allow the phones to register to a backup location if things are too busy.  You can still setup your AGL so that it won't stay on the CLAN, but as least it can login, download thr AGL and go somewhere else.

    Had this been a clean interchange, I'm sure you wouldn't not have seen the issue since the registration data would have stayed active.  

    -Nick

    ------------------------------
    Nick Kwiatkowski
    Director of Design and Engineering
    Michigan State University
    ------------------------------



  • 9.  RE: 96x1's needing re-login after CM interchange

    Posted 01-30-2020 12:58 PM
    Ah, AGL = Alternate Gatekeeper List. This is what I need. So you tell the phones that if they start timing out, go look somewhere else, like an LSP? So it actually gets its config from somewhere else, or it just hangs in a holding pattern so it doesn't shut itself down?


  • 10.  RE: 96x1's needing re-login after CM interchange

    Posted 01-30-2020 01:44 PM

    Generally PROCR should be your preferred primary registration point.  It can handle a lot of registrations.   Phones can still register to CLANS but their capacity stinks and they can easily get overwhelmed because all that traffic has to pass through the packet bus to the IPSIs to the active server.  Once a phone registers, CM will send the AGL and that will generally steer the phone into re-registering.  I believe the AGL is constructed something like this:

     

    1)      List of procr/CLAN IPs that are in the phone's mapped network region (i.e. ip-network-map).  This list is further prioritized based on the "Gatekeeper Priority" setting.

    2)      If the phone's IP isn't in ip-network-map, then instead it's a list of procr/CLAN IPs in the same network region as their initial registration point, again sorted by "Gatekeeper Priority".

    3)      List of "Backup Servers" from the phone's mapped network region (if it has one) or the network region of the initial registration point.  The priority for these is set in the network region itself.

     

    The phone tries all these in the priority given.   (Note: This applies only to ip-interfaces with "Allow H323 Registrations" enabled.)

     

    During a dirty interchange, there's going to be all sorts of registration issues.   PROCR isn't immediately available until things get sorted.  Once CM is actually up, IPSI-based cabinets get reset so CLANs aren't available until 7 minutes after CM starts.  The PSN you sent sounds like the issue could have actually been worse than a simple dirty interchange.  With dual active servers, since the servers share an IP but not the MAC address, a GARP for the shared IP can be sent by a server that then immediately goes back into standby.  The router can then potentially send traffic to the wrong server (based on the router's ARP cache having the wrong MAC for the shared IP).  So a phone could potential try to register to a server that's gone standby and get rejected.

     

    Anyway, much of the "why" some of your phones got logged out is speculation on my part.  We can discuss more at the conference.  We don't have a lot of H323 phones left but even when we did, we generally didn't allow registrations to CLANs at all (most CLANs are set to not allow H323).   The CLANs mostly existed in G650s because they're needed for TN-board firmware upgrades (i.e. TN2224CP, TN793CP, etc.).

     

     

    Sam Osheroff, ACCA

    UC Operations Engineer

    IT Infrastructure Telecom Operations

    University of Washington

    sosherof@uw.edu

    Internal: x16362 Direct: 206.221.6362

     






  • 11.  RE: 96x1's needing re-login after CM interchange

    Posted 01-30-2020 03:39 PM
    Thanks Sam, we'll talk at conference. This whole discussion has been very helpful.  ​Bottom line: I need to engage my business partner to help us design better resiliancy in the event of any kind of interchange.