Wednesday, December 10, 2008

Operation Road blocker!

It was international hump day on 12th Nov, 08 when operation team raised issue about application outage due to some internet connectivity being mercurial. On ping in various websites, it was found that network loosing loads of packets especially in MDS application connection however it was working fine for few other websites. Loss of packets in other websites like yahoo, google, and call-mustang were very intermittent. tracert was running until team realized sites like yahoo, google getting stable but call-mustang tracert behaves same as MDS site, although MDS site was completely down for India users but application was up and running fine for US. It was extreme tricky situation for us to analyze the root cause of such outage because application servers had no issue and application was smoothly accessible by our US team. India local service provider was found ok because other internet sites were working fine and our application was showing similar behavior using other internet service as well (verified through data card connection).

I had been coordinating with all the stakeholders be it data center, service provider, ops user, engineering team, and internal business team. On further analysis it was unanimously agreed that the 3rd level connection has some problem routing requests back n forth. Data center team started further coordination with level3 engineers through their service providers (SAVVIS) and our IT team with our local service provider (Bharti).

Within 3 hour of issue report at 10:49 am, I could sent an email reading application is running smoothly and stable with 0% packet loss after finding issue resolved for the India team. I was sure for neither BHARTI nor SAVVIS caused the issue directly because other internet sites from our office was fine during challenge in MDS apps and MDS application was fine from US (Confirmed by my US team member). I concluded issue with bridge (level3) between India service provider (Bharti) and ATOMIC service provider (SAVVIS)”

On smooth application operation I checked and compared route and found it different than previous one. Now level3 issue was resolved.

No comments: