View Full Version : Escalation of helpdesk tickets
Valorum
11-26-2006, 01:46 PM
I understand that the emergency ticket system is a two way street, like you say in your guidelines. So i try to not use it lightly. Typically i start a ticket as urgent even it is affecting the effectiveness of my server. Only when the situation prolongs will i escalate the ticket.
Over the last week i ran into 2 issues that in my opinion warranted an emergency ticket. Each time the ticket was reset to "Normal" by the helpdesk staff however. I'm writing this to understand why that was, and to understand what the procedures are, and to perhaps highlight a problem with the processes.
The first issue was when the name servers for the new server my account was moved to (www89) were not responding to the outside world. This meant that the domains were completely unreachable. I started the ticket as a "Urgent" ticket, and provided all the information that explained that the nameservers were timing out, providing proof. After several back-and-forth replies it became clear that the person manning the helpdesk didn't understand the urgency of the situation, and thus i escalated the ticket. Within minutes the ticket was reset back to Normal status! The reason that was given was that emergency tickets should only be used for servers that are down. Well, they were, and i provided the proof. Furthermore, why would the ticket be reset to Normal status instead of Urgent, which it was originally?? The fact that my domain was unreachable is not, in my opinion, a "Normal" status ticket.
The second issue has been going on since Friday night, when i switched my account over to the new server. I have not received any new email on the old nor the new server, and as it turns out this is because the new email server has some sort of problem accepting email. The server reports a "451: Local problem" error when you try to submit email. So again i started a ticket in Urgent status (not Emergency) and report the problem. 11 hours later(!) i have not yet received a reply and the problem still persists. So as per the guidelines posted on the helpdesk (http://helpdesk.hostpc.com/index.php?act=help&sub=5), i escalate the ticket to emergency status. (I have been without email service for 36 hours now.) Suddenly within a short time the ticket is reset back to Normal status without any further reply clarifying why. I do not know if it was done because somebody is handling it, or if the helpdesk thinks it should not be an emergency ticket. A simple clarification by whoever set the status to Normal would've been very helpful. But then i still wonder why it was set back to Normal status and not Urgent, which is what it was before.
So, why did these two situations not warrant an emergency ticket, and what kind of situations do? And why do tickets get set back to Normal status instead of Urgent status? (My suggestion would be to at least include a vrief explanation / reply when something like that is done to help avoid confusion / frustration of the customer.)
app-o-rama.com
11-27-2006, 07:53 AM
The first time I clicked on the the guidelines posted on the helpdesk (http://helpdesk.hostpc.com/index.php?act=help&sub=5) link I saw a page that included this:
Please note: Emergency tickets are reserved for Server OUTAGES only. This means the server has been confirmed down at http://www.hostpc.com/uptime. By the time you notice, an emergency alert has already gone out to all Level 1 Techs, but if the server is not restored after 10 minutes, please feel free to open a ticket.
The second time I clicked on the link, I saw the "two-way street" text. I don't know if HostPC is currently revising their guidelines for what "Emergency" is or if this is a result of Joe's being away (see his post in the Announcements forum) more than he would like.
If the HostPC employee who answered your tickets was operating under the above definition of "Emergency", neither of your tickets would qualify for "Emergency" status since the uptime page shows www81 has been up 100% since 11/20/06.
Obviously, this isn't much of an answer to your question, but hopefully it provides a little insight. Someone from HostPC will have to give you a real answer.
I'm a little confused, I guess. Why would you start all your tickets as urgent?
As for the nameserver issue, why not switch back to the previous ones when you spotted the problem? A few minutes of downtime while the propagation happens is better than any sort of extended downtime!
Dan
starfighter
11-28-2006, 09:46 PM
Please note: Emergency tickets are reserved for Server OUTAGES only. This means the server has been confirmed down at http://www.hostpc.com/uptime (http://www.hostpc.com/community/../uptime). By the time you notice, an emergency alert has already gone out to all Level 1 Techs, but if the server is not restored after 10 minutes, please feel free to open a ticket.
This is the correct policy for emergency tickets. Any other use of emergency tickets is a violation of the policies governing access to the helpdesk. By your own admission neither ticket met this threshold and therefore were summarily reduced in priority. You had to read the above quote to get to the helpdesk, therefore you knew the policy. While a staff note is nice and provided when there is time, it does not always happen.
Thats the summary of the emergency ticket status rules in a nutshell.
I'm a little confused, I guess. Why would you start all your tickets as urgent?
As for the nameserver issue, why not switch back to the previous ones when you spotted the problem? A few minutes of downtime while the propagation happens is better than any sort of extended downtime!
Dan
To shed some light, his downtime was not due to propogation originally. The new server WAS misconfigured (long story, first time for everything). The mis-configuration was minor and easily corrected, but at 2am, the staff online was not senior techs, and a non-forced server move is not an emergency situation. The problem was resolved within a few hours (once I got online at 4am). That's all I'm going to say about the situation in the forums. We don't discuss the details of individual clients support issues here for privacy reasons.
Needless to say, everyone thinks their problems are more urgent then the nexts. I'm 100% sure Val felt his was an emergency, but we need to evaluate each issue based on the situation as we see it and make a judgement call as to whether it's urgent or normal. Rarely are "emergency" tickets truely that.
app-o-rama.com
11-29-2006, 07:12 AM
Needless to say, everyone thinks their problems are more urgent then the nexts. I'm 100% sure Val felt his was an emergency, but we need to evaluate each issue based on the situation as we see it and make a judgement call as to whether it's urgent or normal. Rarely are "emergency" tickets truely that.
Does this mean that neither of Val's scenerios rise to the level of "emergency?" If so, was it because they only affected his/her account and not the whole server? Can you provide any tips for us customers on how to diagnose if it is a server-wide or one-account-only issue?
I had assumed that "emergency" was reserved for server-wide issues and not one-account-only issues. Is that correct?
Please note: Emergency tickets are reserved for Server OUTAGES only. This means the server has been confirmed down at http://www.hostpc.com/uptime (http://www.hostpc.com/community/../uptime). By the time you notice, an emergency alert has already gone out to all Level 1 Techs, but if the server is not restored after 10 minutes, please feel free to open a ticket.
Valorum
12-01-2006, 08:53 PM
Thanks for the feedback / thoughts. This is good!
I started this thread because i ran into 2 situations which i believed warranted an escalated status, yet HostPC apparently had a different opinion. I don't intend this to be a complain & whine thread. With this discussion i hope we can create clarity and mutual understanding on when to use "urgent" and "escalated" types of tickets. This will benefit all of us. Please read & respond with that in mind.
So with that out of the way, here goes...
First off, i do need to briefly describe the issues so that we can discuss if escalation in situations like those is appropriate or not. (The particular issues that i ran into have been dealt with and are behind us, and i don't intend to discuss them in detail here unless it benefits the discussion.)
The first issue was a server-wide issue that affected any and all accounts and domains on that server, and the outage persisted for 4-10 hours, depending who you ask. As Sean explained, the nameservers for that particular webserver were unresponsive due to an accidental misconfiguration, which caused any domains on that server to not resolve for the outside world. So as far as the outside world was concerned, the domains (websites, email and all) were non-existent.
The other issue was that the email server (the smtp portion to be precise) was not allowing us to send or receive any email for our domain for an extended period of time (over 24 hours). The smtp server reported an error 451, which indicates a local problem on the server, whenever an outside server tried to deliver email or when i tried to send email using either the HostPC provided Squirrelmail or my own desktop email client. As it turned out, it was an issue with the ClamAV service running on that server, and Sean was able to fix it quickly once i kept urging him to look at it. I do not know if that situation also affected any other accounts / domains on that same server, but i suspected that it might and that the email service across all domains on that server was unavailable.
So the discussion here is about whether those types of situations warrant an escalated ticket or not, and if not, how a customer and the helpdesk staff should handle such situations.
If the types of situations described above do not warrant an "escalated" status, surely they warrant an "urgent" status? I escalated the two tickets about these two situations (after i started them out as urgent), for reasons explained throughout this post. To my surprise HostPC did not only disagree with that assessment, but decided to change the status from "escalated" all the way down to "normal" for both these tickets. There may be a logical reason why they might be set back to "urgent", but i fail to see the logic why situations like those described above would not even warrant an "urgent" status :confused:
And finally, i would like to understand why the tickets were closed repeatedly by the helpdesk staff while the problems still persisted. The helpdesk staff would say in their response that the situation had been fixed, yet when i went to use the exact same diagnostic tools i had mentioned in the opening of the ticket, it turned out the problem was still present, and i had to re-open the ticket and inform HostPC of the problem still persisting. This obviously does not give me as a customer a warm fuzzy that the tech staff is doing a thorough job, and i start second-guessing them. That's a bad situation. I think things like that can easily be avoided if the helpdesk staff verifies the situation a little more thoroughly, especially when the opening of the ticket mentions specific URLs and symptoms that can be verified.
The discussion below is mostly about why i escalated these particular tickets. I am hoping HostPC can provide some insight as to why an escalated status is deemed inappropriate for these types of situations.
I'm a little confused, I guess. Why would you start all your tickets as urgent?
I use normal tickets for regular questions or things that don't require urgent attention.
I try to create urgent tickets only for situations where something that's out of my control negatively impacts the service you are providing to me and keeps me from being able to use the service i'm paying you for.
When i open an urgent ticket, i typically do so only after i have exhausted my other options of troubleshooting. When i say a server seems to not be working, i usually have already probed it with different tools to verify that claim, and typically i'll include that information in my ticket.
So i open urgent tickets for things such as the email server not working, mySQL not working, nameservers not responding etc. Basically all the stuff you guys can take care of and that i cannot take care of. I'm not talking about my website or some script messing up. I can troubleshoot that myself and don't need to bother you guys with that.
Then only when such a situation persists for an extended period of time, i may raise it to escalation if in my estimation it is a serious issue, such as servers being down or not working properly. So i have not ever started a ticket as "escalated" even though IMO it would've been justified in at least the first type of situation.
As for the nameserver issue, why not switch back to the previous ones when you spotted the problem? A few minutes of downtime while the propagation happens is better than any sort of extended downtime!
This particular issue was not due to propagation, so moving back would not have solved the problem. It was because your nameservers were non-responsive (because of a misconfiguration, as it turned out). ALL domains on that particular server were unreachable, not just mine. Even if i had gone through the hassle of moving everything back, the other accounts on that server were going to continue suffering from the same problem. Hence the urgent ticket. And since the situation persisted for over an hour, and since it was not just my account that was being affected, i decided to escalate the ticket.
I felt like i was doing you guys a service by escalating this ticket, because apparently nobody in your organization (including you) was aware of these servers being non-responsive and domains being unreachable for an extended period of time. My thought also was that you just signed up a bunch of new customers and probably put at least some of them on this new server, and that it would be a bad starting customer experience w/ HostPC if they start off on a server on which their domains are unreachable for an extended period of time right off the bat. So again, i thought this was a serious issue that you might be interested in.
So if i should not have escalated that ticket, i would appreciate it if you could explain to me what i should have done in this situation. Because the helpdesk staff was telling me to just wait for propagation to take place, which was clearly not going to solve the issue. And since the helpdesk is the only way i have to interact with you, i'm stuck. Escalation was the only way i saw out of the situation.
Please note: Emergency tickets are reserved for Server OUTAGES only. This means the server has been confirmed down at http://www.hostpc.com/uptime. By the time you notice, an emergency alert has already gone out to all Level 1 Techs, but if the server is not restored after 10 minutes, please feel free to open a ticket.
This is the correct policy for emergency tickets. Any other use of emergency tickets is a violation of the policies governing access to the helpdesk. By your own admission neither ticket met this threshold and therefore were summarily reduced in priority. You had to read the above quote to get to the helpdesk, therefore you knew the policy. While a staff note is nice and provided when there is time, it does not always happen.
Thats the summary of the emergency ticket status rules in a nutshell.
Before i escalated the tickets, i purposefully read the guidelines posted on the helpdesk concerning escalation (http://helpdesk.hostpc.com/index.php?act=help&sub=5) to make sure that i was using escalation properly. And at that time and as of this moment still, those guidelines say the following:
Emergency Tickets:
Please reserve Emergency tickets for catastrophic failure of a server.
Please make absolutely sure you have a server-related outage before using Emergency. It's a two-way street. We are trying to give you the ability to alert us because yes, we want to know if we have an outage and you're our best source of information. But if we continually get alerted for something not warranting it we'll start turning off the pagers and cell phones. We don't monitor it via pager and cell phone anymore. Hopefully we can start fresh here. Examples of an Emergency ticket are inability to reach your site and you've tried traceroute and get no response but traceroute isn't reporting an outage along the path, it's at the server. Another example is you can reach the server (you can ping it) but the web server isn't rendering pages. Not being able to get your mail server to respond when you know it isn't network related nor a username/password problem is another example. For those of you more technically inclined, if you are pretty sure one of the SMTP mail servers is getting spammed we need to know right away.
Those are the guidelines that are posted on your helpdesk. Those guidelines also seem to make a lot of sense. Don't abuse it. Only use it when you know something serious is going on.
To me they also pretty clearly indicate that if i know that there's a problem with one of the servers (such as nameservers being non-responsive and multiple domains being unreachable as a result, or an email server reporting a local problem and not accepting any email for over 24 hours) i should escalate the ticket. So i felt like i was following the rules when i escalated the tickets for both situations. Was i wrong? :confused:
The shorter version of the escalation rules that you posted (where are those on your site btw? this is the first time i've seen them?) seem to ignore the fact that your service involves not just web servers, but also nameservers and email servers. In the issue of the nameservers being down, the list that shows the status of the various webservers will list the webserver in question as humming along nicely, but the fact that the nameservers for that webserver are down is completely missed. So using the shorter rules would've missed these 2 issues. Is that what HostPC intends? :confused:
Besides that, like the shorter rules explain, if the monitoring page shows that a webserver is down, a notification has already been dispatched to hostpc staff. So that means it's not necessary to ever have an escalation ticket according to those rules... So having escalated tickets seems useless then? :confused:
The outage of the nameservers persisted well over an hour, and even according to the shorter rules, and escalation ticket is warranted if it's more than 10 minutes. The email server was unresponsive for over 24 hours. According the the "long" version of the escalation rules, such a situation warrants an escalation ticket.
So again, why does HostPC feel that in these types of situations escalation is unwarranted?
I think escalation tickets should be used to inform hostpc staff of an outage in the services hostpc provides. So in my interpretation of all the rules and the situation an escalation was justified. If HostPC does not agree with that, i'd appreciate an explanation of how such situations -should- be handled.
Thanks for trying to make this a constructive discussion! :thumbUP
app-o-rama.com
12-02-2006, 08:50 AM
My compliments on your well thought out post. These are good questions for all customers to know the answers to.
The shorter version of the escalation rules that you posted (where are those on your site btw? this is the first time i've seen them?) seem to ignore the fact that your service involves not just web servers, but also nameservers and email servers. In the issue of the nameservers being down, the list that shows the status of the various webservers will list the webserver in question as humming along nicely, but the fact that the nameservers for that webserver are down is completely missed. So using the shorter rules would've missed these 2 issues. Is that what HostPC intends? :confused:
I can't get to the short description consistent, but as I posted here (http://hostpc.com/community/showpost.php?p=14517&postcount=2), I saw the short description for first time I went to the guidelines page you linked. When I revisited this page a few days later, I saw the same thing -- first time to the guidelines page was the short description, if I refreshed it, I saw the long description.
Valorum
12-18-2006, 02:57 AM
Is HostPC staff not interested in this discussion?? :confused:
caddickj
01-11-2007, 01:04 AM
I think Valorum has some good questions here, stated extremely well. Can anyone on staff respond? I'd certainly be interested in hearing thoughts on the use of "Urgent" vs. "Normal", at least.
admin
01-11-2007, 09:51 AM
Escalate tickets as you feel you need to. By the time it's a server wide issue, a senior tech is already working on the issue thanks to proactive monitoring. As you stated, it was a server-wide issue that took a little bit to track down. Our junior techs are in place to be the first line of communication and inform the senior techs of an issue, if it's warranted and they can't handle it themselves.
If the senior techs can't handle it, I'm (as sysadmin) usually already either working on it or am in contact with the people that CAN handle it (ie. Datacenter, DA support or searching for answers via Google, DA forums, etc). If it gets to the point where I need to refer to the datacenter or DA, I'm at their mercy for time/speed of replies. We try to update the tickets as they come in, but sometimes that's not possible. I'd rather the techs work to fix the issue than take time to respond to each and every reply you enter. Keep in mind, you're entering one ticket - we've got 10K customers that may also be submitting tickets regarding it. Each ticket is important, but keep in mind we're only able to answer one at a time, per tech.
By escalating a ticket to "Emergency" - it triggers pagers, SMS, and yes, even triggers my X10 controllers to flash lights in the office and my house. (If that's abused, I will discontinue use of that controller). This has the ability to make sure every tech is on alert of an issue, wake up techs not on duty, disrupt their personal lives, etc - everyone's on call during an "emergency". So, if that's abused, we'll need to cause physical bodily pain to the abuser :)
The issue you're referring to (nearly 45 days ago) was an issue that was escalated several times, resulted in at least 30 "emergency" tickets. We were well aware of the issue and didn't necessarily need updates every 5 minutes that it was still down and "when are you going to fix" ... we were working on it, we dont let issues go like some providers for days or even weeks - it's handled then and there, but the answer isn't always immediately available.
I hope this answers your questions
Pauldow
01-11-2007, 10:15 AM
I'm not sure about the details of your helpdesk system. Does it have the capability of assigning a master ticket so the service people can assign the multiple calls coming in for the same problem to a single record? Then the technician only has to update one ticket to migrate to the child tickets. This would keep everyone informed while minimizing the typing load on the staff. Also, can you set up a web cam in your house so we can see the flashing lights like that guy who set up his Christmas lights to be controlled over the net? http://www.komar.org/cgi-bin/christmas_webcam
No, the helpdesk doesn't have that capability. That would be a great idea though.
Most the time in high volume tickets on the same problem your going to get a 'cut and paste' answer and your ticket closed. Or that is how I did it anyway. Not trying to short the customer, but as Joe stated, if your problem is server wide, a senior tech is probably working on it anyway before you ever noticed it.
Powered by vBulletin™ Version 4.0.3 Copyright © 2012 vBulletin Solutions, Inc. All rights reserved.