Announcement

Collapse
No announcement yet.

Yeah, I think we found your root cause

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Yeah, I think we found your root cause

    Today, I get an instant message from my manager, letting me know that we have an incoming "hot" problem from a major bank. Their storage loses connectivity to a pair of redundant cost-more-than-I-make-in-five-years switches. Given that the switches don't have anything to do with each other, other than being in the same data center, combined with the fact the storage didn't lose connectivity anywhere else is odd, to say the least.

    Before I know it, I have three levels of mgmt. in the chat window, all wanting answers, and wanting them yesterday. Some helpful soul at the customer site then pastes in the log messages they are looking at: (logs modified to protect the guilty)

    Switch X
    Date/Time Port # Message
    11:07:25 112 Port Down
    11:07:14 96 Port Down
    11:07:13 72 Port Down
    11:07:12 65 Port Down
    11:07:11 80 Port Down
    11:07:11 64 Port Down

    Switch Y

    Date/Time Port # Message
    11:07:55 80 Port Down
    11:07:51 65 Port Down
    11:07:48 72 Port Down
    11:07:46 96 Port Down
    11:07:40 112 Port Down
    11:07:37 97 Port Down
    11:07:35 104 Port Down

    Note the oddly sequential nature of those timestamps. This is not, to say the least, the normal failure pattern for hardware. When a bunch of things are going to die, they usually fail all at once. As in, simultaneously. And they don't spread to another separate piece of hardware a short time later, like some poisonous cloud.

    In fact, I would say this is about exactly how fast it would take a cable monkey to unplug a bunch of cables in a patch panel, one after another.

    The ports trickle back online a minute and a half later, at roughly the same speed.

    Let's just say that this one is not exactly going to tax my years of experience troubleshooting enterprise storage equipment.

    SirWired

    P.S. A few minutes later, someone asks on the conference call who is going to ask the customer to look into staff climbing all over the storage box. He didn't mean that metaphorically. He meant that literally, data center staff are using their $1M-ish piece of hardware as a freakin' stepladder. WTF? Is this a data center or a frat house?

  • #2
    Things like this are why people shouldn't be allowed in the cable room until they can at least tell you the differences between a RJ-45 and RJ-11. Who thinks its okay to just unplug a whole cable set without warning anyone?


    And besides, he needed fourteen seconds to unplug all that? I could have done it in five.
    The Rich keep getting richer because they keep doing what it was that made them rich. Ditto the Poor.
    "Hy kan tell dey is schmot qvestions, dey is makink my head hurt."
    Hoc spatio locantur.

    Comment


    • #3
      Quoth Geek King View Post
      And besides, he needed fourteen seconds to unplug all that? I could have done it in five.
      Chainsaws are not generally accepted as an IT equipment must have...
      A PSA, if I may, as well as another.

      Comment


      • #4
        Quoth crazylegs View Post
        Chainsaws are not generally accepted as an IT equipment must have...
        That's dependent on the tech...
        I AM the evil bastard!
        A+ Certified IT Technician

        Comment


        • #5
          Quoth crazylegs View Post
          Chainsaws are not generally accepted as an IT equipment must have...
          Yet, they'll let me have a sledgehammer.....
          Aerodynamics are for people who can't build engines. --Enzo Ferrari

          Comment


          • #6
            Quoth crazylegs View Post
            Chainsaws are not generally accepted as an IT equipment must have...
            Now, now, you should always use the right tool for the right job.


            Pruning shears clean up a wiring closet right quick.
            The Rich keep getting richer because they keep doing what it was that made them rich. Ditto the Poor.
            "Hy kan tell dey is schmot qvestions, dey is makink my head hurt."
            Hoc spatio locantur.

            Comment


            • #7
              Quoth Geek King View Post
              Things like this are why people shouldn't be allowed in the cable room until they can at least tell you the differences between a RJ-45 and RJ-11.
              Oh, oh! I can! I can!

              Speaking of cable closets and server rooms, even though I am still only taking classes for my A+ and server management stuff, I got to check out the server room at our school, which was unmitigated awesome. Bonus points for the fact that all the techs are absolute nerds (All hail nerds!), up to the point where the servers were named after the Marx Brother, the Muppets, and Star Trek characters (I feel a point of pride that the vague server name "Crusher" was the one that tipped me off.)

              And now a question, perhaps a silly one: Do you blokes in England know Star Trek, or is it all Red Dwarf over there? Forgive my ignorance.
              ~ It is a beautiful day to be dizzy!

              Comment


              • #8
                In college, our servers were pretty boring--named after past college presidents. At least we got to be creative with the printers--Fred and Wilma in one of the labs. Mail server was (appropriately) named Blackhole...since all messages sent to it tended to disappear Oh, and it just sucked
                Aerodynamics are for people who can't build engines. --Enzo Ferrari

                Comment


                • #9
                  UPDATE: It was invisible cable-pulling rodents

                  Heard back from the customer today. Despite the fact that a tech was doing work in the exact same cabling rack, he was "just cleaning up copper cables, he didn't touch the fiber. Sure it isn't a software bug?" Ah, well that narrows it down to invisible cable-pulling rodents. Better call the Orkin Man.

                  Fer crisssakes people! Both ends of the links are reporting the same problem "Loss of Light". What part of that is hard to understand? I don't personally care how the cables got unplugged. It isn't my problem that your data center is a freaking uncontrolled disaster zone. I really don't feel like hopping on conference calls every day for an hour while you grasp at straws. Hardware failures do NOT spread like a virus across small parts of two different pieces of equipment! And they don't fix themselves (gradually) after two minutes or so of an outage!

                  SirWired

                  Comment


                  • #10
                    Quoth Geek King View Post
                    Pruning shears clean up a wiring closet right quick.
                    Hedge trimmer is faster.
                    I AM the evil bastard!
                    A+ Certified IT Technician

                    Comment


                    • #11
                      I believe the guy with a chainsaw in the first Die Hard had the right idea
                      Ba'al: I'm a god. Gods are all-knowing.

                      http://unrelatedcaptions.com/45147

                      Comment


                      • #12
                        Quoth Broomjockey View Post
                        I believe the guy with a chainsaw in the first Die Hard had the right idea

                        If that would work in real life, you wouldn't see lumberjacks getting hurt by spiked trees. To say nothing of the spectacular effects you'd get from attempting to chainsaw a power line...

                        Comment


                        • #13
                          Quoth Meadhands View Post
                          Bonus points for the fact that all the techs are absolute nerds (All hail nerds!)...
                          Just a technical note, but they're actually geeks. Geeks make a living using their skills, nerds just rant about them on the `net.

                          Example: A comic shop owner is (probably) a comic geek, a rabid collector is likely a nerd. Especially if they have a blog about who would win in a fight between Superman and Wolverine.
                          The Rich keep getting richer because they keep doing what it was that made them rich. Ditto the Poor.
                          "Hy kan tell dey is schmot qvestions, dey is makink my head hurt."
                          Hoc spatio locantur.

                          Comment


                          • #14
                            Aye, to quote an infamous geek, Chris Pirillo:

                            I'm a geek, not a nerd. There's a difference.
                            I AM the evil bastard!
                            A+ Certified IT Technician

                            Comment


                            • #15
                              Quoth lordlundar View Post
                              Aye, to quote an infamous geek, Chris Pirillo:
                              All hail the Lockergnome.

                              I miss him. Heck, I miss TechTV in general.

                              We've mainly named our servers after colors and cars.

                              One server is named voicemail. Another (OLDER) server is called (Uni initals) Voice.

                              They decided to name the new email server (uni initials)virmail, because it's the mail server, and it's on a virtual server.
                              SC: “Yeah, Bob’s Company. I'm Bob. It's my company.” - GK
                              SuperHotelWorker made my Avi!!

                              Comment

                              Working...
                              X