SRE, DevOps, R&D – One Way or Another, We’re All Doing NOC

Does your company operate and manage a NOC? If you said “no”, it may not be as clear-cut as that. NOCs, or network operating centers, have a bad rep for being outdated and irrelevant in today’s modern cloud-dominated world. When you say “NOC”, most people will imagine a huge room with large screens and teams of people – just like those NASA scenes in the movies. These days, systems work differently, everything is on the cloud, and monitoring isn’t what it used to be. NOCs originated from the world of telecommunications, which is one of the reasons it’s so often associated with legacy technologies. After all, it couldn’t possibly still be relevant in today’s world of cloud and automation… right?

Wrong.

What is NOC?

NOC stands for “network operations center”, and is just as the name says – a center to manage the network operations. It’s basically one centralized place for R&D, DevOps, SRE, IT, and NOC teams to monitor the activities of their networks. When applied today, it’s typically used to monitor cloud infrastructure and applications, and allows teams to handle and escalate alerts and incidents that affect the product’s performance and availability. The end-goal is always to meet service level agreements (SLAs) and reduce downtime, which is critical in cloud environments, when everything needs to be available at top performance 24/7.

A NOC by Any Other Name

Yes, that’s a Shakespeare reference.

NOC does come from a legacy world, but just like we have always had to maintain communication systems around the clock, we also need to maintain cloud production systems and infrastructure 24/7 to ensure availability. In order to achieve that, you need monitoring systems, something that generates alerts, graphs, logs, and more. At the end of the day, you’ll also need someone who knows the process, receives all of this information from relevant sensors, and knows what needs to be done with it. Of course, to do that 24 hours a day, you need a dedicated team that monitors, escalates, and resolves these alerts. All of this is necessary to ensure your cloud product or service is available to your customers 24/7. That means having dedicated teams (DevOps, R&D, etc.) working around the clock using automation tools, monitoring tools, and more, sitting in front of screens throughout their shifts, getting alerts, resolving issues, and ensuring round-the-clock uptime and operations.

If you do that, hate to break it to you, but that’s a NOC. So if you’re asked again – do you operate a NOC at your company? From our experience at XiteiT, a lot of the people who automatically answer “No” at first, change their answer to “Yes” when we dig a little deeper.

In our cloud world, everyone operates a NOC, whether they choose to call it that or not. Sure, the name NOC sounds outdated (perhaps because of “network”), but the concept remains the same – ensuring efficient operations, monitoring, ongoing maintenance, and uptime of their solution in the cloud. It’s not just NOC operators sitting in front of a screen 24 hours a day, getting alerts, and acting according to a runbook or other set of protocols – it’s also DevOps teams carrying out advanced processes, customer success representatives checking on where an alert or issue with a customer stands by quickly logging into an easily accessible system, and R&D teams getting insights into which parts of their code causes the most alerts so that they can improve it and get better at debugging.

You’re Already Doing It – So Do It Right

Whatever you choose to call it – NOC, service ownership, accountability, operations – you’re responsible for ensuring your service is available to your customers. You’re operating a NOC, and if you’re doing it, do it right.

That means:

  • Creating a runbook to preserve institutional knowledge and streamline processes
  • Automate processes where relevant (using the runbook if possible), and create a hybrid of automation and human interference where needed
  • Align all various stakeholders such as R&D, DevOps, SRE, and management around a common culture of service ownership
  • Standardize events and incidents with a simple, easy, and unified format that contains not only technical information but also business information
  • Continue to nurture collaboration and communication between teams, so that everyone knows the impact on each other

Leave a Reply

Your email address will not be published. Required fields are marked *