Back to Blog

8 Data Center Operations and Maintenance Best Practices

Image of Clint Bradford
Clint Bradford

On July 18, 2015, the New York Stock Exchange (NYSE) shut down. It happened on a Wednesday, lasting nearly four hours, from 11:30 in the morning until 3:10 in the afternoon. All trading came to a halt, leaving traders on the floor twiddling their thumbs until just before the closing bell. On the same day, the Wall Street Journal’s website crashed and United Airlines grounded flights globally for nearly two hours due to technical problems, with many suspecting a coordinated cyberattack. Though the cause turned out not to be malicious, the root of these failures was no less troubling.

Technical glitches and poor IT infrastructure caused all three issues. As Matt Gerber, CEO of Digital Fortress, told CRN “that for those failures to have occurred, the systems must have had shortcomings in built-in redundancy, fault tolerance, and instantaneous failover capabilities, and they likely were not properly tested.” Fault tolerance suggests the capabilities of an IT system, or part thereof, will continue to operate uninterrupted even if a component fails. Redundancy refers to backup networks or devices within the infrastructure that allow systems to keep operating when something fails. Failover denotes the ability to switch to this backup system instantly and seamlessly. 

Technology is imperative in today’s world, and ignoring IT infrastructure can lead to costly and embarrassing failures. The NYSE event reminded IT administrators that they need to apply forward-looking and rigorous testing procedures to evaluate systems thoroughly. This is particularly important in data centers, which require operators to maintain strict security protocols to keep customer data secure while ensuring power and cooling capacity are maintained. Using data center operations and maintenance best practices can help data centers provide safer and better services for their customers—especially when data centers are connected to Internet of Things (IoT) devices

Data Center Operations and Maintenance Best Practices

The following are some of the most important best practices to help data center operations and maintenance support security.

1. Ensure Uptime by Creating Redundancies

One big challenge for data centers involves creating alternate pathways for networked equipment and communication channels should something fail. These redundancies create a backup system that allows technicians to perform maintenance and install system upgrades without interrupting service, or switch to the alternate if the primary system fails. 

Data centers run on tier systems, ranked from 1 to 4, which dictate how much uptime customers can expect to receive. Here is a quick look at the tier system:

  • Tier 1 has no redundancies and the lowest guarantee of uptime at 99.671%, with downtime of 28.8 hours expected yearly.
  • Tier 2 offers 99.749% uptime, with an expectation of 22 hours downtime yearly, and includes partial redundancy for powering and cooling critical systems. 
  • Tier 3 allows for concurrent maintenance, with expected downtime of 1.6 hours yearly and 99.982% uptime.
  • Tier 4 guarantees 99.995% uptime, with expected yearly downtime of only 2.4 minutes, offering not only full redundancy with compartment and automatic fault tolerance, but also twelve hours of continuous cooling. 

Software and analytics within smart building systems can help data centers monitor and conduct preventative maintenance before downtime occurs. 

2. Keep Indoor Climates Stable 

Computers, servers, and other equipment often require controlled temperatures and humidity in order to properly function and protect the system’s data storage and software. For this, remote environmental management can offer a solution. IoT devices can monitor temperature and humidity, spot heat sinks, and identify when filters within HVAC systems require replacing or cleaning. Additionally, smart sensors within the HVAC system should regularly be checked to ensure they work properly. 

3. Create Stronger Testing Protocols

Data center operations and maintenance best practices concerning testing could have prevented the NYSE from crashing, according to Lief Morin, president of Key Information Systems. He recommends that data centers test software updates and any other new technology prior to deployment. As Morin notes:

There's systems and procedures. We call it: build, test, run. We have separate sets of systems that do all three of those things. You build, then test for performance, reliability, and upgradability, then there's a separate set of architectures you run on.

Testing any technology added to the system is critical to preventing failures. 

4. Implement Predictive Maintenance 

Data center operations and maintenance best practices are not just a set of rules. Together, they focus on the goal of continuous operations, setting up strategies for data centers to supply sufficient resources and defining roles and responsibilities. A key component of this is maintenance.

Inspections and preventative maintenance are often performed at time-based intervals to keep systems and components from failing. However, this approach does not consider actual operating conditions. Smart monitoring technologies can transform maintenance through the use of analytics. An advanced analytics platform with machine learning capabilities is able to anticipate maintenance needs by identifying trends and predicting when equipment will reach the thresholds at which it will likely fail. 

5. Staff for Operating & Maintaining Data Centers

Employees who maintain and operate data centers play an integral role in ensuring the system operates continuously. As a result, critical staff should be trained to implement data center operations and maintenance best practices, and tasks and responsibilities should be made clear. This keeps everything running smoothly, particularly in complex systems. 

6. Keep It Clean

Modern technology does not like dirt. Along with preventative maintenance, creating a clean environment within a data center will extend life spans of equipment and limit downtime. Providing mats at entrances and replacing them when dirty, banning food and drink in sensitive areas, routinely mopping floors and keeping generators, HVAC filters, electrical systems, and heat exchangers clean all help a data center run better. Janitorial tracking software can help to ensure cleaning occurs on schedule, and environmental sensors connected to smart systems can help identify problems when they arise.

7. Practice Good Data Hygiene

Data storage capacity is no longer an issue, as storage technology has improved and the cost of computer memory has plummeted. This includes the immense amount of data networks gather through IoT devices, which inform decision-making through analytics software that can look at trends and create actionable insights. Yet much of this data is not used.

A 2016 survey found that a third of stored and processed data is outdated, unnecessary, and inconsequential. Even with the lowered price of computer memory, storage costs trillions of dollars globally every year. Through deleting and archiving data, the burden on data centers’ IT infrastructure is reduced, which results in lower cooling costs and power demand while more effectively allocating processing resources and storage. 

8. Maintain Emergency Preparedness

Even with the best infrastructure, most capable staff, and top-notch smart systems, data centers cannot totally eliminate all risks. Preparing for unplanned disruptions, even if they never occur, ensures employees can react to these emergencies more effectively, timeously, and free from miscalculations. Part of this involves developing detailed emergency operating procedures that show workers what to do in case certain scenarios occur. 

Such preparedness allows data centers to respond appropriately by providing personnel with advanced knowledge on how to isolate certain faults and restore services, along with knowing when to bring backup systems online. Many of these operations can be automated, monitored by IoT sensors, and triggered according to analytics software when needed, while being overseen by competent and well-trained frontline technicians. 

Drills for likely scenarios require rehearsing, which should happen regularly, while a clear chain of command should be developed beforehand in case a situation escalates. Evaluating these practice sessions will ready teams when they need to respond to emergencies, while reaction to these emergency drills should be analyzed and altered to make them more effective. 

Partner With an Expert

With all that is involved in data center operations and maintenance, best practices play a critical role in protecting critical systems and ensuring continuous service for its clientele. Partnering with the right MSI to integrate systems and deploy advanced analytics can make implementing those best practices easier. By choosing the right technologies and working with control experts who understand the complex needs of data centers, you can optimize operations safely and securely.

Buildings IOT provides transformative technology and services to help you implement operations and maintenance best practices in data centers. Contact our team today.



Schedule a demo

Recent Posts

How Data Center Integrations are Different

Jewel Turner

The data center industry has experienced huge growth over the past 10 years and it doesn’t show...

Read more

Successful Project Completed for the City of San Rafael

Jewel Turner

In 2018, we set out to complete a project for Enovity and Bay City Mechanical. Bay City Mechanical...

Read more