Speeding Up Root Cause Analysis in Field Service


silver and gold steel tool

Field service is often a race against time. Should a fault occur with a customer asset, every second that machine is out of action means lost revenue and productivity for the business in question.

Finding the underlying cause of the fault can often be a long and drawn-out process of trial and error and field service professionals are constantly seeking new methods of speeding up the diagnostic endeavor. This is leading root cause analysis time to emerge as a key metric for determining the success and effectiveness of any field service provider.

Customers need to know what is wrong, why it has gone wrong, and what will be done to fix it – and in what time frame – as quickly as possible so they can make critical business decisions around the outage.

Internet of Things

Many field service providers today are using predictive maintenance technology to detect problems as they develop and deal with them as quickly as possible – ideally before they result in large scale shutdowns. However, many brands in the business are not using the technology to its fullest potential.

When using consistent analysis technology, you need to ensure your detection software is keyed to detect the appropriate metrics if your root cause analysis is to be carried out at a rate which will result in the minimum possible disruption for your customers. These metrics can be referred to as "the three Ws.”

  • You need to know when something is wrong
  • You need to find where the problem is located
  • You need to understand what the issue is

Manually scanning through the logs to detect anomalies, zooming in on the time and areas where the problem occurred, and searching for errors or other events which fall outside of regular parameters can be a time and resource consuming task and send your root cause analysis completion metric higher than is acceptable.

"You see a cluster of errors around the time of the blip, but they’re just the symptoms - complaining about failed transactions and timeouts,” says Zebrium. "So, you spend an hour working backwards from there – and you finally spot a smoking gun – a benign looking event that changed network settings. Sure enough – someone inadvertently "broke” the network. You keep checking to make sure there aren’t other unexpected events, and finally conclude this was the root cause. Total time taken 1.5 hours.”

Far more effective is to apply some form of automated technology which can perform these tasks for you and significantly reduce your root cause analysis time.

Machine Learning and AI

When you apply machine learning and AI technology to your root cause analysis routine, you gain access to software which can scan through millions of variables and identify anomalies in a fraction of the time it would take a human operator.

Not only this, but machine learning technology can become increasingly more adept at recognizing the specific patterns associated with the assets and industries you are responsible for – thus increasing the accuracy and speed of your root cause analysis over time.

This is why we are seeing an increase in the number of providers offering "root cause as a service” type platforms which are dedicated to a single role and outsource the hardest and most complex parts of this process to experts in the field. With some RCaaS providers claiming their method operates at an accuracy level of 95% and higher, entrusting root cause analysis to a third party might offer a convenient and effective way of speeding up this essential process whilst maintaining the high levels of accuracy your customers demand.

"RCaaS automatically finds the best possible root cause indicators in the logs for any kind of software or infrastructure problem,” says Zebrium. "In simple terms, it finds the same log lines a human would have eventually found by manually digging through the logs, but it does this in-real time and at any scale. The results are delivered as a root cause report containing an English language summary (leveraging the GPT-3 language model), a Word Cloud that brings attention to important words in the log lines, and a small sequence of log lines (typically 10-60) that highlight the root cause and symptoms.”

Final Thoughts

Speeding up the process of root cause analysis is an essential step along the road to providing data drive, IoT powered predictive maintenance. We often spend too much time on the technology which allows for predictive maintenance and neglect the fine details which enable it. RCaaS is just one method providers can use to carry out root cause analysis faster and more accurately than would previously have been possible.


You can hear Steele-Waseca Cooperative Electric General Manager, Syd Briggs speak on root cause analysis and more at Field Service Palm Springs 2023, being held in April at the JW Marriott Desert Springs, Palm Springs, CA.

Download the agenda today for more information and insights.