Reliability: December 2008 Archives

Research Brings Results

user-pic
Vote 0 Votes
Defining Mean Time Between Pump Failures
by Heinz P. Bloch, PE

In September 2008, we were contacted by a Mechanical Engineering student. He was close to completing an internship with a major U.S. oil refinery and had been asked to set up a system allowing the refinery to monitor its pump mean-time-between-failures (MTBF).

Since being given the assignment, the young man had encountered serious roadblocks. His first question was how MTBF was being calculated in the oil refining industry. He advised that some people just take the number of months in service and divide by the number of repairs during that time; while others apparently perform a Weibull analysis. The Weibull analysis sounded much more accurate to him, but he wanted to stay with industry standards.

He ran into a second roadblock when attempting to define what a failure is. The refinery was currently contemplating a definition of "anything costing over $1,000", but he wanted to know what the standard was. Using the all-pervasive and now rather customary (and generally inadequate!) Internet search method, he found many articles that talked about MTBF studies. He did not, however, uncover any articles that shed useful light on how such studies were to be set up. Finally, he asked for help in finding some of the answers.


No Standard, Just Choices

There is no written standard on MTBF, but McKenna and Oliverson's "Glossary of Reliability and Maintenance Terms" (ISBN 0-88415-360-6) neatly defines it as:

"A basic measure of reliability for repairable items; the mean life during which all parts perform within their specified limits, during a particular measurement interval under stated conditions; an index of reliability calculated by dividing the total number of stoppages (outages) by operating time; the number of hours or cycles an item or items operated divided by the number of failures that occurred; commonly expressed as a six or 12 month rolling average; also expressed as one over the failure rate."

That pretty much explains what is common practice. By deviating from common practice, perhaps doing a Weibull plot, one achieves another benchmark. A Weibull plot is a reliability prediction technique used to evaluate the reliability parameters of components (e.g. bearings), and the data from it is more precise than MTBF calculations. These plots are also valuable during the development phase of a component. While Weibull plots are possible for failed pumps, the specialist using them will give up the straightforward comparison with others that use MTBF. That should be a concern for us.

More About Weibull For definitions of failures/metrics, etc., go to Paul Barringer's reading list for reliability1. Select an old document, MIL-STD-721. This is one of many military documents Paul has accumulated on his website. Specifically, go to page 11 for the words, which he has reduced to the equations below:



The MIL-STD-721 document reaches way back to the 1970's, and has now become obsolete. Better modern documents such as MIL-HDBK-338 are available today.

Quoting Paul Barringer, "The practice of summing the life of active units plus dormant units is a poor (lazy) engineering practice in calculating MTBF & MTTF metrics. It is poor because it overstates the results by including so-called life of dormant units. This sets a trap for naive people building RAM models of system performance because the flawed metrics will overstate system performance."

The military documents, such as DoD Ram Guide, RAM primer, MIL-STD-785, NASA-Std 8729.1, and other documents listed on the Barringer website, provide some excellent guides for building RAM models.

For Weibull analysis of components:

Reliability02DecJan09.jpg

Here, h is the characteristic life (i.e., the life at 63.2% of the cumulative distribution function, as this is a mathematical property of the distribution--in short, it's the single point representation of durability that you discuss without all of the if/and/buts). The b is the shape factor. For components, b tells you how things died (i.e., b<1 infers infant mortality, b~1 infers chance failures, and b>1 infers wear-out failure modes)--it is important to let the data speak rather than pontificating about how things died.

The term G(1+1/b) is called the Gamma function. For b = 0.5 the Gamma function is 2, for b = 1 the Gamma function is 1, and for b>1 it may be as small as 0.87 or as large as 1, so as a rough rule of thumb, the MTTF is roughly equal to h. You need to know the beta values to get the correct medicine because everyone will tell you things wear out, although, unfortunately, we kill more things than ever live long enough to wear out. (Note: On another website2, Dr. Robert Abernethy provides additional insight into the differences between MTBF and MTTF. Consulting his website may be important for students of the Weibull method.)

MIL-HDBK-338 on page 46 gives you a simple and clear definition of failure: "The event, or inoperable state, in which any item or part of an item does not, or would not, perform as previously specified." Reliability (lack of failures) always terminates in a failure (loss of the function when you needed it). Many other details about failures are also included in pages 46-47.

Finally, download the technical paper #2 from Paul Barringer's website at the bottom of the page called: Where Is My Data For Making Reliability Improvements. It gives other source documents and shows how to make the calculations.


Consider Feedback from an Asset Management Expert

Several comments were also obtained from John S. Mitchell, a self-described "advocate of change" whose "Asset Management Handbook" (ISBN 0-971-7945-1-0) is listed in our essential library3. John believes a meaningful comparison of MTBF must consider the service. Some, because of the fluid and/or operating conditions, will have shorter life expectancies than others. Mitchell uses the analogy of a coal miner who smokes; the miner probably has a shorter lifetime than a non-smoker office worker.

John Mitchell has been trying -- without success so far -- to find a parameter that will, with one number, describe the distribution around an average. Distribution around an average might be the percentage or number of the total population more than 20% below the average MTBF. As an example, suppose a plant reports an MTBF of 48 months. This would be showing performance a bit below best in class in Table 1, from "Pump User's Handbook: Life Extension" (ISBN 0-88173-517-5), but doesn't say much beyond that. Knowing also that 2% of the total population was below 36 months would be useful information because it would tell us that the plant was aware of certain pumps that failed more often than others. (In many refineries that number is somewhere between 7 and 10 percent). However, suppose one found out that the MTBF of 25% - 30% of the population was below 36 months, our diagnosis might be quite different and the opportunities for improvement would be shifting to a new focus.


Reliability03DecJan09.jpg

More Experience-based Advice You Can Use Today

The explanations offered by Paul Barringer and John Mitchell will have to be weighed by serious reliability professionals. Some of their suggestions were certainly considered in the mid- 1970's when we wrote about calculating pump MTBF based on actual operating time. Yet, industry soon decided that the numbers looked better when the calculation encompassed all installed pumps, irrespective of running or not running. Moreover, we have always advocated picking first the ripe, low-hanging fruit and hasten to note that not everyone has heeded this advice. We are where we are and the picture is not rosy. Repeat failures abound and continue to be tolerated. Repeat failures are warning signs; they are the inevitable precursors to extreme failures which very often kill people. To this day, we see CMMS (computerized Maintenance Management Systems) software that allows log entries in words such as "bearing replaced." To be of use to devotees of equipment uptime, a system must recognize that accurate failure analysis is required for failure avoidance. The entries must properly identify why a bearing failed and diligent failure analysis is absolutely necessary. Failure avoidance should be the ultimate goal because it means asset preservation and curtailment of money wasted on repeat repairs, not to mention costly remedial action after an extreme failure. All too often, persistent repeat failures are evidence of seriously flawed reasoning.

The engineering student employed as an intern at that refinery probably would not wish to lose the opportunity for easy tracking of pump failures. He was probably searching for answers to tasks assigned to him by others. We can only speculate that "persons unknown" are often looking for ways to bury the unacceptable performance of their refinery pumps. They would be delighted to obfuscate the issue by arguing over the most precise numerical evaluation. We, for our part, believe the most productive choice to reduce pump failures is to compare one's pump MTBF against other refineries and to itemize and comprehend what "others" do differently. Note that we are not advocating that you compare your refinery against any non-refineries, but you could make a relevant comparison between a given process unit at your refinery against a like process unit at another refinery.

Although such comparisons are usually made on the basis of MTBF, they are still more useful than anything else. They lead to the next and most important step towards implementing the necessary changes, i.e. intelligently upgrading pumps or systems that fail frequently. Typically, and with few exceptions, these changes must be made on pumps with low MTBF. The simple MTBF roadmap has been followed for the past 35 years; its relative success makes us comfortable with the McKenna-Oliverson definition mentioned earlier. In stark contrast, we consider endless debates over more precise or limited definitions both unproductive and all too often diversionary. In this context, debates generally solve nothing, they are mere exercises in bureaucracy. Exploring the failure history of a given pump in a given service in YOUR refinery and then comparing its reliability with that of a pump in the same service at SOMEONE ELSE'S refinery is of real value. It points out the way to lasting improvement.


How to Recognize a Good Pump MTBF

Examining pump repair records (and the admittedly imperfect MTBF metric) is deemed useful for responsible and conscientious pump users. In view of that fact, the preface to the 2006 Pump User's Handbook (ISBN 0-88173-517-5) alludes to pump failure statistics. Again, and for the sake of convenience, these failure statistics are often translated into MTBF. Agreeing with McKenna and Oliverson and because they wanted to avoid arguments on statistics, many of the best practices plants in the time period of the early 2000's simply took all their installed pumps, divided this number by the number of repair incidents, and multiplied it by the time period being observed. For a wellmanaged and reasonably reliability-focused U.S. refinery with 2,400 installed pumps and 312 repair incidents in one year, the MTBF would be (2,400/312) = 7.7 years. The refinery would count as a repair incident the replacement of parts, any parts, regardless of cost. In this instance, a drain plug worth $2.90 or a casing costing $8,000 would show up the same way on the MTBF statistics. Only the replacement of lube oil, a routine maintenance task, would not be counted as a repair.

Using the same bare-bones measurement strategy, and from published data and observations made in the course of performing maintenance effectiveness studies and reliability audits in the late 1990's and early 2000's, the mean-times-between-failures of Table 1 have been estimated. As of 2008, we have reason to believe the figures are valid within a 10% range of accuracy.

It should again be emphasized that many plants are achieving these mean times before a failure occurs. Why, then, the difference between a "best-of-class" U.S. refinery and a somewhat mediocre performer? There are many reasons that account for the difference. An unsuitable seal with a lifetime of just two or three months will have a catastrophic effect on pump MTBF, as would a badly-performing coupling or bearing. A good refinery frowns upon pulling piping towards the nozzle of a fluid machine, a mediocre refinery permits these disastrous procedures to continue for decades. One refinery supports its machine baseplates with epoxy grout, another refinery not only uses an inferior grout system, but might also allow it to soak with oil, degrade, and deteriorate. It is those types of things, and in areas of lube application, bearing housing protection, mechanical seal selection, installation methods and so forth, that the "best-of-class" differ from the weak performers.


What Constitutes a "Failure"

Finally, we were asked what constitutes a failure. In particular, we would like to comment on the sordid implications of limiting the term "failure" to events costing over $1,000.

In "Glossary of Reliability and Maintenance Terms" , McKenna and Oliverson defined a failure as:

"The termination of the ability of a functional unit to perform its required function; loss of function when the function is needed; the event, or inoperable state, in which any item or part of an item does not, or would not, perform as specified; any event that results in work performed on equipment, rather than scheduled preventive or predictive maintenance that requires the equipment to be shut down for repair or whose lack of repair could ultimately lead to an equipment shutdown. Synonym: malfunction."

We are much indebted to Paul Barringer for providing the many links that will facilitate serious research on reliability subjects. The "Essential Reliability Library, 2008" is the author's own recommendation. We consider it rather elementary, but representing a good first step for machinery engineers.

We accept this definition without qualification or reservation, and offer two examples that illustrate why. Years ago, a plant decided to count failures as only pumps that were taken to the shop for repair. One day, a badly mangled pump was being parked on a flatbed trailer near the shop. Because the pump never entered the shop, it did not appear on the failure record kept by this facility. Another plant decided that "rework" should not be counted as a failure. The facility defined as rework any successive event, occurring within three days of repair completion and restart. This plant then counted the second, or third, or fourth event as part of the same repair and made it show up only once on the refinery's failure log. Those were the games we have seen played when industry deviated from the definitions crafted by people with common sense and logic.

So, again answering the intern's question with an example: If an O-ring worth $2.20 allows oil to leak, it must be counted as a failure. If an impeller replacement were to cost $100,000 plus labor, it would also be called a failure. The most crucial issue identified here is the huge problem many refineries have today: It's a people and people-management problem. It's a problem with setting the wrong priorities. An individual tasked with managing equipment reliability must have the time and the motivation to read, to assemble a reference library (see below), to engage in effective root cause failure analysis, and to improve specifications for both new (future) and present (existing) equipment installed at his plant. He must also mentor others, and do so with knowledge and wisdom. If he neglects any of these duties, he should be viewed like a medical doctor lacking in those traits - society would deny him the title MD. Likewise, a mere dabbler in reliability engineering may not deserve to be called a professional. The medical analogy could also be extended to reliability practitioners that feed their minds only on the Internet. Reasonable people would never entrust life and health to a medical doctor whose knowledge was derived solely from the Internet, from its sales-driven advertisers and from conversations with the purveyors of anecdotal knowledge. Needless to say, a medical professional is being taught by other experienced professionals and will consult relevant texts. It should be no different with reliability engineers working in industry.

I have compiled an essential reference library for those who wish increase their knowledge, which can be found at the link listed in Reference 3 below. Rest assured that research via exclusively consulting the Internet will, at best, uncover disjointed pockets of information. The information so found will not follow a logical progression and will not even come close to conveying the coherent picture needed by true professionals.


References

  1. Paul Barringer's complete reading list can be found at the following link: http://www.barringer1.com/read.htm
  2. Dr. Bob Abernethy's website: http://www.bobabernethy.com
  3. The Essential Reliability Library 2008, a reading list compiled by Heinz P/ Bloch: www.uptimemagazine.com/reading.htm

Heinz P. Bloch (hpbloch@mchsi.com) is a professional engineer with offices in West Des Moines, Iowa. He advises process and power plants worldwide on reliability improvement and maintenance cost reduction opportunities. Heinz is the author of 17 full-length texts and over 400 papers and technical articles. His most recent texts include "A Practical Guide to Compressor Technology" (2006, John Wiley & Sons, NY, ISBN 0-471-727930-8); "Pump User's Handbook: Life Extension," (2006, Fairmont Publishing Company, Lilburn, ISBN 0-88173-517-5) and "Machinery Uptime Improvement," (2006, Elsevier-Butterworth-Heinemann, Stoneham, MA, ISBN 0-7506-7725-2)

One Out of Many...

user-pic
Vote 0 Votes

Pointing the Whole Organization in the Same Direction

By: Dr. Peter G. Martin

Although huge quantities of technology and intellectual property have been invested into the efficient and effective operation of industrial plants over the past century, many plants are still not operating to full potential. At least part of the reason for this has been the lack of focus on the value that the human assets can generate given a supportive, collaborative and empowering environment in which to perform. Mobilizing the valuable human assets to approach their full performance potential has been proven to result in a new operational paradigm which maximizes the business performance through all plant assets. This new paradigm is labeled "asset performance management".

Dealing with Labor

A considerable contributing factor in the engine that can drive toward effective asset performance management is a fundamental change in mindset and culture that is a holdover from the industrial revolution. Changing such a mindset requires that we first understand what it is and where it originated. As industrialization started to ramp up in North America and Western Europe, one resource that was abundant was people to work the plants and factories. Unfortunately, the vast majority of the available human resources were uneducated and unskilled. From the perspective of today's culture it may be hard to relate to how uneducated these people really were.

Most could not read, write or do even basic arithmetic. This led to a huge industrial challenge - how to take advantage of such a resource. This challenge was met by Frederick Taylor, who developed an approach called Scientific Management, which focused on gaining maximum value from an uneducated workforce. In today's vernacular, Scientific Management essentially turned people into minimally functional robots, each performing a well contained and well defined function within the context of the operation of the entire plant or factory. For example, a person may have been trained to watch a gauge and keep it in a certain range. When the needle moved out of the range, the worker would turn a hand valve in one direction. When the needle moved out of range in the other direction, he turned the valve in the other direction. This person might join the workforce of the factory at 16 years old and retire 50 years later having performed that contained task his entire career. This led to the concept of a labor force in industrial companies which was so unskilled that management believed it could not be trusted to perform duties beyond menial tasks. In essence, the laborers were almost treated as a kind of industrial slave.

 
This view of the labor force was exacerbated with the introduction of automation technologies. In many cases, the automation technologies were developed to perform the same functions laborers had performed. For example, automatic controllers providing direct manipulation of control valves essentially were replacing the laborer who had previously been stationed at that valve. Early automation advancements may have allowed a single laborer to perform the scope of functionality that six or eight laborers had previously been doing. As computer-based automation systems were introduced, single operators may have been able to oversee functionality that single operators may have been able to oversee functionality that would have required fifty people in the past. The basic value proposition for the introduction of automation technology was typically based on headcount reductions that could be achieved. Many manufactures seem to have viewed these reductions as a double benefit to the company. First was the cost reduction for not having to pay the displaced laborers. But second was the thought that there would be less of the low-level laborers to have to manage and worry about.

The culmination of the technology replacing people trend took place in the 1980s when a number of management scientists and engineers supported a notion referred to as "lights-out manufacturing." The thought process behind this trend was that technology may have advanced to the point at which no front line workers would be required at all, and without people in the plants there would be no need to turn on the lights. This was a short-lived movement due to the fact that the technologists found they could not anticipate every possible issue or problem that may arise in a plant and that at least some number of people must be in the plant, if for nothing else, at least contingency responses.

All of this has left a residual mindset in both industrial management and engineering that frontline personnel are a necessary evil that would be eliminated if possible. This has further led to an attitude prevalent across industry that the actions and activities of these frontline laborers have to be contained to only those essential to keep the plant operating. A good example of this mindset can be found in the design approach taken to the software in industrial workstations. This software is designed around the concept of "operation by exception," which basically means that the process operator is not supposed to do anything if the process is operating in a reasonable manner (except, perhaps read the sports page). When something unexpected happens, an alarm will cause the operator to follow a predefined procedure that should bring the alarm condition under control. Once the alarm condition has been addressed, the operator goes back to the newspaper. Additionally, engineers have developed and deployed advance control and other advanced techniques designed to operate the plant better than the operators could by themselves. The attitude of protecting the plant from the frontline laborers has continued, even while the average education and skill level of the labor force has been steadily rising. I have been in control rooms in which the frontline process operators all had college educations, and were still viewed as the unskilled, uneducated laborers of the early industrial revolution.

Organizational Silos

Having worked with industrial organizations for over three decades, I have frequently heard the rejoinder that "islands of automation" are to blame for the difficulties in developing higher performing operations. Although there is certainly much truth to this, I have found that "islands of organization" within industrial companies present a much more formidable barrier to performance improvement. As industrialization took hold and grew, the complexities introduced to manufacturing businesses became very challenging. In early industrial plants the same person might operate and maintain the equipment, design and commission new production areas and even account for the business. As more complex manufacturing systems have evolved, this level of generalization is just not feasible, which has led to the era of specialization.

Professionals specialized in engineering, accounting, management, purchasing of materials and shipping of finished products while frontline labor specialized in operations and maintenance of the equipment. This naturally resulted in separation of departments by function which, in turn, led to organizational silos. The development of specialists was necessary to the operation of the increasingly complex plants, but the development of organizational silos resulted in huge inefficiencies across organizations. Today it is not unusual to find maintenance departments that never directly communicate with operations or production teams. In some organizations they don't even like or trust each other. Adding to this, many IT organizations don't like or trust engineering, and the feelings are mutual. And nobody seems to get along well with accounting.

In many cases, the performance measures used to evaluate the performance of one group are in direct conflict with those of a second group. For example, maintenance teams are often measured on the availability of critical equipment assets while operators are measured on the utilization of the assets. Asset availability and asset utilization are inverse functions. That is, to increase utilization often requires the sacrifice of some availability and vice versa. Under this scenario, it is no wonder operations and maintenance teams seldom get along well.

As industry has invested huge amounts of capital into efficiency-increasing automation and information technologies, organizational silos have worked to destroy any potential value that may have been created by the technology. I was recently attending an industrial conference in which an engineer estimated that over 80% of all advanced control that has been implemented in industrial plants has been turned off by the process operators because the operators don't trust it. If engineering and operations had a better working relationship, based on common goals and objectives, this might not be the case. Organizational silos have tended to sub-optimize plant performance by sub-optimizing the human performance within the plants. Perhaps it is time for industry to start moving away from long over-worn prejudices and consider using the valuable human resources more effectively to drive better plant performance.

Measuring Performance

You are probably familiar with the common adage is: "people perform to their measures." I believe that this is very true. Most people want to be evaluated positively, and if they know that measures of performance exist for which they will be held accountable, they will strive to make those measures move in the correct direction. This is true whether the measures are driving desired behaviors or not. For example, measuring maintenance on asset availability and operations on asset utilization does not encourage the cooperative behaviors most industrial leaders would like to see.

In the early periods of industrialization, prior to the many inventions that drove the industrial revolution, most shops measured performance as each product was produced. Production was so slow that accounting for the business on the basis of piecemeal production was easily achieved. Management and operators of these firms knew exactly how they were performing compared to their plan at all times. But with the introduction of tools, such as the power loom in the textile industry, the pace of production increased to the point that piecemeal accounting was no longer feasible. The result was that industrial operations compromised and began measuring the business performance through monthly accounting methods. The primary output of these systems for measuring manufacturing performance was, and in most cases today still is, the variance report. Variance reports basically report the cost per unit product made for each product produced over the past month and displays this against a previously predicted expected value, referred to as the standard cost for the product class. This information may be acceptable for reporting manufacturing performance, but it has little value in enabling the plant personnel to change their behaviors to improve the performance of the operation. The information in the variance reports is both too little (providing a broad plant-wide perspective) and too late (after the month is over) to be of any value to the people actually working to keep the plants operating.

Monthly accounting systems for reporting of manufacturing and business performance represented a compromise introduced to industry out of necessity. The tools just did not exist to measure plant performance as the plant was running. Over many years, industry got lulled into believing that monthly financial reporting was a best practice that should never be challenged. Accounting professionals earned Masters Degrees on how to do monthly accounting. Once degrees are conferred on how to do any practice, it is very challenging to ever question the validity of the practice again. Therefore, when digital computers were generally introduced into industrial operations during the 1960s and 1970s, nobody seemed to raise the question as to whether accounting and performance measurement systems might be able to be developed to account for operations as originally intended - as the products are made - in real time.

Since monthly accounting measures from in cost accounting systems proved to be fairly useless in directing the actions of the operations and maintenance teams, a number of leading industrial companies started to develop a different set of operations performance measurements to supplement the accounting systems by providing more actionable feedback to plant personnel. The measures produced by these systems are commonly referred to as key performance indicators (KPIs). These KPIs were not developed to replace the accounting measures, rather they were developed because engineers and managers did not view the measures produced in the accounting systems as adequate for directing performance and improving actions in the plant. KPIs were typically developed to measure different operational silos within plants, such as maintenance, operations and engineering. By focusing on specific functions, they tend to offer better resolution, as well as better timeliness, than accounting measures. However, by being functionally focused, they also tend to discourage cooperation between organizational groups. Even though daily measures provided a great leap forward from traditional monthly measures, frontline personnel often find daily measures too long a timeframe to offer actionable feedback. A single operator may make hundreds of specific actions each day, and an overall daily measure does not provide the timeliness for them to understand the performance impact of any specific action.

To make matters worse, KPIs tend to have little credibility with accountants, whose job it is to measure the business performance. Although many KPIs may report in monetary terms, accountants often have great difficulty reconciling the values reported though the KPIs with the values in the accounting reports. When this happens, the accounting information clearly takes precedent. I actually heard one CFO say, "If one more engineer comes to me with one more KPI telling me how much value he has created, I'll fire his $&*!"

Dynamic Performance Measures

The value of an effective and comprehensive performance measurement system cannot be overstated when it's working to drive increased levels of performance from plant assets. Industry has reached the point where the performance measures that encourage the organizational silo mentality have to be abandoned in favor of measures that drive collaboration between traditionally competing functions. A new approach to performance measurement is required that combines the goodness of accounting and operational measures, provides performance measures for every person in the operation, within the time frame in which they do their job and for the same domain for which they are responsible. Such performance measures are referred to as dynamic performance measures (DPMs, See Figure 1).

The first issue that has to be addressed in developing a DPM approach is the availability of a database that provides real time input data. Fortunately, in most industrial plants, such a database is readily available in the form of plant sensors. Plant sensors continually measure physical and chemical properties, such as flow, level, temperature, pressure, speed and composition of process variables in real time. They are typically accessible by the installed automation systems and are used to monitor and control the process. Since both accounting and operational measures can be defined via equations, an experienced engineer can develop models of the equations in the automation system and determine which sensors can be used to populate the models needed to calculate the DPMs. The net result is a set of performance measures for each process unit or work cell in the plant.

martin01decjan09.jpg

In most plants there are simply too many measures for any one frontline person to deal with in real time. When working in real time environments, such as driving a car or operating a plant, ergonomic research has determined that most people can only consider up to four competing measures at a time. The question is which four measures are most appropriate for each person in the operation. This can be determined by taking the current manufacturing strategy into consideration. Dr. Thomas Vollmann developed a strategy analysis approach that can be very helpful in determining the DPMs for each person in the operation. The Vollmann Triangle diagram (See Figure 2) is helpful in understanding his approach. He points out that every plant should be working to a strategy designed to maximize the economic value of the plant output within the external and internal environment in which the plant is operating. Each manufacturing strategy should be defined by a set of actionable strategic objectives for the plant. An action plan, in which each action step is measurable, should be developed for each objective. The measures that fall out of the action steps are the strategic performance measures of the plant. These measures can be decomposed through the physical areas, units and major assets of the plant to determine the most important measures for each process unit according to the current strategy. This can then be used to prioritize the real time KPI and accounting measures for each person that impacts the performance of the operation. This information can then be presented on a performance dashboard contextualized to each person's responsibility. These are the DPMs of the frontline operators.

martin02decjan09.jpg

Developing these DPMs requires a real time computer engine that has builtin modeling capability. This is exactly what a standard automation system is. These DPMs must then be aggrandized to provide performance measures in real time for every other function within the plant. This can easily be accomplished by using a standard process historian which can also develop hourly, shift, daily, weekly and monthly accumulations of the DPMs. The availability of a comprehensive, real time, bottom to top performance measurement system provides the potential to drive improved performance in a number of ways previously unavailable to industrial operations. The basic value improvement that can be realized through better individual performance of frontline personnel, who can immediately see how their actions impact plant performance, has been proven to provide huge performance gains. However, this is only a starting point.

A New Perspective on Asset Performance Management

The availability of DPMs enables asset performance management in ways previously unavailable. As previously mentioned, traditional asset management involves operators driving the assets to maximize asset utilization and maintenance maintaining the assets to drive maximum asset availability. It is important to understand that neither asset availability, nor asset utilization, is a measure of the business objectives of any plant. Since they are inverse functions, operators and maintenance teams are frequently at odds with each other. So, in essence, traditional performance measurement systems tend to discourage cooperation and collaboration.

It's quite useful to use an analogy from the world of sports since nearly all professional sports are performance- driven. In automobile racing, the driver is analogous to the operators in industrial plants and the pit crews are analogous to the maintenance teams. In interviewing a NASCAR driver and a pit crew chief, I noticed how well they tended to cooperate. I asked them if, as is common in industrial plants, the pit crew was measured on the availability of the car and the operator measured on the utilization. They told me that although utilization and availability (or maintained state, which may be a much better measure than classic availability) are important, the primary measure of both is winning the race. I asked the pit crew chief if, upon detecting a problem with the car that might negatively impact the maintained state, he would call the car into the pit. He said, "Only if the problem means we won't win the race." Then I asked the driver if he would refuse to come into the pit if called in by the crew chief. He said, "no way, I know he is calling me in because I'll lose the race if something is not done." You see, for both parties, the primary focus is winning. And since they have a shared focus, they not only trust each other, but they cooperate extremely well. So how can we define "winning" for frontline maintenance teams and operators in industrial plants to engender the same level of cooperation and even collaboration?

Most plant management teams are measured on driving the maximum production value from the plant assets over an extended period of time. Certainly the utilization and availability of each plant asset impacts business value, but neither should be treated as the primary measure of performance of any industrial operation. The real victory in industrial plants is driving the maximum business value from each plant asset over time. If every operator and maintenance person has a primary measure based on this win, the behaviors of each will change drastically and the behavior of the plant will follow suit. Industrial companies must empower frontline teams with the information, in the form of DPMs, which will drive both collaboration and continuously improving business value from all plant assets.

martin03decjan09.jpg

Asset Performance Management (APM) driven by DPMs results in operations and maintenance working together to balance plant operations for optimal business value in all circumstances. The primary measure of both frontline teams is business value. Secondary measures for maintenance include the maintained state of the equipment and the probability of a failure over time. Secondary measures for operations include operation to maintained state and the probability of a failure over time. With all DPMs prioritized to the manufacturing strategy in place, every person in the organization will be pulling in the same direction. They will all be focused on "winning." They will all be focused on doing their part, but, even more productively, doing it within the context of the overall performance of the operation.

An interesting symmetry develops between operations teams and maintenance teams when a true asset performance management approach is taken. Both operations and maintenance have advanced in three steps with the evolution of technology in each area over time. Technology impacted operations by first providing regulatory control, followed by advanced control, then followed by process optimization. Maintenance had a similar progression from reactive, to preventive and then to predictive maintenance. As each progression was underway, the KPIs for each function were used to measure progress. The next step, asset performance management, occurs when the two frontline functions converge around new measures of performance that combine accounting and operational measures into a comprehensive, prioritized performance measurement system called DPM. This is the point at which cross silo collaboration takes hold and breakthrough levels of performance are attained.

Summary

Industry is on the verge of a major new wave in performance improvement driven by collaboration across organizational silos guided by Dynamic Performance Measures. For this new wave to really take hold, industrial management and engineering have to escape from the residue of the industrial revolution and stop thinking of operations and maintenance teams as an unskilled, uneducated labor force.

Frontline personnel are responsible for making, or losing, most industrial operations more money minute by minute than any other group in industry. It is time we start treating them as the performance managers they are by empowering them with DPMs.

On top of this, industrial management must start to break down the organizational silos that have existed in plants for decades while simultaneously preserving the specialized knowledge and capability of each team in the plant. Again, this can be achieved by empowering the teams with the correct performance measures that define the "win" for the business. When this is accomplished, the result is a new performance-generating collaborative approach to plant operation called asset performance management. Asset performance management is the industrial performance wave that is just starting to crest. Those industrial concerns that catch this wave will be the performance leaders of this new millennium.

Peter G. Martin, PhD, D. Eng., joined The Foxboro Company in the 1970's and has worked in a variety of positions in training, engineering, product planning, marketing and strategic planning. He left Foxboro to become Vice President at Intech Controls and also at Automation Research Corporation before returning to Invensys in 1996. Since his return, he has been VP of Marketing for Foxboro and Chief Marketing Officer for Invensys Manufacturing and Process Systems prior to moving into his current position, VP Strategic Ventures. He has written two books: Bottom Line Automation and Dynamic Performance Management: The Pathway to World Class Manufacturing. Dr. Martin holds multiple patents, including the patent for Dynamic Performance Measures, Real-Time Activity-Based Costing, Closed-loop business control, and Asset and Resource Modeling, which are the basis for Fortune recently naming him a Hero of U.S. Manufacturing. He was also recently named as one of the 50 Most Influential Innovators of All Time by the Instrument, Systems and Automation Society (ISA). Dr. Martin has BA and MS degrees in Mathematics, an MA degree in Administration and Management, a Master of Biblical Studies degree, and a D. Eng in Industrial Engineering and a PhD in Biblical Studies.

Click Here for the Case Study

Feature_Dec_Jan_2009.pdf


About this Archive

This page is an archive of entries in the Reliability category from December 2008.

Reliability: February 2009 is the next archive.

Find recent content on the main index or look in the archives to find all content.