Browse

• Abstract

SECTION I

## INTRODUCTION

THE global demand for products with ultrahigh reliability and lower life cycle costs is driving the need for a good design for reliability (DFR) program. However, theoretical knowledge is not enough. This paper discusses six DFR heuristics developed from years of experience; that need to be applied to ensure a successful design. Heuristics are words of wisdom based on robust knowledge and experience. Following heuristics are discussed with examples:

• Spend significant effort on requirements analysis
• Safety critical failure is not an option
• Measure reliability in terms of total life cycle cost, not the component cost
• Learn to say no to yes men
• Don't just design for reliability, design for durability
• Design for prognostics to minimize surprise failures
SECTION II

## SPEND SIGNIFICANT EFFORT ON REQUIREMENTS ANALYSIS

My experience of over 30 years suggests that about 60 percent of projects fail due to incomplete, missing or vague requirements. Since such a high risk exists in the requirements area, it makes sense to put extra effort into writing proper requirements by performing a detailed requirements analysis. If the requirements are not comprehensive, many costly design changes may be required at later stages of the development cycle which adversely impacts budget and schedule. As a minimum, capture the requirements given below:

• Customer Requirements
• Functional performance requirements, including what the product shall never do (such as delivering sudden acceleration in an automobile)
• Reliability requirements
• Durability Requirements (Duty Cycles of hardware and software for the expected life)
• Manufacturing line yield requirements
• Environmental requirements (temperature, humidity, altitude during operation and storage)
• Serviceability/Maintainability requirements (time to repair, tools required to repair and test, service intervals etc)
• User Interface requirements (to ensure ease of use, alerts, and robustness against human mistakes through human factors analysis)
• Input/Output Interface Requirements (for devices getting inputs from other devices or users and providing output to other devices or users)
• Installation Requirements (ease and mistake-proofing installation)
• Shipping/Handling Requirements (vibration loads, shocks, protective packaging)

It is extremely important that a product meets its customers' intended use requirements, not just the designer's specification; because if a product does not meet the user's expectations, reliability is of very little value. A QFD (quality functional deployment) or house of quality is a great tool to capture customers' expectations. It captures qualitative customer expectations and converts them into quantitative engineering requirements. It helps you prioritize the requirements, benchmark your competitor products and your predecessor products with respect to the requirements and it helps you make informed decisions if you want to change your requirements after the product requirements document is written.

After you have captured all the requirements mentioned above in a product requirements document, build a use/misuse model to drive design and testing. A good design and reliability test plan should consider all the possible stresses a device will experience in the field throughout its life. The use-misuse model captures how, when, where, and by whom the product may be used or misused over the life of the product. In other words, it considers the environment of use (temperature, altitude, humidity, cleanliness, home/hospital, height from floor, portable/stationary, country of use, etc), the user demographics (seniors/adults/children, non-responsive or responsive users, etc) and the frequency of use (continuous 24×7 operation or 8 hours a day operation, number of cycles on a part that moves, etc).

SECTION III

## SAFETY CRITICAL FAILURE IS NOT AN OPTION

When it comes to safety of the users, such as the passengers on Boeing 747 or humans exposed to nuclear leaks, failure is not an option. Even simple devices such as lasers, MRI scanners, implantable pacemakers and fire alarms are safety critical. Such devices require very high reliability standards. All life threatening failures are safety critical. Since a high reliability standard is hard to maintain in today's environment of intense pressure of shorter product-cycle times and stringent cost constraints, a good DFR process will help ensure that the device is highly reliable at lower life cycle costs of ownership [1].

Since critical failure is not an option, a FMEA (failure modes and effects analysis) or a fault tree analysis should always be performed to identify all failure mode scenarios. Make sure the usage scenarios identified in the use/misuse model are considered during the analysis. Once all failure modes are identified, select the failure modes that could cause critical failures and try to mitigate them through design, if possible. For example, if a joint on a pressurized container can leak because of excessive variation in its components, one can prevent the failure by choosing a joint less container design. If there are no joints, there will be no leaks.

If it is not possible or viable to design out the critical failure mode, the team must consider alternate design strategies. The alternate strategies should consider fault tolerance in the form of active or passive redundancy. Example: a medical lab device required cooling of thermal electronics with a fan. The failure of the fan could result in false positives on the test results. The problem was solved by using two lower price fans in active redundancy. That means the fans with low capacity shared the cooling load but if one of them failed, there was still sufficient cooling to prevent false positives.

SECTION IV

## MEASURE RELIABILITY IN TERMS OF TOTAL LIFE CYCLE COST, NOT THE COMPONENT COST

Reliability is defined as the probability that a system will perform its intended function without failure, under stated conditions for a stated period of time. Typically, people with non-statistical background have a hard time understanding the true impact of reliability or unreliability when stated in terms of probability, confidence levels, distribution models etc. Hence always measure and explain the impact of reliability or unreliability with an estimate of total life cycle cost for a component or device. It will be unwise to buy a cheap component that results in warranty costs more than several times the price of the component. Your goal should be to pay higher price with zero warranty costs: because the return on such an investment is very high.

RAND Corporation data shows that warranty cost is inversely proportional to the reliability of a device [2]. Hence, more and more manufacturers are willing to invest in reliability related tasks to try and reap the benefits in terms of reduced warranty costs. A 5% increase in reliability focused development costs is likely to return a 10% reduction in warranty costs. A 20% increase in reliability focused development costs will typically reduce warranty costs by half. Make sure you include all the tangible as well as non-tangible cost of failures, not only the warranty costs. Include cost of repair and maintenance, cost of downtime, cost of losing unsatisfied customers and litigation costs, for the expected design life of the product.

An example of life cycle cost estimation: A simple assembly design required buying a shaft and a gear and welding the two pieces together. The manufacturing and supply chain engineers will procure the best piece parts and a very reliable welding machine for mass production. One obvious reliability problem everyone overlooked was that no matter how much you control the welding process, some welded joints are likely to fail at different points in the life of the product, either during the warranty period or after the warranty period. The designers should evaluate whether welding is the right strategy for this design. Customers will have to pay for repairs and go through the frustration of downtime. A better design strategy may be a single piece design which eliminates the need for welding, thus improving the reliability of the product. However, the single piece design requires a higher initial cost due to a more expensive casting process. An analysis needs to be performed to evaluate the cost of preventing the failure from occurring vs. fixing the failure if it occurs.

A company assumed the expected life by users for this assembly is five years. Therefore the following life cycle costs (LCC) were assessed over a five year period (M stands for millions).

${\rm LCC}={\rm Parts Cost}+{\rm Inspection}/{\rm test Costs}+{\rm Scrap}/{\rm rework Costs}+{\rm Warranty Costs}+{\rm Safety Costs}$

${\rm LCC for two piece design}={6.5}{\rm M}+{2.5}{\rm M}+{3}{\rm M}+{20}{\rm M}+{5}{\rm M}={36.5}{\rm M}$

${\rm LCC for single piece design}={8.5}{\rm M}+0+0+0+0=8.5{\rm M}$.

Note that the parts cost for the single piece design is higher by 2M since it includes the initial higher investment in the casting process. The other costs in the single piece design are zero because there is no joint failure any more. It is possible the single piece design may have new failure modes. The costs of the new failures should be included. In this case, they were insignificant. The potential life cycle savings for this project are $36.5{\rm M}-{8.5}{\rm M}={28}{\rm M}$. Therefore the return on investment of preventing the failure from occurring, is $28{\rm M}/2{\rm M}=1400\%$

SECTION V

## LEARN TO SAY NO TO YES MEN

Why most engineers do not design for high return on investment? The answer is simple. There are at least two reasons. Majority of the engineering educators schools treat reliability is a statistical tool rather than a profit making machine. They fail to emphasize the profitability to a company. The second reason is even more startling. Most schools teach consensus agreement on the design. This guarantees a low return on investment because the teams vote “yes” to get over quickly with the meetings when they have too many unproductive meetings to attend. To get a high return on investment, we need do the opposite-go against the consensus. The team members must find faults with the design. That is when the great ideas happen!

How can we make a good decision on complex interactions until we see the situation from different points of view? If we introduce a new process or protocol, we need to see potential problems and hear potential harm from the experience of service personnel, inexperienced/experienced users who may not be direct customers (consumers), quality assurance staff, manufacturing engineers, and customers. Each will have a different concern. Let them challenge the oversights and omissions based on their experience. Each team member should mentally perform risk before saying “yes.” Dr. Edward De Bono, the author of Six Thinking Hats, suggests having about six members on a team, with each assigned to look at a situation from a specific point of view. Since each participant is assigned to view the problems from a different perspective, the process encourages balanced and comprehensive participation. As a minimum, the team leader should ask why each member is voting “yes.”

SECTION VI

## DO NOT JUST DESIGN FOR RELIABILITY, DESIGN FOR DURABILITY

Customers who look for reliability look for durability also. You would not buy a refrigerator which is reliable during the warranty period of one year and then starts failing soon after that so a manufacturer can make more profits on spare parts and service calls. As a consumer I want a refrigerator to last at least 10 years without a failure. Ditto with a car I buy. Manufacturers are now competing on reliability over a long life, which is durability. Hyundai's market share was going down until they started 10 year warranty on the power train. Their market share has gone up since in my observation of industry reports from time to time.

To design for durability we need to recall our Engineering101, where we were asked to design with 100% safety margin. This seems to be a forgotten art. Imagine a bridge designed for 20-ton trucks. It may have no problems in the beginning. But the bridge is degrading over time. After five years it may not be strong enough to take even 15 tons and it is very likely to collapse. And if there is a stress concentration anywhere, it may collapse with even a 10 ton load. If the bridge was designed for 40 tons, it can be very safe even in the presence of a stress concentration. This is the same as 100% design safety margin we were taught in engineering schools. For the same reason, the electronic components in the aerospace industry and the medical industry are de-rated 50% so that the actual load or stress is less than 50% of the strength. This helps to address deterioration of the components due to wear out as well. We can also call this paradigm “Design for Twice the Load or Twice the Life”. Fortunately, with a little creativity, designing for twice the strength can be very cheap. In a company in Michigan, the design life was increased by four-fold just by changing the heat treating method on a shaft and key assembly. Similarly the redesign of the first European jet aircraft (Comet) fuselage failures around the windows were done without increasing the thickness of the fuselage. They just changed the radius of the corners on the windows.

SECTION VII

## DESIGN FOR PROGNOSTICS TO MINIMIZE SURPRISE FAILURES

The purpose of prognostics is to detect the symptoms of malfunctions and failures, and warn the user well in advance before a product actually fails. Cars are an excellent example of the implementation of prognostics principles. The warnings displayed on the dashboard give us an early indication of problems related to low oil, low pressure, engine running hotter than usual etc that could lead to a failure eventually. It even indicates the need for a maintenance visit as per the manufacturers' recommended schedule even before a potential issue occurs.

Prognostics in devices analyze the data collected from various types of sensors in real-time to diagnose performance problems, discern impending faults, and schedule maintenance procedures. It may involve monitoring the vital signs of the device and using artificial intelligence to analyze what could go wrong, diagnose potential problems, and suggest an intervention. Let us take an example of a medical device [3]. The output of a pressure feedback sensor of a ventilator is fed into a microprocessor, and the data is used to adjust the motor speed to control the pressure delivered to the patient. The sensor output signal has to be in a certain range. If this output signal starts drifting towards the edges of the limit, the prognostic circuit that monitors the sensor output signal should display a “schedule service call” message. This way preventive action can be taken before a failure actually occurs, shutting down therapy and putting patient safety at risk.

SECTION VIII

## CONCLUSION

Apply the paradigms covered in this paper prior to approving specifications to ensure a successful design for reliability and safety. Wrong and incomplete specifications will result in a wrong design.

## Footnotes

D. Raheja is with the Raheja Consulting, Inc., Laurel, MD 20708, USA(draheja@aol.com)

## References

No Data Available

## Cited By

No Data Available

None

## Multimedia

Archive

Video

Video

Archive

### Carl S. Carlson Biography

This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available