TAG | troubleshooting
As we all know, there are 10 kinds of people in the world.
For those of you who haven’t read Zen and the Art of Motorcycle Maintenance by Pirsig, he spends at least one chapter at the beginning talking about how we naturally tend to divide things into smaller pieces in an effort to understand them. The novice looks at a motorcycle and sees the visible things, like a seat, handlebars, and wheels, but the expert sees a fuel system, a cooling system, and the suspension. The same thing or system (motorcycle) can be subdivided different ways depending on what we want to do with it.
My tongue-in-cheek title of this post is an acknowledgement of the many ways we can categorize something like Automation Software, but for my purposes today, I’m making two categories: hammers and levels.
A carpenter carries both a hammer and a level, but the two have fundamentally different failure modes. If a hammer stops working, you’ll know it as soon as you try to use it. As long as it hammers in a nail, it doesn’t matter if the hammer is rusty, dirty, scratched or dented, it’s a working hammer. The level, on the other hand, is a measuring instrument. As novices, we assume that it comes from the factory pre-calibrated, and we happily hang our shelf or picture without testing it, but a professional carpenter knows that they have to check their levels for accuracy, or else the level is useless. You could use a level for years, but if one day it stopped being accurate, you probably wouldn’t know. This is a very different situation than the hammer.
Software in general, and automation software in particular, both have similar examples. You never need to “calibrate” the Axis 1 Advanced proximity switch on a machine because if it doesn’t work, the machine won’t make parts (and you’ll know about it instantly, usually via a 2 am phone call). On the other hand, testing data collection logic is surprisingly difficult because the only way to test it is to compare it with a known-good equivalent. Assuming you created this data collection logic to automate away a manual process, the only measuring stick we can check it against is the manual process we’re replacing. Once the system is bought off and we get rid of the paper system, how do you prove that subsequent changes don’t break the data collection system?
It’s tempting to brush off the problem by saying that anyone who makes a subsequent change has to do a full regression test of the system, including the data collection system, but anyone who has worked in a real factory environment knows that this is unlikely to work in practice. Full regression tests are expensive.
In the greater software world, they use automated unit tests. They take the logic being tested and they run it through a series of automated checks to make sure nothing changes. This works well in an environment like PC programming, but is very difficult in practice for PLC programming because (a) you usually need a physical PLC to execute the logic (unless you have some kind of emulator) and (b) the people maintaining the system are likely not familiar with concepts like unit tests, and are likely to undervalue their importance.
This screams for a system-level solution. Take accounting for instance. Double-entry accounting (the use of debits and credits to force every action to be made twice) is deliberately created to help catch manual entry errors. If your debits and credits don’t balance, you know you’ve made a mistake somewhere, and you go back and check your arithmetic.
In the automation world, the solution is to measure every input to the data collection system two ways, analyze and aggregate both separately, and compare the end results. Create a system warning or fault if the results don’t match. For instance, measure the amount of material going into the machine, and measure the amount of material exiting the machine, both as finished product, and scrap. If the input doesn’t match the sum of the outputs over the same time period, you know you have a problem. The system becomes self-checking (a hammer rather than a level).
If you follow this route, you need to take care to avoid some common traps:
- Don’t re-use logic between the two sides (in fact, try to make them work differently)
- Try to use different sensors or sensing methods (can we measure the input by speed and duration, and the output by parts and scrap weight?)
- Record both, so if there is a discrepancy, you can check them against manual measurements
It sounds like more work, but making the system self-checking actually reduces the amount of testing you have to do, so it’s not as bad as you think. Besides, writing code is a lot more fun than testing it. We automate everyone else’s job, why not the boring parts of ours?
I’ve been working on an HMI application recently (RSView ME). At the end of the day I saved everything, closed the project, then used Application Manager to backup the application to an .apa file and store it on the server.
This morning I opened the local copy of the application, and a significant portion of the work I had done yesteday was “undone”. It was only one of the displays, but all of the changes were missing. I restored a copy of the backup, and it was the same there. Now I’m pretty paranoid about saving my work, so I probably saved that display about 50 times yesterday, and it was definitely saving somewhere. The question was, where?
Here’s what I did to get myself into this problem:
- Started by copying one display (let’s say “A”) to make a new one (let’s say “B”).
- I then renamed the duplicate (let’s say from “B” to “C”). I probably didn’t close the display before renaming it (which might have caused the problem).
- I made all of my changes to the display, saving along the way.
It turns out that the changes I made were being saved against “B”, not “C” even though “B” didn’t really exist anymore. I was able to recover from this problem with the following procedure:
- Close the application (but leave RSView Studio open if you want)
- Goto C:\Documents and Settings\All Users\Documents\RSView Enterprise\ME\HMI projects\[proj-name]\Gfx
- Rename C.Gfx to D.Gfx
- Rename B.Gfx to C.Gfx
- Open the application again in RSView Studio. The changes should be restored.
I hope that helps someone one day. 🙂