You may have noticed I recently added a new section to this site: Patterns of Ladder Logic Programming. My goal, as usual, is to try to help new ladder logic programmers come up to speed faster and without all the trial and error I had to go through.
The new Patterns section is an attempt to distill ladder logic programs into their component parts. I assume the reader already knows the basic elements of ladder logic programming, such as contacts, coils, timers, counters, and one-shots. The patterns describe ways of combining these elements into larger patterns that you’re likely to see when you look through real programs. In my experience, you can program 80% of the machines out there by combining these patterns in applicable ways.
The Patterns section isn’t complete yet, but I will be adding to it slowly over time. If you think of a pattern that’s blatantly missing, please send me a note so I can include it.
Beckhoff releases new versions of TwinCAT 3 fairly often, and especially since this is a new platform you probably want to stay on top of their new updates for improved stability and new features. Here are some hard-won lessons I wanted to share with you about how to upgrade your production system to the latest TwinCAT 3 release:
Have a Test System
You should definitely have an offline test system for many good reasons. This test system should have the same operating system version as your production system and should ideally be the same hardware, though I realize that’s not always feasible. It could just be an old desktop PC you have sitting around, but that’s better than nothing. TwinCAT 3 is free for non-production use so you have no excuse for not having an offline test system.
Test your upgrade on the Test System First!
Never try upgrading your production system without running through a dry-run on your test system first. Get a copy of the latest TwinCAT solution from your production machine and get it running on your test system. It doesn’t matter if you don’t have I/O attached because the runtime will work just fine anyway. After your perform the upgrade procedure on your offline test machine, make sure you do a thorough test, including a reboot.
Follow These Steps
- Stop the machine and make it safe
- Put the runtime into Config mode
- Uninstall the old version of TwinCAT 3
- Also uninstall the old version of the Beckhoff Real-time Ethernet PnP Drivers
- Install the new version of TwinCAT 3
- Open the TwinCAT 3 solution
- Re-install any custom libraries, if you have any (optional)
- Go to the tool for configuring Real-time Ethernet devices, and install the new driver on your EtherCAT cards
- Re-link your EtherCAT master to your EtherCAT adapter under I/O, just in case
- Build each PLC project (individually, don’t use Rebuild All because it sometimes ignores errors)
- Check that you didn’t lose any I/O mapping
- Activate boot project on each PLC project
- Check under System that it’s configured to start in Run mode (if that’s what you want)
- Activate configuration and restart in run mode
- Test by doing a reboot
Upgrading the Real-time Ethernet drivers is critical. We were experiencing cases where the EtherCAT bus would just cut out on us, but only on one machine. All of our machines were upgraded to the same version, so we initially thought it must be a hardware issue. It turned out that we weren’t upgrading the Real-time Ethernet driver when we upgraded TwinCAT 3 versions, so this machine had an old version of the driver loaded, and all the other machines had a newer version. After upgrading the driver, the problem went away, so upgrading the driver is a critical step.
If you find that you did lose your I/O mapping, make sure you’ve built all your PLC projects (which generates a TMC file) and then close TwinCAT 3 XAE down and revert your .tsproj (TwinCAT solution project) file back to the original state. Then start again at the step where you open the TwinCAT 3 solution. Now you should find that your I/O mapping is back. That’s because the inputs and outputs of each PLC project are compiled into the TMC file and TwinCAT 3’s system manager links I/O against that. If the file doesn’t exist (or they changed the format during the upgrade) then it’ll just delete the links. However, the links still exist in the original .tsproj file, so creating the TMC file and then reverting the .tsproj file will put everything back to a happy state. This is also a useful trick when you’re moving the project to a new PC and you didn’t bring the .tmc files along for the ride (because they’re quite large).
I reviewed TwinCAT 3 in February of 2013 and it was a mixed bag. I lauded the amazing performance but warned about the reliability problems. I think it’s time to revisit the topic.
Things have improved greatly. When I wrote that review we had 2 production systems running TwinCAT 3 (the 32-bit version). We’re now up to 5 production systems with another on the way, all running version 3.1.4016.5 (which is a 64-bit version). The product has been more stable with each release. First we tried switching to a Beckhoff industrial PC, but we still experienced two blue screen crashes. We’ve then turned off anti-virus and disabled automatic windows updates. So far I haven’t seen another blue screen on that system, for about two months.
Manually installing windows updates isn’t a big deal, but it’s unfortunate to be running a PC-based control system with no anti-virus. Our industrial PCs are blocked from going online, and each one is behind a firewall that separates it from our corporate network, but it’s still a risk I don’t want to take. Industrial Control vendors continually tell us their products aren’t supported if you run anti-virus, and I don’t see how anyone can make statements like that in this day and age.
The performance of the runtime (ladder logic) and EtherCAT I/O is still absolutely amazing.
While the IDE is much better than the TwinCAT 2 system, the editor is still quite slow (even on a Core-i7 with a solid state drive).
The Scope is now integrated right into the IDE, and I can’t give that tool enough accolades. I recently had to use Rockwell’s integrated scope for ControlLogix 5000 and it’s pitiful in comparison to the TwinCAT 3 scope.
The TwinSAFE safety PLC editor is light years beyond the TwinCAT 2 editor, but it’s still clunky. It particularly sucks when you install a new revision of TwinCAT 3 and it has to upgrade the safety project to whatever new file format it has. We recently did this, then had to add a new 4-input safety card to the design, and it wouldn’t build the safety project because of a collision on the connection ID. It took us a couple hours of fiddling and we eventually had to manually set the connection ID to a valid value to get it to work. On another occasion, after a version upgrade, I had to go in and add missing lines in the safety program save file because it didn’t seem to upgrade the file format properly (I did this by comparing the save file to another one created in the new version).
The process of upgrading to a new TwinCAT 3 version often involves subtle problems. The rather infamous 3.1.4013 version actually broke the persistent variable feature, so if you restarted your controller, all the persistent variables would be lost. They quickly released a fix, but not before we experienced a bit of pain when I tried it on one of our systems. I’m really stunned that a bug this big and this obvious could actually be released. It’s almost as if Beckhoff doesn’t have a dedicated software testing department performing regression tests before new versions are released, but certainly nobody would develop commercial software like this without a software testing department, would they? That’s a frightening thought.
I ended my previous review by saying I couldn’t recommend TwinCAT 3 at this time. I’m prepared to change my tune a bit. I think TwinCAT 3 is now solid enough for a production environment, but I caution that it’s still a little rough around the edges.
Motion control is pretty complicated.
There’s been something really bothering me about the “integrated” motion control you find in PLCs these days (notably Allen-Bradley ControlLogix and Beckhoff TwinCAT). Don’t get me wrong, they’re certainly integrated far better than stand-alone motion controllers. Still, it just doesn’t “feel” right when you’re programming motion control from ladder logic.
When I’m programming a cylinder motion in ladder logic, I would typically use a five-rung logic block for each motion (extend/retract). One of the 5 bits is a “command” bit. This is a bit that means “do such-and-such motion now”. Importantly, if I turn that bit off, it means “stop now!” This works well for a cylinder with a valve controlling it because when I turn off power to that valve, the cylinder will stop trying to move. It would be nice if integrated motion was this simple.
It’s interesting to note that manual moves (a.k.a. “jogging”) are usually this simple. You drop a function block on a rung, give it a speed and direction, and when you execute it based on a push-button, the axis jogs in that direction, and when the push-button turns off, it stops jogging. Unfortunately none of the other features are that simple.
All other moves start motion with one function block and require you to stop it with another. The reason it works like this is because motion controllers also support blended moves. That is, I can first start a move to position (5,3) and after it’s moving there I can queue a second move to position (10,1) and it will guide the axes through a curved geometry that takes it arbitrarily close to my first point (based on parameters I give it) and then continue on to the second point without stopping. In fact you can program arbitrarily complex paths and the motion controller will perform them flawlessly. Unfortunately this means that 90% of the motion control logic out there is much more complex than it needs to be.
Aside: in object-oriented programming, such as in Java or .NET, it’s pretty normal to have to interface with a relational database such as MySQL or Microsoft SQL Server. However, when you try to mesh the two worlds of object-oriented programming and relational databases, you typically run into insidious little problems. Programmers call this the Object-relational impedance mismatch. I’m sure that if you added it up, literally billions of dollars have been spent trying to overcome these issues.
My point is that there is a similar Ladder logic-motion control impedance mismatch. The vast majority of PLC-based motion control is simple point-to-point motion. In that case, the ideal interface from ladder would be a single instance of a “go-to” function block with the following parameters:
- Target Position (X, Y…)
- Max Velocity
- Acceleration Jerk
- Deceleration Jerk
When the rung-in-condition goes true on this block, the motion control system moves to the target position with the given parameters, and when the rung-in-condition goes false, it stops. Furthermore, we should be able to change any of those parameters in real-time, and the motion controller should do its best to adjust the trajectory and dynamics to keep up. That would be all you need for most applications.
The remaining applications are cases where you need more complex geometries. Typically this is with multi-axis systems where you want to move through a series of intermediate points without stopping, or you want to follow a curved path through 2D or 3D-space. In my opinion, the ideal solution would be a combination of a path editor (where you use an editing tool to define a path, and it’s stored in an array of structures in the PLC) and a “follow path” function block with the following parameters:
- Path Tolerance
When the rung-in-condition is true, it moves forward along that path, and when it turns off, it stops. You could even add a BOOL parameter called Reverse which makes it go backwards along the path. The second parameter, “Path Tolerance” would limit how far off the path it can be before you get a motion error. I think this parameter is a good idea because it (a) allows you to initiate the instruction as long as your position is anywhere along that path, and (b) makes sure you’re not going to initiate some wild move as it tries to get to the first point.
A neat additional function block would be a way to calculate the nearest point on a path from a given position, so you could recover by jogging onto a path and continue on the path after a fault.
Obviously there needs to be a way for the PLC to generate or edit paths dynamically, but that’s hardly a big deal.
Anyway, these are my ideas. For now we’re stuck with this clunky way of writing motion control logic. Hopefully someone’s listening to us poor saps in the trenches!
Happy Canada Day!
Some of you may wonder if I’d fallen off the face of the Earth, but the truth is life just gets busy from time to time. Just for interest’s sake, here’s my latest fun project: an Arduino UNO running ladder logic!
You may remember I wrote a ladder logic editor about 5 or so years ago called SoapBox Snap. It only had the ability to run the ladder logic in a “soft” runtime (on the PC itself). This is an upgrade for SoapBox Snap so that it can download the ladder logic to an Arduino and even do online debugging and force I/O:
I haven’t released the new version yet, but it’s very close (like a few days away probably).
Edit: I’ve now released it and here is a complete tutorial on programming an Arduino in Ladder Logic using SoapBox Snap.
I’ve mentioned before that we first had a TiVo, and after getting an HDTV (TiVo doesn’t support an HD model compatible with Canadian cable TV) we got a Boxee Box and dropped our cable TV subscription. Basically we decided to stream all our TV from the internet. Now that Boxee got bought out and stopped updating the firmware for its rather outdated hardware we decided to move on. I think a lot of people are considering the leap away from cable or satellite and into streaming their TV and movies from the internet, so let me share our experiences.
First of all, I’m not convinced that a smart TV or embedded device like the Boxee Box or Roku is the answer. The Boxee Box was great for streaming content from other PCs in the house, but not for streaming stuff from online. These embedded set-top type devices, whether they’re built into the TV or not, suffer from two major drawbacks: one is that they’re usually underpowered for the price you pay, and two is that the software is typically some kind of highly customized embedded Linux with some custom user interface software built on top of it. On the Boxee Box, the web browser always seemed to be a bit slow, flaky, and the flash was typically out of date compared to what the online TV streaming sites were using. Our relatives recently purchased a brand new smart TV, and the web browser didn’t seem to support the latest flash that they needed to visit some site. Given the premium you pay for smart TV features, that seems a bit hard to swallow. Flashing the firmware was the next step, but I’m not sure how that went.
Given that background, we decided to take the plunge and just buy a PC and hook it up to the TV, then get a wireless keyboard and mouse combination. You can get a pretty good PC (Core i5) on sale for less than $500, and I’m pretty sure that even a $300 bargain model would probably do everything you wanted, and that’s less than the premium you might pay for a smart TV.
We couldn’t be happier with the new PC solution. The Windows software just stays up-to-date. It’s fast (much faster than any other set-top-box hardware you’ll see today), the interface is familiar, and all your hard-won Windows knowledge will come in handy if you have any problems. Netflix and other Hulu-like services work great. I also like that the kids can have their own login that’s limited by the parental controls feature of Windows (which I’d never used before, but is actually quite advanced).
I know one issue is where to locate the PC. We actually already had a place in the entertainment cabinet where a full desktop tower could fit, so it wasn’t a big deal, but if that’s an issue for you, there are other smaller (more expensive) options like the Zotac ZBox out there, which is just a full blown PC in a small form factor. I have also seen a wireless HDMI device (it works line-of-site) so you could hook it up to a laptop that you have on a table beside the couch (the TV would show up as a second monitor). Another issue is having the wireless keyboard and mouse out on the coffee table. That’s not a big deal if you have somewhere handy to stow it when not in use. Having a full keyboard certainly makes certain things a lot easier (searching for content, entering passwords, etc.).
One final caveat – the one bit of content that’s very hard to find online is sports. If you’re really into sports (we’re not) then you’ll need to keep some kind of paid TV subscription. That’s just the way it is.
Plus you get the benefit of a full-blown PC handy in your living room. Overall I think it’s the best solution we have right now, and it’s what I’d recommend if you’re looking to take the plunge.
Let’s assume you already know everything there is to know about motion control… you can jog a servo axis, home it, make it move to a position using trapezoidal or s-curve motion. Now what?
Sooner or later you’re going to find yourself with 2 or more axes and you’re going to want to do something fancy with them. Maybe you have an X/Y table and you want it to move on a perfect 45 degree angle, or you need it to follow a curved, but precise, path in the X/Y plane. Now you need coordinated motion.
Coordinated motion controllers are actually quite common. Every 3, 4, or 5 axis mill uses coordinated motion, every robot controller, and even those little RepRap 3D printers. What you may not know is that most integrated motion solutions you might encounter in the PLC world also offer coordinated motion (a.k.a. interpolated motion) control. If you’re from the Allen-Bradley world, the ControlLogix/CompactLogix line of PLCs allows you to use the Motion Coordinated Linear Move (MCLM) and Motion Coordinated Circular Move (MCCM) instructions along with a few others. If you’re from the Beckhoff world, you can purchase a license for their NC I product which offers a full G-code interpreter, which is the language milling machines and 3D printers speak.
Under the hood, a coordinated motion control solution offers several features necessary for a workable multi-axis solution. The first is a path planner, the second is synchronization.
The job of the path planner is fairly complex. If you say that you need to move your X/Y table from point 5,2 to point 8,3 then it needs to take the maximum motion parameters of both axes into account to make sure that neither axis exceeds it’s torque, velocity or acceleration/deceleration limits, and typically it will limit the “velocity vector” as well, meaning the actual speed of the point you’re moving in the X/Y plane. Furthermore, it must create a motion profile for each axis that, when combined, cause the tooling to move in a straight line between those points. After all, you may be trying to move a cutting tool along a precise path and you need to cut a straight line. To make matters far more complicated, after the motion is already in progress, if the controller receives another command (for instance to move to point 10,5 after the initial move to 8/3) then it will “blend” the first move into the second, depending on rules you give it.
For instance, let’s say you start at 0,0, then issue a move to 10,0 but then immediately issue a second move to 10,10. You have to option of specifying how that motion will move through the 10,0 point. If you issue a “fine” move then the X axis has to decelerate to a stop completely before the Y axis starts its motion. However, you can also tell it that you only care that you get within 1 unit of the point, in which case the Y axis will start moving as soon as you get to point 9,0 and will do a curved move through point 10,1 on its way to 10,10 without ever moving though point 10,0. This is actually useful if you’re more concerned with speed than accuracy. Another option you have is to issue a linear move to 9,0 followed by a circular move to 10,1 (with center at 9,1) followed by a linear move to 10,10. That will cause the tooling to follow a similar path, but in this case you’re in precise control of the curved path that it takes. In neither case will either axis stop until it gets to the final point.
The other important feature of coordinated motion is synchronization of the axes. Typically the controller delegates lower level control of the axes to traditional axis controllers, and the coordinated motion controller just feeds the motion profiles to each axis. However, it’s imperative that each axis starts its motion at precisely the same time, or the path won’t be correct in the multi-dimensional space. That requires some kind of clock synchronization, and that’s the reason why you see options for things like Coordinated System Time Masters on ControlLogix and CompactLogix processors.
That was very brief, but I hope it was informative. If you do have to tackle coordinated motion on your next project, definitely allocate some time for reading your manufacturer’s literature on the subject because it’s a fairly steep learning curve, but clearly necessary if your project demands it.
One of the most confusing things that new programmers face is how to break down their program into smaller pieces. Sometimes we call this architecture, but I’m not sure that gives the right feel to the process. Maybe it’s more like collecting insects…
Imagine you’re looking at your ladder logic program under a microscope, and you pick out two rungs of logic at random. Now ask yourself, in the collection of rungs that is your program, do these two rungs belong close together, or further apart? How do you make that decision? Obviously we don’t just put rungs that look the same next to each other, like we would if we were entomologists, but there’s clearly some measure of what belongs together, and what doesn’t.
Somehow this is related to the concept of Cohesion from computer science. Cohesion is this nebulous measure of how well the things inside of a single software “module” fit together. That, of course, makes you wonder how they defined a software module…
There are many different ways of structuring your ladder logic. One obvious restriction is order of execution. Sometimes you must execute one rung before another rung for your program to operate correctly, and that puts a one-way restriction on the location of these two rungs, but they could still be located a long way away.
Another obvious method of grouping rungs is by the physical concept of the machine. For instance, all the rungs for starting and stopping a motor, detecting a fault with that motor (failure to start), and summarizing the condition of that motor are typically together in one module (or file/program/function block/whatever).
Still we sometimes break that rule. We might, for instance, have the motor-failed-to-start-fault in the motor program, but likely want to map this fault to an alarm on the HMI, and that alarm will be driven in some rung under the HMI alarms program, potentially a long way away from the motor program. Why did we draw the line there? Why not put the alarm definition right beside the fault definition that drives it? In most cases it’s because the alarms, by necessity, are addressed by a number (or a bit position in a bit array) and we have to make sure we don’t double-allocate an alarm number. That’s why we put all the alarms in one file in numerical order, so we can see which numbers we’ve already used, and also so that when alarm # 153 comes up on the HMI, we can quickly find that alarm bit in the PLC by scrolling down to rung 153 (if we planned it right) in that alarms program.
I just want to point out that this is only a restriction of the HMI (and communication) technology. We group alarms into bit arrays for faster communication, but with PC-based control systems we’re nearing the day when we can just configure alarms without a numbering scheme. If you could configure your HMI software to alarm when any given tag in the PLC turned on, you wouldn’t even need an alarms file, let alone the necessity of separating the alarms from the fault logic.
Within a program, how should we order our rungs? Assuming you satisfied the execution order requirement I talked about above, I usually fall back on three rules of thumb: (1) tell a story, (2) keep things that change together grouped together, and (3) separate the “why” from the “how”.
What I mean by “tell a story” is that the rungs should be organized into a coherent thought process. The reader should be able to understand what you were thinking. First we calculate the whozit #1, then the whozit #2, then the whozit #3, and then we average the 3. That’s better than mixing the summing/averaging with the calculating of individual whozits.
Point #2 (keep things that change together grouped together) is a pragmatic rule. In general it would be nice if your code was structured in a way that you only had to make changes in one place, but as we know, that’s not always the case. If you do have a situation where changing one piece of logic means you likely have to change another piece of logic, consider putting those two rungs together, as close as you can manage.
Finally, “separate the why from the how”. Sometimes the “how” is as simple as turning on an output, and in that case this rule doesn’t apply, but sometimes you run across complicated logic just to, say, send a message to another controller, or search for something in an array based on a loop (yikes). In that case, try to remove the complexity of the “how” away from the upper level flow of the “why”. Don’t interrupt the story with gory details, just put that into an appendix. Either stuff that “how” code in the bottom of the same program, or, better yet, move it into its own program or function block.
None of these are hard and fast rules, but they are generally accepted ways of managing the complexity of your program. You have to draw those lines somewhere, so give a little thought to it at first, and save your reader a boatload of confusion.
Earlier this year I reviewed TwinCAT 3 and I admit that it was a less than stellar review. Up to now TwinCAT 3 has seemed like a beta all the way.
Well, I’m pleased to say that after using and deploying TwinCAT 3.1 for several weeks, it’s a significant improvement over its predecessor, and brings it into the realm of “release-quality” software. There are several improvements I’ll try to highlight.
64-bit! Finally! Yes, it’s so nice to get rid of the old 32-bit copy of Windows 7 and get back to the world of 64-bit computing.
The TwinSAFE safety editor is so much better. While the old one would take several minutes to verify my safety program before downloading it, and several minutes to even get online with the safety controller, this new version does it in seconds. It was so fast in comparison that I thought it didn’t work the first time. Thank goodness because that was a major shortcoming of the old version.
I’m pleased to report that Beckhoff has made changes to the source code file format to make them more friendly to source control applications (like Subversion or Mercurial). Doing a “diff” between 2 file versions now gives you a really good idea of what changed, rather than some obscure XML code. This is one of the first things I complained about and I didn’t expect it to get changed, so I’m really happy to see that. (Also remember that TwinCAT 2 files were completely unfriendly to Source Control; they were just a big binary blob!) Note that when you upgrade your TwinCAT 3.0 solution to TwinCAT 3.1, it has to do a conversion, and it’s one-way. Keep a backup of your old version just in case. In our case, the upgrade went well, except for references, but that was ok…
The old library manager in each PLC project has been replaced by a References folder, which you will be familiar with if you do any sort of .NET programming. You can now add and remove library references right in the solution tree, which is nice. That’s one thing that didn’t get upgraded automatically. I had to remove my old references to the TwinCAT 3.0 standard libraries and re-add them as TwinCAT 3.1 libraries. Not a big deal once I figured out what was causing the build error.
The scope viewer has been removed from the system tray icon menu and pulled into the Visual Studio interface. I think this is just part of their attempt to pull everything into one environment, which I applaud.
Unfortunately I still had one blue-screen-of-death when I tried to do an online change, but I couldn’t reproduce that problem. I’m using a commercial desktop computer, not an industrial PC for my testing, and Beckhoff says they won’t support the software if it isn’t installed on Beckhoff hardware. Let that be a warning to you: even though you pay significantly more for the software license in order to run it on non-Beckhoff hardware, they will not warranty software bugs in that case. That doesn’t mean you can’t get local support, but it does mean that if you have a legitimate bug report, they won’t help you unless it can be reproduced on Beckhoff hardware.
While upgrading, I also upgraded the operating system on the PC from Windows 7 32-bit to Windows 7 64-bit. That created a small problem because it caused the Beckhoff generated system ID to change, and the license is attached to the system ID. That means I have to go through generating a license file and applying the response file again. Hopefully there are no problems with duplicate licensing, as it is the same physical computer. I’m still waiting for a response though.
I did run into a couple gotchas during the upgrade. First, TwinCAT 3.1, for whatever reason, requires Intel’s Virtualization Technology Extentions (VTx) to be enabled in the BIOS, and mine weren’t. It turns out that I needed to flash the BIOS to even get that option. Secondly, we use Kaspersky Endpoint Security version 8 for our enterprise anti-virus solution. It turns out that both version 8 and version 6 prevent TwinCAT 3.1 from booting the PLC runtime into Run Mode on startup. Without Kaspersky installed it worked fine. I eventually tried Microsoft Security Essentials (the free Anti-virus solution from Microsoft) and that seems to work well. Some anti-virus is better than none, I figure.
The upgraded system has been in production for nearly 2 weeks, and seems to be working well (no problems that I can attribute directly to TwinCAT, at any rate).
In summary, if you’re looking to jump into TwinCAT 3, I know I said to wait in my last review, but I now think that TwinCAT 3.1 is a good solid base if you’re looking to get your feet wet.
As a licensed P.Eng., I recently received a copy of a draft version of a document regarding engineering practices for the design of safety critical software. It seemed to be well thought out and written, but one of the last statements caught me by surprise. It basically said that your design should prevent Stuxnet-like attacks.
I think it’s great that the profession is starting to take this stuff seriously, but I was a bit shocked at the implication of this document. Primarily I was concerned because Stuxnet was nothing like an ordinary attack. Defending against it would be like defending your office building against a laser guided bomb. Stuxnet was a nation-state-backed attack with huge funding and resources, going after a military/industrial target.
What does that mean for us practicing engineering in, say, the automotive industry? Do we just get to brush the whole cybersecurity thing aside because a Stuxnet-like attack against our facility has absolutely zero chance of ever being launched? It doesn’t seem like that’s what the document was trying to say.
In fact, is Stuxnet even a relevant example for “Safety Critical Software”? Did Stuxnet actually affect any safety critical systems, like a Safety PLC? All it did was damage equipment. Was anyone hurt? Was anyone in danger of being hurt? I guess I just disagree with the use of Stuxnet as an example here.
That’s not to say cybersecurity doesn’t need some attention from our profession. In particular, consider the case of so-called “Safety PLCs”. Here you have a device, typically sitting on some kind of network, definitely responsible for safety critical systems. There are at least 2 pieces of software installed on a Safety PLC: the firmware (the operating system, communication system, and runtime of the device) and the user-configurable logic that you download when you’re programming the system. Compromise of either of these pieces of software certainly poses a grave risk to human safety.
The user-configurable logic is supposed to be password protected. This is one step up from normal PLCs which rarely offer any kind of authentication. Now, when TUV, or whoever, is certifying these devices, they are definitely checking that once it’s locked, you can’t update the user-configurable software without a password. In fact, TUV has access to the design, so I *hope* someone there is well versed in cyber security and knows what they’re talking about. For instance, if it was found that authentication was being done in the programming software (client application) and not on the device itself, it would be completely useless security, because someone could easily bypass it. But let’s assume that nobody building these things is doing anything that blatantly wrong.
So even if the client/device protocol is rock solid, is the firmware bulletproof? I think this is unlikely. Every time a security researcher runs even the most basic of security audits against PLCs they usually find tons of obvious exploits. You have to realize that most of these devices are just embedded computers running off-the-shelf real-time operating systems like VxWorks. Usually it’s using the standard VxWorks networking protocols, which are sometimes known to have vulnerabilities (in certain versions) and are unfortunately rarely updated in the field. That’s not even counting deliberate backdoors and debug code left in the software. Does TUV do a security audit of every design? Maybe, but I doubt it. Even if they do, are they doing repeated security audits against older devices? When new vulnerabilities are found in these embedded operating systems that impact existing devices, are they requiring them to be recalled? I’ve never heard of such things.
I’m pretty sure the only reason we haven’t heard about this yet is because even security researchers who’ve woken up to the concept of PLCs and SCADA systems in the past 2 years still have no idea that Safety PLCs actually exist. I don’t think it’ll be too long until they find out.
Designing a device like this that’s intrinsically secure against hacking seems almost impossible to me. Whatever software module is responsible for receiving the user-configurable safety program and storing it in the persistent memory of the device is necessarily network-facing. Any vulnerability in that communication module could be exploited, and then you have the ability to bypass all the security protocols on the device, and write whatever user-configurable safety program you want.
So if this document says we have to design our Safety Critical systems to withstand a Stuxnet-like attack, and all the Safety PLCs we could use are likely vulnerable via communication module software exploits, and even firewalls and airgaps don’t seem to be able to defend against this kind of attack (just ask the employees of the Natanz facility in Iran), then where does that leave us? Is a P.Eng. supposed to just throw up their hands and say “sorry, it can’t be done?” Not likely.