On Managing Complexity

There’s a recurring theme on this blog when I talk about programming. It’s a skill you don’t learn in school, but which separates recent graduates from employees with 5 years of experience: how to manage complexity.

The reason you never learned how to manage complexity in nearly 20 years of school is that you never actually encountered anything of sufficient complexity that it needed managing. That’s because at the beginning of each term/semester/year you start with a blank slate. If you only ever spend up to 500 hours on a school project, you’ll never understand what it feels like to look at a system with 10,000 engineering hours logged to it.

That’s when the rules of the game change. You’ve only ever had to deal with projects where you could hold all the parts and variables in your head at one time and make sense of it. Once a project grows beyond the capability of one person to simultaneously understand every part of it, everything starts to unravel. When you change A you’ll forget that you had to change B too. Suddenly your productivity and quality start to take a nosedive. You’ve reached a tipping point, and there’s no going back.

It isn’t until you reach that point that you understand why interfaces are useful and why global variables are bad (not to mention their object-oriented cousins, the singleton).

The funny thing is that you learn about all the tools for managing complexity in school. When I was in high school I was taught about structured programming and introduced to modular programming. In university my first programming class used C++ and went through all the features like classes, inheritance and polymorphism. However, since we’d never seen a system big enough to need these tools, most of it fell on deaf ears. Now that I think about it, I wonder if our professors had ever worked on a system big enough to need those tools.

The rules of managing complexity are:

  • Remove unnecessary dependencies between parts of your system. Do this by designing each module against an abstract interface that only includes the functionality it needs, nothing else.
  • Make necessary dependencies explicit. If one module depends on another, state it explicitly, preferably in a way that can be verified for correctness by an automated checker (compiler, etc.). In object-oriented programming this is typically done with constructor injection.
  • Be consistent. Humans are pattern-recognition machines. Guide your reader’s expectations by doing similar things the same way everywhere. (Five-rung logic, anyone?)
  • Automate! Don’t make a person remember any more than they have to. Are there 3 steps to make some change? Can we make it 2, or better yet, 1?

It would be interesting if, in university, your second year project built on your first year project, and your third year project built on your second year project, etc. Maybe by your fourth year you’d have a better appreciation for managing complexity.

Playing the Odds

Millions of people play the lottery every week. Some just buy a “quick pick” ticket (which lets the computer pick for them). Some faithfully play the same numbers over and over. Some have clever schemes.

Enough people play that even though the odds are long, a few people win. Invariably the winners are interviewed, and people are interested in what made them special. Did they have a system? Did they have a “feeling” that they would win today?

It makes a good story, and we all love a good story. We’re wired for it.

If a lottery winner wrote a book called “How I won the lottery, and you can too!” would you read it? (Believe it or not, many people make a living selling such advice — it’s usually some kind of psychic deal.) Of course you wouldn’t; you’re not foolish.

But you have to wonder, aren’t most of the books in the success section of the bookstore just by people who lucked out once? Microsoft and Google both only had one major breakthrough (an insanely great licensing deal for DOS, and a great search engine respectively). Everything else they did was through incremental building and improvement. Yet everyone seems to be focused on the big win. How can I make my numbers get drawn this week in the lottery? What’s the secret?

There is no secret, of course. Roll the dice. Be prepared to lose.

The bulk of technological progress, and even financial progress, is incremental. Just look at the difference between a good investment and a bad investment. A company with a 5% rate of growth feels slow and plodding. If a medium sized company can post a 15% growth rate every year, it’s moving so fast it probably feels like it’s running off the rails.

Even so, the company growing at 5% will still double in 14 years (handy trick: take 70 and divide by the percent growth to get the approximate doubling time). Computer power doubles about every 18 months, but the industry was never driven by people looking to win a lottery. They just kept making die sizes smaller, overcoming each technical barrier as it arose, and packing more transistors on a chip. DNA sequencing technology is growing at an even faster rate.

Keep asking yourself “How can I cut this machine’s cycle time in half? How can I cut the scrap by a third? What similar product could we produce that has a higher demand? How can I cut energy use by 10%?”

If you do this enough, you’ll still win, and the odds are a lot better.

Time Blindness

As a details-oriented person (and thus a bit of a pessimist), one of my biggest frustrations is people around me who are “time blind”. Let me give you an example…

It’s 9:15 am, you’re deep into solving some difficult problem, and someone calls you up and asks you to review an estimate for them. Let’s assume you can’t just say no.

You go sit down with this person and you review this quote. It involves changing the auto sequence on some widget maker so that on this one particular recipe it stops the conveyor, runs backwards 2 stations, dispenses some new chocolaty cream filling, and continues on.

The first thing you point out is that the quote only specifies 90 minutes total for 2 customer meetings. “That seems a little low, don’t you think,” you say… “after all, from the time you step out of your car at their office, it takes 20 minutes before you even get to their board room, we always spend about 10 to 15 minutes waiting for some last minute guest to arrive, and I’ve never been in a meeting with them that didn’t go at least an hour.”

“So, you think we should put down 2 hours total?”

“No, I think it’s more reasonable to expect each meeting will take 90 minutes, and by the way, in what universe can you make it across town and back in 30 minutes? You’ve only got 1 hour here for total travel time for those meetings. It’s 30 minutes one way in normal traffic. What about walking to and from your car? What about waiting in the lobby?”

“No, you can get there in 20 minutes most days.”

“So you rounded down to 15?”

“We have a contingency amount of 10% over here.”

“10% of 15 minutes is 1.5 minutes… besides contingency is for unknowns… never mind.”

Of course this goes on forever. At 11:30 you leave this meeting, your co-worker looks at his watch and says, “How long are you logging for his meeting? An hour and a half?”

Time Blindness

Rampant in business circles, typical sufferers include new hires with no experience, and overly optimistic managers. Unfortunately those afflicted with Time Blindness also tend to be in denial. Some are experts at hiding it from themselves, going so far as to work extra hours without counting those hours against their projects, reinforcing the “truth” of their misguided beliefs about how long things take.

Treatment

There is none. Just don’t get sucked into their unrealistic commitments, and be careful not to get blamed when their projects invariably go over budget.

OWI-535 Robot Arm with USB Controller from C# and .NET

I got the OWI-535 “Robot Arm Edge” 5-Axis robot arm and USB Controller add-on for Christmas:



The robot arm comes as a kit that you assemble yourself, and my 3 year old and I had lots of fun putting it together (it helped to have some tiny fingers around, honestly). It comes with a manual controller that allows you to rotate all 4 joints, plus the gripper. It’s fun to play around with, but let’s be honest, everyone wants to hook it up to their computer.

Unfortunately the software that comes with the USB controller works on Windows 7, but “32-bit only”. That was a little annoying, but hey, I didn’t really want to stick with their canned software anyway. I figured I would see if I could get it to work from C#. I was surprised to find a great site by Dr. Andrew Davison in Thailand who wrote some Java code as part of his site for his book Killer Game Programming in Java (chapter NUI-6 Controlling a Robot Arm, which doesn’t appear in the book). Surprisingly he went as far as to do the inverse kinematic equations so you can give the arm a set of X,Y,Z co-ordinates (in mm) in “world frame” and it will calculate all the join angles to get to that location, and then used timed moves to get the arm into that position.

His Java library uses libusb-win32, and that library has a .NET equivalent called LibUsbDotNet. The API’s don’t appear to be the same, but thankfully I managed to find a thread on SourceForge about controlling the OWI-535 using LibUsbDotNet. So, over the course of a couple of nights, after the kids went to bed, I ported Dr. Davison’s Java code over to C# (quite easy actually) and replaced the libusb-win32 calls with LibUsbDotNet API calls. It worked!

Here is the .NET solution that I wrote called TestOwi535. I used Visual C# 2010 Express Edition to write it, so that’s free. Also, you must download and install LibUsbDotNet first and run the USB InfWizard first to generate a .inf file (you have to do this with the robot arm plugged in and turned on), and then use that .inf file to install LibUsbDotNet as the driver (technically you’re installing libusb-win32 as the driver and LibUsbDotNet is just the C# wrapper).

If you right click on the C# project in the solution explorer, you’ll find 3 options for startup object: MainClass is the original code I found in the SourceForge thread, but it’s just proof of concept code and only moves one joint in one direction for a second. The ArmCommunicator class is an interactive console application that allows you to move all joints (and control the gripper and light) by typing in keyboard commands. Finally the RobotArm class is the full inverse kinematics thing. In the last case, make sure you start with the arm at the zero position (base pointing away from the battery compartment, and all joints in-line straight up in the air, gripper pointing directly up, and gripper open). It will do a move to the table, to the front right of the arm, close the gripper, and then move back to zero and open the gripper.

Unfortunately that’s where you start to see the obvious deficiency of the arm: it has no position feedback. That means even though it tracks its position in the code, the physical position eventually drifts away from the internal position tracking, and the arm eventually doesn’t know where it is (it’s just using timed moves). One of the biggest areas where you could improve it is to change the joint rates so that it knows that the joints move faster when going down than when going up.

Still, it’s a neat little toy for the price. I’m going to start hunting around for a way to add joint position feedback, as that would really improve the performance. I also want to rewrite a new module from the ground up that allows concurrent joint moves (this one only moves one joint at a time). Ideally you want to control this thing in “gripper frame” instead of “world frame”.

Happy hacking!

The Psychology of Automation

It’s easy to overlook the power of human motivation in automation systems.

I’m going to assume you don’t work in a lights-out factory. That’s pretty rare. Almost all automation systems interact with people on a regular basis, and even though we have high fidelity control over our automation processes, people are notoriously difficult to predict, let alone control.

For instance, consider a process with a reject station. Finished parts are measured and parts that don’t meet specification are diverted down a chute. Good parts continue down the line.

Question: how big do you make the reject bin? Naively you might want to make it as big as possible so the operator doesn’t have to waste time emptying it. Unfortunately that means it’s just as easy to make bad parts as it is to make good parts. You’d be better off to make the reject chute only hold 3 parts, and put a full sensor on the chute that throws a fault and stops the machine when it’s fully. Then it’ll be a pain in the ass to make bad parts, and the operator will have a lot more motivation to do something about it. While you’re at it, put the reject chute on the other side of the machine so they have to walk around the machine to empty it.

Consider a machine with an e-stop button. It’s a big red button with a mushroom head that’s supposed to be easy to hit. However, I’ve seen a lot of machines where the consequence of hitting that button was major downtime because the part tracking got screwed up or the machine just didn’t recover gracefully. I once watched a pallet get bumped out of the track and it was riding along the rail of the conveyor. I hit the e-stop just before it was about to damage some equipment. I was scolded for my efforts: “never hit the e-stop,” he said. That’s the wrong motivation. You want operators to press the button when they see something wrong, so make it easy to recover.

Consider an inventory tracking system. You want people to record stuff they’ve consumed, and what cost account they’ve consumed it against. What motivation does a person standing there with a bolt in their hand have to look up that bolt in your inventory system and mark it consumed? Very little. What if you lock the door to the store room and make them request an item before you unlock the door? That’ll help, but chances are some people who don’t know exactly what they want will click the first item on the list, and just go in and browse. What if you make the inventory storage system so convoluted that the only way to find an item is to look up the storage location in the computer? Well, that might work (until your inventory system breaks).

Like water, people tend to take the easiest path down hill. You’re better off digging a channel where you want it to go than expecting it to get there under its own power. Use gravity to your advantage. Make it harder to do things the wrong way and easier to do things the right way.

NTSB Suggests Banning Use of Electronic Devices While Driving — Yeah?

I was a bit surprised to see all the hubbub about the NTSB recommending banning the use of cell phones and other electronic devices while driving. You see, they already passed similar legislation here in Ontario, and the sky hasn’t fallen.

As someone who used to answer my cell phone while driving (I always justified to myself that I was always on a straight stretch of road or not in town but in truth I would answer it almost any time), I can honestly say I was wrong to do that, and I was converted by the evidence. Our brains just don’t seem to be wired to handle cell phone conversations while driving, even though we do much better with other tasks like talking to a passenger.

Unfortunately our brains are also poorly wired to understand statistics. The fact that I used cell phones while driving and I‘ve never had an accident because of it is apparently all the proof I needed to know that it was safe. Of course, the real research disagrees:

There are, of course, edge cases. I know that here you’re free to dial 911. Also, you’re not allowed to be manipulating a Navigation System, but I believe you’re still allowed to have one turned on giving directions (that’s pretty reasonable – you can set it before you leave, or let your passenger set it).

For those of you who like to let their spouse know when you’re almost home, I’ve heard they can get an app where they can see your location in real-time on their phone by tracking your phone’s GPS, so there’s no need to call. (Yeah, I think that’s creepy too, and maybe they already have it installed!)

Safe(r) Data Collection from a PLC

There’s been a lot of discussion recently about the dangers of connecting automation equipment to networks, and yet there are significant pressures to do so. Of course, I don’t ever think that you should take a PLC and put it on an internet accessible IP address, but it’s certainly common practice to connect industrial automation equipment to internal LANs to facilitate data collection. People in the front office need to push production planning information down to the production floor, and they need real-time data on what’s going on (not to mention for historical data logging, historians, etc.).

It’s all too common to throw a PLC on the same network as your front office, and I’ve seen it blow up. What happens is something invariably goes wrong on the office network (someone plugs two ports together on the switch in the boardroom, or someone brings in an infected music player and plugs it in, or the DNS server at head-office goes down and the local DNS doesn’t work correctly… I’ve seen a lot). However, you want your machines to keep going when this happens.

This is all made worse now that (a) industrial automation equipment is more commonly based on off-the-shelf commodity hardware and software (e.g. windows PCs) and (b) the people writing malware can actually spell PLC now. Up to this point there’s been some form of security through obscurity.

If you have a local IT staff that’s on the ball, you really should be getting them to handle the network layout. On the other hand, if you’re in a small facility with limited resources, there’s a lot you can do by making some simple design choices that will go a long way towards improving the reliability and security of your systems.

Most automation cells now come with an Ethernet switch already built-in. Typically this is an industrial spec. DIN-rail mounted one. It’s not fancy, but it’s supposed to survive in a panel. These Ethernet switches are there to connect your PLC to your HMI, and increasingly to connect your PLC to Ethernet-based I/O like Ethernet/IP, etc. The common (and wrong) thing to do is to drop a network cable from your plant network to your panel and plug it right into this Ethernet switch. This creates some technical problems right off the bat:

The automation devices typically have fixed IP addresses (I personally prefer this because it means these devices aren’t dependent upon an external DNS or DHCP server – two less dependencies are good). Chances are that these IP addresses won’t work on your plant network, so you have to manage those IPs at the plant level. You’re opening yourself up to someone with the wrong IP on their laptop pouncing on your PLC’s IP address, and then bam, your machine is down.

A much better way is to place some kind of Router with NAT between the plant network and the machine’s Ethernet Switch:

Now if you’re just a little manufacturer with two machines out back and your data-collection link isn’t critical, you can probably get away with one of those home routers from Best Buy that you’d use to connect your laptop and your desktop at home to your cable modem. Note that it doesn’t need to be wireless, and you’re probably better off if it isn’t. The way you hook it up is to connect the Internet (Uplink) port on the router to the plant network and run a cable from one of the ports on the LAN side to the existing Ethernet switch in the machine. If your data needs to be a bit more reliable, consider buying some kind of Cisco router with NAT capability (but you’ll be going from the $50 range to many hundreds of dollars – your choice).

Now, when you configure it, you want to make sure that you turn off the port forwarding function, the DMZ function, disallow remote administration, and block all anonymous internet traffic (these should be default settings, but it’s good to check). Also, make sure the router’s local IP address doesn’t conflict with the PLC’s and HMI’s, and make sure they have the same subnet. Typically you’ll want to either turn off DHCP on the local side, or limit it to a range that won’t conflict with the fixed IPs. DHCP is handy when you connect your laptop to the programming port. Now what you’ve done is made it somewhat invisible from the plant side. Some piece of malware scanning for devices on your plant network should just see a black hole.

Now on the PLC side, you can now initiate a connection to the data collection server even though the data collection server can’t connect to the PLC (in the same way that your home computer can connect to Google, but Google can’t get to your PC – theoretically). Note that a piece of malware on your data collection server or on one of the routers/switches in your plant network could intercept this communication, and own your PLC, but at least you’ve significantly reduced the surface area of attack. Not perfect, but reasonable at this time, depending on the sensitivity of your equipment. I’m assuming you’re not enriching uranium or providing drinking water to my community.

(I’m going to be talking specifically about Allen-Bradley products now – sorry.)

So how do you get the data from the PLC to the data collection server? In the old days you’d have some software on the server like RSSQL, and it used a product like RSLinx Enterprise and as far as I know, it initiated the connection to the PLC. That won’t work in this case. Sometimes you’d throw an OPC server in there, and have some kind of historian that would log tags to a database. That OPC server, obviously, needs to be able to initiate a connection to the PLC. To use a router with NAT, you’d need to port-forward from the router to the PLC (or to the OPC server if it was inside the machine network). That’s undoing a lot of our protection.

What you need to do is initiate the connection from the PLC, and have the Data Collection computer act as a Server. One way to do this is with a 3rd party Ethernet card, like this MVI56-GEC card from Prosoft for the ControlLogix line. I have used that in the past to connect to a server, but it involves a lot of ugly PLC coding. It’s your only option if you have to conform to someone else’s protocol though.

If you just want to write data directly into a SQL database, there are 3rd party products that will let you do this (basically a SQL Server connector card).

But there is an option without buying any new hardware. The ControlLogix/CompactLogix lines can send Unsolicited CIP messages, and you can find products that can receive these messages in the PC world, like CimQuest’s NET.LOGIX product. It can act as a server and receive data directly from the PLC – either individual tags, or even arrays of UDTs. The code on both ends is relatively simple, so all you have to pay for is the NET.LOGIX runtime license, which is cheaper than the hardware alternatives. Note that you can also do this with PLC5 and SLC500 devices, though there’s some more effort involved.

I hope that’s enlightening. This is by no means a perfect solution, but it’s reasonable for now. It doesn’t plug the laptop hole (the programming laptop is probably still your #1 vector for malware to get into your machine network). It’s susceptible to man-in-the-middle attacks between the router and the data collection server. It’s susceptible to exploitable bugs in the router’s firmware. Beware and use your own judgement.

Decision-making in Organizations

I think I can group decisions into two types:

  1. Decisions where it’s really important that we make the right decision
  2. Decisions where it’s really important that we make any decision and everyone gets behind it

For instance, deciding what products to launch for the Christmas season is really important. The choices made will have a profound impact on the bottom line of your company. On the other hand, it didn’t really matter what side of the road we decided to drive on, but it was really important that we, as a group, made a decision, and everyone agreed to it.

Now let’s talk about how organizations make decisions. I think there are typically two approaches:

  1. Appeal to authority
  2. Appeal to committee

When appealing to authority, the accounting department has the authority to make cash-flow decisions, and the engineering department has the authority to make technical decisions, and the marketing department gets to decide whether we run Superbowl ads or Craigslist ads. The CEO can override these decisions when a higher level view recognizes a different need.

When we appeal to committee, we gather all the “stakeholders” who then sit around a table, generally as equal representatives of their respective departments, and come to some kind of consensus.

I don’t think anyone’s surprised by the fact that when it comes to making decisions where being right is the most important criteria, authoritative decisions tend to be better than committee decisions. In the same way, when success of the decision is tied to consensus rather than the “correctness” of the decision, then committee decisions probably have an edge.

Now, if you’ve spent any time around government offices, you’ll realize that almost all decisions, including planning the staff Christmas gathering, are done by committee. Very large publicly traded companies don’t seem to be much different. On the other side of the spectrum, small companies don’t need much consensus because they’re small, and they tend towards decisions based on authority. Successful entrepreneurs seem to surround themselves with knowledgeable people and trust those people to make intelligent choices. This makes them well suited to make decisions where it’s important to be right, like how much raw material to buy this month, and where to commit other scarce resources.

It’s interesting to look at the outliers too. Apple is famous for being the exception that proves the rule. Despite being a huge organization, all information seems to indicate that Jobs ruled it with authority, not committee. And since he seemed to make good decisions, they were successful. Apple shareholders beware.

Now let’s go all 7-Habits on this and put it in quadrants, dividing decisions along two axes:

Great Risk
if Wrong
Little Risk
if Wrong
Must have
everyone’s
support
1:
Invade Iraq?
What product
for Christmas?
2:
Drive on the
Left or Right?
Don’t need
everyone’s
support
3:
Bail out
the banks?
Windows or
Linux servers?
4:
Chicken or Fish?
Bike-shed color?

I divided it into four quadrants numbered 1, 2, 3, and 4. Quadrants 2 and 3 we’ve already covered. In quadrant 2, committees really shine, and in quadrant 3 authority really shines. I’m not even going to talk about quadrant 4.

Quadrant 1 is the tricky one. The Easter Island society collapsed because they were faced with a decision: do we allow everyone to cut down all the trees, or do we centrally manage it? Obviously they made the wrong decision, but the right decision would have required broad support, which is why it’s so difficult.

Apple beat the quadrant 1 decisions by rolling both authority and consensus into one charismatic (and knowledgeable) leader. People follow leaders who have a track record of delivering on their promises. Success is a positive spiral.

The idea that you can take committees and make them authoritative is misguided. On the other hand, we’ve seen our share of authority figures who’ve succeeded at the long road of building consensus around the right decisions. They are our political and cultural heroes.

All of this brings me to two conclusions:

First, unsurprisingly, is that we shouldn’t put big government bureaucracy in charge of quadrant 1 type decisions (and that’s a bit scary, because they certainly are in charge of those decisions now).

Second is that our system of government tends to promote leaders who are good consensus builders without promoting leaders who are likely to make the right decisions. I’m not saying it promotes leaders who are likely to make bad decisions; I’m just saying it’s neutral on the issue.

I’m not out to change the system of government, but I think a two-pronged offensive could make a dent: on one side our domain experts tend to live in a world where consensus building doesn’t matter because their community has the skill to recognize logical consistent arguments. Scientists simply publish their findings and wait for others to confirm or disprove them. Engineers test various design alternatives and measure their performance. Unfortunately this means our domain experts lack the soft skills necessary to convince us to do the right things. A marketing budget for these experts, perhaps paid for by some rational-minded philanthropists, could go a long way.

On the other side, the general public is hopelessly lacking in critical thinking skills. We live in a world where logic is first introduced as a university-level introductory philosophy class. It belongs in high school (along with some other suspiciously missing life-skills like food/nutrition and childcare).

Unfortunately the high school curriculum is decided on by… a committee.

Internet Facing Water Management Infrastructure in Texas Exploited Easily

Completing the trilogy of ICS-security related blog posts, a hacker recently demonstrated how easy it was to find and log in to an internet-facing SCADA system using for water management in a town in Texas. From the article on threatpost:

The hacker, using the handle “pr0f” took credit for a remote compromise of supervisory control and data acquisition (SCADA) systems used by South Houston, a community in Harris County, Texas. Communicating from an e-mail address tied to a Romanian domain, the hacker told Threatpost that he discovered the vulnerable system using a scanner that looks for the online fingerprints of SCADA systems. He said South Houston had an instance of the Siemens Simatic human machine interface (HMI) software that was accessible from the Internet and that was protected with an easy-to-hack, three character password.

For those of us who design, build, and deploy systems like this, let’s ask ourselves what would happen if a serious incident happened and significant equipment damage was done, or worst case, people were seriously injured or killed. Don’t you think the people who worked on the system would end up in court (if not in criminal court, then at least in civil court)?

When in doubt, don’t sit these things directly on the internet. There are lots of secure remote access products available (Google for “VPN”). It’s worth it.

Public Water Control System Attacked

Joe Weiss recently reported on the possible hacking of a public water SCADA system, apparently in Illinois. This attack, if it was an attack, caused damage to a pump by turning it on and off repeatedly.

It seems obvious that this situation is going to be repeating itself more and more. If you’re a company with industrial control systems, or you provide control system services, now’s a great time to start thinking about your control system security strategy. Do you have the necessary skills on staff? If not, where are you going to source them from?