Vision & Sensors | Deep Learning
A Wrong Vision of Efficiency
Our obsession with efficiency is killing our effectiveness in deep learning machine vision.

Image Source: simonkr / E+ / Getty Images Plus
I happen to be a coffee fanatic. Not a coffee snob by any means; I just want as much of the cheap stuff as I can get! When I’m tired? Coffee. Need some inspiration? Coffee. Time for a break? How about coffee and then keep it moving?!
Engineers solve problems for a living; where there’s a will there’s a way! I have found this “Can Do” attitude finds a great friend and ally in the frenetic boost that the liquid gold provides. No need for sleep, just pour another cup and keep it moving!
However, eventually I find that my work pace has slowed to that of a snail, even while my heart pumps at the pace of a jackrabbit tailed by a fox. My thinking becomes fuzzy and I have to read and re-read everything. Alas, maybe a little sleep will help…
Why is it that I regularly return to this same song and dance, resisting the need for sleep until my work greatly suffers? Well, my desire for efficiency, to get as much done as quickly as possible, blinds me to what I truly need to be effective. By stopping, resting, and “not getting anything done”, I end up far more prepared to perform good work. And it just so happens that I perform that good work very efficiently when well rested.
I have seen a very similar thing happen when applying deep learning machine vision to industry. “We want to keep product moving, no down-time!” “The quickest way is the best way.” You see, in our efforts to implement machine vision, and especially deep learning machine vision, we have misunderstood the keys to success, and have suffered the consequence of sub-par inspections. And we have done this unaware of our predicament at all: Our obsession with efficiency is killing our effectiveness with deep learning machine vision.
When is deep learning used in machine vision?
Before we revisit my thesis, what exactly is deep learning and why would we use it in machine vision? For years, we have been able to produce successful machine vision systems utilizing what we now refer to as rule-based machine vision. We want to measure the distance between two edges on a part, so we light the object in a way that provides maximum contrast to allow us to perform the measurement. Through rule-based programming, like edge-finding tools and measurement tools, we can then compute the distance and pass or fail the part accordingly. Traditional rule-based machine vision can be incredibly accurate and efficient. However, there are many tasks it simply is not suited for.
The easiest way to understand deep learning is to consider inspection tasks that humans are uniquely good at. For example, if I held a pen behind a notepad, leaving only the end of the pen in sight, you would still be able to recognize the object as a pen. Why is that so? Well, because in your lifetime, you have seen thousands of different pens, viewing them at different angles and distances, and you therefore are pretty confident in your understanding of what a pen looks like. In fact, you would know it was a pen without hesitation and may even be bothered that I had asked you such a pointless question! To try to do this same thing using rule-based machine vision however, would be a nightmare at best and impossible at worst. How could we possibly create enough rules to identify the object as a pen? What if the pen is 50% concealed? Eighty percent concealed? Ninety-five percent concealed? What if I grab a pen of a different shape? Take off the cap? Use a different color pen? Even moving the pen further away from the camera substantially increases the difficulty for our rule-based system.
With deep learning, however, we “teach” the deep learning model what a pen looks like, much like we may teach another human being what a pen looks like. We do this through giving our model multiple examples of anything we want the model to consider a pen. If we want our model to identify black pens, blue pens, and red pens, we include examples of each. If we want to identify pens that are close to the camera and pens that are far from the camera, we include examples of both.
In short, deep learning is excellent at Location, Classification, and Analysis (or Anomaly Detection).
Location: Deep learning models can be trained to find all of the instances and positions of an object(s). This can be used for counting items on a conveyor or for robotic pick-and-place applications.
Classification: Am I looking at a screw, a bolt, a nail, or a nut? Used in conjunction with deep learning’s location capability, we can find all of the different screws, bolts, nails, or nuts in our image. Classification can be used for any kind of object and is the method behind deep learning optical character recognition.
Analysis (or Anomaly Detection): The most powerful deep learning tool, analysis is used to find defects like surface scratches, dents in metal grate, or bubbles in plastic bottles. Just as a pulled thread on a paisley shirt pops out to our eyes, deep learning can find similar anomalies (try doing that with rule-based vision!).
Machine Vision as Fusion
While deep learning can be used to find our defects, what do we do once our defects are found? Just as I can see a pulled thread in your shirt but am only able to estimate its length (i.e., without tools, humans are not good at measurement!), so deep learning is not great at quantifying what it finds. While some model architectures are designed to give us a singular output (Good/Bad or Classification type if looking at a single object), most of the time we are provided with a heatmap identifying the identified defects.
With that in mind, almost every deep learning machine vision application is a fusion of both rule-based and deep learning tools. We use deep learning to identify the defects and rule-based tools to quantify them and produce our final Pass/Fail (or numerical) output.
Because our deep learning model provides a heatmap, each pixel of the heatmap represents how certain our model is about said pixel being a defect. The heatmap is made of pixels on the grayscale (0 to 255) with a value of “0” equating to 0% certain the pixel is a defect and “255” equating to 100% certain. This relationship of grayscale value to certainty is not necessarily linear, i.e., we should not assume that “128” means the model is only 50% certain it is a defect. Rather, we simply can use this variable scale to determine what we want to consider actually to be a defect. Because of this, we can set our first threshold on the heatmap values to determine at which point we want to set the “cutoff” value for our defect.
For example, setting our threshold to “128” and above would yield much larger resulting “defects” while setting our threshold to “225” and above would yield much smaller, more narrowly defined “defects.”
Now that we have set our threshold and obtained our resultant defects, we have a second round of thresholding to do: Pass/Fail thresholding.
At this point, we are not setting thresholds that change the sizes of our defects, rather we are determining if our defects are large (or potentially small) enough to be considered true defects. We can set a number of different kinds of thresholds as well. We can set a maximum allowable size for any single defect instance. We can set a maximum allowable number of instances of any defect type. We can even set a maximum total allowable size for all defects of any one class.
We have incredible flexibility in how we handle our data at this point. For example, say I have a product with five scratches on it. My part could fail because I only allow three scratches. Or, it could fail because one of the scratches is 1-inch long. Or, it could fail because all together, the scratches have a length of 2-inches. Our ability to fine-tune our programs, thanks to our rule-based machine vision tools, is really quite phenomenal.
Where Mistakes Are Made
The preparation work that goes into preparing a successful deep learning model is really quite expansive and can be incredibly monotonous and boring. In this step we are curating all of the necessary information to teach the model properly and thoroughly. This includes both image acquisition from the on-line system (Yes, you want to build and implement your system before acquiring and training on images, so that you train on what you will be running.) and precise defect labeling.
Herein lies the problem. Success in deep learning machine vision is found in the work that we do prior to training the model. The biggest limiting factor for success (assuming otherwise proper machine vision practices) is the accuracy and precision of our dataset!
Simply put, it will take a lot of time compiling a proper dataset, including all possible variations we expect to see, removing any images that introduce unnecessary confusion, etc. Then, it will take even more time to properly label our images with accuracy and precision. When working with deep learning Analysis-type models especially, this labeling looks like manually “coloring” or “masking” over each incidence of the defects in our image, with each defect-type in its own class. Just as we teach our kindergartners the importance of coloring “inside the lines,” we must show the same care in our labeling! If I am too gratuitous in my labeling, I am teaching the model that the “defect” consists of a lot more “good” than it actually does. If I am not generous enough, I am depriving my model of a full understanding of the defect. In practice, it is generally good to include a few of the pixels around the edge of our defect, as it is helpful for the model to see the transition from “good to bad.” However, we need to be consistent!
Another common mistake comes when we do not sufficiently separate our defect classes. If I am going to train a model with separate classes of defects, say “dent,” “scratch,” and “hole,” then I better be sure that my labeling of each defect class contains only defects legitimately in that class. However, once I label a defect resembling a “hole” as a “dent,” I have now compromised the integrity of both classes, weakening the ability of the model to discern these classes.
All of this deep learning model training prep work happens off-line, at the computer and can begin to feel very abstract and intangible. In fact, it can feel like we aren’t getting anything done! This is where the “lie” of efficiency destroys our effectiveness.
First Things First
You see, once we have a model, we can run the model on-line, with real parts and live images. From here, we can adjust all of our thresholds: both the thresholds that determine the sensitivity of our model (what do we consider to actually be defect pixels) and the thresholds that we use to quantify our defects for the sake of passing and failing. This work is incredibly tangible: by adjusting one threshold I can drastically change how my program handles the objects of the inspection! Every change we make has immediate, real-world impacts.
However, these changes, though very real and potent, have no effect on the actual ability of our deep learning model to properly identify defects: we are only modifying and quantifying the results of its decision-making. Therefore, we can ultimately only achieve results as good as our model is capable of properly identifying defects. We cannot “improve” our model’s ability through adjusting our thresholds, we only focus its ability.
The only way to improve our inspection and achieve better success is to stop, step away from the line, and re-evaluate our view of being efficient and effective. It is true that the slow, mundane work of dataset creating and defect labeling can feel like a waste of time, but we must not give into this illusion. It is this work that is actually the most important step in the process. The end result of our inspection will only be as good as the information we feed into the model. Ever heard the old saying “garbage in, garbage out”? Well, in my time, I have seen a lot of “garbage in” when it comes to deep learning model training and I can tell you, the results are… garbage.
We live in a fast-paced world and we need to get results quickly; as engineers it’s what we do! But if we are too wound up and focused on “go, go, go” efficiency we will not be able to appreciate that it is in our dataset curating and labeling that our deep learning battles are won or lost. We will end up wasting more time adjusting our rule-based thresholds, never to achieve the success we are looking for.
So. The next time you are looking to implement deep learning for machine vision, slow down. Keep in mind that the attention to detail you give now will pay great dividends later. Embrace the slow process of curating a fine and clean dataset. Label your work with the care of an artist. And fire up the coffee maker!
Looking for a reprint of this article?
From high-res PDFs to custom plaques, order your copy today!