As businesses increasingly rely on machine vision to enhance quality, improve productivity, and increase the bottom line, technology providers are relying more on industrial computing solutions that enable faster processing speeds and higher efficiency, or that support new tasks altogether. Fortunately, industrial computing capabilities have expanded rapidly in recent years, which has cleared the way for a new range of machine vision and edge AI applications that can drive productivity and efficiency improvements for disparate businesses worldwide.

While the latest graphics processing units (GPUs) and central processing units (CPUs) may garner much of the attention when it comes to edge AI and compute-intensive machine vision application processing, field programmable gate arrays (FPGAs) represent a powerful tool that allows faster image capture, processing, and decision making closer to the application, such as deep learning and high-speed inspection.

A Primer on CPUs, GPUs, and FPGAs

Machine vision applications involve multiple components that must be integrated into a single system. These include cameras, software, and an industrial PC that can host a PCIe card for adding on devices such as frame grabbers, GPUs, and NICs. On the processing side, CPUs, GPUs, and FPGAs can be leveraged in disparate ways, and each offers distinct capabilities and benefits.

CPUs are the heart of today’s desktop and laptop computers. They have large instruction sets and a large library of native computer languages, including C, C++, Java, C#, and Python. Comprised of multiple processing cores, modern CPUs are general-purpose processors suitable for a wide range of applications. For example, machine vision software runs on a computer’s CPU typically on x86 architecture.

Originally designed for 3D graphics rendering, GPUs have seen widespread adoption in compute-intensive industrial applications, such as deep learning. GPUs are made up of thousands of small cores, which makes them suitable for parallel computing or running several processes at once. Many GPUs — including those from NVIDIA — offer tools for deep learning acceleration.

FPGAs, meanwhile, are smaller, energy-efficient devices that can be programmed to execute custom image-processing tasks using thousands of logic cells that are hard-coded to perform computations. FPGAs are analogous to a custom-designed factory where raw data comes in, massive processing takes place in an assembly line, and processed knowledge exits.  FPGAs offer low latency, parallel processing capabilities, low-level control over components, and built-in security features.

Device selection comes down to several factors, including speed, latency, power, size, and environment. CPUs are comparatively inexpensive, energy-efficient, and easy to program; and with x86 architecture, they can support most machine vision software packages. GPUs come at a slightly higher cost and consume more power, but they excel in parallel processing, deliver higher throughput for real-time processing, come with thorough software support and tools for GPU acceleration, and offer straightforward programming for developers. FPGAs generally present the highest upfront cost, but are highly customizable and compact devices that deliver powerful low-latency processing for the most demanding machine vision applications, such as providing real-time feedback on the edge.  

Figure 2

From PCB Inspection to OCR Applications

Despite higher upfront costs, FPGAs can often deliver disproportionately more value in challenging applications. Let’s take the example of printed circuit board (PCB) inspection, where deep learning software often aids a machine vision system in identifying defects, such as unwanted solder bridges. A CPU is capable of running the machine vision software and handling general control and user interface capabilities, while a GPU can train the system’s neural networks with images of acceptable and unacceptable solder bridges. While a GPU may be needed for training an AI model, using an AI model for classification of faults is much less compute-intensive; more power-efficient devices, such as FPGAs, can be more feasible given the application’s environmental or space constraints.

In such a scenario, where real-time feedback on defects can help to quickly correct a process, an FPGA makes it possible to optimize classification and detection at the edge. Deploying an FPGA can greatly improve the photon-to-decision speed. It can capture and process images of PCB solders, for example, allow software analysis of images, and make a decision on whether a solder bridge is acceptable or not, allowing the inspection process to cycle faster than a combination of CPU or GPU could.    

Even common applications like barcode reading and optical character recognition (OCR) can benefit from the use of FPGAs. To make accurate reads, machine vision software tools for these applications require high-quality images. There are software tools for improving image quality, including adding or reducing gain, contrast stretching, sharpening or denoising (Figure 1), but using such tools in a high-speed application requires additional processing, which can slow down the process. Pairing a machine vision camera with an FPGA allows more computational tasks to occur in the hardware to deliver lower-latency processing required to make reads or recognize text at even ultra-high speeds.

Figure 3

Recent advances in FPGA technology may open the door for new real-time applications in edge AI and beyond. In its Versal line of AI Edge series of devices, for example, AMD has introduced chips equipped with not only FPGA logic and ARM cores but also AI cores specifically designed for inference and classification tasks.  

AI Cores Add Flexibility

Case in point: A new embedded platform combines a Versal FPGA with a Ryzen embedded x86 CPU to deliver real-time response capabilities on a variety of sensors using programmable I/O while handling an enormous amount of compute requirements on a single 7 x 7 x 3-inch platform. So-called “adaptive compute” boards can accommodate up to 8x GMSL or 2x 10GigE or 25GigE cameras while offering Linux and ROS (robot operating system) options, solid-state storage, full graphics HMI, and low-power, low-heat (as low as 40 W) operation. In addition, the platform is compatible with a range of sensor types, including RGB/mono, time of flight, lidar, ultrasonic, and GPS — while offering interface support for MIPI, SubLVDS/LVDS, and SLVS-EC.

With its FPGA and AI core capabilities, the single circuit board system delivers real-time image enhancement, custom AI classifiers in its AI cores, and x86 CPU capabilities. By offering multiple processing technologies in a single compact platform, the platform provides integrators the ability to delegate processing tasks to the processor best suited for an individual task. For example, integrators can leverage low-latency FPGA fabric for communication and processing sensor data, while the AI cores can tackle inference, which ensures that the CPU can handle operating system and general application processing.

Figure 4

For example, in addition to providing camera interfaces, the FPGA can also improve image quality by examining a window of 7 x 7 pixels to improve each output pixel. This takes about 100 operations per pixel or 200M operations per HD image. At 30 frames per second, this is 6 giga-operations/second. This can be performed in the FPGA in real-time but would exceed the compute capacity of a single CPU. The CPU, on the other hand, is able to run a full Linux OS and execute hundreds of tasks, including the HMI, low-speed sensors, GPS, and actuators, all while maintaining a secure cloud connection for data storage and remote access. The AI cores can be dedicated to running particular AI classification tasks on regions of an image with the results sent back to the CPU for decision making.

In addition to edge AI, target applications include multi-camera vision systems, autonomous mobile robots, and embedded machine vision. And since programming these hybrid devices can be challenging, partnering with a qualified FPGA programmer allows you to make these new embedded computing systems fully customizable for a wide range of other applications.

Real-Time, Zero-Distance Processing

Not every machine vision or edge AI application may need the low-latency, deterministic responsiveness that FPGAs can bring. Some applications are better suited to a CPU, a GPU, or a combination of the two. However, as machine vision applications evolve, FPGA-based solutions offer the resources needed for low-latency fusion of different sensors and sensor types, which facilitates powerful data processing that can be passed to AI engines in real-time. As a result, these solutions represent zero-distance image acquisition and processing devices that alleviate CPU workload in edge applications.

SIDEBAR

Image Quality Improvement: CPU, GPU, or FPGA?

Common image processing methods used to improve image quality are based off of local window operators, where a region of pixels is computed to output one pixel. In a 3 x 3 convolution (Figure 2), 9 pixels are multiplied by filter coefficients and added together, but in convolutions involving more pixels the compute requirements stack up quickly. For example, a 7 x 7 convolution involves 49 multiply-adds, which is essentially 100 operations per pixel, which can put a lot of strain on the CPU.

To further visualize the compute difference between CPUs and FPGAs, the left panel of Figure 3 shows a 3 x 3 convolution in C code, while the middle panel shows machine instruction codes that have been compiled into executable code for CPU processing. Figure 3’s right panel, meanwhile, shows high-level instruction codes that have been converted into VHDL and then into FPGA logic through software tools to create an FPGA processing pipeline that can more quickly handle image processing algorithms such as a 3 x 3 convolution.

A CPU will take a 3 x 3 convolution and execute each step sequentially for each pixel, which involves many steps and thus additional compute time. A GPU is well-suited to image processing if there is a large number of convolutions to be performed and there is ample power — as intermediate results are stored in external memory. FPGAs, on the other hand, started out computing simple logic operations but have evolved to efficiently compute thousands of operations in parallel, all within the FPGA chip. Rather than a set number of cores, the FPGA is like an assembly line where the data is passed from one compute element to another in a pipelined fashion.

For image processing, this is highly efficient as the convolution is fixed and data is passed through the pipeline, which permits one or more result per cycle.

One recently released FPGA, the Versal “Adaptive System-on-a-Chip,” contains ARM cores, FPGA logic, and AI Tiles. These AI Tiles are arranged in a 2D array and can process data in a pipelined manner but with the added advantage that each AI Tile is specially designed to efficiently execute AI operations.

In a benchmark study conducted by SiSoftware (Figure 4), common image processing algorithms were run on a 10-core Intel i9-7900X SkyLake-X CPU using a 2MP camera, with corresponding frame rates shown. For simple image processing functions, frame rates as high as 4540 were reached. For more complicated filters, however, frame rates are much lower. FPGAs, meanwhile, can utilize built-in compute resources on a chip, which comprises hundreds to thousands of elements. In this case, an FPGA with 1600 digital signal processor (DSP) ASIC cells — essentially 1600 cores — makes it possible to map and define the compute. In operation, if these 1600 DSPs run at 250 MHz, this equates to 400 giga-ops per second, which has significantly more processing power than 10-core CPU.