THANK YOU FOR SUBSCRIBING
A featured contribution from Leadership Perspectives: a curated forum reserved for leaders nominated by our subscribers and vetted by our Manufacturing Technology Insights Advisory Board.



The promise of machine vision has really taken off in the 10 years that it. From self-driving cars all the way to facial recognition doorbells – the applications have really captured the imagination of the public. There was a massive amount of work on the embedded infrastructure needed to get these solutions right. It is exciting for me, as a developer of products, to see the industry continuously learn, and evolve in a scalable way as the demand for machine vision has grown. I would like to share three trends:
Levels of recognition. Often, we will get requests to “recognize” an object or people. Recognition has a wide span of meanings. There is the deep learning/machine learning level of recognition driven by the real-time needs of self-driving, facial-based identity, and instant awareness of a large number of objects. Figure 1’s upper right quadrant represents this well and is an area that many are calling ubiquitous in commercial and industrial applications.

But in most cases, we have found that lower levels of recognition are good enough for the application rest. There is feature-level recognition (see Figure 2); which is looking for the existence of features in an image. And there is also basic object detection. In both cases, there has been a growth in interesting options for developers.
Referring to Figure 1 again, these interesting options are represented by the upper left and lower right quadrants.
• The upper left quadrant represents the sort of new options available using powerful edge compute devices but with lower-end optics and sensors. The optical scanner used for fingerprint identification on smartphones is an example. Accurate results can be taken as low as 500dpi optical sensor.
• The lower right quadrant represents the options where a more embedded autonomous compute is paired with higher- end optics and sensors. The earliest Ring doorbell was powered by an iMX.RT from NXP is an example of this.
Independence at the Edge. The growth in lower-level recognition options has created an architectural shift in how dependent deep-learning systems are on the Cloud.
There are now application-specific AI Accelerators that blur the lines between all three quadrants. Google’s Edge TPU, Intel’s Movidius VPUs, NVIDIA Jetson Nano, and others, through parallel computing optimized for machine learning, can add inference to the system. Learning locally; not connected to the Cloud.
“The growth in lower-level recognition options has created an architectural shift in how dependent deep-learning systems are on the Cloud”
Not all applications need continuous learning. Low-end hardware that only intermittently connects to the Cloud; can participate in learning by uploading privacy-safe metrics that then contribute to Big Data that can be
• Specifically used to improve a model that comes down later in a firmware update
• Broadly becomes part of uncovering patterns not noticed before or used in a different application entirely
The latter is really a good segway into our third trend…
Imaging is Everywhere. The recent need to work from home has rapidly changed the average person’s exposure to video and imaging. This has allowed for some very interesting cases of how imaging is truly everywhere.
• Identity using 1-to-1 Match. Earlier this year, taxpayers looking to get a federal Identity Protection (IP) PIN from the IRS website found themselves transferring the process of biometric identification to their smartphone. The app is run, by ID.me, does a 1-to-1 match - comparing the photo of a government-issued ID that you take to a selfie that you also take. Only after a match is users sent back to the website for the issue of the IP PIN.
• Layered Biometrics and Encrypted Coding. My first encounter with the CLEAR app was at CES2022; which required a similar matching process to confirm vaccination status to a government-issued ID. CLEAR is more prominent used as identification for airport security. Layering both eyes and face detection to create an encrypted code that is used for both keeping biometric data secure but also identification.
• Classic image processing and UX. The entire imaging chain is only as good as what is coming in. In both ID.me and CLEAR, users will notice guide marks and provide suggestions on the positioning of the government-issued ID. This is an example of the marriage of user experience and classic image processing (setting the region of interest) is used to increase the success of the more complex algorithms running above it. Ultimately, it is about making the users more successful in the new experience.
In closing, we, as product developers, need to determine the real level of recognition is really needed. What makes machine vision exciting is the ever-expanding applications that it can be used, and the breadth of options available to developers to make the experience successful