E/A
Machine vision systems give machines the ability to see, interpret, and act on visual information. From a single smart camera verifying a label colour to a 12-camera 3D inspection cell sorting 2000 parts per minute, the same engineering principles apply: optics, image sensors, lighting, image processing, and decision logic. This guide takes you from first principles — what is a pixel, how does a lens form an image — through every processing technique to advanced AI-based inspection and real industry commissioning practice.
A machine vision system is a pipeline: light reflects from an object, passes through a lens, strikes a sensor, is digitised to an array of numbers (pixels), processed by algorithms, and a decision (pass/fail, position, measurement) is output. Understanding each stage of this pipeline — and what can go wrong at each — is the foundation of every successful vision application.
Enter your sensor and application requirements. Get the required focal length, image resolution, and minimum detectable feature size.
This interactive canvas lets you draw on a high-resolution image and then "downsample" it to see what different sensor resolutions actually look like. Understand pixel size, aliasing, and resolution limits.
| Type | Parts/min max | Resolution | Moving OK? | 3D? | Best application |
|---|---|---|---|---|---|
| Area scan (5MP) | ~600 ppm | 2592×2048px | With strobe | No | Discrete indexed parts |
| Area scan (20MP) | ~200 ppm | 5472×3648px | With strobe | No | High detail, large FOV |
| Line scan (4k) | Unlimited web | 4096 × ∞ lines | Required | No | Web, cylinders, print |
| 3D profile (laser) | ~1200 ppm | 1024×Z px | Required | YES | Height, gap, warpage |
| 3D structured light | ~60 ppm | Full 3D point cloud | No (static) | YES | Robot guidance, metrology |
Light source → Object → Lens → Image Sensor → ADC → Frame Buffer → Processing → Decision. Every stage introduces constraints. The light source determines contrast. The lens determines resolution, FOV, and distortion. The sensor determines dynamic range, noise, and frame rate. The ADC determines bit depth (typically 8-bit = 256 grey levels, or 12-bit = 4096 levels for precision gauging). Processing speed determines throughput. Understanding the chain prevents the most common mistake: trying to fix a poor contrast problem with better algorithms instead of better lighting.
// Key relationships in image formation
//
// Working Distance (WD): distance from lens front to object
// Field of View (FOV): area of object captured
// Sensor size (SS): physical size of image sensor
// Focal length (f): lens property (mm)
//
// Primary relationship (thin lens approximation):
// Magnification m = sensor_size / FOV = f / (WD - f)
// FOV = sensor_size × WD / f (for WD >> f)
// f = sensor_size × WD / FOV
//
// Example: 1/2" sensor (6.4×4.8mm), FOV=100×75mm, WD=300mm
// f_h = 6.4 × 300 / 100 = 19.2mm → use 25mm (will reduce FOV)
// f_h = 6.4 × 300 / 100 = 19.2mm → use 16mm (will increase FOV)
// Actual FOV with 16mm lens: 6.4 × 300 / 16 = 120mm width
//
// Resolution (mm/pixel):
// res = FOV_width / sensor_pixels_x
// At FOV=100mm, 2448px: res = 100/2448 = 0.0409 mm/px = 40.9 µm/px
//
// Minimum detectable feature:
// min_feature = res × 2 (Nyquist sampling theorem)
// = 0.0409 × 2 = 0.082 mm = 82µm minimum visible feature
// For reliable measurement: use min_feature = res × 4-6
// → Reliable gauging down to ~0.164-0.245mm with this setup
Area scan cameras capture a full 2D image at each trigger — used for stationary or indexed parts. Line scan cameras have a single row of pixels (4096–16384 pixels wide) and build up a 2D image line-by-line as the object moves past — used for continuous webs (film, paper, textiles), cylindrical objects (cans, bottles), and very high-resolution inspection where a single area-scan frame would require impractically large sensors. 3D cameras measure height (Z) at every pixel — used for volume, height, warpage, and gap measurement. Technologies: structured light, stereo, Time-of-Flight, laser triangulation (profile sensors).
// Camera type selection guide
//
// AREA SCAN — best for:
// Indexed (stop-and-go) parts
// Parts on trays, pallets, nests
// Complex shape verification
// Code reading, OCR, colour
// Working distance > 200mm
//
// LINE SCAN — best for:
// Continuous web (film, fabric, paper, metal strip)
// Cylindrical objects (require rotation)
// Very high resolution (> 4000px width)
// Print quality inspection
// Objects moving at constant speed > 1 m/s
// Requirement: precise speed/encoder synchronisation
//
// 3D PROFILE (laser line scan) — best for:
// Height/gap measurement
// Warp/flatness inspection
// Object on conveyor profiling
// Volume and mass estimation
// Presence of raised/recessed features
//
// 3D AREA (structured light / stereo) — best for:
// Complete 3D model capture
// Robot guidance pick-and-place
// Bin picking (unstructured parts)
// Assembly gap/flush measurement
//
// Resolution comparison (2D inspection, 300mm FOV width):
// Area scan 5MP (2592px): 300/2592 = 0.116mm/px
// Area scan 20MP (5472px): 300/5472 = 0.055mm/px
// Line scan 4096px: 300/4096 = 0.073mm/px
// Line scan 16384px: 300/16384 = 0.018mm/px
CCD (Charge-Coupled Device): all pixels read out sequentially through a shift register. Excellent image uniformity, very low noise, but slower and more expensive. Now largely replaced in industry. CMOS (Complementary Metal-Oxide Semiconductor): each pixel has its own readout circuit. Faster, lower power, cheaper, can include on-chip processing. Global shutter: all pixels exposed simultaneously — essential for moving objects or strobe lighting. Rolling shutter: rows exposed sequentially — fast-moving objects appear distorted (skewed). Never use rolling shutter for motion or strobe. Pixel size: larger pixels (Sony IMX sensors: 3.45 µm to 7.4 µm) collect more photons = better SNR, but larger sensors = larger/more expensive lenses.
// Shutter type selection — critical for moving objects
//
// GLOBAL SHUTTER: all pixels exposed at exactly the same instant
// → Moving objects captured sharply (no geometric distortion)
// → Strobe synchronisation works perfectly
// → Required for:
// Any moving conveyor inspection
// Strobe LED illumination
// Trigger from encoder or sensor
// Cameras: Basler acA2040-90um (GS), Keyence IV2, SICK Inspector
//
// ROLLING SHUTTER: rows read out top-to-bottom sequentially
// → Row exposure time differs by up to 1/frame_rate
// → Fast-moving objects appear SKEWED or WOBBLY
// → LED strobe creates partially exposed images (banding)
// → Only acceptable for:
// Completely stationary objects
// Very slow motion (< 1 mm per row readout time)
// Low-cost applications with careful lighting design
// Risk: object moving at 0.5m/s, rolling shutter 33ms/frame:
// Top row sees object 16.5mm earlier than bottom row
// → 16.5mm skew in 100mm height = 16.5% geometric error
//
// Sensor selection for SNR:
// Quantum efficiency (QE): % photons converted to electrons
// Full-well capacity: max electrons before saturation (DR limit)
// Read noise: electrons RMS (limits low-light performance)
// SNR = QE × photons / sqrt(QE × photons + read_noise²)
// Target SNR > 40dB for reliable grey-level discrimination
Optics determines what the camera sees. A perfect sensor paired with a poor lens or wrong illumination produces unusable images — no algorithm can recover information that was never captured. Investing time in optical design is the highest-return activity in machine vision system design.
Select an illumination technique and surface type. See the simulated image effect and understand why the combination produces high or low contrast.
Entocentric (standard) lenses: perspective projection — objects closer appear larger. Normal for presence/absence, colour, and coarse position. Telecentric lenses: parallel chief ray — magnification independent of object distance within the depth of field. Essential for dimensional gauging (0.1–50 µm accuracy). Object-space telecentric: constant magnification regardless of Z. Double telecentric: both input and output rays are parallel — used for very high accuracy gauging. Cost: 3–10× entocentric. Macro lenses: standard lenses focused close, 0.1–1:1 magnification. Microscope objectives: 5×–100× magnification for semiconductor and medical inspection.
// Telecentric vs entocentric — when to use each
//
// ENTOCENTRIC (standard):
// Use when: presence/absence, code reading, colour check
// Magnification change per mm defocus:
// At WD=200mm with 25mm lens: Δm/ΔWD = f/WD² = 25/40000 = 0.063%/mm
// Measurement uncertainty from ±0.5mm Z-variation: ±0.03%
// Acceptable for features > 0.5mm with loose tolerance
//
// TELECENTRIC:
// Use when: dimensional gauging < 0.1mm tolerance
// Magnification change per mm defocus: < 0.01%/mm typical
// Cost: Linos, Schneider, Edmund Optics TC lenses: €500–€2000
// Working distance: FIXED — specify at design time (cannot adjust!)
// Object must always be within depth of field: ±1–3mm typical
//
// TELECENTRIC BRANDS:
// Opto Engineering (Italy) — opto-e.com — market leader
// TC series: 0.05–2× magnification, 25–200mm FOV
// TCSM series: telecentric with extended DOF
// Schneider-Kreuznach (Germany) — schneideroptics.com
// Componon-S: high-resolution enlarger/telecentric
// Edmund Optics (USA) — edmundoptics.com
// Gold Series Telecentric: 0.038–0.5× magnification
// Navitar (USA) — navitar.com
// 1-50505: 0.5×, 50mm object size, 0.01% telecentricity error
//
// DEPTH OF FIELD (DOF) calculation:
// DOF = 2 × blur_diameter × f_number × (WD/f)²
// blur_diameter = acceptable unsharpness (typically 2-3 pixels)
// pixel_size = sensor_pitch / magnification
// For 5µm pixel at 1:5 magnification: pixel at object = 25µm
// At f/8, WD=200mm, f=25mm: WD/f = 8
// DOF = 2 × 50µm × 8 × 64 = 51.2mm (very large DOF)
// For precision gauging use f/4–f/8; for DOF use f/16
Illumination is responsible for 70% of the success or failure of a machine vision application. The goal is always maximum contrast between the feature of interest and the background, and minimum variation due to ambient changes. Six fundamental techniques: bright-field (coaxial or dome — flat surfaces appear bright), dark-field (oblique low-angle — surface texture appears bright), backlight (silhouette — edges and holes appear sharp), structured light (for 3D), colour (for colour discrimination), and UV/IR (for fluorescence or penetration through materials).
| Technique | Light angle | Best for | Flat mirror | Surface defects | Edges/holes |
|---|---|---|---|---|---|
| Coaxial | 0° (on-axis) | PCB, wafers, labels | ✓✓ White bg | ✓ Dark on white | Good |
| Dome | All angles diffuse | Curved shiny parts | ✓✓ Uniform | ✓ Low glare | Poor |
| Dark-field | 10–30° oblique | Surface texture/scratches | Dark bg | ✓✓ Bright on dark | Poor |
| Backlight | Transmitted (behind) | Silhouette, holes | N/A | Poor | ✓✓ Sharp edges |
| Ring (45°) | 45° all azimuth | General purpose | Glare spots | OK | OK |
// Illumination selection guide
// Match technique to inspection task
//
// COAXIAL (on-axis, bright-field):
// Source: half-mirror beamsplitter + ring or spot LED
// Best for: flat mirror surfaces (PCBs, wafers, metal parts)
// Effect: flat surfaces = white, scratches/defects = dark
// Range: 0–100mm object size
// Brands: CCS (Japan) LFV2, Metaphase, Advanced Illumination
// DOME (diffuse bright-field):
// Source: hemispherical diffuser with ring LED
// Best for: highly curved or shiny 3D surfaces
// Effect: uniform illumination from all directions
// Eliminates: specular hot-spots on curved parts
// Brands: CCS HLV2 dome, Smart Vision Lights, Metaphase
// DARK-FIELD (low-angle ring):
// Source: ring LED at 10–30° from object plane
// Best for: surface scratches, engraved text, embossed features
// Effect: raised/recessed features = bright, flat = dark
// Great for: embossed codes on coins/parts, surface texture
// Brands: CCS LDL2-150SW, Advanced Illumination RL4560
// BACKLIGHT (transmitted):
// Source: bright flat panel behind object
// Best for: silhouette measurement, drill holes, transparent objects
// Effect: object silhouette = black, background = white
// Best edge finding precision: ±1/4 pixel typical
// Brands: CCS XL-256-200-R, Smart Vision Lights BL series
// COLOUR LED (R/G/B):
// Filter colours to enhance contrast of specific features
// Red filter → red features disappear, blue features go dark
// Blue filter → blue features bright, red features dark
// Example: green LED on printed text on yellow label
// → text appears high contrast
// Brands: CCS TH series (RGB), Gardasoft LED controllers
// STROBE TIMING:
// Strobe duration = camera exposure time (match exactly)
// Strobe delay = trigger_to_exposure_start (microseconds)
// Strobe current: 3–10× continuous rating for 1–5ms pulses
// → 10× brightness without overheating LED
Optical filters placed in front of the lens select specific wavelengths to enhance contrast or eliminate interference. Bandpass filters (e.g. 850 nm IR, 525 nm green) block all light except the target wavelength — when combined with a monochromatic LED of the same wavelength, ambient light is almost completely eliminated. Polarising filters eliminate specular reflections from flat surfaces — a polariser on the light source and a cross-polariser on the lens means only scattered (diffuse) light reaches the sensor, making shiny parts appear matte. Neutral density (ND) filters reduce overall intensity for high-power lights without changing the spectrum.
// Optical filter selection — interference rejection
//
// Problem: ambient fluorescent lighting causes image instability
// Solution: narrow bandpass filter + monochromatic LED
//
// Setup:
// LED wavelength: 850nm (near-IR, invisible, safe)
// Filter: 850nm bandpass ±10nm, OD4 out-of-band rejection
// Effect: fluorescent light (450-650nm) blocked by 10,000×
// Camera sees only LED illumination
// Ambient variation: no effect on image
//
// Cross-polarisation for specular rejection:
// Polariser angle: 0° on LED illuminator
// Analyser angle: 90° on lens (cross-polarised)
// Effect: specular reflection preserves polarisation → blocked
// Diffuse reflection scrambles polarisation → passes
// Use for: shiny metal parts, glass, labels on bottles
// Transmission: ~50% reduction in diffuse signal
//
// Bandpass filter part numbers (Semrock / Edmund Optics):
// FF01-850/10-25 850nm ±10nm, 25mm diameter, OD4 blocking
// FF01-630/20-25 630nm (red LED match), ±20nm
// FF01-525/15-25 525nm (green LED match), ±15nm
// Custom filters: Andover, Omega Optical
//
// LED colour wavelengths:
// Red: 625–660 nm Good for: print marks, colour filters
// Green: 520–535 nm Good for: labels, blood, chlorophyll
// Blue: 455–470 nm Good for: yellow labels, contrast boost
// UV (365nm): Fluorescence excitation, security marks
// IR (850nm): Ambient rejection, covert illumination
// IR (940nm): Solar ambient rejection (outdoor)
Image processing transforms raw pixel data into measurements and decisions. The pipeline: pre-processing (noise reduction, correction) → feature extraction (edges, blobs, corners) → measurement (dimensions, angles, positions) → classification (pass/fail, type identification). Every step must be designed for robustness: the algorithm must work not just for the ideal part but for every acceptable part in every condition that will occur on the production line.
Process a synthetic test image through a configurable pipeline. Adjust each stage and see the effect on the output image and histogram. Understand how each operation changes the image.
Synthetic parts with intentional defects. Click on a part to measure its properties. Set thresholds to pass/fail each part.
Pre-processing removes noise and prepares the image for feature extraction. Gaussian blur (σ = 0.5–2 pixels): reduces sensor noise, prevents false edge detections. Median filter (3×3 or 5×5): removes salt-and-pepper noise (single-pixel spikes) better than Gaussian. Unsharp mask: enhances fine details by subtracting a blurred version. Thresholding converts a greyscale image to binary (0 or 255). Fixed threshold: simple but sensitive to lighting changes. Otsu method: automatically finds the optimal threshold by maximising inter-class variance — robust to uniform lighting variations. Morphological operations: dilation (expand white regions), erosion (shrink white regions), opening (erosion then dilation — removes thin protrusions), closing (dilation then erosion — fills small holes).
// Image pre-processing pipeline — Python/OpenCV example
import cv2
import numpy as np
def preprocess_image(raw_img):
# 1. Gaussian denoise (σ=1.0, 5×5 kernel)
denoised = cv2.GaussianBlur(raw_img, (5, 5), 1.0)
# 2. Flat-field correction (compensate uneven illumination)
# flat_field = acquired with uniform white target
# corrected = (image / flat_field) × mean(flat_field)
# corrected = np.clip(denoised.astype(float) /
# flat_field.astype(float) * 128, 0, 255).astype(uint8)
# 3. Histogram equalisation (CLAHE — adaptive)
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
enhanced = clahe.apply(denoised)
# 4. Otsu threshold (automatic, works across lighting variation)
thresh_val, binary = cv2.threshold(
enhanced, 0, 255,
cv2.THRESH_BINARY + cv2.THRESH_OTSU)
print(f'Otsu threshold: {thresh_val}')
# 5. Morphological opening (remove noise < 3px diameter)
kernel = cv2.getStructuringElement(
cv2.MORPH_ELLIPSE, (3, 3))
cleaned = cv2.morphologyEx(
binary, cv2.MORPH_OPEN, kernel)
# 6. Morphological closing (fill holes < 5px diameter)
kernel5 = cv2.getStructuringElement(
cv2.MORPH_ELLIPSE, (5, 5))
filled = cv2.morphologyEx(
cleaned, cv2.MORPH_CLOSE, kernel5)
return filled, thresh_val
Edge detection finds boundaries between regions. Sobel operator: computes gradient in X and Y — fast but noisy. Canny edge detector: Sobel + non-maximum suppression + hysteresis thresholding — the gold standard for edges. Sub-pixel edge location: the true edge position lies between integer pixel positions. Fit a Gaussian or parabola to the intensity profile across the edge transition — the peak of the derivative gives the edge position to ±0.05–0.1 pixel accuracy. Sub-pixel accuracy is essential for dimensional gauging: at 50 µm/pixel, ±0.1 pixel = ±5 µm measurement uncertainty.
// Canny edge detection + sub-pixel refinement
import cv2
import numpy as np
from scipy.ndimage import gaussian_filter1d
def find_edge_subpixel(img_row, low_threshold=0.3):
"""
Find sub-pixel edge position in a 1D intensity profile
Returns position with ±0.05 pixel accuracy
"""
# 1. Smooth the profile
smoothed = gaussian_filter1d(img_row.astype(float), sigma=1.0)
# 2. First derivative (gradient)
gradient = np.gradient(smoothed)
# 3. Find peak of gradient (integer pixel position)
peak_idx = np.argmax(np.abs(gradient))
# 4. Parabolic interpolation for sub-pixel position
if 1 <= peak_idx <= len(gradient) - 2:
y_m1 = abs(gradient[peak_idx - 1])
y_0 = abs(gradient[peak_idx])
y_p1 = abs(gradient[peak_idx + 1])
# Parabola vertex: offset = (y_m1 - y_p1) / (2*(y_m1 - 2*y_0 + y_p1))
denom = 2 * (y_m1 - 2 * y_0 + y_p1)
if abs(denom) > 1e-6:
offset = (y_m1 - y_p1) / denom
return peak_idx + offset # Sub-pixel position
return float(peak_idx)
# Canny parameters:
# low_threshold: weak edges (connected to strong edges = kept)
# high_threshold: strong edges (always kept)
# aperture_size: Sobel kernel size (3, 5, or 7)
# L2gradient: True = more accurate gradient magnitude
edges = cv2.Canny(img, 50, 150, apertureSize=3, L2gradient=True)
# Rule of thumb: high = 2-3× low
# For noisy images: increase both thresholds
# For weak edges (low contrast): decrease both thresholds
Blob analysis finds connected regions (blobs) in a binary image and measures their geometric properties. Key measurements: area (pixel count), centroid (centre of mass), perimeter, bounding box, major/minor axis (ellipse fit), aspect ratio, circularity (4π×area/perimeter²), convexity (area/convex hull area). These properties feed quality decisions: a circular object with circularity < 0.85 is oval or damaged. A component with area outside [min_area, max_area] is wrong size. Statistical process control (SPC) applies X̄-R or X̄-S charts to continuous measurements to detect process drift before defects are produced.
// Blob analysis and feature measurement
import cv2
import numpy as np
def analyse_blobs(binary_img, min_area=100, max_area=50000):
# Find connected components
num_labels, labels, stats, centroids = \
cv2.connectedComponentsWithStats(
binary_img, connectivity=8)
results = []
for i in range(1, num_labels): # Skip background (0)
area = stats[i, cv2.CC_STAT_AREA]
if not (min_area <= area <= max_area):
continue # Filter by area
x = stats[i, cv2.CC_STAT_LEFT]
y = stats[i, cv2.CC_STAT_TOP]
w = stats[i, cv2.CC_STAT_WIDTH]
h = stats[i, cv2.CC_STAT_HEIGHT]
cx, cy = centroids[i]
# Extract blob mask for detailed analysis
mask = (labels == i).astype(np.uint8) * 255
contours, _ = cv2.findContours(
mask, cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
cnt = contours[0]
perimeter = cv2.arcLength(cnt, True)
circularity = (4 * np.pi * area) / (perimeter ** 2 + 1e-6)
ellipse = cv2.fitEllipse(cnt) if len(cnt) >= 5 else None
aspect_ratio = w / h if h > 0 else 0
# Convex hull for convexity
hull = cv2.convexHull(cnt)
hull_area = cv2.contourArea(hull)
convexity = area / hull_area if hull_area > 0 else 0
results.append({
'area': area,
'centroid': (cx, cy),
'bbox': (x, y, w, h),
'circularity': circularity,
'aspect_ratio': aspect_ratio,
'convexity': convexity,
'perimeter': perimeter,
})
return results
# Quality decision rules:
# PASS if: min_area < area < max_area
# AND circularity > 0.85 (nearly round)
# AND aspect_ratio in [0.9, 1.1] (±10% from square)
# AND convexity > 0.95 (no concave defects)
Traditional rule-based vision algorithms excel at well-defined tasks with consistent conditions. AI-based vision — particularly convolutional neural networks (CNNs) — handles variability, texture-based defects, and complex classification that would require thousands of rules to program explicitly. Understanding when to use AI, how to train it correctly, and how to validate it is the most important modern skill in machine vision.
Visualise how an image flows through a convolutional neural network. Each layer progressively extracts higher-level features. Click a layer to see what it detects at that stage.
Adjust the classification threshold. See how it shifts the trade-off between false negatives (escapes) and false positives (false rejections). Find the optimal threshold for your application.
Estimate how many images you need for a reliable AI model, and how long data collection will take based on your production rate.
Convolutional Neural Networks process images through layers of learned filters. Each convolutional layer detects increasingly abstract features: early layers detect edges and corners, middle layers detect textures and shapes, later layers detect semantic objects. For defect detection, transfer learning is standard: start with a network pre-trained on ImageNet (millions of natural images), replace the final classification layer, and fine-tune on your inspection images. This requires only 100–1000 images per class instead of millions. ResNet, EfficientNet, MobileNet are common backbone architectures.
# Transfer learning for defect classification
# Using PyTorch + EfficientNet-B0 backbone
# Scenario: classify metal surface defects into 5 classes
import torch
import torch.nn as nn
from torchvision import models, transforms
# 1. Load pre-trained backbone (ImageNet weights)
model = models.efficientnet_b0(pretrained=True)
# 2. Freeze all backbone layers (keep ImageNet features)
for param in model.parameters():
param.requires_grad = False
# 3. Replace classifier for our task (5 defect classes)
num_classes = 5 # OK, scratch, pit, crack, contamination
model.classifier[1] = nn.Linear(
model.classifier[1].in_features, num_classes)
# 4. Only train the new classifier layer
optimizer = torch.optim.Adam(
model.classifier.parameters(), lr=0.001)
# 5. Data augmentation — CRITICAL for robustness
train_transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomVerticalFlip(p=0.5),
transforms.RandomRotation(15),
transforms.ColorJitter(
brightness=0.2, contrast=0.2), # Simulate lighting variation
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]) # ImageNet stats
])
# 6. Class imbalance handling
# Defective parts are rare in production! Typical imbalance:
# OK: 95%, defects: 5%
# Solution: weighted loss function
class_weights = torch.tensor([0.05, 0.24, 0.24, 0.24, 0.24])
# OR: oversample minority classes in DataLoader
# OR: use focal loss (penalises easy correct predictions)
criterion = nn.CrossEntropyLoss(weight=class_weights)
# Target metrics BEFORE deployment:
# Sensitivity (recall) on defect classes: > 99%
# Precision on defect classes: > 90%
# AUC-ROC: > 0.999
# Confusion matrix on held-out validation set (never used in training)
Supervised classification requires labelled examples of every defect type. But many industrial applications have defects that are rare, varied, or unknown in advance. Anomaly detection trains only on "normal" (acceptable) examples and learns to detect anything that deviates. Autoencoders learn to compress and reconstruct normal images — anything abnormal produces a high reconstruction error (anomaly map). PatchCore and FastFlow are state-of-the-art methods achieving near-perfect anomaly detection on the MVTec benchmark. Integration in industry: MVTec HALCON 23.05 includes anomaly detection models; Cognex VisionPro and In-Sight offer built-in anomaly tools.
# Anomaly detection with autoencoder (simplified)
# Train ONLY on good (OK) images
# At inference: high reconstruction error = anomaly
import torch
import torch.nn as nn
class ConvAutoencoder(nn.Module):
def __init__(self):
super().__init__()
# Encoder: compresses 256×256 → 8×8×256
self.encoder = nn.Sequential(
nn.Conv2d(1, 32, 3, stride=2, padding=1), # 128×128
nn.ReLU(),
nn.Conv2d(32, 64, 3, stride=2, padding=1), # 64×64
nn.ReLU(),
nn.Conv2d(64, 128, 3, stride=2, padding=1), # 32×32
nn.ReLU(),
nn.Conv2d(128, 256, 3, stride=2, padding=1), # 16×16
nn.ReLU(),
)
# Decoder: reconstructs 16×16×256 → 256×256
self.decoder = nn.Sequential(
nn.ConvTranspose2d(256, 128, 3, stride=2,
padding=1, output_padding=1),
nn.ReLU(),
nn.ConvTranspose2d(128, 64, 3, stride=2,
padding=1, output_padding=1),
nn.ReLU(),
nn.ConvTranspose2d(64, 32, 3, stride=2,
padding=1, output_padding=1),
nn.ReLU(),
nn.ConvTranspose2d(32, 1, 3, stride=2,
padding=1, output_padding=1),
nn.Sigmoid(), # Output normalised 0-1
)
def forward(self, x):
return self.decoder(self.encoder(x))
# Anomaly score = pixel-wise reconstruction error
def detect_anomaly(model, image, threshold=0.05):
with torch.no_grad():
reconstructed = model(image.unsqueeze(0))
error_map = torch.abs(image - reconstructed.squeeze(0))
anomaly_score = error_map.mean().item()
anomaly_map = error_map.squeeze().numpy()
is_defect = anomaly_score > threshold
return is_defect, anomaly_score, anomaly_map
# Key advantage: no defect images needed for training
# Detects NOVEL defects never seen in training
# Limitation: may generate false positives on normal part variation
# Mitigation: augment training data to include all normal variation
Training a model is only 20% of the work. Production deployment requires: model export to ONNX or TensorRT for inference on embedded hardware or smart cameras, latency measurement (target < 10 ms on GPU, < 50 ms on CPU for industrial rates), edge deployment on NVIDIA Jetson or Intel NUC. Validation for regulated industries (medical, automotive safety) requires IQ (Installation Qualification), OQ (Operational Qualification), PQ (Performance Qualification) with statistical confidence intervals. In the EU, AI systems used in safety-critical applications fall under the AI Act (2024): high-risk AI systems require conformity assessment, technical documentation, and ongoing monitoring.
# Production deployment checklist — AI vision system
# ── MODEL EXPORT ────────────────────────────────────────────
# PyTorch → ONNX (framework-agnostic, runs on any hardware)
import torch.onnx
model.eval()
dummy_input = torch.randn(1, 1, 224, 224)
torch.onnx.export(
model, dummy_input, 'defect_model.onnx',
input_names=['image'], output_names=['class_scores'],
dynamic_axes={'image': {0: 'batch_size'}},
opset_version=17)
# ── INFERENCE BENCHMARK ─────────────────────────────────────
import onnxruntime as ort
import time
sess = ort.InferenceSession('defect_model.onnx',
providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
N = 100
start = time.perf_counter()
for _ in range(N):
outputs = sess.run(None, {'image': img_np})
elapsed_ms = (time.perf_counter() - start) / N * 1000
print(f'Inference latency: {elapsed_ms:.2f} ms per image')
# Target: < 10ms (GPU), < 50ms (CPU optimised)
# ── VALIDATION REPORT REQUIREMENTS ─────────────────────────
# For regulated applications (ISO 9001, IATF 16949, FDA 21 CFR)
# Required statistics:
# Sensitivity (recall) per defect class: > 99.0% (calculate CI)
# False Rejection Rate: < 2% (economic limit)
# Validation dataset: >= 200 per class (independent from training)
# Confidence interval: 95% CI using Wilson score method
# Repeatability study: same part 30× → score std dev < 0.01
# Reproducibility: different lighting/operators → same results
# ── EDGE HARDWARE REFERENCE ─────────────────────────────────
# NVIDIA Jetson Orin NX: 1 TOPS INT8, 10W, camera I/O
# Intel Core i7 + OpenVINO: ~20ms inference on CPU, no GPU
# Hailo-8 AI accelerator: 26 TOPS, 2.5W, PCIe or USB
# Smart cameras with built-in AI:
# Keyence IV2-G300MA (NVIDIA GPU embedded, 60fps)
# Cognex In-Sight 9000 (Ambarella chip, deep learning)
# SICK Inspector P620 (Intel OpenVINO, HALCON runtime)
Machine vision applications span every manufacturing industry. Each has unique requirements for image quality, throughput, accuracy, and regulatory compliance. Understanding the industry context shapes every design decision from camera choice to algorithm selection to system integration.
Verify your vision system can keep up with the production line. Calculate the maximum sustainable throughput given your processing latency and part spacing.
Electronics inspection is the highest-volume and most demanding vision application category. Solder joint inspection (AOI — Automated Optical Inspection): 2D and 3D, detect missing components, tombstoning, bridges, insufficient solder. Die bonding: ±5 µm placement accuracy using telecentric cameras and sub-pixel algorithms. Wafer inspection: dark-field and bright-field at 1× to 40× magnification, defect review at SEM level. Screen printing inspection: stencil alignment to PCB pads using fiducial marks. Standards: IPC-A-610 (acceptability of electronic assemblies), IPC-7711/7721 (rework).
// PCB inspection system — solder joint measurement
// Task: measure solder paste height and area after printing
// System: 3D structured light camera on gantry
//
// BRANDS — PCB/Electronics AOI:
//
// SAKI Corporation (Japan) — sakicorp.com
// BF-Lynx3Di: 3D AOI, 0.5-µm Z resolution, 150mm²/s
// Used in: high-end SMT lines, chip package inspection
//
// Cyberoptics (USA) — cyberoptics.com
// SQ3000+: multi-reflection suppression, 3D, PCB inspection
// SE350: solder paste inspection (SPI), ±1µm Z
//
// Mirtec (Korea) — mirtec.com
// MV-6 OMNI: 18MP cameras, multi-angle illumination
// Cost-effective for medium-volume SMT
//
// KohYoung (Korea) — kohyoung.com
// Zenith Ultra: 6µm Z resolution, AI-based false call filtering
// Solves: false positives from IC marking variations
//
// Keyence (Japan) — keyence.com
// IV2-G series: smart camera on robot for inline QC
// XG-X: PC-based vision, multi-camera, 4000fps
//
// Algorithm: solder paste height → volume calculation
// For each pad ROI:
// 1. Acquire 3D height map (Z[x,y] in µm)
// 2. Subtract board reference plane (Z_ref)
// 3. Threshold at Z_paste_min (typically 50µm)
// 4. Compute: volume = Σ (Z[x,y] - Z_ref) × pixel_area
// 5. Compare to target: volume ∈ [80%, 120%] of nominal
// 6. Area coverage: area / pad_area ∈ [80%, 110%]
Automotive: body-in-white (BIW) gap and flush measurement with laser scanners, robot guidance for door installation, seat belt latch verification, VIN plate OCR. IATF 16949 requires full traceability — every inspection result must be logged with part serial number and timestamp. Pharmaceutical: IQ/OQ/PQ validation required for any inspection system under FDA 21 CFR Part 11 (electronic records/signatures), blister pack inspection for missing tablets, label print quality verification. Food: presence of foreign materials (metal, bone, glass) via X-ray, fill level, seal integrity, date code verification.
// Vision system integration — automotive gap/flush
// Application: measure door gap (5.5±0.5mm) and flush (±0.3mm)
// System: 8× Keyence LJ-X8200 laser profilers on robot arm
//
// Measurement protocol (AIAG SPC manual, MSA 4th edition):
//
// Gauge R&R study BEFORE production use:
// 10 parts × 2 operators × 2 replicates = 40 measurements
// Repeatability (EV): within-operator variation
// Reproducibility (AV): between-operator variation
// Gauge R&R % = sqrt(EV² + AV²) / tolerance × 100
// Acceptance criteria: %R&R < 10% = excellent
// 10-30% = marginal (requires judgement)
// > 30% = unacceptable
//
// Data logging for IATF traceability:
VAR
inspection_record : STRUCT
timestamp : STRING[20]; // ISO 8601
part_serial : STRING[24]; // VIN or part number
station_id : STRING[8];
gap_mm : ARRAY[1..8] OF REAL; // 8 measurement points
flush_mm : ARRAY[1..8] OF REAL;
result : STRING[4]; // PASS / FAIL
failure_code : STRING[16]; // Which point failed + direction
camera_serial : STRING[16]; // For traceability
recipe_version : STRING[8]; // Algorithm version
END_STRUCT;
END_VAR
// Pharma track-and-trace:
// Each blister pack: DataMatrix code → serialisation DB lookup
// Verify: correct product, correct lot, within expiry date
// Defect: broken tablet → reject + log to MES
// Regulation: EU FMD (Falsified Medicines Directive)
// FDA 21 CFR Part 11 — audit trail required
A vision system is not just a camera and an algorithm — it is a complete system: mechanical mounting, lighting enclosure, cable management, PLC integration, HMI, data logging, and alarm management. Commissioning a vision system for production requires a structured qualification process, user training, and a maintenance plan.
Visualise the complete timing sequence: trigger → exposure → readout → processing → result → PLC action. Adjust parameters and see if the system fits the production cycle.
Vision systems communicate inspection results to PLCs and MES systems through several interfaces. Digital I/O: simplest — PASS/FAIL outputs, trigger input. Serial RS-232/RS-422: for string data (measurement values, barcodes). PROFINET / EtherNet/IP / EtherCAT: structured real-time data over industrial Ethernet — the modern standard. OPC-UA: for MES/SCADA integration, reading measurement statistics and KPIs. Trigger synchronisation: the camera trigger comes from a photoelectric sensor or encoder, routed through the PLC or directly to the camera. Latency from trigger to result must be within the allowable window (part spacing on conveyor minus processing time).
// Vision system PLC integration — PROFINET example
// Cognex In-Sight with PROFINET module
// PLC: Siemens S7-1500, TIA Portal
// I/O Data structure (PLC ← Vision, 8 bytes input):
// Byte 0: Status
// Bit 0: Job complete (rising edge = new result available)
// Bit 1: Overall pass/fail (1=PASS, 0=FAIL)
// Bit 2: Trigger acknowledged
// Bit 3: System fault
//
// Byte 1: Failure code (which inspection failed)
// 0x00: All pass
// 0x01: Presence check failed
// 0x02: Dimension out of tolerance
// 0x04: Barcode not readable
// 0x08: Colour check failed
// 0x10: OCR failed
//
// Bytes 2-5: REAL — first measurement value (e.g. diameter mm)
// Bytes 6-7: UINT16 — inspection count (rollover)
// PLC ← Vision trigger (2 bytes output):
// Byte 0, Bit 0: Trigger (rising edge starts inspection)
// Byte 0, Bit 1: Reset statistics
// Byte 1: Recipe select (0–255 different product types)
// PLC function block:
FUNCTION_BLOCK FB_VisionInterface
VAR_INPUT
trigger_cmd : BOOL; // From machine logic
recipe_select : BYTE := 0;
END_VAR
VAR_OUTPUT
inspection_ok : BOOL;
fault_code : BYTE;
measurement_1 : REAL;
new_result : BOOL; // Pulse on each new result
system_fault : BOOL;
END_VAR
VAR
rtrig : R_TRIG;
rtrig_result : R_TRIG;
pn_input : ARRAY[0..7] OF BYTE; // PROFINET RX
pn_output : ARRAY[0..1] OF BYTE; // PROFINET TX
END_VAR
// Trigger on rising edge only
rtrig(CLK := trigger_cmd);
pn_output[0].0 := rtrig.Q;
pn_output[1] := recipe_select;
// Parse result on job_complete rising edge
rtrig_result(CLK := pn_input[0].0);
new_result := rtrig_result.Q;
inspection_ok := pn_input[0].1;
fault_code := pn_input[1];
system_fault := pn_input[0].3;
MEMCPY(ADR(measurement_1), ADR(pn_input[2]), 4);
END_FUNCTION_BLOCK
A complete vision system requires cameras, lenses, lights, controllers, and software. Each category has market leaders with specific strengths. Mixing brands is common — a Basler camera with an Opto Engineering telecentric lens and a CCS LED on a Keyence smart controller is a legitimate system.
// Complete brand reference — machine vision ecosystem
//
// ── CAMERAS ──────────────────────────────────────────────────
// Basler (Germany) — baslerweb.com — largest portfolio
// ace 2 (GigE): 1.6–25MP, global/rolling, Color/Mono, €200–800
// boost (CoaXPress): 45MP, 170fps, ultra-high speed
// dart MIPI: embedded (Jetson/RPi) integration
// FLIR / Teledyne (USA) — flir.com
// Grasshopper3: 5–50MP, Sony IMX sensors, PoE, scientific grade
// Blackfly S: compact, 1.3–24MP, USB3/GigE, cost-effective
// Baumer (Switzerland) — baumer.com
// VX series: 5–20MP, IP67, M42 lens mount, industrial
// Allied Vision (Germany) — alliedvision.com
// Alvium MIPI: embedded vision, 0.4–24MP
// Mako G: GigE, 2–16MP, trigger, €150–600
// IDS Imaging (Germany) — ids-imaging.com
// uEye Bolt: USB3, 1.6–12MP, compact, OEM
// uEye CP: GigE, 1.3–20MP, C/CS mount
//
// ── SMART CAMERAS / VISION SENSORS ──────────────────────────
// Keyence IV2-G300MA: 8MP, AI, GigE, 60fps, standalone
// SICK Inspector P620: HALCON embedded, 5MP, GigE
// Cognex In-Sight 9000: 8MP, PatMax, deep learning
// Omron FH series: PC-based, 8-camera, HALCON
// Banner PresencePLUS Pro: plug-and-play, 1.3MP
//
// ── LENSES ───────────────────────────────────────────────────
// Opto Engineering (Italy) — opto-e.com — telecentric leader
// TC23036: 0.257× magnification, 37mm object, 0.01% telecentricity
// Schneider-Kreuznach (Germany) — vision lenses Xenoplan
// Cinegon 1.9/16: 16mm, f/1.9, very sharp
// Computar (Japan/USA) — computar.com
// M0814-MP2: 8mm, 2MP rated, low distortion, economy
// Edmund Optics (USA) — Standard and telecentric, widest range
// Kowa (Japan) — kowa-lenses.com — LM25JC5M: 25mm, 5MP
//
// ── ILLUMINATION ─────────────────────────────────────────────
// CCS Inc (Japan) — ccs-grp.com — widest range
// LFV2 coaxial, HLV2 dome, LDL2 darkfield, TH colour
// Advanced Illumination (USA) — advancedillumination.com
// RL4560: dark-field ring, 45° angle, Gardasoft controller
// Smart Vision Lights (USA) — smartvisionlights.com
// MO75: modular, stackable, any configuration
// Rugged IP67 options for washdown
// Metaphase Technologies (USA) — metaphase-tech.com
// Backlight, dome, ring in standard and custom
// Gardasoft (UK) — gardasoft.com — LED controllers
// RT series: constant current, strobe, < 1µs pulse
//
// ── FRAME GRABBERS ───────────────────────────────────────────
// Euresys (Belgium) — euresys.com
// Coaxlink Quad: 4× CoaXPress, 6.25Gbit/s
// Grablink: Camera Link, Full configuration
// DALSA / Teledyne — Xtium-CL, Xcelera
// Active Silicon (UK) — multi-interface, FPGA-based
//
// ── SOFTWARE ─────────────────────────────────────────────────
// MVTec HALCON (Germany) — mvtec.com/halcon
// Most complete algorithm library, deep learning included
// Used by: Keyence, SICK, Omron OEM
// Cognex VisionPro — pattern matching gold standard (PatMax)
// National Instruments NI Vision (LabVIEW)
// OpenCV (open source) — backbone of most custom systems
// Roboflow + YOLOv8 — rapid AI prototype pipeline
Commissioning a vision system for production is a structured process. Factory Acceptance Test (FAT): verify the system meets specifications before leaving the supplier. Site Acceptance Test (SAT): verify after installation in production. Challenge pieces (golden and reject samples): physical standards verified by an independent method, stored securely, used to verify the system daily. Performance monitoring: log every inspection result; trend false reject rate and run capability studies monthly. Preventive maintenance: clean lenses and optics quarterly, verify LED intensity (LEDs degrade 0.5–1% per 1000 hours), re-qualify after any hardware change.
// Daily verification procedure — golden sample check
// Run at start of each shift, after any power cycle
// 5 PASS samples + 5 FAIL samples (known, independently measured)
VAR
golden_pass_results : ARRAY[1..5] OF BOOL; // All must = PASS
golden_fail_results : ARRAY[1..5] OF BOOL; // All must = FAIL
golden_measurements : ARRAY[1..5] OF REAL; // Compare to NIST-traceable value
golden_ref_values : ARRAY[1..5] OF REAL := [10.023, 10.018, 10.031, 10.025, 10.020];
tolerance_verify : REAL := 0.010; // Verification tolerance (mm)
shift_ok : BOOL;
shift_fault_msg : STRING;
END_VAR
// Run all 10 samples:
shift_ok := TRUE;
FOR i := 1 TO 5 DO
IF NOT golden_pass_results[i] THEN
shift_ok := FALSE;
shift_fault_msg := 'GOLDEN PASS sample ' + INT_TO_STRING(i) + ' failed';
END_IF;
IF golden_fail_results[i] THEN
shift_ok := FALSE;
shift_fault_msg := 'GOLDEN FAIL sample ' + INT_TO_STRING(i) + ' not detected';
END_IF;
IF ABS(golden_measurements[i] - golden_ref_values[i]) > tolerance_verify THEN
shift_ok := FALSE;
shift_fault_msg := 'Measurement drift on sample ' + INT_TO_STRING(i);
END_IF;
END_FOR;
IF NOT shift_ok THEN
// Lock machine — no production until vision recalibrated
production_enable := FALSE;
ALARM_SET('VISION_01', shift_fault_msg);
// Log to MES: calibration event required
ELSE
// Log successful verification
LOG_EVENT('VISION_VERIFY_OK', current_shift, current_operator);
END_IF;
Optics calculations, illumination physics, AI confusion matrices, line scan rates, calibration — real engineering knowledge tested.
A camera has a 1/2.9" sensor with a 2448×2048 pixel array. The field of view must be 120×100 mm. What focal length lens is required?
Why is a coaxial (on-axis) illumination technique preferred for inspecting mirror-like metallic surfaces?
A template matching algorithm achieves a score of 0.92 out of 1.0 on a reference part. What does this score represent and at what threshold should you reject?
What is the difference between "pattern matching" (geometric) and "blob analysis" in machine vision?
A telecentric lens has a working distance of 110 mm and measures an object at the nominal distance as 50.00 mm wide. The same object at 115 mm (5 mm further) measures as 50.01 mm. What is this property called and why does it matter?
What is "structured light" 3D vision and how does it differ from stereo vision?
In a convolutional neural network (CNN) used for defect detection, what is a "false negative" in production terms and why is it the more dangerous error type?
A line-scan camera captures images of a moving web material at 5000 lines/second. The web moves at 200 mm/s. What is the resulting image resolution in the motion direction?
What is the purpose of a "calibration target" (e.g. dot grid or checkerboard) in a machine vision system?
A machine vision system must inspect 120 parts per minute. The camera exposure time is 2 ms and the trigger-to-image-ready time (including processing) is 45 ms. Can this system meet the throughput requirement?