Glossario IA
Il dizionario completo dell'Intelligenza Artificiale
Embedded AutoML
Subfield of AutoML specialized in the automatic generation of models optimized for the specific constraints of embedded devices, including limited memory, low computational power, and energy constraints.
Model Quantization
Optimization technique that reduces the numerical precision of a neural network's weights and activations (typically from 32-bit to 8-bit or less) to decrease model size and accelerate inference on constrained hardware.
Neural Pruning
Process of selectively removing redundant weights or neurons in a neural network to reduce its computational complexity and memory footprint while preserving its accuracy.
Knowledge Distillation
A transfer learning method where a large teacher model trains a more compact student model, allowing the performance of the large model to be retained in an architecture suitable for Edge devices.
Inference Optimization
Set of techniques aimed at reducing the time and resources required to execute a trained model, including operator fusion, efficient memory allocation, and hardware parallelism exploitation.
NAS for Edge
Constrained Neural Architecture Search that automatically optimizes network structures by specifically considering the hardware limitations of Edge devices, such as target latency and power consumption.
Model Compiler
Tool that transforms AI computational graphs into optimized machine code for specific target architectures, incorporating optimizations like quantization and operator fusion.
TensorRT
NVIDIA's optimization and runtime SDK for deploying AI models in production, using quantization, layer fusion, and kernel optimization to maximize performance on NVIDIA GPUs.
TinyML
Field of machine learning focused on running AI models on microcontrollers and ultra-low-power devices, typically with less than 1MB of memory and operating at less than 1mW.
Edge TPU
ASIC hardware accelerator developed by Google specifically for edge AI inference, optimized to run quantized TensorFlow Lite models with high energy efficiency.
Memory optimization
Techniques for reducing the memory footprint of models including weight sharing, compression, and dynamic allocation to adapt to embedded device constraints.
Inference latency
Time elapsed between data input into a model and obtaining its prediction, a critical parameter in real-time Edge applications where typical target values are below 10ms.
Lightweight model
Neural network architecture specifically designed to minimize parameters and computational operations, such as MobileNet or EfficientNet, optimized for mobile and Edge deployments.
Distributed deployment
Strategy of distributing AI workloads across multiple Edge devices to optimize overall resources and improve scalability of distributed AI applications.
Energy optimization
Process of minimizing power consumption of AI models on Edge devices, crucial for battery-powered applications and large-scale deployments.
Edge AI
Paradigm of processing artificial intelligence directly on edge devices, eliminating the need to communicate with the cloud for critical inference tasks.
AI Microcontroller
Ultra-low-power system-on-chip integrating dedicated hardware accelerators for AI inference, enabling the execution of TinyML models with a consumption of a few microwatts.
Hardware-aware optimization
AutoML approach that integrates the specific characteristics of the target hardware into the automatic model design process, ensuring optimal compatibility and performance.
Operator fusion
Compilation technique that combines several adjacent layers or operations into a single kernel operation, reducing memory overhead and improving computational efficiency on the Edge.