Revolutionize Your Business: Unlock Cutting-Edge Ai Technology At Unprecedented Affordability
Unlock the Power of AI in Your Business with a Revolutionary Subscription
In today’s …
23. December 2024
The world of object detection has witnessed a significant shift with the emergence of transformer-based Detection models, known for their one-to-one matching strategy. Unlike traditional many-to-One Detection models like YOLO, which require Non-Maximum Suppression (NMS) to reduce redundancy, DETR models leverage Hungarian Algorithms and multi-head attention to establish a unique mapping between detected objects and ground truth, thereby eliminating the need for intermediate NMS.
However, despite their advantages in latency and instability, DETR-based models have been plagued by slow convergence. This limitation can be attributed to sparse supervision and low-quality matches. The inherent nature of these mechanisms leads to sparsity in training, as only one positive sample per target is assigned, limiting the number of positive samples. This has a detrimental impact on model learning, particularly for tiny object detection.
Small queries in DETR models lack spatial alignment with targets, resulting in boxes with lower IOUs giving high-quality scores. To address these issues, researchers have been exploring ways to incorporate O2M assignments into O2O mechanisms. While this approach increases the density of samples, it requires additional decoders that introduce overhead and redundancy in predictions.
A novel approach called DEIM (Dense Object Matching) has emerged as a game-changer in object detection. Developed by researchers at Intellindust AI Lab, DEIM combines two innovative methods: Dense O2O and Matchability Aware Loss (MAL). The first method increases the number of targets in each training image, thereby boosting the number of positive samples with single mapping. This can be achieved through simple augmentation methods like mosaic or mixups.
DEIM introduces Matchability Aware Loss (MAL), which scales the penalty on low-quality matches by incorporating the IoU between matched queries and targets with classification confidence. This approach has a more straightforward formulation and performs equally well for high-quality matches while improving on lower ones.
Researchers integrated DEIM into popular O2O models like D-FINE-L and D-FINE-X to assess its effectiveness. The results were impressive, with models powered by DEIM outperforming their SOTA counterparts in training cost, inference latency, and detection accuracy.
Comparisons against SOTA O2M models like YOLOV8 to 11 and DETR-based models like RTDETRv2 and D-FINE revealed that DEIM emerged as the clear winner. D-FINE, a leading real-time detector, achieved remarkable results upon incorporation of DEIM, including 0.7 AP and a 30% reduction in training cost.
Most notable improvements were observed in small object detection, where SOTA achieved an AP gain of 1.5. For O2M comparison, DEIM models surpassed YOLO, the current standard for detection.
DEIM presents a simple yet effective solution to slow convergence in DETR-based models. Its performance is unmatched, especially in small object detection, making it an essential framework for researchers and developers looking to improve their object detection capabilities.