A remote sensing image object detection method ABS-YOLO based on improved YOLOv11
https://doi.org/10.29235/1561-2430-2025-61-3-253-264
Abstract
The problem of object detection in Earth remote sensing images is studied, which is important for agricultural monitoring, urban planning, early warning of natural disasters, etc. Due to the different sizes of objects, complex background, and dense distribution of small objects in remote sensing images, problems such as high percentage of missed objects and insufficient accuracy of their coordinates often arise. In this regard, an improved method for YOLOv11, ABS-YOLO, is proposed, which significantly improves the performance of object detection by integrating Averaged Convolution (AConv), Bidirectional Weighted Feature Pyramid (BiFPN), and Swin Transformer attention mechanism. Experimental results show that, compared with YOLOv11, the proposed object detection method ABS-YOLO with AConv, BiFPN, and Swin Transformer achieves 3.9 % increase in mAP50 estimations and 2.6 % increase in mAP50-95 on the NWPU VHR-10 dataset with significant improvement in precision and recall rates. This method allows achieving a balance between efficiency and accuracy of remote sensing object detection due to the proposed improvements.
About the Authors
Wu XianyiBelarus
Wu Xianyi – Postgraduate Student
4, Nezavisimosti Ave., 220030, Minsk
S. V. Ablameyko
Belarus
Sergey V. Ablameyko – Academician of the National Academy of Sciences of Belarus, Dr. Sc. (Engineering), Professor
6, Surganov Str., 220012, Minsk
References
1. Velastegui-Montoya A., Montalván-Burbano N., Carrión-Mero P., Rivera-Torres H., Sadeck L., Adami M. Google Earth Engine: A Global Analysis and Future Trends. Remote Sensing, 2023, vol. 15, no. 14, art. ID 3675. https://doi.org/10.3390/rs15143675
2. Liu H., Gong P., Wang J., Wang X., Ning G., Xu B. Production of global daily seamless data cubes and quantification of global land cover change from 1985 to 2020 – iMap World 1.0. Remote Sensing of Environment, 2021, vol. 258, art. ID 112364. https://doi.org/10.1016/j.rse.2021.112364
3. Zhang L. P., Shen H. F. Progress and future of remote sensing data fusion. Journal of Remote Sensing, 2016, vol. 20, no. 5, pp. 1050–1061. https://doi.org/10.11834/jrs.20166243
4. Dalal N., Triggs B. Histograms of oriented gradients for human detection. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005, vol. 1, pp. 886–893. https://doi.org/10.1109/CVPR.2005.177
5. Lowe D. G. Object recognition from local scale-invariant features. Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, vol. 2, pp. 1150–1157. https://doi.org/10.1109/ICCV.1999.790410
6. Maturana D., Mery D., Soto Á. Face recognition with local binary patterns, spatial pyramid histograms and naive Bayes nearest neighbor classification. Proceedings of the 2009 International Conference of the Chilean Computer Science Society, 2009, pp. 125–132. https://doi.org/10.1109/SCCC.2009.21
7. Girshick R., Donahue J., Darrell T., Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. Arxiv [Preprint], 2014. Available at: https://arxiv.org/abs/1311.2524. https://doi.org/10.48550/arXiv.1311.2524
8. Girshick R. Fast R-CNN. Arxiv [Preprint], 2015. Available at: https://arxiv.org/abs/1504.08083. https://doi.org/10.48550/arXiv.1504.08083
9. Ren S., He K., Girshick R., Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. Arxiv [Preprint], 2016. Available at: https://arxiv.org/abs/1506.01497. https://doi.org/10.48550/arXiv.1506.01497
10. Redmon J., Divvala S., Girshick R., Farhadi A. You only look once: Unified, real-time object detection. Arxiv [Preprint], 2016. Available at: https://arxiv.org/abs/1506.02640. https://doi.org/10.48550/arXiv.1506.02640
11. Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C.-Y., Berg A. C. SSD: Single Shot MultiBox Detector. Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science, vol 9905. Springer, 2016, pp. 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
12. Zhang Y., Ye M., Zhu G., Liu Y., Guo P., Yan J. FFCA-YOLO for small object detection in remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 2024, vol. 62, pp. 1–15. https://doi.org/10.1109/TGRS.2024.3363057
13. Yi H., Liu B., Zhao B., Liu E. Small object detection algorithm based on improved YOLOv8 for remote sensing. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024, vol. 17, pp. 1734–1747. https://doi.org/10.1109/JSTARS.2023.3339235
14. Wu T., Dong Y. YOLO-SE: Improved YOLOv8 for Remote Sensing Object Detection and Recognition. Applied Sciences, 2023, vol. 13, no. 24, art. ID 12977. https://doi.org/10.3390/app132412977
15. Wang X., Gao H., Jia Z., Li Z. BL-YOLOv8: An Improved Road Defect Detection Model Based on YOLOv8. Sensors, 2023, vol. 23, no. 20, art. ID 8361. https://doi.org/10.3390/s23208361
16. Soudeep S., Jahin M. A., Mridha M. F. Interpretable dynamic graph neural networks for small occluded object detection and tracking. Arxiv [Preprint], 2025. Available at: https://arxiv.org/abs/2411.17251. https://doi.org/10.48550/arXiv.2411.17251
17. Khanam R., Hussain M. YOLOv11: An overview of the key architectural enhancements. Arxiv [Preprint], 2024. Available at: https://arxiv.org/abs/2410.17725. https://doi.org/10.48550/arXiv.2410.17725
18. Tan M., Pang R., Le Q. V. EfficientDet: Scalable and efficient object detection. Arxiv [Preprint], 2020. Available at: https://arxiv.org/abs/1911.09070. https://doi.org/10.48550/arXiv.1911.09070
19. Lin T.-Y., Dollár P., Girshick R., He K., Hariharan B., Belongie S. Feature pyramid networks for object detection. Arxiv [Preprint], 2017. Available at: https://arxiv.org/abs/1612.03144. https://doi.org/10.48550/arXiv.1612.03144
20. Liu S., Qi L., Qin H., Shi J., Jia J. Path aggregation network for instance segmentation. Arxiv [Preprint], 2018. Available at: https://arxiv.org/abs/1803.01534. https://doi.org/10.48550/arXiv.1803.01534
21. Ghiasi G., Lin T.-Y., Pang R., Le Q. V. NAS-FPN: Learning scalable feature pyramid architecture for object detection. Arxiv [Preprint], 2019. Available at: https://arxiv.org/abs/1904.07392. https://doi.org/10.48550/arXiv.1904.07392
22. Liu Z., Lin Y., Cao Y., Hu H., Wei Y., Zhang Z., Lin S., Guo B. Swin Transformer: Hierarchical vision transformer using shifted windows. Arxiv [Preprint], 2021. Available at: https://arxiv.org/abs/2103.14030. https://doi.org/10.48550/arXiv.2103.14030
23. Su H., Wei S., Yan M., Wang C., Shi J., Zhang X. Object detection and instance segmentation in remote sensing imagery based on precise Mask R-CNN. IGARSS 2019 – 2019 IEEE International Geoscience and Remote Sensing Symposium, 2019, pp. 1454–1457. https://doi.org/10.1109/IGARSS.2019.8898573
24. Su H., Wei S., Liu S., Liang J., Wang C., Shi J., Zhang X. HQ-ISNet: High-Quality Instance Segmentation for Remote Sensing Imagery. Remote Sensing, 2020, vol. 12, no. 6, pp. 989. https://doi.org/10.3390/rs12060989
25. Khanam R., Hussain M. What is YOLOv5: A deep look into the internal features of the popular object detector. Arxiv [Preprint], 2024. Available at: https://arxiv.org/abs/2407.20892. https://doi.org/10.48550/arXiv.2407.20892
26. Li C., Li L., Jiang H., Weng K., Geng Y., Li L., Ke Z. [et al.]. YOLOv6: A single-stage object detection framework for industrial applications. Arxiv [Preprint], 2022. Available at: https://arxiv.org/abs/2209.02976. https://doi.org/10.48550/arXiv.2209.02976
27. Jocher G., Qiu J., Chaurasia A. Ultralytics YOLO (Version 8.0.0) [Computer software]. 2023. Available at: https:// github.com/ultralytics/ultralytics
28. Wang A., Chen H., Liu L., Chen K., Lin Z., Han J., Ding G. YOLOv10: Real-time end-to-end object detection. Arxiv [Preprint], 2024. Available at: https://arxiv.org/abs/2405.14458. https://doi.org/10.48550/arXiv.2405.14458
29. Ouyang D., He S., Zhang G., Luo M., Guo H., Zhan J., Huang Z. Efficient multi-scale attention module with cross-spatial learning. ICASSP 2023 – 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5. https://doi.org/10.1109/ICASSP49357.2023.10096516
30. Yang L., Zhang R.-Y., Li L., Xie X. SimAM: A simple, parameter-free attention module for convolutional neural networks. Proceedings of the 38th International Conference on Machine Learning, PMLR, 2021, vol. 139, pp. 11863–11874. Available at: https://proceedings.mlr.press/v139/yang21o.html
31. Chen Z., Lu S. CAF-YOLO: A Robust framework for multi-scale lesion detection in biomedical imagery. Arxiv [Preprint], 2024. Available at: https://arxiv.org/abs/2408.01897. https://doi.org/10.48550/arXiv.2408.01897
32. Xu S., Zheng S., Xu W., Xu R., Wang C., Zhang J., Teng X., Li A., Guo L. HCF-Net: Hierarchical context fusion net work for infrared small object detection. Arxiv [Preprint], 2024. Available at: https://arxiv.org/abs/2403.10778. https://doi.org/10.48550/arXiv.2403.10778
33. He K., Gkioxari G., Dollár P., Girshick R. Mask R-CNN. Arxiv [Preprint], 2018. Available at: https://arxiv.org/abs/1703.06870. https://doi.org/10.48550/arXiv.1703.06870