Optimizing Deeplabv3+ with Multi-Scale Attention for Semantic Segmentation

Main Article Content

Jiajing Liu
Ahmad Khan

Abstract

DeepLabv3+, a leading model for semantic segmentation, often struggles with high computational costs and inadequate multi-scale representation, leading to blurred boundaries and poor detection of small-scale targets. To overcome these challenges, our work introduces an efficient network built upon the lightweight MobileNetV2 backbone that incorporates three novel modules. First, our DENS-ASPP module replaces the standard ASPP to better capture multi-scale features using a densely connected atrous cascade. These features are then refined by the SEA module, which applies spatial attention for modeling extensive, direction-sensitive contextual information. The last component of our architecture is the DCE module, which enhances the decoder with coordinate attention, embedding positional information to sharpen object details. Our model achieves 72.56% mIoU and 87.28% mPA on the PASCAL VOC 2012 dataset, demonstrating that this integrated framework yields substantial gains in segmentation performance.

Article Details

Section
บทความวิจัย

References

Azad, R., Asadi-Aghbolaghi, M., Fathy, M., & Escalera, S. (2020, August). Attention Deeplabv3+: Multi-level context attention mechanism for skin lesion segmentation. In European Conference on Computer Vision (pp. 251–266). Springer.

Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.

Ding, P., & Qian, H. (2024). Light-Deeplabv3+: A lightweight real-time semantic segmentation method for complex environment perception. Journal of Real-Time Image Processing, 21(1), 1.

Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The Pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.

Honarbakhsh, V., Siahkoohi, H. R., Rezghi, M., & Sabeti, H. (2023). SeisDeepNET: An extension of Deeplabv3+ for full waveform inversion problem. Expert Systems with Applications, 213(1), 118848.

Hou, Q., Zhou, D., & Feng, J. (2021). Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(pp. 13713–13722). IEEE.

Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of the IEEEConference on Computer Vision and Pattern Recognition (pp. 7132–7141). Salt Lake City, UT, US.

Jiang, L., Zhou, W., Li, C., & Wei, Z. (2021, March). Semantic segmentation based on DeeplabV3+ with multiple fusions of low-level features. In 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing China. (pp. 1957–1963). IEEE.

Lee, K., & Park, K. S. (2024). Deep learning model analysis of drone images for unauthorized occupancy detection of river site. Journal of Coastal Research, 116(SI), 284–288.

Li, L., Zhang, W., Zhang, X., Emam, M., & Jing, W. (2023). Semi-supervised remote sensing image semantic segmentation method based on deep learning. Electronics, 12(2), 348.

Lili, G., & Jinzhi, Z. (2022, August). A lightweight network for semantic segmentation of road images based on improved DeepLabv3+. In 2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI) (pp. 832–837). IEEE.

Wang, J., Zhang, X., Yan, T., & Tan, A. (2023). Dpnet: Dual-pyramid semantic segmentation network based on improved Deeplabv3 plus. Electronics, 12(14), 3161.

Wenkuana, D., & Shicai, G. (2023). Hazy images segmentation method based on improved DeeplabV3. Academic Journal of Computer and Information Science, 6(5), 21–29.

Xiang, S., Wei, L., & Hu, K. (2024). Lightweight colon polyp segmentation algorithm based on improved DeepLabV3+. Journal of Cancer, 15(1), 41–50.

Yang, M., Yu, K., Zhang, C., Li, Z., & Yang, K. (2018). DenseASPP for semantic segmentation in street scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3684–3692). IEEE.

Yang, Z., Peng, X., & Yin, Z. (2020, October). Deeplab_v3_plus-net for image semantic segmentation with channel compression. In 2020 IEEE 20th International Conference on Communication Technology (ICCT) (pp. 1320–1324). IEEE.

Zeng, H., Peng, S., & Li, D. (2020, November). Deeplabv3+ semantic segmentation model based on feature cross attention mechanism. Journal of Physics: Conference Series, 1678(1), 012106.

Zhang, Z., Huang, J., Jiang, T., Sui, B., & Pan, X. (2020). Semantic segmentation of very high-resolution remote sensing image based on multiple band combinations and patchwise scene analysis. Journal of Applied Remote Sensing, 14(1), 016502.

Zhao, H., Shi, J., & Qi, X. (2017, July). Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2881–2890). IEEE.