Optimizing Deeplabv3+ with Multi-Scale Attention for Semantic Segmentation
Main Article Content
Abstract
DeepLabv3+, a leading model for semantic segmentation, often struggles with high computational costs and inadequate multi-scale representation, leading to blurred boundaries and poor detection of small-scale targets. To overcome these challenges, our work introduces an efficient network built upon the lightweight MobileNetV2 backbone that incorporates three novel modules. First, our DENS-ASPP module replaces the standard ASPP to better capture multi-scale features using a densely connected atrous cascade. These features are then refined by the SEA module, which applies spatial attention for modeling extensive, direction-sensitive contextual information. The last component of our architecture is the DCE module, which enhances the decoder with coordinate attention, embedding positional information to sharpen object details. Our model achieves 72.56% mIoU and 87.28% mPA on the PASCAL VOC 2012 dataset, demonstrating that this integrated framework yields substantial gains in segmentation performance.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Every article published in the Romphruek Journal of the Humanities and Social Sciences is the opinion and point of view of the authors. Thery're not the viewpoint of Krirk University or the editored department. Any part or all of the articles for pablication must be clearly cited.
References
Azad, R., Asadi-Aghbolaghi, M., Fathy, M., & Escalera, S. (2020, August). Attention Deeplabv3+: Multi-level context attention mechanism for skin lesion segmentation. In European Conference on Computer Vision (pp. 251–266). Springer.
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.
Ding, P., & Qian, H. (2024). Light-Deeplabv3+: A lightweight real-time semantic segmentation method for complex environment perception. Journal of Real-Time Image Processing, 21(1), 1.
Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The Pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.
Honarbakhsh, V., Siahkoohi, H. R., Rezghi, M., & Sabeti, H. (2023). SeisDeepNET: An extension of Deeplabv3+ for full waveform inversion problem. Expert Systems with Applications, 213(1), 118848.
Hou, Q., Zhou, D., & Feng, J. (2021). Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(pp. 13713–13722). IEEE.
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of the IEEEConference on Computer Vision and Pattern Recognition (pp. 7132–7141). Salt Lake City, UT, US.
Jiang, L., Zhou, W., Li, C., & Wei, Z. (2021, March). Semantic segmentation based on DeeplabV3+ with multiple fusions of low-level features. In 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing China. (pp. 1957–1963). IEEE.
Lee, K., & Park, K. S. (2024). Deep learning model analysis of drone images for unauthorized occupancy detection of river site. Journal of Coastal Research, 116(SI), 284–288.
Li, L., Zhang, W., Zhang, X., Emam, M., & Jing, W. (2023). Semi-supervised remote sensing image semantic segmentation method based on deep learning. Electronics, 12(2), 348.
Lili, G., & Jinzhi, Z. (2022, August). A lightweight network for semantic segmentation of road images based on improved DeepLabv3+. In 2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI) (pp. 832–837). IEEE.
Wang, J., Zhang, X., Yan, T., & Tan, A. (2023). Dpnet: Dual-pyramid semantic segmentation network based on improved Deeplabv3 plus. Electronics, 12(14), 3161.
Wenkuana, D., & Shicai, G. (2023). Hazy images segmentation method based on improved DeeplabV3. Academic Journal of Computer and Information Science, 6(5), 21–29.
Xiang, S., Wei, L., & Hu, K. (2024). Lightweight colon polyp segmentation algorithm based on improved DeepLabV3+. Journal of Cancer, 15(1), 41–50.
Yang, M., Yu, K., Zhang, C., Li, Z., & Yang, K. (2018). DenseASPP for semantic segmentation in street scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3684–3692). IEEE.
Yang, Z., Peng, X., & Yin, Z. (2020, October). Deeplab_v3_plus-net for image semantic segmentation with channel compression. In 2020 IEEE 20th International Conference on Communication Technology (ICCT) (pp. 1320–1324). IEEE.
Zeng, H., Peng, S., & Li, D. (2020, November). Deeplabv3+ semantic segmentation model based on feature cross attention mechanism. Journal of Physics: Conference Series, 1678(1), 012106.
Zhang, Z., Huang, J., Jiang, T., Sui, B., & Pan, X. (2020). Semantic segmentation of very high-resolution remote sensing image based on multiple band combinations and patchwise scene analysis. Journal of Applied Remote Sensing, 14(1), 016502.
Zhao, H., Shi, J., & Qi, X. (2017, July). Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2881–2890). IEEE.