The transformer-based semantic segmentation approaches, which divide the image into different regions by sliding windows and model the relation inside each window, have achieved outstanding success. However, since the relation modeling between windows was not the primary emphasis of previous work, it was not fully utilized.
To solve the problems, a research team led by Zizhang Wu published their new research on 15 October 2024 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team proposed a Graph-Segmenter, including a Graph Transformer and a Boundary-aware Attention module, which is an effective network for simultaneously modeling the more profound relation between windows in a global view and various pixels inside each window as a local one, and for substantial low-cost boundary adjustment. Specifically, they treat every window and pixel inside the window as nodes to construct graphs for both views and devise the Graph Transformer. The introduced boundary-aware attention module optimizes the edge information of the target objects by modeling the relationship between the pixel on the object's edge.
In the research, they proposed a novel relation modeling method acting on sliding windows, using graph convolutions to establish relationships between windows and pixels inside each window, which enhanced the backbone to address the issues above. In particular, they regard each window or the pixels inside as nodes for the graph network and use the visual similarity between nodes to establish the edges between nodes. After that, they use the graph network further to update the nodes and edges of the graph. So that different nodes can adaptively establish connections and update information in network transmission to realize the nonlinear relationship modeling between different windows and different pixels inside. In brief, the network's overall feature learning and characterization capabilities are further improved by enhancing the long-distance nonlinear modeling capabilities between different windows and different pixels inside, which leads to an evident rise in performance.
Furthermore, they introduce an efficient boundary-aware attention-enhanced segmentation head that optimizes the boundary of objects in the semantic segmentation task, allowing us to reduce the labeling cost even further while simultaneously improving the accuracy of the semantic segmentation in the boundary of the objects under consideration. To put it another way, they develop a lightweight local information-aware attention module that allows for improved boundary segmentation. By determining the weights of the pixels around an object's border and applying various attention coefficients to distinct pixels via local perception, it is possible to reinforce the important pixels that are critical in categorization while weakening the interfering pixels. The attention module used in this study has just a few common CNN layers, which makes it efficient for segmentation boundary adjustments when considering the size, floating point operations, and latency time of the segmentation data.
DOI: 10.1007/s11704-023-2563-5