In crowdsourcing scenarios, we can obtain each instance's multiple noisy labels from different crowd workers and then infer its integrated label via label aggregation. In spite of the effectiveness of label aggregation methods, there still remains a certain level of noise in the integrated labels. Thus, some noise correction methods have been proposed to reduce the impact of noise in recent years. However, to the best of our knowledge, existing methods rarely consider an instance's information from both its features and multiple noisy labels simultaneously when identifying a noise instance.
To solve these issues, a research team led by Liangxiao JIANG published their new research on 15 October 2024 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team proposed a novel noise correction method called as label distribution similarity-based noise correction (LDSNC). LDSNC reduces the impact of noise in the integrated labels by considering each instance's information from both its features and multiple noisy labels simultaneously. Experimental results on simulated and real-world crowdsourced datasets demonstrate that LDSNC outperforms all the other state-of-the-art competitors in terms of the noise ratio.
In the research, they learn multiple classifiers using instances' features and their integrated labels and then use them to estimate the predicted label distribution for each instance. Then, they measure the similarity between each instance's predicted label distribution and multiple noisy label distribution to filter noise instances. Finally, they correct the noise instances according to the consensus voting strategy.
Firstly, to measure whether an instance's features are distinguishable, LDSNC obtains each instance's predicted label distribution by building multiple classifiers using instances' features and their integrated labels. Secondly, to measure whether an instance's multiple noisy labels are noisy, LDSNC obtains each instance's multiple noisy label distribution using its multiple noisy labels. Finally, LDSNC uses the Kullback-Leibler (KL) divergence to calculate the similarity between the predicted label distribution and multiple noisy label distribution and defines the instance with the lower similarity as a noise instance. The extensive experimental results on simulated and real-world crowdsourced datasets validate the effectiveness of LDSNC.
Future work can focus on taking those instances whose features are indistinguishable but whose multiple noisy labels are noisy into account to further enhance the effectiveness of the method.
DOI: 10.1007/s11704-023-2751-3