Inference-Time Selective Debiasing to Enhance Fairness in Text Classification Models

Kuzmin, Gleb; Yadav, Neemesh; Smirnov, Ivan V.; Baldwin, Timothy; Shelmanov, Artem O.

Inference-Time Selective Debiasing to Enhance Fairness in Text Classification Models

We propose selective debiasing - an inference-time safety mechanism designed to enhance the overall model quality in terms of prediction performance and fairness, especially in scenarios where retraining the model is impractical. The method draws inspiration from selective classification, where at inference time, predictions with low quality, as indicated by their uncertainty scores, are discarded. In our approach, we identify the potentially biased model predictions and, instead of discarding them, we remove bias from these predictions using LEACE - a post-processing debiasing method. To select problematic predictions, we propose a bias quantification approach based on KL divergence, which achieves better results than standard uncertainty quantification methods. Experiments on text classification datasets with encoder-based classification models demonstrate that selective debiasing helps to reduce the performance gap between post-processing methods and debiasing techniques from the at-training and pre-processing categories.1 © 2025 Association for Computational Linguistics

Авторы

Kuzmin Gleb ^2, ⁴ , Yadav Neemesh ⁵ , Smirnov Ivan V. ^4, ³ , Baldwin Timothy ^1, ⁶ , Shelmanov Artem O. ¹

Conference proceedings

Proc. Conf. Nations Americas Chapter Assoc. Comput. Linguist.: Hum. Lang. Technol., NAACL-HLT

Издательство

Association for Computational Linguistics (ACL)

Язык

English

Страницы

95-107

Статус

Published

Ссылка

Внешняя ссылка

DOI

10.18653/V1/2025.NAACL-SHORT.9

Том

Год

2025

Организации

¹ Mohamed Bin Zayed University of Artificial Intelligence, Abu Dhabi, Abu Dhabi, United Arab Emirates
² Weakly-Supervised NLP Group
³ RUDN University, Moscow, Moscow Oblast, Russian Federation
⁴ Laboratory for Analysis and Controllable Text Generation Technologies RAS, Russian Federation
⁵ Indraprastha Institute of Information Technology, Delhi, New Delhi, India
⁶ University of Melbourne, Melbourne, VIC, Australia

Ключевые слова

Classification (of information); Computational linguistics; Natural language processing systems; Prediction models; Text processing; Uncertainty analysis; De-biasing; Low qualities; Modeling quality; Overall-model; Prediction performance; Safety mechanisms; Text classification models; Time predictions; Time-selective; Uncertainty; Forecasting

Цитировать

ГОСТ MLA RIS BibTex

Другие записи

AUTONOMY IN THE RUSSIAN FEDERATION: THEORY AND PRACTICE

Article

Kartashkin V.A., Abashidze A.Kh.

International Journal on Minority and Group Rights. Том 10. 2003. С. 203-220

APPLICATION OF INTERMITTENT HYPOXIA-HYPEROXIA THERAPY IN CLINICAL PRACTICE: PERSPECTIVES FOR USE IN NEUROREHABILITATION (LITERATURE REVIEW)

Article

Ilina Anna A., Petrova Marina V., Ilin Dmitri V., Grechko A.V., Gudojnikova Victoria V.

Вестник анестезиологии и реаниматологии. Том 22. 2025. С. 128-137