Study shows AI chatbots provide less-accurate information to vulnerable users

Large language models have been widely championed as revolutionary tools capable of democratizing global access to information. However, new research from the Massachusetts Institute of Technology Center for Constructive Communication indicates that these artificial intelligence systems systematically underperform for the vulnerable demographics that might benefit from them the most.
Presented at the AAAI Conference on Artificial Intelligence, the study investigated state-of-the-art chatbots, including OpenAI’s GPT-4, Anthropic’s Claude 3 Opus, and Meta’s Llama 3. Researchers tested the models using the TruthfulQA and SciQ datasets to measure factual accuracy and truthfulness, while prepending user biographies that varied by education level, English proficiency, and country of origin. The results demonstrated a significant drop in accuracy for users with less formal education or lower English proficiency. These negative effects compounded severely for users at the intersection of both categories.
The research also highlighted alarming disparities in how the models handled queries. Claude 3 Opus, for instance, refused to answer nearly 11% of questions for less educated, non-native English speakers, compared to just 3.6% for control users. In many of these refusals, the model responded with condescending, patronizing, or mocking language, occasionally mimicking broken English. The models also withheld factual information on topics like nuclear power and historical events specifically from less-educated users originating from countries such as Iran or Russia, despite answering the identical prompts correctly for other demographic profiles.
The researchers warn that as personalization features become increasingly common, these inherent sociocognitive biases risk exacerbating existing information inequities by quietly spreading harmful behavior and misinformation to those least equipped to identify it.






