Abstract:
Developing value-aligned agents is a complex undertaking and an ongoing challenge in the field of AI. Indeed, designing Large Language Models (LLMs) that can balance mult...Show MoreMetadata
Abstract:
Developing value-aligned agents is a complex undertaking and an ongoing challenge in the field of AI. Indeed, designing Large Language Models (LLMs) that can balance multiple possibly conflicting moral values based on the context is a problem of paramount importance. In this paper, we propose a system that performs contextual value alignment based on contextual aggregation of possible responses. This aggregation is achieved by integrating a subset of possible LLM responses that are best suited to a user's input while taking into account features extracted about the user's moral preferences. The proposed system trained using the Moral Integrity Corpus displays better alignment to human values than state-of-the-art baselines.
Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information:
ISSN Information:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Ethical Issues ,
- Challenge In The Field ,
- Human Values ,
- Field Of Artificial Intelligence ,
- Loss Function ,
- Ethical Principles ,
- Moral Responsibility ,
- Training Examples ,
- User Profile ,
- Artificial Intelligence Systems ,
- Plain English ,
- Moral Agency ,
- Human Preferences ,
- Aggregation Module ,
- Vector C ,
- Proximal Policy Optimization ,
- Reward Model ,
- Alignment Of Values ,
- Chatbot
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Ethical Issues ,
- Challenge In The Field ,
- Human Values ,
- Field Of Artificial Intelligence ,
- Loss Function ,
- Ethical Principles ,
- Moral Responsibility ,
- Training Examples ,
- User Profile ,
- Artificial Intelligence Systems ,
- Plain English ,
- Moral Agency ,
- Human Preferences ,
- Aggregation Module ,
- Vector C ,
- Proximal Policy Optimization ,
- Reward Model ,
- Alignment Of Values ,
- Chatbot
- Author Keywords