The challenges of aligning artificial intelligence (AI) with human values are the focus of a compelling new book by Dr Travis LaCroix from our Department of Philosophy.
Artificial Intelligence and the Value Alignment Problem delves into the ethical and practical problems surrounding AI, addressing topics such as bias, fairness, transparency, and opacity.
The book aims to provide a comprehensive overview of the fundamental social and ethical dilemmas posed by AI.
Through numerous case studies, Dr LaCroix illustrates the complexities of ensuring present-day AI systems, such as predictive policing or ChatGPT, are in harmony with human values.
We spoke with Dr LaCroix about the inspiration behind the book and its key insights.
What inspired you to write this book on AI?
While teaching at Dalhousie University in Canada, I developed a course on computing and society, focusing on how AI systems can misalign with human values.
However, I struggled to find a suitable textbook – existing materials were either too advanced or scattered across various journal articles.
To address this gap, I decided to write a book that introduces the topic in a structured, accessible way for students and researchers across disciplines.
Can you tell us about your new book and the topics it covers?
This book offers a philosophical and technical introduction to the "value alignment problem" in AI – the challenge of ensuring that AI systems align with human values.
While this issue is often framed in broad terms, my approach is to analyse value alignment through the principal-agent framework from economics, examining how real-world machine learning systems function and identifying concrete ways to address misalignment.
The book also connects value alignment to broader AI ethics topics, including bias, fairness, transparency, intellectual property, privacy, environmental impacts, and power dynamics.
How is this book different from others on the same topic?
Unlike most existing books on AI alignment, which are written for a general audience, this book is designed for academic teaching and research. The most well-known works on the topic – such as Stuart Russell's Human Compatible and Brian Christian's The Alignment Problem – focus on speculative artificial general intelligence (AGI). In contrast, my book examines value alignment in the context of present-day AI systems, making it more directly relevant to ongoing research and policy discussions.
What new research does the book explore?
Rather than focusing on normative questions, such as 'What values should AI systems reflect?', or technical solutions 'How can we encode those values into AI systems?', my book takes a structural approach, examining the systemic factors that contribute to misalignment.
It is designed as a self-contained, interdisciplinary resource, covering the history of AI, the current landscape of artificial intelligence, and the value alignment problem.
I introduce three key axes of value alignment – objectives, information, and principals – alongside an overview of different approaches to alignment, including AI safety and machine ethics.
The book also explores practical strategies for mitigating misalignment, such as benchmarking, the role of language, and the integration of human values. Additionally, an appendix addresses the topic of superintelligence and the control problem, situating these discussions within broader debates about value alignment.
What are some key takeaways you hope readers will gain from your book?
First: The value alignment problem in AI is urgent – not because of hypothetical threats from superintelligent AI, but because of how AI systems are developed and deployed by a small handful of actors who wield massive amounts of power in society.
Second: Many in the AI field operate under the guiding faith of the 'scaling hypothesis' - the idea that increasing computational power and model size leads to better performance.
My book introduces a parallel idea: as AI systems grow in scale, the risks of value misalignment also increase. Understanding this dynamic is essential for ensuring that AI serves the public interest rather than reinforcing harmful incentives.