Evaluating Trust And Safety Of Large Language Models

Courtesy of LLNL

Amid the skyrocketing popularity of large language models (LLMs), researchers at Lawrence Livermore National Laboratory are taking a closer look at how these artificial intelligence (AI) systems perform under measurable scrutiny. LLMs are generative AI tools trained on massive amounts of data in order to produce a text-based response to a query. This technology has the potential to accelerate scientific research in numerous ways, from cyber security applications to autonomous experiments. But even if a billion-parameter model has been trained on trillions of data points, can we still rely on its answer?

Two Livermore co-authored papers examining LLM trustworthiness - how a model uses data and makes decisions - were accepted to the 2024 International Conference on Machine Learning, one of the world's prominent AI/ML conferences.

"This technology has a lot of momentum, and we can make it better and safer," said Bhavya Kailkhura, who co-wrote both papers.

More effective models

Training on vast amounts of data isn't confirmation of a model's trustworthiness. For instance, biased or private information could pollute a training dataset, or a model may be unable to detect erroneous information in the user's query. And although LLMs have improved significantly as they have scaled up, smaller models can sometimes outperform larger ones. Ultimately, researchers are faced with the twin challenges of gauging trustworthiness and defining the standards for doing so.

In "TrustLLM: Trustworthiness in Large Language Models," Kailkhura joined collaborators from universities and research organizations around the world to develop a comprehensive trustworthiness evaluation framework. They examined 16 mainstream LLMs - ChatGPT, Vicuna, and Llama2 among them - across eight dimensions of trustworthiness, using 30 public datasets as benchmarks on a range of simple to complex tasks.

Led by Lehigh University, the study is a deep dive into what makes a model trustworthy. The authors gathered assessment metrics from the already extensive scientific literature on LLMs, reviewing more than 600 papers published during the past five years.

"This was a large-scale effort," Kailkhura said "You cannot solve these problems on your own."

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.