Certifying Large Language Models with LLMCert

TL;DR

Large Language Models (LLMs) have shown impressive performance as chatbots, and are hence used by millions of people worldwide. This, however, brings their safety and trustworthiness to the forefront, making it imperative to guarantee their reliability. Prior work has generally focused on establishing the trust in LLMs using evaluations on standard benchmarks. This analysis, however, is insufficient due to the limitations of the benchmarking datasets, their use in LLMs' safety training, and the lack of guarantees through benchmarking. As an alternative, we propose quantitative certificates for LLMs. We present the first family of frameworks, LLMCert, that gives formal guarantees on the behaviors of LLMs. Individual frameworks certify LLMs for counterfactual bias and knowledge comprehension. We provide details of the individual frameworks in their respective project pages given below.

Media coverage

[Sep 2024] IBM references LLMCert-B (QuaCer-B) https://www.ibm.com/think/insights/ai-ethics-tools as a provable measure for LLM bias.

[Jul 2024] Thanks to Bruce Adams and Siebel School of Computing and Data Science for writing about our work here https://siebelschool.illinois.edu/news/bias-LLMs.

Certifying Large Language Models with LLMCert

TL;DR

Certifying counterfactual bias with LLMCert-B

Check the project page here.

Certifying knowledge comprehension with LLMCert-C

Check the project page here.

Media coverage