MlingConf: A Comprehensive Investigation of Multilingual Confidence Estimation for Large Language Models

This project MlingConf introduce a comprehensive investigation of Multilingual Confidence estimation on LLMs, focusing on both language-agnostic (LA) and languagespecific (LS) tasks to explore the performance and language dominance effects of multilingual confidence estimations on different tasks.

Data Construction

The benchmark comprises four meticulously checked and human-evaluate high-quality multilingual datasets for LA tasks and one for the LS task tailored to specific social, cultural, and geographical contexts of a language. The proposed MlingConf datasets are constructed as follows.

Translation

python code/preparation.py --dataset triviaqa

python code/translate.py --stage translate --dataset triviaqa

Consistency Check and Filter

python code/preparation.py --dataset common

python code/translate.py --stage translate --dataset common

Experiments

LLM Inference on MlingConf datasets

# model_name: [llama3, gpt-3.5, llama2, vicuna]
# dataset: [triviaqa, common, gsm8k, sciq, lsqa]
# max_length: [16, 16, 200, 16, 16]

CUDA_VISIBLE_DEVICES=1 python code/inference.py --model_name llama3 --dataset triviaqa --max_length 16

AUROC, ECE, and Accuracy Evaluation

# model_name: [llama3, gpt-3.5, llama2, vicuna]
# dataset: [triviaqa, common, gsm8k, sciq, lsqa]
CUDA_VISIBLE_DEVICES=1 python code/confidence.py --model_name llama3 --dataset triviaqa --max_length 48
python code/evaluation.py --model llama3 --dataset triviaqa

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
code		code
data		data
figs		figs
prompt		prompt
.gitignore		.gitignore
README.md		README.md
main		main
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MlingConf: A Comprehensive Investigation of Multilingual Confidence Estimation for Large Language Models

Data Construction

Translation

Consistency Check and Filter

Experiments

LLM Inference on MlingConf datasets

AUROC, ECE, and Accuracy Evaluation

About

Releases

Packages

Languages

AmourWaltz/MlingConf

Folders and files

Latest commit

History

Repository files navigation

MlingConf: A Comprehensive Investigation of Multilingual Confidence Estimation for Large Language Models

Data Construction

Translation

Consistency Check and Filter

Experiments

LLM Inference on MlingConf datasets

AUROC, ECE, and Accuracy Evaluation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages