Skip to content

The project for MlingConf: A Comprehensive Investigation of Multilingual Confidence Estimation for Large Language Models

Notifications You must be signed in to change notification settings

AmourWaltz/MlingConf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MlingConf: A Comprehensive Investigation of Multilingual Confidence Estimation for Large Language Models

[📄 Paper Link]

This project MlingConf introduce a comprehensive investigation of Multilingual Confidence estimation on LLMs, focusing on both language-agnostic (LA) and languagespecific (LS) tasks to explore the performance and language dominance effects of multilingual confidence estimations on different tasks.

Data Construction

The benchmark comprises four meticulously checked and human-evaluate high-quality multilingual datasets for LA tasks and one for the LS task tailored to specific social, cultural, and geographical contexts of a language. The proposed MlingConf datasets are constructed as follows.

Translation

python code/preparation.py --dataset triviaqa

python code/translate.py --stage translate --dataset triviaqa

Consistency Check and Filter

python code/preparation.py --dataset common

python code/translate.py --stage translate --dataset common

Experiments

LLM Inference on MlingConf datasets

# model_name: [llama3, gpt-3.5, llama2, vicuna]
# dataset: [triviaqa, common, gsm8k, sciq, lsqa]
# max_length: [16, 16, 200, 16, 16]

CUDA_VISIBLE_DEVICES=1 python code/inference.py --model_name llama3 --dataset triviaqa --max_length 16

AUROC, ECE, and Accuracy Evaluation

# model_name: [llama3, gpt-3.5, llama2, vicuna]
# dataset: [triviaqa, common, gsm8k, sciq, lsqa]
CUDA_VISIBLE_DEVICES=1 python code/confidence.py --model_name llama3 --dataset triviaqa --max_length 48
python code/evaluation.py --model llama3 --dataset triviaqa

About

The project for MlingConf: A Comprehensive Investigation of Multilingual Confidence Estimation for Large Language Models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published