This is the repository of dataset and source code for "BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models".
Setup the environment by first downloading this repository and then running:
pip install -r requirements.txt
The datasets evaluated in this paper are available in the data/ directory:
- probabilistic estimation:
common2sense_human_annotation.csv
(for evaluation) andcommon2sense_human_annotation.json
( We provide this in the same format as a decision-making dataset to facilitate easier inference). - decision making:
common2sense.json
,plasma.json
andtoday.json
. Each JSON dataset contains the following columns:scenario
statement
opposite_statement
additional_sentence_label
(indicates which statement each additional condition supports)- In
common2sense.json
, the additional conditions are provided asadded_information
andoppo_added_information
. - In
plasma.json
andtoday.json
, the additional conditions are listed underadditional_sentences
.
Configure files for running the pipeline are in the scripts/ directory:
- To run the entire BIRD pipeline:
bash scripts/run_bird.sh
- To run the baselines:
bash scripts/baseline.sh
- To run the evaluation:
bash scripts/eval.sh
If you find the project helpful, please cite:
@inproceedings{
feng2025bird,
title={{BIRD}: A Trustworthy Bayesian Inference Framework for Large Language Models},
author={Yu Feng and Ben Zhou and Weidong Lin and Dan Roth},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=fAAaT826Vv}
}