CodeOpt: Code Optimization with Two-Stage Sampling, Few-Shot Learning, and Iterative Self-Reflection

CodeOpt is a cutting-edge framework for optimizing code performance using a combination of Two-Stage Sampling, Few-Shot Learning, and Iterative Self-Reflection. This project leverages advanced techniques such as BM25 for initial preselection, CodeBERT for semantic similarity, and Large Language Models (LLMs) for code optimization.

Features

Two-Stage Sampling: Combines BM25-based preselection with semantic similarity scoring.
Few-Shot Learning: Incorporates representative examples for contextual optimization.
Iterative Self-Reflection: Refines code iteratively to achieve improved runtime and correctness.
Support for Genetic Algorithm Inspired Chain-of-Thought (GA-COT).

Installation

Follow these steps to set up the project environment:

1. Clone the Repository

git clone https://github.com/MarttiWu/codeopt.git
cd codeopt

2. Create a Conda Environment

conda create -n codeopt python=3.10 -y
conda activate codeopt

3. Install Dependencies

Install the required Python libraries:

pip install -r requirements.txt

Dataset and Model

Download the dataset.

Create models folder:

mkdir models

Download the model:

huggingface-cli download TheBloke/Mistral-7B-Instruct-v0.2-GGUF mistral-7b-instruct-v0.2.Q5_K_M.gguf
--local-dir ./models --local-dir-use-symlinks False

Running the Project

1. Prepare the Input Data

Add your input code files to the data/ directory.
Ensure your test cases are in the tests/ directory, structured as expected by the executor.

2. Configuration

Modify the config.json file to set up your desired parameters:

use_ga: Enable/disable Genetic Algorithm Inspired Chain-of-Thought (GA-COT).
use_fewshot: Enable/disable Two-Stage Sampling and Few-Shot Learning.
evaluation_path: Path to save the results.
max_new_tokens: Number of tokens for LLM output.

3. Run the Optimization

Use the following command to run the project:

python src/main.py --config config.json

Example Configuration

Here’s an example of config.json:

{
    "mode": "iterative",
    "train_data_path": "processed_data/python/train_split.jsonl",
    "test_data_path": "processed_data/python/test_split.jsonl",
    "test_cases_path": "processed_data/public_test_cases/",
    "model_path": "./models/mistral-7b-instruct-v0.2.Q5_K_M.gguf",
    "log_path": "./logs/mistral",
    "evaluation_path": "./evaluation/mistral",
    "batch_size": 1,
    "max_seq_length": 2048,
    "max_new_tokens": 1024,
    "temperature": 0.7,
    "use_ga": false,
    "use_fewshot": true
}

Citation

If you use this project in your research, please cite:

@misc{codeopt2024,
  author = {Wu, Meng-Ting and Tsai, Agron and Chen, Leo and Li, Lori},
  title = {CodeOpt: Code Optimization with Two-Stage Sampling, Few-Shot Learning, and Iterative Self-Reflection},
  year = {2024},
  howpublished = {\url{https://github.com/MarttiWu/codeopt}}
}

Acknowledgments

We acknowledge the authors of Search-Based LLMs for Code Optimization for their foundational work that inspired our methodology and approach. Their research: S. Gao, C. Gao, W. Gu, and M. Lyu, “Search-based llms for code optimization,” in 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), pp. 254–266, IEEE Computer Society, 2024.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
configs		configs
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
framework.png		framework.png
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeOpt: Code Optimization with Two-Stage Sampling, Few-Shot Learning, and Iterative Self-Reflection

Features

Installation

1. Clone the Repository

2. Create a Conda Environment

3. Install Dependencies

Dataset and Model

Running the Project

1. Prepare the Input Data

2. Configuration

3. Run the Optimization

Example Configuration

Citation

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

MarttiWu/codeopt

Folders and files

Latest commit

History

Repository files navigation

CodeOpt: Code Optimization with Two-Stage Sampling, Few-Shot Learning, and Iterative Self-Reflection

Features

Installation

1. Clone the Repository

2. Create a Conda Environment

3. Install Dependencies

Dataset and Model

Running the Project

1. Prepare the Input Data

2. Configuration

3. Run the Optimization

Example Configuration

Citation

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages