diff --git a/cookbooks/Nomadic_Prompt_Optimization_Report.html b/cookbooks/Nomadic_Prompt_Optimization_Report.html new file mode 100644 index 0000000..f67272a --- /dev/null +++ b/cookbooks/Nomadic_Prompt_Optimization_Report.html @@ -0,0 +1,254 @@ + + + + + + + Optimizing LLM Prompts Using Nomadic Platform's Reinforcement Learning Framework + + + + + +
+ Nomadic Platform Logo +

Optimizing LLM Prompts Using Nomadic Platform's Reinforcement Learning Framework

+
+
+ +
+

Implementation Overview

+
+

Algorithm Description

+

+ The RL Prompt Optimizer employs a reinforcement learning framework to iteratively improve prompts used for language model evaluations. + At each episode, the agent selects an action to modify the current prompt based on the state representation, which encodes features of the prompt. + The agent receives rewards based on a multi-metric evaluation of the model's responses, encouraging the development of prompts that elicit high-quality answers. +

+ +

Hyperparameters & Configuration

+
Model: GPT-3.5-turbo
+
Learning Rate (α): 0.1, 0.05
+
Discount Factor (γ): 0.95
+
Initial ε: 0.1
+
ε decay: 0.99
+
Min ε: 0.01
+ +

Reward Structure

+

+        weights = {
+            "faithfulness": 0.4,  # Context adherence
+            "correctness": 0.3,   # Response accuracy
+            "relevance": 0.2,     # Query relevance
+            "clarity": 0.1        # Comprehensibility
+        }
+
+ +

Performance Summary

+ + + + + + + + + + + + + + + + + +
MetricValue
Best Score Achieved0.7333333333333333
Average Convergence Time50.0 episodes
Mean Q-Value0.175
+
+ + +
+

Results Visualization

+

+ The following interactive visualizations illustrate various aspects of the RL prompt optimization process: +

+
    +
  1. Prompts Overview: Displays both the initial and current prompts for each experiment, updating dynamically as prompts change over episodes.
  2. +
  3. Learning Progress: Plots the score achieved in each episode, indicating the learning curve of the agent.
  4. +
  5. Prompt Evolution: Tracks the length of the prompt over episodes, reflecting how the prompt changes over time.
  6. +
  7. Action Selection Counts: Visualizes the frequency of each action selected by the agent, showing the agent's strategy.
  8. +
  9. Q-values Over Time: Illustrates the average Q-values across episodes, indicating the agent's confidence in its actions.
  10. +
  11. Model Outputs Over Iterations: An animated line plot showing the length of model outputs over iterations.
  12. +
  13. Actions Taken per Episode: Displays the specific action taken by the agent at each episode.
  14. +
+
+
+
+
+ Strategy 1 (α=0.1) +
+ +
+
+ Strategy 1 (α=0.05) +
+ +
+
+ Strategy 2 (α=0.1) +
+ +
+
+ Strategy 2 (α=0.05) +
+
+
+
+

Combined Metrics Visualization

+
+
+
+
+ + +
+

Key Findings

+
+ +
+
+ + + + \ No newline at end of file