Hyperparameter Optimization for AI Agents ✦

Let's explore the art of fine-tuning AI systems: from manually adjusting prompts to creating self-optimizing agents. We will learn how to turn "it works" into "it works flawlessly" through a systematic approach to parameterization.

Strategies for self-optimization: automatically improve prompts, configure hyperparameters of agents/workflows, create adaptive dialogue scenarios. Allows bots to find optimal solutions through generations of improvements — your secret to creating self-learning systems!

Questions

What are hyperparameters?
What hyperparameters do LLMs, workflows, and agents have?
How to automate the search for optimal configurations?
Can we always afford automatic optimization?

What are parameters and hyperparameters?

Parameters are usually the weights of a neural network - mathematical values that we select during training.

Hyperparameters are the parameters that we configure to optimize the model's performance. For example, for a neural network, this could be the number of neurons in a hidden layer, a specific activation function (there are many), etc. For LLMs this could be temperature, max_tokens, top_p, etc.

Types of Hyperparameters

In the materials below, we will be talking about hyperparameters for neural networks - but I ask you to think about them in the context of workflows and agents. Parameters can be:

integer - we can move the value by +1 or -100
real - we can move the value by any amount, for example by 1e-10
boolean - we can turn on/off a certain mode
string - we can change the text
list - we can choose from a list of values

For agents, this could be:

the length of the message history in tokens, after reaching which we summarize the history
temperature
turning on/off certain modes of your agent
prompts for each agent
a list of tools or sub-agents for the agent

In the context of LLM-based services, hyperparameters are the parameters that we configure to optimize the performance of the entire system. For example:

for a simple chatbot this could be:
- system prompt
- temperature
- (meaning, we will choose from several prompts and temperatures to generate the highest quality responses)
for RAG this could be:
- k (the number of "context" chunks that will be used to augment the response)
- chunk_size (the size of the chunk)
- chunk_overlap (chunk overlap)
- sub-areas of documents, databases
- various embedding models
- text preprocessing methods
- etc. + more complex RAGs have even more hyperparameters
for workflow this could be:
- prompts for each step
- types of structures for structural responses
- various classification methods for routing
- etc.
for agents this could be:
- prompts for each agent
- max_iterations
- subagents
- architecture
- tools
- tool descriptions
- data passed from agent to agent
- anything you want to optimize

Examples:

Automatic prompt tuning: killing prompt engineering one and two
Evolutionary self-development of the system
Optimizing the system to achieve the best quality/price/speed/efficiency, etc.

Core Algorithms

Text materials: Wikipedia & Illustrations

Videos:

1. Grid Search & Random Search

Grid Search: Evaluates all combinations in a predefined set. Best suited for small, discrete spaces, but does not scale well.
Random Search: Randomly selects hyperparameters, often outperforming grid search in high-dimensional spaces, effectively exploring more values.

2. Bayesian Optimization

Mechanism: Uses probabilistic models (e.g., Gaussian Processes) to predict promising configurations, balancing exploration and exploitation.
Advantages: Effective for low-dimensional tasks, but struggles with high dimensionality.

3. Evolutionary Algorithms

Process: Mimics natural selection, iteratively evolving populations of hyperparameter sets through mutation and crossover.
Application: Effective for complex, non-differentiable search spaces (e.g., neural architecture search).

4. Hyperband & Successive Halving

Hyperband: Combines random search with early stopping, dynamically allocating resources to promising configurations.
Successive Halving: Aggressively prunes inefficient models early in training.

5. Population-Based Training (PBT)

Adaptive tuning: Simultaneously optimizes hyperparameters and model weights during training, ideal for dynamic tasks such as reinforcement learning.

Genetic Algorithms

Video resources:

Genetic Algorithm with Solved Example

Manual Optimization

info

Hyperparameter optimization is often done in three steps:

Parameter selection
Experiment
Evaluation of results
- where the experiment can be a large dataset of questions/answers or a complex automated environment,
- and the evaluation of results can be carried out by human assessors (most often these assessors are you :) ) or more expensive LLMs

If the total cost of this sampling-experiment-evaluation cycle is too high (for example, $1000), then we will not be able to build a system for automatic optimization and will do it manually.

Then you can choose any algorithm from those studied earlier, and follow its instructions manually. Also, intuition and deductive reasoning will greatly help you with manual optimization.

Now we know...

We have learned what hyperparameters are, how they affect the performance of GenAI apps, and what optimization methods can be used to configure them. Now you can apply this knowledge to improve your AI systems.

Exercises

Think back to one of your recent projects:

What hyperparameters do you think are most critical for your project?
What optimization methods would you choose for your task and why?

Questions​

Types of Hyperparameters​

For agents, this could be:​

In the context of LLM-based services, hyperparameters are the parameters that we configure to optimize the performance of the entire system. For example:​

Core Algorithms​

1. Grid Search & Random Search​

2. Bayesian Optimization​

3. Evolutionary Algorithms​

4. Hyperband & Successive Halving​

5. Population-Based Training (PBT)​

Genetic Algorithms​

Manual Optimization​

Now we know...​

Exercises​