Exploring the OpenAI API

Understanding how to work with the OpenAI API will allow you to easily understand the employment of frameworks that we will go through in the Junior block, and will also be useful when debugging problems and working with all OpenAI-compatible providers.

Questions

What does the body of an API request in the OpenAI API look like?
What parameters can be configured besides temperature, max_tokens?
How does streaming work in the OpenAI API?
How to use function calling in chat models?
What is structured output?

Steps

1. Read the OpenAI API documentation

Read everything about the chat/completions endpoint, don't skip anything. All future material will be based on this endpoint and it is important for you to understand in detail how it works, otherwise you will be confused.

What are Temperature, Top P, Frequency penalty, Presence penalty?

2. Read pages from the Cookbook

tip

There are three ways to read further:

Superficially, to understand the landscape and capabilities of the API. (a good option if it's difficult)
Read carefully, understand exactly what each function does.
Run these notebooks yourself.

3. Read pages from Cookbook 2

[Regarding functions, read entirely how_to_call_functions_with_chat_models
How to use structured outputs? or an alternative one-hour course on deeplearning.ai

Extra Steps

E1. We lay down the principle of divide and conquer.

https://cookbook.openai.com/examples/entity_extraction_for_long_documents

E2. What parameters do not yet exist in any public API?

<EOS> - end of sequence - a token after which we usually stop making predictions of the next tokens

I want to reveal some interesting points to you that I have not yet encountered with providers. Usually we set the temperature and all other parameters as a constant - but this can also be done dynamically - so that the parameters change depending on the conditions, mathematical functions, etc. Examples:

For example, while the LLM is analyzing the task, we keep the temperature at zero; and when it comes to writing creative text, we raise it to 0.6
After the LLM generates <EOS>, we can continue to generate tokens
We can manually multiply the probability of outputting <EOS> by any value (even 0) up to a certain number of generated tokens. In addition to multiplication, any logic is available.
Just like with <EOS>, we can do the same with any tokens - we can come up with any logic that is only born in schizophrenia. We can statically/dynamically/logically influence the probabilities (0-100%) of generating a specific token or set of tokens or sequence of tokens.

What could be added to existing APIs:

the ability to prohibit generating <EOS> when the sequence length is less than a certain number of tokens (MinOutputLen)
BannedTokensToGenerate
the ability to control the probability of <EOS> depending on the length of the generated text (BoostEos)
NGramPenalty
the ability to continue generation for a certain number of steps even after <EOS> (IncludeLateHypotheses)
prohibit generating certain tokens after a certain number of repetitions (MaxRepeats)
the ability to select a sampling algorithm via SamplerParams (Sampling, BeamSearch, StochasticBeamSearch), NumHypos & BeamSize
the ability to change the temperature during generation - TemperatureScheduler

For reasoning models, you can multiply all these features by two: for the reasoning and generation phases.

Of course, all this can be implemented with local model inference on your hardware. Come work at OpenAI/Sber/Yandex/Anthropic - all these parameters are available in the private API!

Now we know...

We have studied the main components of the OpenAI API and learned how to:

Count tokens in texts using tiktoken to optimize requests
Work with streaming responses to create more responsive interfaces
Use function calling for structured interaction with external tools
Receive structured responses in JSON format for convenient processing
Apply the "divide and conquer" principle for processing long documents

These skills are fundamental when creating AI Agents, as they allow you to effectively manage communication between the agent and the LLM, and also provide the ability to integrate with external systems.

Exercises

What roles do messages have?
You want to send two requests to the LLM while maintaining context:
1. "Hello! My name is Alex."
2. "What's my name?"
What will the body of the second request to the model look like?
Come up with three examples of using function calls that could significantly improve the capabilities of an AI Agent.

Practical task:

Create a simple Python script that will output only the first, second, penultimate, and last chunks of the response from the model to the console.

Questions​

Steps​

1. Read the OpenAI API documentation​

2. Read pages from the Cookbook​

3. Read pages from Cookbook 2​

Extra Steps​

E1. We lay down the principle of divide and conquer.​

E2. What parameters do not yet exist in any public API?​

Of course, all this can be implemented with local model inference on your hardware. Come work at OpenAI/Sber/Yandex/Anthropic - all these parameters are available in the private API!​

Now we know...​

Exercises​