Understanding Temperature in LLMs

One simple setting can change how your LLM responds…

We’ve all witnessed the creativity of responses from ChatGPT and other chat LLM services. The responses to our questions are often unique; sometimes, they seem foolish and wrong.

When we ask an open-ended question where the answer might not be a specific value, how is the creativity aspect of the LLM controlled? Suppose we are building our private ChatGPT-like service that employees will use in our Enterprise. The documents we feed the system will include product features, help pages, HR documents, and ENG documents. How do we create the LLM system to respond with “just the facts” and not get overly creative in the answers?

Something called “Temperature” programmatically controls the creativity aspect of how the LLM responds. The possible values for temperature go from 0 – 2 like this:

0-1: Produces answers that are most likely correct with little variability or creativity. The value of “0.7” is often recommended as a good starting point.

Above 1-2: Produces answers with more randomness, and as you approach 2, the answers can become incomprehensible.

Let’s look at an example:

Temperature setting “0”

PROMPT: “Fill in the last word of the phrase: ‘The road is … ‘ “

RESPONSE (from a Python script, the prompt was repeated ten times)

The road is long.

The road is long.

The road is long.

The road is long.

The road is long.

The road is long.

The road is long.

The road is long.

The road is long.

The road is long.

Analysis: The same answer was produced for each of the ten attempts.

Temperature setting “1.5”

PROMPT: “Fill in the last word of the phrase: ‘The road is … ‘ “

RESPONSE (from a Python script, the prompt was repeated ten times)

‘The road is long.’

‘The road is long.’

‘long.’

‘The road is long.’

‘Likely to be bustling with traffic during rush hour.’

‘The road is long.’

‘The road is long.’

‘The road is long.’

‘The road is endless.’

‘smooth.’

Analysis: The higher the value for “temperature” (1.5), the more variations in the answer will appear in the results.

Temperature setting “2.0”

PROMPT: “Fill in the last word of the phrase: ‘The road is … ‘ “

RESPONSE (from a Python script, the prompt was repeated ten times)

“The road is long.”

“measurementining!!”

“open”

“long”

“Unfortunately, this yellow ribbon can’t heal”

“TRUE kneedog cottages expensive”

“yours.go”

“down plot medicines”

“wide open”

“The road is wherever”

Analysis: A few responses make sense at the max value (2.0), but many are total gibberish and make no sense.

Summary:

Three parameters in the OpenAI API control various aspects of the randomness and creativity of responses. In addition to “temperature,” other parameters called “top_p” and “seed” can influence the overall behavior. If you are building an AI LLM Chatbot application, consider your use case, the user, the types of documents you might ingest in the RAG system, and the prompts a user might submit. You will want to experiment and tune these values to produce responses that meet your requirements.

The short and simplified Python code that I used to generate the example output is here:

https://github.com/oregon-tony/AI-Examples/blob/main/temperature.py