Generative AI: how to control the creativity of your Large Language Model

4 min readOct 7, 2023

At the heart of every Generative Service for Text, there is a Large Language Model (LLM), a model that has been trained, on a very large corpus of documents, to learn the structure and characteristics of languages and specifically has been trained to generate text.

In this article, I’ll discuss some general characteristics of LLM for text generation. Especially those parameters that can be used, in the inference phase, to control the variety of possible answers to prompts. Then, I’ll add some more details regarding Oracle OCI AI Generative Service, which is based on LLMs provided by Cohere.

First of all, some basic concepts: how does an LLM generating text work, what is a prompt, and what does it mean that an LLM can be creative?

An LLM for text generation is trained to generate a text, in other words, a sequence of tokens (a word can be split into several tokens, for example: water-fall). A text is given as input to the model and it is called a prompt. It can contain some context information plus, for example, a question. The model produces as output a text that should contain the correct answer to the question. This output is called completion.

The current architecture that is commonly used today for LLM is the Transformer (see, for example, the famous article: “Attention is all you need”). The Transformer normally ends with a Softmax layer that produces, for every position in the output sequence, a distribution probability. In other words, you don’t get as output, for every position, a single token, but a probability for every token in your vocabulary (the sum of probabilities is 1).

How do we go from these probabilities to concrete words? Well, there are several possible strategies.

The most simple one is the “greedy approach”: for every token position, the model chooses always the token with the highest probability. This approach can seem reasonable but, in this way, we will get, for the same prompt, always the same answer.

In several situations, we don’t want this behavior. For example, if we want to explore different possible answers. We can say that we want the model to be “more creative”

So, these models have different parameters that can control how to produce different results and, if we want to use this term, be “more or less creative”, in the response.

We will examine these parameters:

Max tokens
Top K
Top P
Temperature

All these parameters (we have more), for example, can be used in Oracle OCI AI Generative Service.

UI of OCI GenAI Playground, for the beta phase.

“Max tokens” is quite simple: it controls the maximum length of the output sequence. We have here only to take care that the number is the number of tokens, not words. More or less, on average, a word can be split into 2 tokens. With a higher value for max_tokens we will get longer answers. Simple.

For Top K and Top P we need to remember that, for every position in the output sequence, the model produces a probability. If we don’t want to follow the “greedy approach” we can choose not a single token, but a set of tokens and do random sampling in the subset, weighing each token with its probability (random sampling).

How do we choose a subset in which to do the random sampling?

If we set Top K (for example 10) we decide to choose only the Top K tokens, with the highest probabilities, and we do random sampling in this subset.

If we set Top P (for example 0.5) we choose only the top tokens whose cumulative probability is P.

In both cases, we do random sampling and we choose the output token from the subset (produced by Top K or Top P).

Therefore, if we have Top K =1, we will produce, for the same prompt, always the same output. With a higher Top K we can have more different answers. The same for Top P.

Temperature is a little bit more difficult to explain. It is a parameter that controls the shape of the probability distribution. If Temperature = 1, the distribution is exactly the distribution produced by the Softmax layer. With Values < 1 it is more peaked around those tokens with higher probability. With Temperature > 1 the distribution broadens.

We can think that higher temperatures increase the “randomness” of the output and can create “more creative answers”.

Some more parameters, like frequency penalty and presence penalty, can be set, again to control the variety of possible answers to be produced. Maybe, I’ll explore them in another article.

If you want to try the OCI Generative AI Service and its Python API, you can participate in the OCI AI beta program. More details here.

You can get more information and details on Large Language Models, for example, in Coursera training Generative AI with Large Language Models

Generative AI: how to control the creativity of your Large Language Model

Written by Luigi Saetta