Counting LLM Tokens

In the world of LLMs, someone eventually pays for “tokens.” But tokens are not necessarily equivalent to words. Understanding the relationship between words and tokens is critical to grasping how language models like GPT-4 process text.

While a simple word like “cat” may be a single token, a more complex word like “unbelievable” might be broken down into multiple tokens such as “un,” “believ,” and “able.” By converting text into these smaller units, language models can better understand and generate natural language, making them more effective at tasks like translation, summarization, and conversation.

Continue reading

Regression Testing your LLM RAG


Regression testing ensures that the answers obtained from tests align with the expected results. Whether it’s a ChatBot or Copilot, regression testing is crucial for verifying the accuracy of responses. For instance, in a ChatBot designed for HR queries, consistency in answering questions like “How do I change my withholding percentage on my 401K?” is essential, even after modifying or changing the LLM model or changing the embedding process of input documents.

Using a Python script, you can automate this process by comparing the actual responses with the expected ones. By employing text similarity functions, discrepancies between the actual and expected responses can be identified. This comparison returns a value close to 1 for contextual similarity, while values closer to 0 indicate significant differences. One example test could be like:

{
“Original_Prompt”: “What is the capital of France?”,
“Expected_Answer”: “The capital of France is Paris.”
}

To experiment with this testing process, a sample Python script has been shared that reads prompts and expected values from a json file, scoring them against the actual responses generated by the LLM. This script uses the OpenAI API and is just one example of automating RAG regression testing. Check out the script and the accompanying “test_prompts.json” file for sample input data in the provided GitHub link.

For organizations focusing on AI governance and prioritizing accuracy, automating RAG regression testing can become a step toward ensuring the reliability of AI systems. Take a look at the script and the sample input file.

https://github.com/oregon-tony/AI-Examples/blob/main/promptRegression

#RegressionTesting #AI #Automation #Python #OpenAI #Accuracy #RAG #Compliance