Problem

When utilizing large language models (LLMs) through APIs, developers often face the challenge of slow processing times due to sequential execution of web requests. Unlike traditional code execution, fetching model outputs via these APIs involves transferring data over the network, which can significantly delay your results. Fortunately, Python offers a powerful solution: multithreading. This technique can dramatically reduce waiting times by executing multiple operations in parallel, potentially speeding up your processes by more than 100 times!

Solution

Setting Up Your Environment

Before diving into multithreading, let's ensure you have the necessary tools. For this tutorial, we’ll be using the OpenAI API, but these principles can be adapted for nearly any API provider. Start by installing the required Python modules:

Code sample by Cloudaen
pip install --upgrade openai langchain

Next, import the newly installed modules along with `tqdm` for progress tracking:

Code sample by Cloudaen
import openai
from tqdm import tqdm

You’ll also need to set up your OpenAI client using a secret API key:

Code sample by Cloudaen
OPENAI_KEY = 'YOUR_KEY_HERE'
client = openai.OpenAI(api_key=OPENAI_KEY)
Creating a Function for API Calls

Now, let’s write a function to interact with OpenAI’s `GPT-4-turbo`, their latest and most advanced model, to perform a specific task such as extracting nouns from sentences:

Code sample by Cloudaen
def get_nouns(sentence):
  response = client.chat.completions.create(
    model="gpt-4-turbo",
    temperature=0,
    max_tokens=50,
    messages=[
      {"role": "system", "content": "You are a noun extraction assistant. Extract nouns from the given sentence and list them in comma-separated format."},
      {"role": "user", "content": f"Sentence: {sentence}"}
    ]
  )
  return response.choices[0].message.content
Implementing Multithreading

It's time to leverage Python’s `concurrent.futures` module to utilize our `get_nouns()` function more efficiently by processing multiple sentences concurrently:

Code sample by Cloudaen
import concurrent.futures

sentences = [
    "The quick brown fox jumps over the lazy dog.",
    "Today is a beautiful day to go for a walk.",
    "I enjoy reading books on rainy days.",
    "The cake was delicious and everyone loved it."
]

# Using ThreadPoolExecutor to handle multiple sentences simultaneously
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
    results = list(tqdm(executor.map(get_nouns, sentences), total=len(sentences), desc="Extracting nouns"))

# Outputting the results
for result in results:
    print(result)

Using the `ThreadPoolExecutor`, we can efficiently process a list of sentences in parallel, minimizing CPU cycle waste and optimizing response times.

Conclusion

Multithreading in Python is an invaluable technique for enhancing the performance of API calls, especially when dealing with large language models. By implementing the strategies outlined above, developers can significantly reduce the latency of data retrieval and improve the responsiveness of their applications. This approach not only saves time but also maximizes the efficiency of your system's resources.