Leveraging LLM function calling to harness real-time knowledge

Large language models (LLMs) have made headlines by showcasing impressive capabilities like generating text in multiple languages, solving math problems, and executing complex instructions. However, due to their training methods, LLMs also present significant challenges that limit the broader adoption of generative AI for business purposes.

First, LLMs are prone to “hallucinating” answers. This means that an LLM might provide the most probable response to a prompt, but it’s not necessarily accurate or true. Second, LLMs have a knowledge cutoff date, meaning they only have access to information available up until a certain point. For instance, GPT-3.5 has a cutoff date of September 2021, GPT-4’s is April 2023, and GPT-4o’s is October 2023.

To harness the power of generative AI in business, it is crucial to curb LLMs’ hallucinations and provide them with access to real-time knowledge. The first challenge can be tackled by utilizing a retrieval augmented generation (RAG) mechanism, enabling an LLM to give accurate, contextually relevant responses. The solution to the latter is function calling that allows LLMs to generate accurate function calls, and in this way retrieve real-time knowledge. In this article, we will explain what function calling is, how it works, and in which business scenarios it can be applied.

What function calling is and how it works

Function calling capabilities allow an LLM (like the GPT model family from OpenAI or the Gemini models from Google) to recognize when a user’s request requires the execution of a specific function or operation and generation of a structured output (like JSON format) that represents a call to that function. This allows developers to integrate the LLM with external tools, APIs, or functions, enabling the AI to perform tasks beyond generating text.

To illustrate how function calling works, let us imagine asking an LLM-powered virtual assistant about the current weather in a given location. Normally, it would not give us an answer, as its knowledge ends at a certain date. Therefore, to answer this question, we need to utilize a function calling feature that will access an external weather service website via API. Technically speaking, a sample workflow could be as follows.

Define function schemas: functions like get_current_weather(location, date) are defined using JSON Schema, providing the LLM with names, descriptions, and function parameters to understand and utilize them effectively.
User provides input: the user asks a question that may trigger a function call (e.g., “What’s the current weather in Warsaw?”).
The LLM processes the input: the LLM recognizes the intent and decides whether a function call is needed, mapping it to a function based on defined schemas.
The LLM generates a function call: the LLM creates a structured output with the function name and necessary arguments, often including a JSON string.
An application parses the function call: the system extracts the function name and parameters from the LLM’s response.
Execute the function: the application invokes the specified function with the given arguments, interacting with external APIs or services.
Return function result to the LLM: the function’s output is returned to the LLM as a message with a specific role, like “function.”
The LLM generates the final response: the LLM integrates the function output into a coherent and contextually appropriate reply.
Application provides final reply: the application delivers the final answer to the user, completing the interaction.

Of course, function calling can be applied in many other scenarios, such as getting currency exchange data, accessing CRM systems to retrieve sales figures, checking stock inventories, and more.

However, there is a caveat to consider when planning to implement function calling: you need to verify if your chosen LLM supports this feature. Fortunately, most popular large language models, including the GPT family by OpenAI, Gemini by Google, LLama by Meta, Mistral, and Claude by Anthropic, do support function calling. For open-source LLMs, like those available on the Hugging Face website, you will need to check if your specific model supports function calling.

What is synthetic data and how it can help us break the data wall?

Boosting productivity with an AI personal assistant—three real-life use cases

Finding the best LLM—a guide for 2024

Small language models (SLMs)—a gentle introduction

Generative AI implementation in business: how to do it responsibly and ethically

AI in manufacturing: Four nonobvious implementations

Ten business use cases for generative AI virtual assistants

Generative AI in knowledge management—a practical solution for enterprises

Curbing ChatGPT hallucinations with retrieval augmented generation (RAG)

Large language models (LLMs)—a simple introduction

RAG vs. fine-tuning vs. prompt engineering—different strategies to curb LLM hallucinations

Advanced function calling capabilities

In our example, the LLM can identify and execute only one function (get_current_weather). In more complex scenarios where multiple questions are asked in the user prompt, a more advanced approach is needed. This approach is called parallel function calling and involves the model’s ability to identify and execute multiple functions simultaneously, enabling the resolution of their results in parallel. By executing functions in parallel, the LLM can efficiently handle multiple tasks at once, reducing overall response time.

For example, if a user asks, “What’s the current weather in Warsaw and the time in Tokyo?” the LLM can generate two separate function calls: one for get_current_weather with the argument “location: Warsaw” and another for get_current_time with the argument “location: Tokyo.” The application can then execute both function calls concurrently, fetching the required information simultaneously. This not only speeds up the response but also provides a seamless experience to the user by addressing multiple queries in a single interaction.

Parallel function calling is particularly beneficial in applications where latency is critical, such as real-time data analysis or interactive conversational agents. It leverages modern multi-threading and asynchronous programming techniques to maximize resource utilization and efficiency. Additionally, this approach allows the system to scale more effectively, as it can handle an increased number of simultaneous function calls without a proportional increase in response time.

Function calling business use cases

Function calling significantly enhances the capabilities of LLMs by enabling them to interact with external tools, systems, and APIs. This integration allows LLMs to:

gain access to real-time data that is not included in their training data,
perform specific actions like retrieving data or triggering events,
and improve their accuracy by providing responses grounded in verified knowledge.

Thanks to these capabilities, function calling can be applied in many business scenarios.

Conversational agents are the first use case that comes to mind. Function calling can be used to create virtual AI assistants or chatbots that answer complex questions by calling external APIs or accessing external knowledge bases. This enables them to provide responses that are both relevant and hallucination free.

In the realm of natural language understanding, LLMs can convert natural language into structured JSON data, extract structured information from text, and perform tasks like named entity recognition, sentiment analysis, and keyword extraction.

By integrating with external APIs, LLMs can fetch data or perform actions based on user input. Function calling allows natural language queries to be converted into valid API calls, streamlining the interaction between users and various services. This is particularly helpful in building question-answering systems or AI-powered virtual assistants. For example, imagine a user request “Book me a flight to New York next Monday” converted into an API call to an airline reservation system.

Function calling makes it possible to extract specific information from a given input, such as retrieving relevant news stories or references from an article. This enhances the model’s ability to provide concise and targeted information, improving the overall utility and responsiveness of the system. For example, we can create a function that retrieves the newest information on a particular topic and summarizes it.

Function calling can be also used to access real-time data or current events; personal information such as calendar appointments, emails, or to-do list items; and business informatics like sales data, customer information, or support queries. This extends an LLM’s capabilities beyond static knowledge to delivering dynamic, up-to-date information. A typical implementation would be a virtual assistant helping employees to organize meetings by checking attendees’ availability and proposing the best time.

Last but not least, function calling enables LLMs to automate tasks by defining custom functions that extend a model’s functionality and knowledge. This includes acting upon data, making changes, or updating information stored elsewhere, thereby increasing efficiency and productivity in various applications.

Function calling vs. RAG

Since function calling can be used in information retrieval scenarios, this raises the legitimate question whether it can replace a RAG pipeline. The simplest answer is: it depends. If you have a large set of static data, such as technical documentation or maintenance manuals, RAG is the better solution. However, if access to real-time data is needed, like CRM data or inventory levels, then the function calling approach is more suitable.

A hybrid approach can also be employed by combining RAG with function calling. A classic example is a customer service chatbot. To respond to a customer’s question about a product, it can use a RAG pipeline to retrieve information from the product catalog. To check if the product is in stock, the chatbot uses a function calling feature to access the inventory management system.

A less obvious solution is an LLM-powered virtual assistant for industrial data analysis. Imagine this scenario: a maintenance technician or engineer needs to monitor factory machinery performance and quickly respond to alerts that could lead to faults and costly downtimes. To streamline this process, a virtual assistant can be developed to access data from industrial IoT sensors located throughout the factory. Using function calling, the assistant accesses this real-time data and visualizes it as charts highlighting anomalies in sensor readings.

Whenever an anomalous reading occurs, an alert is raised, and the technician receives a notification on their mobile phone. Since the AI assistant is also available on mobile, the technician can check the alert, verify the current and predicted sensor values, and take any necessary action. These actions are suggested by the assistant, which has access to the machinery’s technical documentation through a RAG pipeline.

As you can see, this AI-powered virtual assistant for IIoT data analysis is a practical example of a hybrid approach. We use function calling to retrieve real-time data and retrieval-augmented generation to access relevant information from extensive technical documentation.

Summary: why use function calling capabilities in AI-powered apps

In our view, LLMs are becoming a commodity accessible to virtually everyone, offering little in terms of competitive advantage. What truly makes a difference, especially in the business potential of generative AI, is the overall ecosystem in which an LLM-powered app operates. Function calling, which enables effective use of external tools, systems, and APIs, is a crucial part of this ecosystem. It allows us to build AI solutions that deliver significant business benefits. Another essential piece of the puzzle is the retrieval-augmented generation mechanism, which grounds LLMs’ responses in verified knowledge. Moreover, these techniques can be combined to further enhance the capabilities of LLM-powered apps, such as in customer service chatbots or data analysis assistants.

If you are considering implementing AI-powered tools in your company that require access to real-time data and verified knowledge, we are here to help. At Fabrity, we have experience with various GenAI projects, including knowledge bots, AI-powered sales assistants, and highly specialized virtual assistants for analyzing data from industrial IoT sensors. Drop us a line at sales@fabrity.pl, and we will reach out to discuss the details. Together, we will find the best approach to implementing generative AI in your company.

Leveraging LLM function calling to harness real-time knowledge

Jarosław Ganczarenko

Contents

What function calling is and how it works

Advanced function calling capabilities

Function calling vs. RAG

Summary: why use function calling capabilities in AI-powered apps

Are you looking for a development team?

Sign up for the newsletter

You may also find interesting:

Industrial IoT solutions—5 practical examples

Five Symfony PHP framework best practices—and the headaches you will face if you ignore them

Drupal headless explained: When to go decoupled—and when not to

Leveraging LLM function calling to harness real-time knowledge

Jarosław Ganczarenko

Contents

What function calling is and how it works

Advanced function calling capabilities

Function calling vs. RAG

Summary: why use function calling capabilities in AI-powered apps

Are you looking for a development team?

Sign up for the newsletter

You may also find interesting:

Industrial IoT solutions—5 practical examples

Five Symfony PHP framework best practices—and the headaches you will face if you ignore them

Drupal headless explained: When to go decoupled—and when not to

Book a free 15-minute discovery call