In the evolving landscape of artificial intelligence, agent frameworks have unlocked potentials for developing robust, scalable, and intelligent systems. This article explores the agent design pattern, delves into the agent components and their features, and overviews several popular frameworks such as LangChain, AutoGen, CrewAI, and PhiData. A more comprehensive list can be found here — A list of AI autonomous agents. Finally, we will walk through a practical example using PhiData to create an agent that generates weekly news for Raspberry Pi enthusiasts.

Understanding the Agent

In the world of GenAI and development, an agent stands as a pivotal self-contained unit capable of performing tasks autonomously, driven by specific instructions and contextual understanding. Imagine a sophisticated entity that seamlessly integrates intelligence and functionality to execute tasks with precision. The attached diagram offers a high-level overview of this agent design pattern. Let’s delve into the components that make this pattern so effective and versatile.

When viewed from a distance, an agent has at least two interfaces: input and output. Upon closer inspection, besides the input and output, we expect to see a: large language model (LLM), which is the brain or the decision maker of the agent, an agent workflow, which orchestrates the data flows inside the agent, and a config that describes the agent. Additionally, there are components that make the agent more advanced:

Input and Output Proxies: These components handle the conversion of input and output schemas to and from the agent. They ensure that the data received and sent by the agent is in the correct format and enriched with necessary metadata.
Tools: A set of functions and built-in utilities that the LLM can suggest to call to perform specific tasks. These tools can interact with external APIs, execute SQL queries, retrieve knowledge from databases, and more.
Memory: Stores the prompt, context, and history of conversations. It allows the agent to maintain continuity and coherence across interactions.
Knowledge: A vector db that stores information extracted from interactions and external sources, helping the agent build a comprehensive understanding over time.

From the above diagram, it is evident that agents can range from very simple to highly complex. Based on our experience in developing agents for various tasks, as well as insights from research papers and tech blogs (Multi AI Agent 101, Building AI Agents), it is considered crucial for an agent to be designed for a specific-enough task to perform well. It’s important to remember that in here, we would like to use agents in business-critical tasks, and not just for instance to generate a piece of text; business requirements allow for much less margin of error.

As mentioned above, our research as well as the wider body of academic research suggests that when an LLM receives a prompt tailored to a specific task, it performs significantly better than with a generic prompt. A survey on efficient prompting methods for LLMs on arXiv discusses the diversity and detail of prompts for specific tasks and the challenges of long natural language prompts. Further insights into the mixed effects of naive prompts on LLM performance, and the importance of delivering tailored responses for tasks requiring empathy and analytical precision, are available in a research article on arXiv.

Consequently, we are of the belief that all components of the agent should also be task-specific. This includes the configuration description, knowledge base, and memory, which should all be relevant to the particular task. Below, we will quickly review some of the most well-known agent frameworks, as well as one lesser-known framework (PhiData) that aligns closely with the above diagram.

Comparing Agent Frameworks

LangChain

LangChain focuses on building language model applications that are tightly integrated with various data sources. It excels in applications requiring complex language understanding and generation capabilities.

Strengths: Strong language model integration, versatile data handling.
Use Cases: Conversational AI, content generation, and complex data querying.

AutoGen

AutoGen emphasizes automation and ease of use, providing tools to quickly build and deploy agents with minimal coding. It targets developers who need rapid prototyping and deployment capabilities.

Strengths: Quick setup, user-friendly interface, extensive automation.
Use Cases: Prototyping, automated workflows, simple task automation.

CrewAI

CrewAI is designed for collaborative environments where multiple agents work together to achieve a common goal. It focuses on coordination, communication, and synergy between agents.

Strengths: Collaboration-focused, robust communication protocols.
Use Cases: Multi-agent systems, collaborative tasks, distributed problem-solving.

PhiData

PhiData provides a comprehensive framework for building sophisticated agents with advanced memory and knowledge management. It is ideal for applications requiring deep contextual understanding and long-term learning.

Strengths: Advanced memory management, rich knowledge base, strong contextual understanding.
Use Cases: Long-term projects, knowledge-intensive tasks, personalized user interactions.

Building an Agent with PhiData

Let’s create a simple example using PhiData. Our goal is to build a simple agent that generates weekly news for Raspberry Pi enthusiasts. We will use OpenAI api default model gpt-4o, and use duckduckgo search api as tool.

Step 1: Add Dependencies

First, install PhiData and set up your project environment:

pip install phidata openai duckduckgo-search

Step 2: Defining the Agent

Define the structure of your agent, including config and tools.

Config class

from phi.llm.base import LLM

class PiNewsLetterConfig:
    def __init__(
            self,
            name: str,
            description: str, 
            instructions: list[str], 
            llm: LLM) -> None:
        self.name = name
        self.description = description
        self.instructions = instructions
        self.llm = llm

Agent class

from phi.assistant import Assistant

class PiNewsLetterAgent:
    def __init__(
            self,
            config: PiNewsLetterConfig,
            tools: list
        ):
        self.config = config
        self.tools = tools

    @property
    def agent(self):
        return Assistant(
            llm = self.config.llm,
            tools = self.tools,
            description = self.config.description,
            instructions = self.config.instructions
        )

Step 3: Write Executable Code

To simplify the process, combine the above classes into a single Python script main.py and replace the placeholder with your OpenAI API key.

Step 4: Running the Agent

Run your agent to generate the weekly news:

python main.py

Result

Today is June 22, 2024, and the results were generated at 9 am BST. The following results list several impressive news items about the Raspberry Pi, which include the publish date, summary of the article and the source link (links disappeared when copied from terminal). However, you may notice that they are not entirely satisfactory. This is because we used a generic prompt and search tool instead of a customised tailored to my specific interests, such as DIY projects with camera modules and new hardware releases. In upcoming posts of these series, we will explore various solutions to improve the results, including more targeted prompts and specialized tools.

Conclusion

The agent design pattern is a powerful approach that ensures agents are scalable, autonomous, and self-contained base units. By focusing on task-specific design, agents can be fine-tuned to meet precise business requirements, reducing the margin for error and enhancing overall efficiency. For agent developers, this design isolates issues, simplifies debugging, and improves version control. This approach is especially vital in multi-agent systems, where performance and coordination are crucial, as small errors can be amplified throughout the communication chain.

In our comparison of LangChain, AutoGen, CrewAI, and PhiData, we highlighted the unique capabilities of each framework, demonstrating how they can be leveraged to build effective agents. Using PhiData, we showcased a simple yet practical example of an agent that generates weekly news for Raspberry Pi enthusiasts.

Looking ahead, we will continue to explore the capabilities and structures of agents and multi-agent systems. Our future publications will delve into more complex agents, integrating various tools to tackle a range of intriguing tasks. As AI technology advances, the role of well-designed agent frameworks will also evolve. We are committed to ongoing investigation and innovation in this field.

Binome Multi-agent Framework Chapter 1 — Key Design Principles for LLM-Based Agents