Information
Learn Suggestions will filter as you type Learn Discover Product documentation Development languages Topics Suggestions will filter as you type Azure Products Popular products Azure AI Services Azure App Service Azure Databricks Azure DevOps Azure Functions Azure Monitor Azure Virtual Machines Popular categories Compute Networking Storage AI & machine learning Analytics Databases Security View all products Architecture Cloud Adoption Framework Well-Architected Framework Azure Architecture Center Develop Python .NET JavaScript Java PowerShell Azure CLI View all developer resources Learn Azure Start your AI learning assessment Top learning paths Cloud concepts AI fundamentals Intro to generative AI Azure Architecture fundamentals Earn credentials Instructor-led courses View all training Troubleshooting Resources Product overview Latest blog posts Pricing information Support options More Products Popular products Azure AI Services Azure App Service Azure Databricks Azure DevOps Azure Functions Azure Monitor Azure Virtual Machines Popular categories Compute Networking Storage AI & machine learning Analytics Databases Security View all products Architecture Cloud Adoption Framework Well-Architected Framework Azure Architecture Center Develop Python .NET JavaScript Java PowerShell Azure CLI View all developer resources Learn Azure Start your AI learning assessment Top learning paths Cloud concepts AI fundamentals Intro to generative AI Azure Architecture fundamentals Earn credentials Instructor-led courses View all training Troubleshooting Resources Product overview Latest blog posts Pricing information Support options Table of contents Exit focus mode Suggestions will filter as you type Overview Integrated vector databases Related concepts AI Applications NoSQL MongoDB PostgreSQL Apache Cassandra Apache Gremlin Table Download PDF Read in English Save Add to Collections Add to plan Table of contents Read in English Add to Collections Add to plan Edit Facebook x.com LinkedIn Email Print Table of contents Feedback Show 3 more Yes No Provide product feedback | Get help at Microsoft Q&A English (United States) Your Privacy Choices Theme Light Dark High contrast English (United States) Your Privacy Choices Theme Light Dark High contrast This browser is no longer supported. Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support. In-depth articles on Microsoft developer tools and technologies Personalized learning paths and courses Globally recognized, industry-endorsed credentials Technical questions and answers moderated by Microsoft Code sample library for Microsoft developer tools and technologies Interactive, curated guidance and recommendations Thousands of hours of original programming from Microsoft experts Microsoft Learn for Organizations Access curated resources to upskill your team and close skills gaps. Microsoft Learn for Organizations Access curated resources to upskill your team and close skills gaps. Microsoft Learn for Organizations Access curated resources to upskill your team and close skills gaps. Microsoft Learn for Organizations Access curated resources to upskill your team and close skills gaps. AI agents are designed to perform specific tasks, answer questions, and automate processes for users. These agents vary widely in complexity. They range from simple chatbots, to copilots, to advanced AI assistants in the form of digital or robotic systems that can run complex workflows autonomously. This article provides conceptual overviews and detailed implementation samples for AI agents. Unlike standalone large language models (LLMs) or rule-based software/hardware systems, AI agents have these common features: Note The usage of the term memory in the context of AI agents is different from the concept of computer memory (like volatile, nonvolatile, and persistent memory). Copilots are a type of AI agent. They work alongside users rather than operating independently. Unlike fully automated agents, copilots provide suggestions and recommendations to assist users in completing tasks. For instance, when a user is writing an email, a copilot might suggest phrases, sentences, or paragraphs. The user might also ask the copilot to find relevant information in other emails or files to support the suggestion (see retrieval-augmented generation). The user can accept, reject, or edit the suggested passages. Autonomous agents can operate more independently. When you set up autonomous agents to assist with email composition, you could enable them to perform the following tasks: You can configure the agents to perform each of the preceding tasks with or without human approval. A popular strategy for achieving performant autonomous agents is the use of multi-agent systems. In multi-agent systems, multiple autonomous agents, whether in digital or robotic form, interact or work together to achieve individual or collective goals. Agents in the system can operate independently and possess their own knowledge or information. Each agent might also have the capability to perceive its environment, make decisions, and execute actions based on its objectives. Multi-agent systems have these key characteristics: A multi-agent system provides the following advantages over a copilot or a single instance of LLM inference: Complex reasoning and planning are the hallmark of advanced autonomous agents. Popular frameworks for autonomous agents incorporate one or more of the following methodologies (with links to arXiv archive pages) for reasoning and planning: Self-Ask Improve on chain of thought by having the model explicitly ask itself (and answer) follow-up questions before answering the initial question. Reason and Act (ReAct) Use LLMs to generate both reasoning traces and task-specific actions in an interleaved manner. Reasoning traces help the model induce, track, and update action plans, along with handling exceptions. Actions allow the model to connect with external sources, such as knowledge bases or environments, to gather additional information. Plan and Solve Devise a plan to divide the entire task into smaller subtasks, and then carry out the subtasks according to the plan. This approach mitigates the calculation errors, missing-step errors, and semantic misunderstanding errors that are often present in zero-shot chain-of-thought prompting. Reflect/Self-critique Use reflexion agents that verbally reflect on task feedback signals. These agents maintain their own reflective text in an episodic memory buffer to induce better decision-making in subsequent trials. Various frameworks and tools can facilitate the development and deployment of AI agents. For tool usage and perception that don't require sophisticated planning and memory, some popular LLM orchestrator frameworks are LangChain, LlamaIndex, Prompt Flow, and Semantic Kernel. For advanced and autonomous planning and execution workflows, AutoGen propelled the multi-agent wave that began in late 2022. OpenAI's Assistants API allows its users to create agents natively within the GPT ecosystem. LangChain Agents and LlamaIndex Agents also emerged around the same time. Tip The implementation sample later in this article shows how to build a simple multi-agent system by using one of the popular frameworks and a unified agent memory system. The prevalent practice for experimenting with AI-enhanced applications from 2022 through 2024 has been using standalone database management systems for various data workflows or types. For example, you can use an in-memory database for caching, a relational database for operational data (including tracing/activity logs and LLM conversation history), and a pure vector database for embedding management. However, this practice of using a complex web of standalone databases can hurt an AI agent's performance. Integrating all these disparate databases into a cohesive, interoperable, and resilient memory system for AI agents is its own challenge. Also, many of the frequently used database services are not optimal for the speed and scalability that AI agent systems need. These databases' individual weaknesses are exacerbated in multi-agent systems. In-memory databases are excellent for speed but might struggle with the large-scale data persistence that AI agents need. Relational databases are not ideal for the varied modalities and fluid schemas of data that agents handle. Relational databases require manual efforts and even downtime to manage provisioning, partitioning, and sharding. Pure vector databases tend to be less effective for transactional operations, real-time updates, and distributed workloads. The popular pure vector databases nowadays typically offer: Just as efficient database management systems are critical to the performance of software applications, it's critical to provide LLM-powered agents with relevant and useful information to guide their inference. Robust memory systems enable organizing and storing various kinds of information that the agents can retrieve at inference time. Currently, LLM-powered applications often use retrieval-augmented generation that uses basic semantic search or vector search to retrieve passages or documents. Vector search can be useful for finding general information. But vector search might not capture the specific context, structure, or relationships that are relevant for a particular task or domain. For example, if the task is to write code, vector search might not be able to retrieve the syntax tree, file system layout, code summaries, or API signatures that are important for generating coherent and correct code. Similarly, if the task is to work with tabular data, vector search might not be able to retrieve the schema, the foreign keys, the stored procedures, or the reports that are useful for querying or analyzing the data. Weaving together a web of standalone in-memory, relational, and vector databases (as described earlier) is not an optimal solution for the varied data types. This approach might work for prototypical agent systems. However, it adds complexity and performance bottlenecks that can hamper the performance of advanced autonomous agents. A robust memory system should have the following characteristics. AI agent memory systems should provide collections that store metadata, relationships, entities, summaries, or other types of information that can be useful for various tasks and domains. These collections can be based on the structure and format of the data, such as documents, tables, or code. Or they can be based on the content and meaning of the data, such as concepts, associations, or procedural steps. Memory systems aren't just critical to AI agents. They're also important for the humans who develop, maintain, and use these agents. For example, humans might need to supervise agents' planning and execution workflows in near real time. While supervising, humans might interject with guidance or make in-line edits of agents' dialogues or monologues. Humans might also need to audit the reasoning and actions of agents to verify the validity of the final output. Human/agent interactions are likely in natural or programming languages, whereas agents "think," "learn," and "remember" through embeddings. This difference poses another requirement on memory systems' consistency across data modalities. Memory systems should provide memory banks that store information that's relevant for the interaction with the user and the environment. Such information might include chat history, user preferences, sensory data, decisions made, facts learned, or other operational data that's updated with high frequency and at high volumes. These memory banks can help the agents remember short-term and long-term information, avoid repeating or contradicting themselves, and maintain task coherence. These requirements must hold true even if the agents perform a multitude of unrelated tasks in succession. In advanced cases, agents might also test numerous branch plans that diverge or converge at different points. At the macro level, memory systems should enable multiple AI agents to collaborate on a problem or process different aspects of the problem by providing shared memory that's accessible to all the agents. Shared memory can facilitate the exchange of information and the coordination of actions among the agents. At the same time, the memory system must allow agents to preserve their own persona and characteristics, such as their unique collections of prompts and memories. The preceding characteristics require AI agent memory systems to be highly scalable and swift. Painstakingly weaving together disparate in-memory, relational, and vector databases (as described earlier) might work for early-stage AI-enabled applications. However, this approach adds complexity and performance bottlenecks that can hamper the performance of advanced autonomous agents. In place of all the standalone databases, Azure Cosmos DB can serve as a unified solution for AI agent memory systems. Its robustness successfully enabled OpenAI's ChatGPT service to scale dynamically with high reliability and low maintenance. Powered by an atom-record-sequence engine, it's the world's first globally distributed NoSQL, relational, and vector database service that offers a serverless mode. AI agents built on top of Azure Cosmos DB offer speed, scale, and simplicity. Azure Cosmos DB provides single-digit millisecond latency. This capability makes it suitable for processes that require rapid data access and management. These processes include caching (both traditional and semantic caching, transactions, and operational workloads. Low latency is crucial for AI agents that need to perform complex reasoning, make real-time decisions, and provide immediate responses. In addition, the service's use of the DiskANN algorithm provides accurate and fast vector search with minimal memory consumption. Azure Cosmos DB is engineered for global distribution and horizontal scalability. It offers support for multiple-region I/O and multitenancy. The service helps ensure that memory systems can expand seamlessly and keep up with rapidly growing agents and associated data. The availability guarantee in its service-level agreement (SLA) translates to less than 5 minutes of downtime per year. Pure vector database services, by contrast, come with 9 hours or more of downtime. This availability provides a solid foundation for mission-critical workloads. At the same time, the various service models in Azure Cosmos DB, like Reserved Capacity or Serverless, can help reduce financial costs. Azure Cosmos DB can simplify data management and architecture by integrating multiple database functionalities into a single, cohesive platform. Its integrated vector database capabilities can store, index, and query embeddings alongside the corresponding data in natural or programming languages. This capability enables greater data consistency, scale, and performance. Its flexibility supports the varied modalities and fluid schemas of the metadata, relationships, entities, summaries, chat history, user preferences, sensory data, decisions, facts learned, or other operational data involved in agent workflows. The database automatically indexes all data without requiring schema or index management, which helps AI agents perform complex queries quickly and efficiently. Azure Cosmos DB is fully managed, which eliminates the overhead of database administration tasks like scaling, patching, and backups. Without this overhead, developers can focus on building and optimizing AI agents without worrying about the underlying data infrastructure. Azure Cosmos DB incorporates advanced features such as change feed, which allows tracking and responding to changes in data in real time. This capability is useful for AI agents that need to react to new information promptly. Additionally, the built-in support for multi-master writes enables high availability and resilience to help ensure continuous operation of AI agents, even after regional failures. The five available consistency levels (from strong to eventual) can also cater to various distributed workloads, depending on the scenario requirements. Tip You can choose from two Azure Cosmos DB APIs to build your AI agent memory system: For information about the availability guarantees for these APIs, see the service SLAs. This section explores the implementation of an autonomous agent to process traveler inquiries and bookings in a travel application for a cruise line. Chatbots are a long-standing concept, but AI agents are advancing beyond basic human conversation to carry out tasks based on natural language. These tasks traditionally required coded logic. The AI travel agent in this implementation sample uses the LangChain Agent framework for agent planning, tool usage, and perception. The AI travel agent's unified memory system uses the vector database and document store capabilities of Azure Cosmos DB to address traveler inquiries and facilitate trip bookings. Using Azure Cosmos DB for this purpose helps ensure speed, scale, and simplicity, as described earlier. The sample agent operates within a Python FastAPI back end. It supports user interactions through a React JavaScript user interface. All of the code and sample datasets are available in this GitHub repository. The repository includes these folders: The GitHub repository contains a Python project in the loader directory. It's intended for loading the sample travel documents into Azure Cosmos DB. Set up your Python virtual environment in the loader directory by running the following command: Activate your environment and install dependencies in the loader directory: Create a file named .env in the loader directory, to store the following environment variables: The Python file main.py serves as the central entry point for loading data into Azure Cosmos DB. This code processes the sample travel data from the GitHub repository, including information about ships and destinations. The code also generates travel itinerary packages for each ship and destination, so that travelers can book them by using the AI agent. The CosmosDBLoader tool is responsible for creating collections, vector embeddings, and indexes in the Azure Cosmos DB instance. Here are the contents of main.py: Load the documents, load the vectors, and create indexes by running the following command from the loader directory: Here's the output of main.py: The AI travel agent is hosted in a back end API through Python FastAPI, which facilitates integration with the front-end user interface. The API project processes agent requests by grounding the LLM prompts against the data layer, specifically the vectors and documents in Azure Cosmos DB. The agent makes use of various tools, particularly the Python functions provided at the API service layer. This article focuses on the code necessary for AI agents within the API code. The API project in the GitHub repository is structured as follows: We used Python version 3.11.4 for the development and testing of the API. Set up your Python virtual environment in the api directory: Activate your environment and install dependencies by using the requirements file in the api directory: Create a file named .env in the api directory, to store your environment variables: Now that you've configured the environment and set up variables, run the following command from the api directory to initiate the server: The FastAPI server starts on the localhost loopback 127.0.0.1 port 8000 by default. You can access the Swagger documents by using the following localhost address: http://127.0.0.1:8000/docs. It's imperative for the travel agent to be able to reference previously provided information within the ongoing conversation. This ability is commonly known as memory in the context of LLMs. To achieve this objective, use the chat message history that's stored in the Azure Cosmos DB instance. The history for each chat session is stored through a session ID to ensure that only messages from the current conversation session are accessible. This necessity is the reason behind the existence of a Get Session method in the API. It's a placeholder method for managing web sessions to illustrate the use of chat message history. Select Try it out for /session/. For the AI agent, you only need to simulate a session. The stubbed-out method merely returns a generated session ID for tracking message history. In a practical implementation, this session would be stored in Azure Cosmos DB and potentially in React localStorage. Here are the contents of web/session.py: Use the session ID that you obtained from the previous step to start a new dialogue with the AI agent, so you can validate its functionality. Conduct the test by submitting the following phrase: "I want to take a relaxing vacation." Select Try it out for /agent/agent_chat. Use this example parameter: The initial execution results in a recommendation for the Tranquil Breeze Cruise and the Fantasy Seas Adventure Cruise, because the agent anticipates that they're the most relaxing cruises available through the vector search. These documents have the highest score for similarity_search_with_score called in the data layer of the API, data.mongodb.travel.similarity_search(). The similarity search scores appear as output from the API for debugging purposes. Here's the output after a call to data.mongodb.travel.similarity_search(): Tip If documents are not being returned for vector search, modify the similarity_search_with_score limit or the score filter value as needed ([doc for doc, score in docs if score >=.78]) in data.mongodb.travel.similarity_search(). Calling agent_chat for the first time creates a new collection named history in Azure Cosmos DB to store the conversation by session. This call enables the agent to access the stored chat message history as needed. Subsequent executions of agent_chat with the same parameters produce varying results, because it draws from memory. When you're integrating the AI agent into the API, the web search components are responsible for initiating all requests. The web search components are followed by the search service, and finally the data components. In this specific case, you use a MongoDB data search that connects to Azure Cosmos DB. The layers facilitate the exchange of model components, with the AI agent and the AI agent tool code residing in the service layer. This approach enables the seamless interchangeability of data sources. It also extends the capabilities of the AI agent with additional, more intricate functionalities or tools. The service layer forms the cornerstone of core business logic. In this particular scenario, the service layer plays a crucial role as the repository for the LangChain Agent code. It facilitates the seamless integration of user prompts with Azure Cosmos DB data, conversation memory, and agent functions for the AI agent. The service layer employs a singleton pattern module for handling agent-related initializations in the init.py file. Here are the contents of service/init.py: The init.py file initiates the loading of environment variables from an .env file by using the load_dotenv(override=False) method. Then, a global variable named agent_with_chat_history is instantiated for the agent. This agent is intended for use by TravelAgent.py. The LLM_init() method is invoked during module initialization to configure the AI agent for conversation via the API web layer. The OpenAI chat object is instantiated through the GPT-3.5 model and incorporates specific parameters such as model name and temperature. The chat object, tools list, and prompt template are combined to generate AgentExecutor, which operates as the AI travel agent. The agent with history, agent_with_chat_history, is established through RunnableWithMessageHistory with chat history (MongoDBChatMessageHistory). This action enables it to maintain a complete conversation history via Azure Cosmos DB. The LLM prompt initially began with the simple statement "You are a helpful and friendly travel assistant for a cruise company." However, testing showed that you could obtain more consistent results by including the instruction "Answer travel questions to the best of your ability, providing only relevant information. To book a cruise, capturing the person's name is essential." The results appear in HTML format to enhance the visual appeal of the web interface. Tools are interfaces that an agent can use to interact with the world, often through function calling. When you're creating an agent, you must furnish it with a set of tools that it can use. The @tool decorator offers the most straightforward approach to defining a custom tool. By default, the decorator uses the function name as the tool name, although you can replace it by providing a string as the first argument. The decorator uses the function's docstring as the tool's description, so it requires the provisioning of a docstring. Here are the contents of service/TravelAgentTools.py: The TravelAgentTools.py file defines three tools: Specific instructions ("In order to book a cruise I need to know your name") might be necessary to ensure the capture of the passenger's name and room number for booking the cruise package, even though you included such instructions in the LLM prompt. The fundamental concept that underlies agents is to use a language model for selecting a sequence of actions to execute. Here are the contents of service/TravelAgent.py: The TravelAgent.py file is straightforward, because agent_with_chat_history and its dependencies (tools, prompt, and LLM) are initialized and configured in the init.py file. This file calls the agent by using the input received from the user, along with the session ID for conversation memory. Afterward, PromptResponse (model/prompt) is returned with the agent's output and response time. With the successful loading of the data and accessibility of the AI agent through the API, you can now complete the solution by establishing a web user interface (by using React) for your travel website. Using the capabilities of React helps illustrate the seamless integration of the AI agent into a travel site. This integration enhances the user experience with a conversational travel assistant for inquiries and bookings. Install Node.js and the dependencies before testing the React interface. Run the following command from the web directory to perform a clean installation of project dependencies. The installation might take some time. Next, create a file named .env within the web directory to facilitate the storage of environment variables. Include the following details in the newly created .env file: REACT_APP_API_HOST=http://127.0.0.1:8000 Now, run the following command from the web directory to initiate the React web user interface: Running the previous command opens the React web application. The web project of the GitHub repository is a straightforward application to facilitate user interaction with the AI agent. The primary components required to converse with the agent are TravelAgent.js and ChatLayout.js. The Main.js file serves as the central module or user landing page. The main component serves as the central manager of the application. It acts as the designated entry point for routing. Within the render function, it produces JSX code to delineate the main page layout. This layout encompasses placeholder elements for the application, such as logos and links, a section that houses the travel agent component, and a footer that contains a sample disclaimer about the application's nature. Here are the contents of main.js: The travel agent component has a straightforward purpose: capturing user inputs and displaying responses. It plays a key role in managing the integration with the back-end AI agent, primarily by capturing sessions and forwarding user prompts to the FastAPI service. The resulting responses are stored in an array for display, facilitated by the chat layout component. Here are the contents of TripPlanning/TravelAgent.js: Select Effortlessly plan your voyage to open the travel assistant. The chat layout component oversees the arrangement of the chat. It systematically processes the chat messages and implements the formatting specified in the message JSON object. Here are the contents of TripPlanning/ChatLayout.js: User prompts are on the right side and colored blue. Responses from the AI travel agent are on the left side and colored green. As the following image shows, the HTML-formatted responses are accounted for in the conversation. When your AI agent is ready to go into production, you can use semantic caching to improve query performance by 80% and to reduce LLM inference and API call costs. To implement semantic caching, see this post on the Stochastic Coder blog. Was this page helpful? Events Build Intelligent Apps Mar 18, 5 AM - Mar 21, 6 PM Join the meetup series to build scalable AI solutions based on real-world use cases with fellow developers and experts. Events Build Intelligent Apps Mar 18, 5 AM - Mar 21, 6 PM Join the meetup series to build scalable AI solutions based on real-world use cases with fellow developers and experts.