# Open Agent RL Dataset: High Quality AI Agent | Tool Use & Function Calls | Reinforcement Learning Datasets DeepNLP website provides **high quality, genuine, online users' request** of Agent & RL datasets to help LLM foundation/SFT/Post Train to get more capable models at function call, tool use and planning. The datasets are collected and sampled from users' requests on our various clients (Web/App/Mini App) and [Open OneKey Agent Router](https://www.deepnlp.org/agent/onekey-mcp-router) and [Open OneKey MCP Router](https://www.deepnlp.org/agent/onekey-mcp-router). Some datasets requires credit to deduct and you can easily gain more credit by activities such as commenting and discussion and uploading your own datasets to the communities (https://www.deepnlp.org/workspace/billing). Visit Our AI Store Dataset Tab to Select [Dataset](https://www.deepnlp.org/store/dataset). **Disclaimer**: Safe privacy preserving or personalized information are marked and filtered out. **Function Call Plot Charts** AI Agent Marketplace Category

**Multi-Agent/MCP Parallel Running** AI Agent Marketplace Category

## Dataset Features **Genuinue Users' Queries**: Most of the high quality datasets are collected from query logs of our live AI Agents, such as [MCP Tool Use Agent](https://agent.deepnlp.org/agent/mcp_tool_use), [Open OneKey Agent Router](https://www.deepnlp.org/agent/onekey-mcp-router) and [Open OneKey MCP Router](https://www.deepnlp.org/agent/onekey-mcp-router). **Function Call and MCP Servers Support**: The datasets covers wide range of MCP servers from the Open MCP Marketplace() and Playgrounds. **Users Action and Humans' Feedback**: Users' actual feedbacks are crucial in improving the AI Agents training process. We collects users' genuine actions, such as **ACCEPT/REJECT** in confirming the function call results, **Upvote/Downvote** action of the final responses, and many other users' feedback on clickable elements. **Various Domains and Tasks**: We covers 40+ categories of AI agents' tool use scenarios, ranging from information seeking (AI search, map search, etc) to autonomous AI agents browser use, computer use, Data Analysis, Excel Spreadsheet and Powerpoint creation and generation, etc. **Example AI Agent Dataset Dialogues** | Domain | Related MCP Server | Demo | | ---- |------------------------------------------------------------------------------------| ---- | | Office File Agent | Excel Spreadsheet, Powerpoint, PDF, etc | [Example](https://agent.deepnlp.org/agent/mcp_tool_use/share/ee640008-6bc1-4c3a-832b-2557f985b540) [MCP]() | | AI Search/Deep Research | Bing/Google Custom/Perplexity/Tavily/Firecrawl | [Demo](https://agent.deepnlp.org/agent/mcp_tool_use?server=tavily-ai/tavily-mcp) [MCP]() | | Map Trip Planning | GoogleMap, Amap(Gaode), BaiduMap, etc. | [Example](https://agent.deepnlp.org/agent/mcp_tool_use/share/8ab0b25c-b72d-4cae-9c86-a852df8c6541) [MCP](https://agent.deepnlp.org/agent/mcp_tool_use?server=amap-mcp/amap-mcp-%E9%AB%98%E5%BE%B7%E5%9C%B0%E5%9B%BE-mcp) [Use MCP]() | | Browser Usage | Playwright, Puppeteer, etc. | [Demo](https://agent.deepnlp.org/agent/mcp_tool_use?server=puppeteer/puppeteer) [MCP]() | | Chart,Graph,Image | Gemini NanoBanana3,everart,mcp-server-charts(AntV),canva-mcp,bing-image-search-mcp | [Example](https://agent.deepnlp.org/agent/mcp_tool_use/share/1a6101e0-9afa-47c4-828c-f3d373dc96f1) [MCP]() | | Multi-Agent Parallel Running | Deep Research/Trip Planning and more. | [Example](https://agent.deepnlp.org) | ## Dataset Introduction We provide main below types of AI agents datasets in List of Messages Json Formats and scalar data such as rewards, etc. | Dataset Name | Description | User Feedback | Example Dataset Download | Full DataSet Download | |----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| ---- |-------|-----------------------------------------------------------------------------------------------------------------| | Tool Use Multi-Turn Dialogue | The tool use multi-turn dialogue dataset is in the list of messages formats, Useful for AI Search/Deep Research/Map/Financial Data/etc | YES | 50 instances, [Download](https://www.deepnlp.org/store/dataset/dataset/pub-deepnlp/agent-tool-use-dialogue-open-dataset-example) | 1k, [Download](https://www.deepnlp.org/store/dataset/dataset/pub-deepnlp/agent-tool-use-dialogue-open-dataset) | | Function Calling Tool Use | The dataset contains **messages** and **available tools** as input and output the choosen **tool_call** result indicating which tool to use and the arguments. The datasets are collected from calling SOTA LLM such as GPT, OpenAI o-series, Claude, Qwen, Kimi, etc. | No | 50 instances, [Download](https://deepnlp.org/store/dataset/dataset/pub-deepnlp/agent-function-calling-open-dataset-example) | 1k, [Download](https://deepnlp.org/store/ai-agent/ai-agent/pub-deepnlp/agent-function-calling-open-dataset) | | Reinforcement Learning | Sessions of user and assistant' multi-dialogues, rewards from users' feedback in this session, such click of confirmation (Accept/Reject), Upvote, Downvote on the responses, etc. | YES | 50 instances, [Download](https://deepnlp.org/store/dataset/dataset/pub-deepnlp/agent-reinforcement-learning-open-dataset-example) | 1k, [Download](https://deepnlp.org/store/dataset/dataset/pub-deepnlp/agent-reinforcement-learning-open-dataset) | ### Using Datasets Training Scripts How to use these datasets? We have also provided SFT/RL training demo scripts with packages such as `trl`, `verl` and you can find the demo scripts in [GitHub](https://github.com/AI-Hub-Admin/tutorial_agent_training_sft_rl/tree/main) ### Dataset 1 Tool Use Multi-Turn Dialogue Dataset **Dataset Description** | KEY | Type | Description | | ---- |---------------------|-----------------------------------------------------------------------------------------------------------------------------| | trace_id | String | Identify each unique new user request or API calling | | session_id | String | The identifier of each dialogue, which consists of multiple turns of dialogues and every user input produces a new trace_id | | messages | List of Json Object | Dialogue Messages | This data instances indicates a multi-turn dialogues of users' calling Google Maps **get_weather** tool to know the recent weather in San Francisco. The dialogues contains three types of messages: ``` User: query, original question that user asks, User: available_tools, List of Json that user provides to LLM, Assistant: message, content.type='tool_use', LLM output which tool to use and its parameters, User: message, content.type='tool_result', Users' actual function call running results. ``` ``` [ { "role": "user", "content": "What is the weather like in San Francisco?" }, { "role": "assistant", "content": [ { "type": "text", "text": "I need to use get_weather, and the user wants SF, which is likely San Francisco, CA." }, { "type": "tool_use", "id": "toolu_01A09q90qw90lq917835lq9", "name": "get_weather", "input": { "location": "San Francisco, CA", "unit": "celsius" } } ] }, { "role": "user", "content": [ { "type": "tool_result", "tool_use_id": "toolu_01A09q90qw90lq917835lq9", "content": "15 degrees" } ] } ] ``` Note that the function call comes in different formats when calling various models. We are mainly collecting in the OpenAI and anthroupic function calling formats. We supported both and you can see the differences from the offical documentations. **Multi-modal and Files** formats are also attached: The images and raw descriptions of the files such as path are also attached for context variables. ![Excel Spreadsheet Usage](https://raw.githubusercontent.com/aiagenta2z/mcp-marketplace/refs/heads/main/app/mcp_tool_use/docs/office_excel_use_agent.jpg) **OpenAI/Qwen/etc Function Call Formats** ``` { "tool_call": { "id": "call_d6f4ed29ce614390b99a05", "function": { "arguments": "{\"url\": \"https://www.stackoverflow.com\", \"browserType\": \"chromium\"}", "name": "playwright_navigate" }, "type": "function", "index": 0 } } ``` **Anthroupic Tool Use Formats** ``` { "type": "tool_use", "id": "toolu_01A09q90qw90lq917835lq9", "name": "get_weather", "input": { "location": "San Francisco, CA", "unit": "celsius" } } ``` ### Dataset 2 Function Calling Tool Use Dataset The functions call tool use datasets contains both the List of **messages** (Users' original query) and **Available Tools** (List of Json) and **Tool Call Results** produced by SOTA LLM. For example, the LLM chooses **puppeteer_navigate** tool from 7 other tools of MCP Server puppeteer. And set the parameters as **{\"url\": \"https://www.baidu.com\"}** **Dataset Description** | KEY | Description | |--------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | trace_id | Identify each unique new user request or API calling | | session_id | The identifier of each dialogue, which consists of multiple turns of dialogues and every user input produces a new trace_id | | data | The Json Object of Each Function Call | | data.function_calls.messages | List of messages in {"role":"", "content":""} formats. | | data.function_calls.tools | List of available tools, including the schema of tools name, description and parameters | | data.function_calls.tool_calls | Dict of tool_id, function, arguments | | tool_calls_results | Note: Tool Call Result are tool running actual output, which can be found in the list of dialogue message in Dataset 1, but not needed for training function call models training dataset | Format ``` { "model": "claude-opus-4", "session_id": "37e83ca0-d0d4-4204-b3e6-95be51bf57a6", "trace_id": "5c6a8e17-e121-4786-955d-0a716ed733ae", "function_calls": [{ "messages": [], "tools": [], "tool_calls": { "id": "", "function": {"arguments":"", "name":""}, "type": "" } }, { "messages": [], "tools": [], "tool_calls": { "id": "", "function": {"arguments":"", "name":""}, "type": "" } } ] } ``` Data Example of Puppeteer Navigate Open Website. ``` { "trace_id": "72d51b6e-8c48-42ee-bfc9-05209d1444c8", "session_id": "7ca7e84c-cee5-4ddd-be1e-748df2be481e", "model": "claude-opus-4", "data": { "function_calls": [{ "messages": [{ "role": "user", "content": "Visit website baidu.com and take a screenshot" }, { "role": "assistant", "content": "", "tool_calls": [{ "id": "call_f02ba9ee6b344dd0af9041", "type": "tool_use", "function": { "name": "puppeteer_navigate", "arguments": "{\"url\": \"https://www.baidu.com\"}" } }] }, { "role": "tool", "tool_call_id": "call_f02ba9ee6b344dd0af9041", "name": "puppeteer_navigate", "content": "" }, { "role": "assistant", "content": "", "tool_calls": [{ "id": "call_c1e8658beda04f6881e8eb", "type": "tool_use", "function": { "name": "puppeteer_screenshot", "arguments": "{\"name\": \"baidu_homepage\"}" } }] }, { "role": "tool", "tool_call_id": "call_c1e8658beda04f6881e8eb", "name": "puppeteer_screenshot", "content": "" }], "tools": [{ "type": "function", "function": { "name": "puppeteer_navigate", "description": "Navigate to a URL", "parameters": { "type": "object", "properties": { "url": { "type": "string", "description": "URL to navigate to" }, "launchOptions": { "type": "object", "description": "PuppeteerJS LaunchOptions. Default null. If changed and not null, browser restarts. Example: { headless: true, args: ['--no-sandbox'] }" }, "allowDangerous": { "type": "boolean", "description": "Allow dangerous LaunchOptions that reduce security. When false, dangerous args like --no-sandbox will throw errors. Default false." } }, "required": ["url"] } } }, { "type": "function", "function": { "name": "puppeteer_screenshot", "description": "Take a screenshot of the current page or a specific element", "parameters": { "type": "object", "properties": { "name": { "type": "string", "description": "Name for the screenshot" }, "selector": { "type": "string", "description": "CSS selector for element to screenshot" }, "width": { "type": "number", "description": "Width in pixels (default: 800)" }, "height": { "type": "number", "description": "Height in pixels (default: 600)" }, "encoded": { "type": "boolean", "description": "If true, capture the screenshot as a base64-encoded data URI (as text) instead of binary image content. Default false." } }, "required": ["name"] } } }, { "type": "function", "function": { "name": "puppeteer_click", "description": "Click an element on the page", "parameters": { "type": "object", "properties": { "selector": { "type": "string", "description": "CSS selector for element to click" } }, "required": ["selector"] } } }, { "type": "function", "function": { "name": "puppeteer_fill", "description": "Fill out an input field", "parameters": { "type": "object", "properties": { "selector": { "type": "string", "description": "CSS selector for input field" }, "value": { "type": "string", "description": "Value to fill" } }, "required": ["selector", "value"] } } }, { "type": "function", "function": { "name": "puppeteer_select", "description": "Select an element on the page with Select tag", "parameters": { "type": "object", "properties": { "selector": { "type": "string", "description": "CSS selector for element to select" }, "value": { "type": "string", "description": "Value to select" } }, "required": ["selector", "value"] } } }, { "type": "function", "function": { "name": "puppeteer_hover", "description": "Hover an element on the page", "parameters": { "type": "object", "properties": { "selector": { "type": "string", "description": "CSS selector for element to hover" } }, "required": ["selector"] } } }, { "type": "function", "function": { "name": "puppeteer_evaluate", "description": "Execute JavaScript in the browser console", "parameters": { "type": "object", "properties": { "script": { "type": "string", "description": "JavaScript code to execute" } }, "required": ["script"] } } }], "tool_calls": { "id": "call_f02ba9ee6b344dd0af9041", "function": { "arguments": "{\"url\": \"https://www.baidu.com\"}", "name": "puppeteer_navigate" }, "type": "function" } }, { "messages": [{ "role": "user", "content": "Visit website baidu.com and take a screenshot" }, { "role": "assistant", "content": "", "tool_calls": [{ "id": "call_f02ba9ee6b344dd0af9041", "type": "tool_use", "function": { "name": "puppeteer_navigate", "arguments": "{\"url\": \"https://www.baidu.com\"}" } }] }, { "role": "tool", "tool_call_id": "call_f02ba9ee6b344dd0af9041", "name": "puppeteer_navigate", "content": "" }, { "role": "assistant", "content": "", "tool_calls": [{ "id": "call_c1e8658beda04f6881e8eb", "type": "tool_use", "function": { "name": "puppeteer_screenshot", "arguments": "{\"name\": \"baidu_homepage\"}" } }] }, { "role": "tool", "tool_call_id": "call_c1e8658beda04f6881e8eb", "name": "puppeteer_screenshot", "content": "" }], "tools": [{ "type": "function", "function": { "name": "puppeteer_navigate", "description": "Navigate to a URL", "parameters": { "type": "object", "properties": { "url": { "type": "string", "description": "URL to navigate to" }, "launchOptions": { "type": "object", "description": "PuppeteerJS LaunchOptions. Default null. If changed and not null, browser restarts. Example: { headless: true, args: ['--no-sandbox'] }" }, "allowDangerous": { "type": "boolean", "description": "Allow dangerous LaunchOptions that reduce security. When false, dangerous args like --no-sandbox will throw errors. Default false." } }, "required": ["url"] } } }, { "type": "function", "function": { "name": "puppeteer_screenshot", "description": "Take a screenshot of the current page or a specific element", "parameters": { "type": "object", "properties": { "name": { "type": "string", "description": "Name for the screenshot" }, "selector": { "type": "string", "description": "CSS selector for element to screenshot" }, "width": { "type": "number", "description": "Width in pixels (default: 800)" }, "height": { "type": "number", "description": "Height in pixels (default: 600)" }, "encoded": { "type": "boolean", "description": "If true, capture the screenshot as a base64-encoded data URI (as text) instead of binary image content. Default false." } }, "required": ["name"] } } }, { "type": "function", "function": { "name": "puppeteer_click", "description": "Click an element on the page", "parameters": { "type": "object", "properties": { "selector": { "type": "string", "description": "CSS selector for element to click" } }, "required": ["selector"] } } }, { "type": "function", "function": { "name": "puppeteer_fill", "description": "Fill out an input field", "parameters": { "type": "object", "properties": { "selector": { "type": "string", "description": "CSS selector for input field" }, "value": { "type": "string", "description": "Value to fill" } }, "required": ["selector", "value"] } } }, { "type": "function", "function": { "name": "puppeteer_select", "description": "Select an element on the page with Select tag", "parameters": { "type": "object", "properties": { "selector": { "type": "string", "description": "CSS selector for element to select" }, "value": { "type": "string", "description": "Value to select" } }, "required": ["selector", "value"] } } }, { "type": "function", "function": { "name": "puppeteer_hover", "description": "Hover an element on the page", "parameters": { "type": "object", "properties": { "selector": { "type": "string", "description": "CSS selector for element to hover" } }, "required": ["selector"] } } }, { "type": "function", "function": { "name": "puppeteer_evaluate", "description": "Execute JavaScript in the browser console", "parameters": { "type": "object", "properties": { "script": { "type": "string", "description": "JavaScript code to execute" } }, "required": ["script"] } } }], "tool_calls": { "id": "call_c1e8658beda04f6881e8eb", "function": { "arguments": "{\"name\": \"baidu_homepage\"}", "name": "puppeteer_screenshot" }, "type": "function" } }] }, } ``` ### Dataset 3 Reinforcement Learning We collect users' positive and negative feedbacks on the AI Agent workflow. **Positive feedback** include the click on the **ACCEPT** button of the function call results and **Upvote** button. We set rewards as 1.0, and use "reward_description" field to include detailed introduction of their actions. **Negative feedback** include the click on the **REJECT** button of the function call results and **Downvote** button. We set rewards as -1.0. **Dataset Description** | KEY | Type | Description | |----------------|---------------------|------------------------------------------------------------------------------------------------------------------------------------------------------| | trace_id | String | Identify each unique new user request or API calling | | session_id | String | The identifier of each dialogue, which consists of multiple turns of dialogues and every user input produces a new trace_id | | messages | List of Json Object | Dialogue Messages | | message.reward | scalar | Users' feedback on each individual message or function call, rool result level. ACCEPT/REJECT, 1.0 for positive feedback, -1.0 for negative feedback | | message.reward_description | string | Detailed description of actions of users. | | reward | scalar | Overall feedback of whole session, Buttom Bar Action, 1.0 for positive feedback, -1.0 for negative feedback | | reward_description | string | Details description of the action, such as ACCEPT/REJECT,UPVOTE,DOWNVOTE, etc | AI Agent Marketplace Category

** Note that: The rewards are on two folds: each function call or message level, whole dialogue session level. Each message or function call level: Json $.message.reward ,e.g. Users may not be satisfied with LLM choose the wrong tools, they might click rejects or after autorun they may click feedback. Dialogue session level: Json: $reward, Bottom Bar Buttons, Upvote, Downvote, etc. Auto-Run mode, function calls doesn't need users' confirmation and default to Accept. **Positive Reward Example** In this financial data checking scenarios, users' clicks on the ACCEPT to confirm the function calls of tool 'get_stock_price_global_market' and parameters "{\"symbol_list\": [\"MSFT\", \"NVDA\"], \"market\": \"US\"}" form server finance-agent-mcp-server. ``` { "messages": [{ "role": "user", "content": "Find the Microsoft and Nvidia Stock Price and Market Cap. Who has higher market cap today?" }, { "role": "assistant", "content": "

Finding Relevant Tools Using Queries:

find

market cap

compare

stock

and

finance data

market

Relevant MCP Servers:

Relevant Tools:

GetAwsSolutionsConstructPattern

SearchGenAICDKConstructs

LambdaLayerDocumentationProvider

pay_order

create_refund

get_refund

sequentialthinking

get_stock_price_global_market

read_documentation

search_documentation

recommend

brave_local_search

firecrawl_scrape

firecrawl_search

firecrawl_extract

firecrawl_deep_research

list_directory

list_directory_with_sizes

move_file

tavily_map

| \n

\n \u25b6\n Call Tool get_stock_price_global_market of Server finance-agent-mcp-server\n

Parameters

{\n \"market\": \"US\",\n \"symbol_list\": [\n \"MSFT\",\n \"NVDA\"\n ]\n}

\n \n \n

Results

\n\n|Searching MCP Marketplace Tools and Servers", "context": "{\"tool_id\": \"call_45853133e9a84899adcbb3\", \"tool_name\": \"get_stock_price_global_market\", \"tool_arguments\": {\"symbol_list\": [\"MSFT\", \"NVDA\"], \"market\": \"US\"}, \"toolname_to_server_dict\": {\"GetAwsSolutionsConstructPattern\": [\"awslabs.cdk-mcp-server\"], \"SearchGenAICDKConstructs\": [\"awslabs.cdk-mcp-server\"], \"LambdaLayerDocumentationProvider\": [\"awslabs.cdk-mcp-server\"], \"pay_order\": [\"paypal\"], \"create_refund\": [\"paypal\"], \"get_refund\": [\"paypal\"], \"sequentialthinking\": [\"sequential-thinking\"], \"get_stock_price_global_market\": [\"finance-agent-mcp-server\"], \"read_documentation\": [\"awslabs.aws-documentation-mcp-server\"], \"search_documentation\": [\"awslabs.aws-documentation-mcp-server\"], \"recommend\": [\"awslabs.aws-documentation-mcp-server\"], \"brave_local_search\": [\"brave-search\"], \"firecrawl_scrape\": [\"firecrawl-mcp\"], \"firecrawl_search\": [\"firecrawl-mcp\"], \"firecrawl_extract\": [\"firecrawl-mcp\"], \"firecrawl_deep_research\": [\"firecrawl-mcp\"], \"list_directory\": [\"filesystem\"], \"list_directory_with_sizes\": [\"filesystem\"], \"move_file\": [\"filesystem\"], \"tavily_map\": [\"tavily-remote-mcp\"]}}" }, { "role": "user", "content": "ACCEPT" }, { "role": "assistant", "content": "", "tool_calls": [{ "id": "call_45853133e9a84899adcbb3", "type": "tool_use", "function": { "name": "get_stock_price_global_market", "arguments": "{\"symbol_list\": [\"MSFT\", \"NVDA\"], \"market\": \"US\"}" } }] }, { "role": "tool", "tool_call_id": "call_45853133e9a84899adcbb3", "name": "get_stock_price_global_market", "content": "[{\"avg_price\": \"517.10 USD\", \"high\": \"522.82 USD\", \"low\": \"514.02 USD\", \"symbol\": \"MSFT\", \"update_time\": \"\", \"previous_close\": \"520.17 USD\", \"change\": \"-3.07\", \"market_capitalization\": \"3.84T USD\", \"pe_ratio\": \"37.98\", \"data_source\": \"morningstar.com\", \"source_url\": \"https://www.morningstar.com/stocks/xnas/msft/quote\"}, {\"avg_price\": \"182.01 USD\", \"high\": \"182.94 USD\", \"low\": \"180.59 USD\", \"symbol\": \"NVDA\", \"update_time\": \"\", \"previous_close\": \"180.45 USD\", \"change\": \"+1.56\", \"market_capitalization\": \"4.44T USD\", \"pe_ratio\": \"57.04\", \"data_source\": \"morningstar.com\", \"source_url\": \"https://www.morningstar.com/stocks/xnas/nvda/quote\"}]" }, { "role": "system", "content": "\n \n **Background**\n You are an AI Agent expert good at generating answers given users' original query and related context information, such as web pages search result, tool function call results, etc.\n\n **Tasks**:\n - Analyse User's query and consider current date, current time and users' location (country, city, district) if provided\n - Identify the language of user's original query and generate the answers using the same language, such as English, Chinese, French, Russion, etc.\n - Identify the expected output content-type of the answers, choose from \"text/markdown\", \"text/html\", \"application/code\", with default to \"text/html\". If users want you to generate a code with explanation, use mixed content type for different sections, \"application/code\" and \"text/markdown\".\n - Output detailed response to answer users' original question directly, without the thinking process. Please include the necessary information.\n - Coding or Math: If users' query is to write codes or math formula, please generate the code in the first place. Secondly, generate a few points describing the meanings of the main functions or changes.\n - MCP Tool Use or Function Call: Answer Users' questions using the mcp or function call results. Please include all the result information such as the weather forecast for all 3 days, etc.\n - Citation and Data Sources: If the results contains URL of sources and citations, remember to include the main text in plan htmlp a tag, such as {citation_name}\n\n **Output Format**:\n - The output consists of multiple sections. Each section begins with a content type field describing the content-type of the paragraph and then the main content.\n\n **Example**:\n **User Query**:\n Please generate a python code of adding two numbers\n **Output**:\n application/code\n ```\n def add(a, b):\n return a + b\n ```\n text/markdown\n ### Description\n This function takes two numbers and output the sum.\n" }, { "role": "user", "content": " \n **Context Variables**\n Datetime: 2025-08-19 10:08:17 CST (Asia/Shanghai)\n User Location: \n **User Query**\n ACCEPT\n" }], "reward": 1.0, "reward_description": "ACCEPT" } ``` **Negative Reward Example** User is not satisfied with LLM choosing "firecrawl_search" and its results and clicks on the downvote button. ``` { "messages": [{ "role": "user", "content": "Latest AI Company Research News" }, { "role": "assistant", "content": "

Finding Relevant Tools Using Queries:

company

latest

AI news

artificial intelligence

technology trends

research updates

Relevant MCP Servers:

Relevant Tools:

firecrawl_search

firecrawl_extract

firecrawl_deep_research

get_pricing

generate_cost_report

get_pricing_service_codes

get_pricing_attribute_values

get_price_list_urls

sequentialthinking

get_notification_details

get_stock_price_global_market

tavily_search

ExplainCDKNagRule

CheckCDKNagSuppressions

SearchGenAICDKConstructs

show_subscription_plan_details

show_subscription_details

get_order

accept_dispute_claim

get_refund

| \n

\n \u25b6\n Call Tool firecrawl_search of Server firecrawl-mcp\n

Parameters

{\n \"country\": \"us\",\n \"lang\": \"en\",\n \"limit\": 5,\n \"query\": \"latest AI company research news\",\n \"scrapeOptions\": {\n \"formats\": [\n \"markdown\"\n ],\n \"onlyMainContent\": true\n }\n}

\n \n \n

Results

\n\n|Searching MCP Marketplace Tools and Servers", "id": "33ea10a7-c66a-48cd-80d3-ac9ace6ae747", "context": "{\"tool_id\": \"call_30fa90e8ecb944bd8613b3\", \"tool_name\": \"firecrawl_search\", \"tool_arguments\": {\"query\": \"latest AI company research news\", \"limit\": 5, \"lang\": \"en\", \"country\": \"us\", \"scrapeOptions\": {\"formats\": [\"markdown\"], \"onlyMainContent\": true}}, \"toolname_to_server_dict\": {\"firecrawl_search\": [\"firecrawl-mcp\"], \"firecrawl_extract\": [\"firecrawl-mcp\"], \"firecrawl_deep_research\": [\"firecrawl-mcp\"], \"get_pricing\": [\"awslabs.aws-pricing-mcp-server\"], \"generate_cost_report\": [\"awslabs.aws-pricing-mcp-server\"], \"get_pricing_service_codes\": [\"awslabs.aws-pricing-mcp-server\"], \"get_pricing_attribute_values\": [\"awslabs.aws-pricing-mcp-server\"], \"get_price_list_urls\": [\"awslabs.aws-pricing-mcp-server\"], \"sequentialthinking\": [\"sequential-thinking\"], \"get_notification_details\": [\"github\"], \"get_stock_price_global_market\": [\"finance-agent-mcp-server\"], \"tavily_search\": [\"tavily-remote-mcp\"], \"ExplainCDKNagRule\": [\"awslabs.cdk-mcp-server\"], \"CheckCDKNagSuppressions\": [\"awslabs.cdk-mcp-server\"], \"SearchGenAICDKConstructs\": [\"awslabs.cdk-mcp-server\"], \"show_subscription_plan_details\": [\"paypal\"], \"show_subscription_details\": [\"paypal\"], \"get_order\": [\"paypal\"], \"accept_dispute_claim\": [\"paypal\"], \"get_refund\": [\"paypal\"]}}" }, { "role": "user", "content": "ACCEPT" "reward": 1.0, }, { "role": "assistant", "content": "", "tool_calls": [{ "id": "call_31924021412031204fsdf", "type": "tool_use", "function": { "name": "firecrawl_search", "arguments": "{\"query\": \"AI Research News\"}" } }] }, { "role": "tool", "tool_call_id": "call_31924021412031204fsdf", "name": "firecrawl_search", "content": "\nThe latest AI company research news highlights significant advancements and strategic moves by leading tech firms. Here are some key updates:\n\n- **Google DeepMind** introduced **AlphaFold 3**, a revolutionary model capable of predicting the structure and interactions of all life\u2019s molecules, including proteins, DNA, and RNA, with unprecedented accuracy. This breakthrough could transform drug discovery and molecular biology research. [Read more](https://www.deepmind.com/)\n\n- **OpenAI** is advancing its **o3** and **o4** reasoning models, focusing on improved planning, tool use, and problem-solving capabilities. The company is also reportedly developing a new AI model named **Strawberry**, aimed at enhancing deep reasoning and autonomous task execution. [Source](https://openai.com/research)\n\n- **Anthropic** released **Claude 3.5 Sonnet**, which outperforms its predecessors and competing models in coding, math, and reasoning benchmarks. The model is now being integrated into enterprise workflows for enhanced decision support and automation. [Learn more](https://www.anthropic.com/news/claude-3-5-sonnet)\n\n- **Microsoft** and **NVIDIA** have deepened their AI partnership, co-developing next-gen AI chips and cloud infrastructure to support large-scale AI training. Microsoft is also integrating more AI capabilities into its Azure platform, including custom LLM deployments. [Microsoft AI News](https://news.microsoft.com/ai)\n\n- **xAI**, Elon Musk\u2019s AI company, continues training **Grok-3** on real-time data from X (formerly Twitter), aiming to create a more dynamic and context-aware AI assistant. Grok is being tested for advanced reasoning and multi-modal capabilities. [Updates via X](https://x.com/xai)\n\n- **Meta AI** unveiled **Llama 3.1**, a fully open-source large language model with enhanced multilingual support and reasoning abilities. It is now one of the most widely adopted models in the open-source AI community. [Meta AI Blog](https://ai.meta.com/llama/)\n\nThese developments reflect a rapid pace of innovation in the AI sector, with companies focusing on more powerful, efficient, and accessible models for both research and commercial applications." }, { "role": "assistant", "content": "Start Running Tool firecrawl_search ", "id": "3cbb4504-f080-4bb7-a9cc-9cfaa3a9756d", "context": "" }, { "role": "assistant", "content": "

Finding Relevant Tools Using Queries:

company

latest

AI news

artificial intelligence

technology trends

research updates

Relevant MCP Servers:

Relevant Tools:

firecrawl_search

firecrawl_extract

firecrawl_deep_research

get_pricing

generate_cost_report

get_pricing_service_codes

get_pricing_attribute_values

get_price_list_urls

sequentialthinking

get_notification_details

get_stock_price_global_market

tavily_search

ExplainCDKNagRule

CheckCDKNagSuppressions

SearchGenAICDKConstructs

show_subscription_plan_details

show_subscription_details

get_order

accept_dispute_claim

get_refund

| \n

\n \u25b6\n Call Tool firecrawl_search of Server firecrawl-mcp\n

Parameters

\n \n \n

Results

\n\n|Searching MCP Marketplace Tools and Servers", "id": "33ea10a7-c66a-48cd-80d3-ac9ace6ae747", "context": "{\"tool_id\": \"call_30fa90e8ecb944bd8613b3\", \"tool_name\": \"firecrawl_search\", \"tool_arguments\": {\"query\": \"latest AI company research news\", \"limit\": 5, \"lang\": \"en\", \"country\": \"us\", \"scrapeOptions\": {\"formats\": [\"markdown\"], \"onlyMainContent\": true}}, \"toolname_to_server_dict\": {\"firecrawl_search\": [\"firecrawl-mcp\"], \"firecrawl_extract\": [\"firecrawl-mcp\"], \"firecrawl_deep_research\": [\"firecrawl-mcp\"], \"get_pricing\": [\"awslabs.aws-pricing-mcp-server\"], \"generate_cost_report\": [\"awslabs.aws-pricing-mcp-server\"], \"get_pricing_service_codes\": [\"awslabs.aws-pricing-mcp-server\"], \"get_pricing_attribute_values\": [\"awslabs.aws-pricing-mcp-server\"], \"get_price_list_urls\": [\"awslabs.aws-pricing-mcp-server\"], \"sequentialthinking\": [\"sequential-thinking\"], \"get_notification_details\": [\"github\"], \"get_stock_price_global_market\": [\"finance-agent-mcp-server\"], \"tavily_search\": [\"tavily-remote-mcp\"], \"ExplainCDKNagRule\": [\"awslabs.cdk-mcp-server\"], \"CheckCDKNagSuppressions\": [\"awslabs.cdk-mcp-server\"], \"SearchGenAICDKConstructs\": [\"awslabs.cdk-mcp-server\"], \"show_subscription_plan_details\": [\"paypal\"], \"show_subscription_details\": [\"paypal\"], \"get_order\": [\"paypal\"], \"accept_dispute_claim\": [\"paypal\"], \"get_refund\": [\"paypal\"]}}" }], "reward": -1.0, "reward_description": "DOWNVOTE" } ``` ### Running SFT RL Posting Training Using Dataset See Detailed on [Github](https://github.com/AI-Hub-Admin/tutorial_model_training_sft_rl) SFT Qwen3/OpenAI format Function Call Datasets [Full Example](https://github.com/AI-Hub-Admin/tutorial_agent_training_sft_rl/blob/main/scripts/train_function_call_sft_qwen3.py) ``` mkdir ../train/dataset/function_call cd ../train/dataset/function_call wget https://static.aiagenta2z.com/scripts/doc/file/06766e91894147319ffd0116b04ff94d/example_deepnlp_agent_function_call_202510.json cd ../../scripts python train_function_call_sft_qwen3.py ``` Loaded TRL Package and Lora Config ``` def sft_trainer(dataset_path, model_path): """ pip install -U transformers huggingface_hub pip install --upgrade torch torchvision torchaudio dataset_path="../train/function_calling_dataset_qwen" """ #### Qwen3 SFT Model # import torch from transformers import AutoModel, AutoTokenizer, AutoModelForCausalLM, TrainingArguments from trl import SFTTrainer, SFTConfig from datasets import load_from_disk from peft import LoraConfig import torch model_id = "Qwen/Qwen3-0.6B" load_from_cache = True run_on_mac = True if load_from_cache: tokenizer = AutoTokenizer.from_pretrained(restore_model_dir_cache) tokenizer.pad_token = tokenizer.eos_token if run_on_mac: model = AutoModelForCausalLM.from_pretrained( model_path, torch_dtype=torch.float32, trust_remote_code=True ) model.to("mps") else: model = AutoModelForCausalLM.from_pretrained(model_path) else: # 1. Load Tokenizer and Model tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) tokenizer.pad_token = tokenizer.eos_token model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", # torch_dtype=torch.bfloat16, trust_remote_code=True ) # 2. LoRA Configuration (Saves memory and prevents forgetting) peft_config = LoraConfig( r=16, lora_alpha=32, target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], lora_dropout=0.05, bias="none", task_type="CAUSAL_LM", ) if run_on_mac: import os os.environ["ACCELERATE_DISABLE_MIXED_PRECISION"] = "true" os.environ["ACCELERATE_MIXED_PRECISION"] = "no" os.environ["ACCELERATE_CONFIG_FILE"] = "/dev/null" # 3. Training Arguments, Testing on Mac, Change bf16, fp16 when running on GPU sft_config = SFTConfig( output_dir=output_sft_model_dir, dataset_text_field="messages", per_device_train_batch_size=4, gradient_accumulation_steps=4, learning_rate=1e-4, num_train_epochs=3, bf16=False, fp16=False, dataloader_pin_memory=False, ) # 4. Trainer dataset = load_from_disk(dataset_path) trainer = SFTTrainer( model=model, train_dataset=dataset, # peft_config=peft_config, args=sft_config, processing_class=tokenizer, ) # 5. Execute trainer.train() trainer.save_model(output_model_dir) ```