Open Agent RL Dataset: High Quality AI Agent | Tool Use & Function Calls | Reinforcement Learning Datasets

DeepNLP website provides high quality, genuine, online users’ request of Agent & RL datasets to help LLM foundation/SFT/Post Train to get more capable models at function call, tool use and planning. The datasets are collected and sampled from users’ requests on our various clients (Web/App/Mini App) and Open OneKey Agent Router and Open OneKey MCP Router. Some datasets requires credit to deduct and you can easily gain more credit by activities such as commenting and discussion and uploading your own datasets to the communities (https://www.deepnlp.org/workspace/billing).

Visit Our AI Store Dataset Tab to Select Dataset.

Disclaimer: Safe privacy preserving or personalized information are marked and filtered out.

Function Call Plot Charts

AI Agent Marketplace Category

Multi-Agent/MCP Parallel Running

AI Agent Marketplace Category

Dataset Features

Genuinue Users’ Queries: Most of the high quality datasets are collected from query logs of our live AI Agents, such as MCP Tool Use Agent, Open OneKey Agent Router and Open OneKey MCP Router.

Function Call and MCP Servers Support: The datasets covers wide range of MCP servers from the Open MCP Marketplace() and Playgrounds.

Users Action and Humans’ Feedback: Users’ actual feedbacks are crucial in improving the AI Agents training process. We collects users’ genuine actions, such as ACCEPT/REJECT in confirming the function call results, Upvote/Downvote action of the final responses, and many other users’ feedback on clickable elements.

Various Domains and Tasks: We covers 40+ categories of AI agents’ tool use scenarios, ranging from information seeking (AI search, map search, etc) to autonomous AI agents browser use, computer use, Data Analysis, Excel Spreadsheet and Powerpoint creation and generation, etc.

Example AI Agent Dataset Dialogues

Domain

Related MCP Server

Demo

Office File Agent

Excel Spreadsheet, Powerpoint, PDF, etc

Example MCP

AI Search/Deep Research

Bing/Google Custom/Perplexity/Tavily/Firecrawl

Demo MCP

Map Trip Planning

GoogleMap, Amap(Gaode), BaiduMap, etc.

Example MCP Use MCP

Browser Usage

Playwright, Puppeteer, etc.

Demo MCP

Chart,Graph,Image

Gemini NanoBanana3,everart,mcp-server-charts(AntV),canva-mcp,bing-image-search-mcp

Example MCP

Multi-Agent Parallel Running

Deep Research/Trip Planning and more.

Example

Dataset Introduction

We provide main below types of AI agents datasets in List of Messages Json Formats and scalar data such as rewards, etc.

Dataset Name

Description

User Feedback

Example Dataset Download

Full DataSet Download

Tool Use Multi-Turn Dialogue

The tool use multi-turn dialogue dataset is in the list of messages formats, Useful for AI Search/Deep Research/Map/Financial Data/etc

YES

50 instances, Download

1k, Download

Function Calling Tool Use

The dataset contains messages and available tools as input and output the choosen tool_call result indicating which tool to use and the arguments. The datasets are collected from calling SOTA LLM such as GPT, OpenAI o-series, Claude, Qwen, Kimi, etc.

No

50 instances, Download

1k, Download

Reinforcement Learning

Sessions of user and assistant’ multi-dialogues, rewards from users’ feedback in this session, such click of confirmation (Accept/Reject), Upvote, Downvote on the responses, etc.

YES

50 instances, Download

1k, Download

Using Datasets Training Scripts

How to use these datasets? We have also provided SFT/RL training demo scripts with packages such as trl, verl and you can find the demo scripts in GitHub

Dataset 1 Tool Use Multi-Turn Dialogue Dataset

Dataset Description

KEY

Type

Description

trace_id

String

Identify each unique new user request or API calling

session_id

String

The identifier of each dialogue, which consists of multiple turns of dialogues and every user input produces a new trace_id

messages

List of Json Object

Dialogue Messages

This data instances indicates a multi-turn dialogues of users’ calling Google Maps get_weather tool to know the recent weather in San Francisco. The dialogues contains three types of messages:

User: query, original question that user asks,
User: available_tools, List of Json that user provides to LLM,
Assistant: message, content.type='tool_use', LLM output which tool to use and its parameters,
User: message, content.type='tool_result',  Users' actual function call running results.
[
        {
            "role": "user",
            "content": "What is the weather like in San Francisco?"
        },
        {
            "role": "assistant",
            "content": [
                {
                    "type": "text",
                    "text": "I need to use get_weather, and the user wants SF, which is likely San Francisco, CA."
                },
                {
                    "type": "tool_use",
                    "id": "toolu_01A09q90qw90lq917835lq9",
                    "name": "get_weather",
                    "input": {
                        "location": "San Francisco, CA",
                        "unit": "celsius"
                    }
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "tool_result",
                    "tool_use_id": "toolu_01A09q90qw90lq917835lq9",
                    "content": "15 degrees"
                }
            ]
        }
]

Note that the function call comes in different formats when calling various models. We are mainly collecting in the OpenAI and anthroupic function calling formats. We supported both and you can see the differences from the offical documentations.

Multi-modal and Files formats are also attached:

The images and raw descriptions of the files such as path are also attached for context variables.

Excel Spreadsheet Usage

OpenAI/Qwen/etc Function Call Formats

{
	"tool_call": {
		"id": "call_d6f4ed29ce614390b99a05",
		"function": {
			"arguments": "{\"url\": \"https://www.stackoverflow.com\", \"browserType\": \"chromium\"}",
			"name": "playwright_navigate"
		},
		"type": "function",
		"index": 0
	}
}

Anthroupic Tool Use Formats

{
	"type": "tool_use",
	"id": "toolu_01A09q90qw90lq917835lq9",
	"name": "get_weather",
	"input": {
		"location": "San Francisco, CA",
		"unit": "celsius"
	}
}

Dataset 2 Function Calling Tool Use Dataset

The functions call tool use datasets contains both the List of messages (Users’ original query) and Available Tools (List of Json) and Tool Call Results produced by SOTA LLM. For example, the LLM chooses puppeteer_navigate tool from 7 other tools of MCP Server puppeteer. And set the parameters as {“url”: “https://www.baidu.com”}

Dataset Description

KEY

Description

trace_id

Identify each unique new user request or API calling

session_id

The identifier of each dialogue, which consists of multiple turns of dialogues and every user input produces a new trace_id

data

The Json Object of Each Function Call

data.function_calls.messages

List of messages in {“role”:””, “content”:””} formats.

data.function_calls.tools

List of available tools, including the schema of tools name, description and parameters

data.function_calls.tool_calls

Dict of tool_id, function, arguments

tool_calls_results

Note: Tool Call Result are tool running actual output, which can be found in the list of dialogue message in Dataset 1, but not needed for training function call models training dataset

Format


{
  "model": "claude-opus-4",
  "session_id": "37e83ca0-d0d4-4204-b3e6-95be51bf57a6",
  "trace_id": "5c6a8e17-e121-4786-955d-0a716ed733ae",
  "function_calls": [{
    "messages": [],
    "tools": [],
    "tool_calls": {
    	"id": "",
    	"function": {"arguments":"", "name":""},
    	"type": ""
    }
  },
  {
    "messages": [],
    "tools": [],
    "tool_calls": {
    	"id": "",
    	"function": {"arguments":"", "name":""},
    	"type": ""
    }
  }
  ]
}

Data Example of Puppeteer Navigate Open Website.

{
	"trace_id": "72d51b6e-8c48-42ee-bfc9-05209d1444c8",
	"session_id": "7ca7e84c-cee5-4ddd-be1e-748df2be481e",
	"model": "claude-opus-4",
	"data": {
		"function_calls": [{
			"messages": [{
				"role": "user",
				"content": "Visit website baidu.com and take a screenshot"
			}, {
				"role": "assistant",
				"content": "",
				"tool_calls": [{
					"id": "call_f02ba9ee6b344dd0af9041",
					"type": "tool_use",
					"function": {
						"name": "puppeteer_navigate",
						"arguments": "{\"url\": \"https://www.baidu.com\"}"
					}
				}]
			}, {
				"role": "tool",
				"tool_call_id": "call_f02ba9ee6b344dd0af9041",
				"name": "puppeteer_navigate",
				"content": ""
			}, {
				"role": "assistant",
				"content": "",
				"tool_calls": [{
					"id": "call_c1e8658beda04f6881e8eb",
					"type": "tool_use",
					"function": {
						"name": "puppeteer_screenshot",
						"arguments": "{\"name\": \"baidu_homepage\"}"
					}
				}]
			}, {
				"role": "tool",
				"tool_call_id": "call_c1e8658beda04f6881e8eb",
				"name": "puppeteer_screenshot",
				"content": ""
			}],
			"tools": [{
				"type": "function",
				"function": {
					"name": "puppeteer_navigate",
					"description": "Navigate to a URL",
					"parameters": {
						"type": "object",
						"properties": {
							"url": {
								"type": "string",
								"description": "URL to navigate to"
							},
							"launchOptions": {
								"type": "object",
								"description": "PuppeteerJS LaunchOptions. Default null. If changed and not null, browser restarts. Example: { headless: true, args: ['--no-sandbox'] }"
							},
							"allowDangerous": {
								"type": "boolean",
								"description": "Allow dangerous LaunchOptions that reduce security. When false, dangerous args like --no-sandbox will throw errors. Default false."
							}
						},
						"required": ["url"]
					}
				}
			}, {
				"type": "function",
				"function": {
					"name": "puppeteer_screenshot",
					"description": "Take a screenshot of the current page or a specific element",
					"parameters": {
						"type": "object",
						"properties": {
							"name": {
								"type": "string",
								"description": "Name for the screenshot"
							},
							"selector": {
								"type": "string",
								"description": "CSS selector for element to screenshot"
							},
							"width": {
								"type": "number",
								"description": "Width in pixels (default: 800)"
							},
							"height": {
								"type": "number",
								"description": "Height in pixels (default: 600)"
							},
							"encoded": {
								"type": "boolean",
								"description": "If true, capture the screenshot as a base64-encoded data URI (as text) instead of binary image content. Default false."
							}
						},
						"required": ["name"]
					}
				}
			}, {
				"type": "function",
				"function": {
					"name": "puppeteer_click",
					"description": "Click an element on the page",
					"parameters": {
						"type": "object",
						"properties": {
							"selector": {
								"type": "string",
								"description": "CSS selector for element to click"
							}
						},
						"required": ["selector"]
					}
				}
			}, {
				"type": "function",
				"function": {
					"name": "puppeteer_fill",
					"description": "Fill out an input field",
					"parameters": {
						"type": "object",
						"properties": {
							"selector": {
								"type": "string",
								"description": "CSS selector for input field"
							},
							"value": {
								"type": "string",
								"description": "Value to fill"
							}
						},
						"required": ["selector", "value"]
					}
				}
			}, {
				"type": "function",
				"function": {
					"name": "puppeteer_select",
					"description": "Select an element on the page with Select tag",
					"parameters": {
						"type": "object",
						"properties": {
							"selector": {
								"type": "string",
								"description": "CSS selector for element to select"
							},
							"value": {
								"type": "string",
								"description": "Value to select"
							}
						},
						"required": ["selector", "value"]
					}
				}
			}, {
				"type": "function",
				"function": {
					"name": "puppeteer_hover",
					"description": "Hover an element on the page",
					"parameters": {
						"type": "object",
						"properties": {
							"selector": {
								"type": "string",
								"description": "CSS selector for element to hover"
							}
						},
						"required": ["selector"]
					}
				}
			}, {
				"type": "function",
				"function": {
					"name": "puppeteer_evaluate",
					"description": "Execute JavaScript in the browser console",
					"parameters": {
						"type": "object",
						"properties": {
							"script": {
								"type": "string",
								"description": "JavaScript code to execute"
							}
						},
						"required": ["script"]
					}
				}
			}],
			"tool_calls": {
				"id": "call_f02ba9ee6b344dd0af9041",
				"function": {
					"arguments": "{\"url\": \"https://www.baidu.com\"}",
					"name": "puppeteer_navigate"
				},
				"type": "function"
			}
		}, {
			"messages": [{
				"role": "user",
				"content": "Visit website baidu.com and take a screenshot"
			}, {
				"role": "assistant",
				"content": "",
				"tool_calls": [{
					"id": "call_f02ba9ee6b344dd0af9041",
					"type": "tool_use",
					"function": {
						"name": "puppeteer_navigate",
						"arguments": "{\"url\": \"https://www.baidu.com\"}"
					}
				}]
			}, {
				"role": "tool",
				"tool_call_id": "call_f02ba9ee6b344dd0af9041",
				"name": "puppeteer_navigate",
				"content": ""
			}, {
				"role": "assistant",
				"content": "",
				"tool_calls": [{
					"id": "call_c1e8658beda04f6881e8eb",
					"type": "tool_use",
					"function": {
						"name": "puppeteer_screenshot",
						"arguments": "{\"name\": \"baidu_homepage\"}"
					}
				}]
			}, {
				"role": "tool",
				"tool_call_id": "call_c1e8658beda04f6881e8eb",
				"name": "puppeteer_screenshot",
				"content": ""
			}],
			"tools": [{
				"type": "function",
				"function": {
					"name": "puppeteer_navigate",
					"description": "Navigate to a URL",
					"parameters": {
						"type": "object",
						"properties": {
							"url": {
								"type": "string",
								"description": "URL to navigate to"
							},
							"launchOptions": {
								"type": "object",
								"description": "PuppeteerJS LaunchOptions. Default null. If changed and not null, browser restarts. Example: { headless: true, args: ['--no-sandbox'] }"
							},
							"allowDangerous": {
								"type": "boolean",
								"description": "Allow dangerous LaunchOptions that reduce security. When false, dangerous args like --no-sandbox will throw errors. Default false."
							}
						},
						"required": ["url"]
					}
				}
			}, {
				"type": "function",
				"function": {
					"name": "puppeteer_screenshot",
					"description": "Take a screenshot of the current page or a specific element",
					"parameters": {
						"type": "object",
						"properties": {
							"name": {
								"type": "string",
								"description": "Name for the screenshot"
							},
							"selector": {
								"type": "string",
								"description": "CSS selector for element to screenshot"
							},
							"width": {
								"type": "number",
								"description": "Width in pixels (default: 800)"
							},
							"height": {
								"type": "number",
								"description": "Height in pixels (default: 600)"
							},
							"encoded": {
								"type": "boolean",
								"description": "If true, capture the screenshot as a base64-encoded data URI (as text) instead of binary image content. Default false."
							}
						},
						"required": ["name"]
					}
				}
			}, {
				"type": "function",
				"function": {
					"name": "puppeteer_click",
					"description": "Click an element on the page",
					"parameters": {
						"type": "object",
						"properties": {
							"selector": {
								"type": "string",
								"description": "CSS selector for element to click"
							}
						},
						"required": ["selector"]
					}
				}
			}, {
				"type": "function",
				"function": {
					"name": "puppeteer_fill",
					"description": "Fill out an input field",
					"parameters": {
						"type": "object",
						"properties": {
							"selector": {
								"type": "string",
								"description": "CSS selector for input field"
							},
							"value": {
								"type": "string",
								"description": "Value to fill"
							}
						},
						"required": ["selector", "value"]
					}
				}
			}, {
				"type": "function",
				"function": {
					"name": "puppeteer_select",
					"description": "Select an element on the page with Select tag",
					"parameters": {
						"type": "object",
						"properties": {
							"selector": {
								"type": "string",
								"description": "CSS selector for element to select"
							},
							"value": {
								"type": "string",
								"description": "Value to select"
							}
						},
						"required": ["selector", "value"]
					}
				}
			}, {
				"type": "function",
				"function": {
					"name": "puppeteer_hover",
					"description": "Hover an element on the page",
					"parameters": {
						"type": "object",
						"properties": {
							"selector": {
								"type": "string",
								"description": "CSS selector for element to hover"
							}
						},
						"required": ["selector"]
					}
				}
			}, {
				"type": "function",
				"function": {
					"name": "puppeteer_evaluate",
					"description": "Execute JavaScript in the browser console",
					"parameters": {
						"type": "object",
						"properties": {
							"script": {
								"type": "string",
								"description": "JavaScript code to execute"
							}
						},
						"required": ["script"]
					}
				}
			}],
			"tool_calls": {
				"id": "call_c1e8658beda04f6881e8eb",
				"function": {
					"arguments": "{\"name\": \"baidu_homepage\"}",
					"name": "puppeteer_screenshot"
				},
				"type": "function"
			}
		}]
	},
}

Dataset 3 Reinforcement Learning

We collect users’ positive and negative feedbacks on the AI Agent workflow. Positive feedback include the click on the ACCEPT button of the function call results and Upvote button. We set rewards as 1.0, and use “reward_description” field to include detailed introduction of their actions. Negative feedback include the click on the REJECT button of the function call results and Downvote button. We set rewards as -1.0.

Dataset Description

KEY

Type

Description

trace_id

String

Identify each unique new user request or API calling

session_id

String

The identifier of each dialogue, which consists of multiple turns of dialogues and every user input produces a new trace_id

messages

List of Json Object

Dialogue Messages

message.reward

scalar

Users’ feedback on each individual message or function call, rool result level. ACCEPT/REJECT, 1.0 for positive feedback, -1.0 for negative feedback

message.reward_description

string

Detailed description of actions of users.

reward

scalar

Overall feedback of whole session, Buttom Bar Action, 1.0 for positive feedback, -1.0 for negative feedback

reward_description

string

Details description of the action, such as ACCEPT/REJECT,UPVOTE,DOWNVOTE, etc

AI Agent Marketplace Category

** Note that: The rewards are on two folds: each function call or message level, whole dialogue session level.

Each message or function call level: Json $.message.reward ,e.g. Users may not be satisfied with LLM choose the wrong tools, they might click rejects or after autorun they may click feedback. Dialogue session level: Json: $reward, Bottom Bar Buttons, Upvote, Downvote, etc.

Auto-Run mode, function calls doesn’t need users’ confirmation and default to Accept.

Positive Reward Example

In this financial data checking scenarios, users’ clicks on the ACCEPT to confirm the function calls of tool ‘get_stock_price_global_market’ and parameters “{“symbol_list”: [“MSFT”, “NVDA”], “market”: “US”}” form server finance-agent-mcp-server.


{
	"messages": [{
		"role": "user",
		"content": "Find the Microsoft and Nvidia Stock Price and Market Cap. Who has higher market cap today?"
	}, {
		"role": "assistant",
		"content": "<p>Finding Relevant Tools Using Queries: </p><div><div class=\"div_planning_highlight\">find</div> <div class=\"div_planning_highlight\">market cap</div> <div class=\"div_planning_highlight\">compare</div> <div class=\"div_planning_highlight\">stock</div> <div class=\"div_planning_highlight\">and</div> <div class=\"div_planning_highlight\">finance data</div> <div class=\"div_planning_highlight\">market</div></div>\r<p>Relevant MCP Servers: </p><div></div>\r<p>Relevant Tools: </p> <div><div class=\"div_planning_highlight\">GetAwsSolutionsConstructPattern</div><div class=\"div_planning_highlight\">SearchGenAICDKConstructs</div><div class=\"div_planning_highlight\">LambdaLayerDocumentationProvider</div><div class=\"div_planning_highlight\">pay_order</div><div class=\"div_planning_highlight\">create_refund</div><div class=\"div_planning_highlight\">get_refund</div><div class=\"div_planning_highlight\">sequentialthinking</div><div class=\"div_planning_highlight\">get_stock_price_global_market</div><div class=\"div_planning_highlight\">read_documentation</div><div class=\"div_planning_highlight\">search_documentation</div><div class=\"div_planning_highlight\">recommend</div><div class=\"div_planning_highlight\">brave_local_search</div><div class=\"div_planning_highlight\">firecrawl_scrape</div><div class=\"div_planning_highlight\">firecrawl_search</div><div class=\"div_planning_highlight\">firecrawl_extract</div><div class=\"div_planning_highlight\">firecrawl_deep_research</div><div class=\"div_planning_highlight\">list_directory</div><div class=\"div_planning_highlight\">list_directory_with_sizes</div><div class=\"div_planning_highlight\">move_file</div><div class=\"div_planning_highlight\">tavily_map</div></div>| \n    <div class=\"tool-call-container\">\n        <div class=\"tool-call-header\">\n            <span class=\"arrow\">\u25b6</span>\n            <span class=\"header-text\">Call Tool get_stock_price_global_market of Server finance-agent-mcp-server</span>\n        </div>\n        <div class=\"collapsible-content\">\n            <div class=\"div_tool_call_row\"><h3>Parameters</h3><div class=\"div_tool_call_json\"> {\n    \"market\": \"US\",\n    \"symbol_list\": [\n        \"MSFT\",\n        \"NVDA\"\n    ]\n} </div></div>\n            <div class=\"div_tool_call_row div_tool_call_row_options\">\n                <input type=\"button\" class=\"agent-button-base agent-button-highlight\" value=\"ACCEPT\" set_on_click=\"true\">\n                <input type=\"button\" class=\"agent-button-base\" value=\"REJECT\" set_on_click=\"true\">\n            </div>\n            <div class=\"div_tool_call_row results-section\"><h3>Results</h3><div class=\"div_tool_call_json\"></div></div>\n        </div>\n    </div>\n\n|Searching MCP Marketplace Tools and Servers",
		"context": "{\"tool_id\": \"call_45853133e9a84899adcbb3\", \"tool_name\": \"get_stock_price_global_market\", \"tool_arguments\": {\"symbol_list\": [\"MSFT\", \"NVDA\"], \"market\": \"US\"}, \"toolname_to_server_dict\": {\"GetAwsSolutionsConstructPattern\": [\"awslabs.cdk-mcp-server\"], \"SearchGenAICDKConstructs\": [\"awslabs.cdk-mcp-server\"], \"LambdaLayerDocumentationProvider\": [\"awslabs.cdk-mcp-server\"], \"pay_order\": [\"paypal\"], \"create_refund\": [\"paypal\"], \"get_refund\": [\"paypal\"], \"sequentialthinking\": [\"sequential-thinking\"], \"get_stock_price_global_market\": [\"finance-agent-mcp-server\"], \"read_documentation\": [\"awslabs.aws-documentation-mcp-server\"], \"search_documentation\": [\"awslabs.aws-documentation-mcp-server\"], \"recommend\": [\"awslabs.aws-documentation-mcp-server\"], \"brave_local_search\": [\"brave-search\"], \"firecrawl_scrape\": [\"firecrawl-mcp\"], \"firecrawl_search\": [\"firecrawl-mcp\"], \"firecrawl_extract\": [\"firecrawl-mcp\"], \"firecrawl_deep_research\": [\"firecrawl-mcp\"], \"list_directory\": [\"filesystem\"], \"list_directory_with_sizes\": [\"filesystem\"], \"move_file\": [\"filesystem\"], \"tavily_map\": [\"tavily-remote-mcp\"]}}"
	}, {
		"role": "user",
		"content": "<action>ACCEPT</action>"
	}, {
		"role": "assistant",
		"content": "",
		"tool_calls": [{
			"id": "call_45853133e9a84899adcbb3",
			"type": "tool_use",
			"function": {
				"name": "get_stock_price_global_market",
				"arguments": "{\"symbol_list\": [\"MSFT\", \"NVDA\"], \"market\": \"US\"}"
			}
		}]
	}, {
		"role": "tool",
		"tool_call_id": "call_45853133e9a84899adcbb3",
		"name": "get_stock_price_global_market",
		"content": "[{\"avg_price\": \"517.10 USD\", \"high\": \"522.82 USD\", \"low\": \"514.02 USD\", \"symbol\": \"MSFT\", \"update_time\": \"\", \"previous_close\": \"520.17 USD\", \"change\": \"-3.07\", \"market_capitalization\": \"3.84T USD\", \"pe_ratio\": \"37.98\", \"data_source\": \"morningstar.com\", \"source_url\": \"https://www.morningstar.com/stocks/xnas/msft/quote\"}, {\"avg_price\": \"182.01 USD\", \"high\": \"182.94 USD\", \"low\": \"180.59 USD\", \"symbol\": \"NVDA\", \"update_time\": \"\", \"previous_close\": \"180.45 USD\", \"change\": \"+1.56\", \"market_capitalization\": \"4.44T USD\", \"pe_ratio\": \"57.04\", \"data_source\": \"morningstar.com\", \"source_url\": \"https://www.morningstar.com/stocks/xnas/nvda/quote\"}]"
	}, {
		"role": "system",
		"content": "\n    \n    **Background**\n    You are an AI Agent expert good at generating answers given users' original query and related context information, such as web pages search result, tool function call results, etc.\n\n    **Tasks**:\n    - Analyse User's query and consider current date, current time and users' location (country, city, district) if provided\n    - Identify the language of user's original query and generate the answers using the same language, such as English, Chinese, French, Russion, etc.\n    - Identify the expected output content-type of the answers, choose from \"text/markdown\", \"text/html\", \"application/code\", with default to \"text/html\". If users want you to generate a code with explanation, use mixed content type for different sections, \"application/code\" and  \"text/markdown\".\n    - Output detailed response to answer users' original question directly, without the thinking process. Please include the necessary information.\n    - Coding or Math: If users' query is to write codes or math formula, please generate the code in the first place. Secondly, generate a few points describing the meanings of the main functions or changes.\n    - MCP Tool Use or Function Call: Answer Users' questions using the mcp or function call results. Please include all the result information such as the weather forecast for all 3 days, etc.\n    - Citation and Data Sources: If the results contains URL of sources and citations, remember to include the main text in plan htmlp a tag, such as <a href=\"{url}\" class=\"citation_url\">{citation_name}</a>\n\n    **Output Format**:\n    - The output consists of multiple sections. Each section begins with a content type field describing the content-type of the paragraph and then the main content.\n\n    **Example**:\n    **User Query**:\n    Please generate a python code of adding two numbers\n    **Output**:\n    <content-type>application/code</content-type>\n    ```\n    def add(a, b):\n        return a + b\n    ```\n    <content-type>text/markdown</content-type>\n    ### Description\n    This function takes two numbers and output the sum.\n"
	}, {
		"role": "user",
		"content": " \n    **Context Variables**\n    Datetime: 2025-08-19 10:08:17 CST (Asia/Shanghai)\n    User Location: \n    **User Query**\n    <action>ACCEPT</action>\n"
	}],
	"reward": 1.0,
	"reward_description": "ACCEPT"
}

Negative Reward Example

User is not satisfied with LLM choosing “firecrawl_search” and its results and clicks on the downvote button.


{
	"messages": [{
		"role": "user",
		"content": "Latest AI Company Research News"
	}, {
		"role": "assistant",
		"content": "<p>Finding Relevant Tools Using Queries: </p><div><div class=\"div_planning_highlight\">company</div> <div class=\"div_planning_highlight\">latest</div> <div class=\"div_planning_highlight\">AI news</div> <div class=\"div_planning_highlight\">ai</div> <div class=\"div_planning_highlight\">artificial intelligence</div> <div class=\"div_planning_highlight\">technology trends</div> <div class=\"div_planning_highlight\">research updates</div></div>\r<p>Relevant MCP Servers: </p><div></div>\r<p>Relevant Tools: </p> <div><div class=\"div_planning_highlight\">firecrawl_search</div><div class=\"div_planning_highlight\">firecrawl_extract</div><div class=\"div_planning_highlight\">firecrawl_deep_research</div><div class=\"div_planning_highlight\">get_pricing</div><div class=\"div_planning_highlight\">generate_cost_report</div><div class=\"div_planning_highlight\">get_pricing_service_codes</div><div class=\"div_planning_highlight\">get_pricing_attribute_values</div><div class=\"div_planning_highlight\">get_price_list_urls</div><div class=\"div_planning_highlight\">sequentialthinking</div><div class=\"div_planning_highlight\">get_notification_details</div><div class=\"div_planning_highlight\">get_stock_price_global_market</div><div class=\"div_planning_highlight\">tavily_search</div><div class=\"div_planning_highlight\">ExplainCDKNagRule</div><div class=\"div_planning_highlight\">CheckCDKNagSuppressions</div><div class=\"div_planning_highlight\">SearchGenAICDKConstructs</div><div class=\"div_planning_highlight\">show_subscription_plan_details</div><div class=\"div_planning_highlight\">show_subscription_details</div><div class=\"div_planning_highlight\">get_order</div><div class=\"div_planning_highlight\">accept_dispute_claim</div><div class=\"div_planning_highlight\">get_refund</div></div>| \n    <div class=\"tool-call-container\">\n        <div class=\"tool-call-header\">\n            <span class=\"arrow\">\u25b6</span>\n            <span class=\"header-text\">Call Tool firecrawl_search of Server firecrawl-mcp</span>\n        </div>\n        <div class=\"collapsible-content\">\n            <div class=\"div_tool_call_row\"><h3>Parameters</h3><div class=\"div_tool_call_json\"> {\n    \"country\": \"us\",\n    \"lang\": \"en\",\n    \"limit\": 5,\n    \"query\": \"latest AI company research news\",\n    \"scrapeOptions\": {\n        \"formats\": [\n            \"markdown\"\n        ],\n        \"onlyMainContent\": true\n    }\n} </div></div>\n            <div class=\"div_tool_call_row div_tool_call_row_options\">\n                <input type=\"button\" class=\"agent-button-base agent-button-highlight\" value=\"ACCEPT\" set_on_click=\"true\">\n                <input type=\"button\" class=\"agent-button-base\" value=\"REJECT\" set_on_click=\"true\">\n            </div>\n            <div class=\"div_tool_call_row results-section\"><h3>Results</h3><div class=\"div_tool_call_json\"></div></div>\n        </div>\n    </div>\n\n|Searching MCP Marketplace Tools and Servers",
		"id": "33ea10a7-c66a-48cd-80d3-ac9ace6ae747",
		"context": "{\"tool_id\": \"call_30fa90e8ecb944bd8613b3\", \"tool_name\": \"firecrawl_search\", \"tool_arguments\": {\"query\": \"latest AI company research news\", \"limit\": 5, \"lang\": \"en\", \"country\": \"us\", \"scrapeOptions\": {\"formats\": [\"markdown\"], \"onlyMainContent\": true}}, \"toolname_to_server_dict\": {\"firecrawl_search\": [\"firecrawl-mcp\"], \"firecrawl_extract\": [\"firecrawl-mcp\"], \"firecrawl_deep_research\": [\"firecrawl-mcp\"], \"get_pricing\": [\"awslabs.aws-pricing-mcp-server\"], \"generate_cost_report\": [\"awslabs.aws-pricing-mcp-server\"], \"get_pricing_service_codes\": [\"awslabs.aws-pricing-mcp-server\"], \"get_pricing_attribute_values\": [\"awslabs.aws-pricing-mcp-server\"], \"get_price_list_urls\": [\"awslabs.aws-pricing-mcp-server\"], \"sequentialthinking\": [\"sequential-thinking\"], \"get_notification_details\": [\"github\"], \"get_stock_price_global_market\": [\"finance-agent-mcp-server\"], \"tavily_search\": [\"tavily-remote-mcp\"], \"ExplainCDKNagRule\": [\"awslabs.cdk-mcp-server\"], \"CheckCDKNagSuppressions\": [\"awslabs.cdk-mcp-server\"], \"SearchGenAICDKConstructs\": [\"awslabs.cdk-mcp-server\"], \"show_subscription_plan_details\": [\"paypal\"], \"show_subscription_details\": [\"paypal\"], \"get_order\": [\"paypal\"], \"accept_dispute_claim\": [\"paypal\"], \"get_refund\": [\"paypal\"]}}"
	}, {
		"role": "user",
		"content": "<action>ACCEPT</action>"
		"reward": 1.0,
	}, {
		"role": "assistant",
		"content": "",
		"tool_calls": [{
			"id": "call_31924021412031204fsdf",
			"type": "tool_use",
			"function": {
				"name": "firecrawl_search",
				"arguments": "{\"query\": \"AI Research News\"}"
			}
		}]
	}, {
		"role": "tool",
		"tool_call_id": "call_31924021412031204fsdf",
		"name": "firecrawl_search",
		"content": "\nThe latest AI company research news highlights significant advancements and strategic moves by leading tech firms. Here are some key updates:\n\n- **Google DeepMind** introduced **AlphaFold 3**, a revolutionary model capable of predicting the structure and interactions of all life\u2019s molecules, including proteins, DNA, and RNA, with unprecedented accuracy. This breakthrough could transform drug discovery and molecular biology research. [Read more](https://www.deepmind.com/)\n\n- **OpenAI** is advancing its **o3** and **o4** reasoning models, focusing on improved planning, tool use, and problem-solving capabilities. The company is also reportedly developing a new AI model named **Strawberry**, aimed at enhancing deep reasoning and autonomous task execution. [Source](https://openai.com/research)\n\n- **Anthropic** released **Claude 3.5 Sonnet**, which outperforms its predecessors and competing models in coding, math, and reasoning benchmarks. The model is now being integrated into enterprise workflows for enhanced decision support and automation. [Learn more](https://www.anthropic.com/news/claude-3-5-sonnet)\n\n- **Microsoft** and **NVIDIA** have deepened their AI partnership, co-developing next-gen AI chips and cloud infrastructure to support large-scale AI training. Microsoft is also integrating more AI capabilities into its Azure platform, including custom LLM deployments. [Microsoft AI News](https://news.microsoft.com/ai)\n\n- **xAI**, Elon Musk\u2019s AI company, continues training **Grok-3** on real-time data from X (formerly Twitter), aiming to create a more dynamic and context-aware AI assistant. Grok is being tested for advanced reasoning and multi-modal capabilities. [Updates via X](https://x.com/xai)\n\n- **Meta AI** unveiled **Llama 3.1**, a fully open-source large language model with enhanced multilingual support and reasoning abilities. It is now one of the most widely adopted models in the open-source AI community. [Meta AI Blog](https://ai.meta.com/llama/)\n\nThese developments reflect a rapid pace of innovation in the AI sector, with companies focusing on more powerful, efficient, and accessible models for both research and commercial applications."
	},
	{
		"role": "assistant",
		"content": "Start Running Tool firecrawl_search ",
		"id": "3cbb4504-f080-4bb7-a9cc-9cfaa3a9756d",
		"context": ""
	}, {
		"role": "assistant",
		"content": "<p>Finding Relevant Tools Using Queries: </p><div><div class=\"div_planning_highlight\">company</div> <div class=\"div_planning_highlight\">latest</div> <div class=\"div_planning_highlight\">AI news</div> <div class=\"div_planning_highlight\">ai</div> <div class=\"div_planning_highlight\">artificial intelligence</div> <div class=\"div_planning_highlight\">technology trends</div> <div class=\"div_planning_highlight\">research updates</div></div>\r<p>Relevant MCP Servers: </p><div></div>\r<p>Relevant Tools: </p> <div><div class=\"div_planning_highlight\">firecrawl_search</div><div class=\"div_planning_highlight\">firecrawl_extract</div><div class=\"div_planning_highlight\">firecrawl_deep_research</div><div class=\"div_planning_highlight\">get_pricing</div><div class=\"div_planning_highlight\">generate_cost_report</div><div class=\"div_planning_highlight\">get_pricing_service_codes</div><div class=\"div_planning_highlight\">get_pricing_attribute_values</div><div class=\"div_planning_highlight\">get_price_list_urls</div><div class=\"div_planning_highlight\">sequentialthinking</div><div class=\"div_planning_highlight\">get_notification_details</div><div class=\"div_planning_highlight\">get_stock_price_global_market</div><div class=\"div_planning_highlight\">tavily_search</div><div class=\"div_planning_highlight\">ExplainCDKNagRule</div><div class=\"div_planning_highlight\">CheckCDKNagSuppressions</div><div class=\"div_planning_highlight\">SearchGenAICDKConstructs</div><div class=\"div_planning_highlight\">show_subscription_plan_details</div><div class=\"div_planning_highlight\">show_subscription_details</div><div class=\"div_planning_highlight\">get_order</div><div class=\"div_planning_highlight\">accept_dispute_claim</div><div class=\"div_planning_highlight\">get_refund</div></div>| \n    <div class=\"tool-call-container\">\n        <div class=\"tool-call-header\">\n            <span class=\"arrow\">\u25b6</span>\n            <span class=\"header-text\">Call Tool firecrawl_search of Server firecrawl-mcp</span>\n        </div>\n        <div class=\"collapsible-content\">\n            <div class=\"div_tool_call_row\"><h3>Parameters</h3><div class=\"div_tool_call_json\"> {\n    \"country\": \"us\",\n    \"lang\": \"en\",\n    \"limit\": 5,\n    \"query\": \"latest AI company research news\",\n    \"scrapeOptions\": {\n        \"formats\": [\n            \"markdown\"\n        ],\n        \"onlyMainContent\": true\n    }\n} </div></div>\n            <div class=\"div_tool_call_row div_tool_call_row_options\">\n                <input type=\"button\" class=\"agent-button-base agent-button-highlight\" value=\"ACCEPT\" set_on_click=\"true\">\n                <input type=\"button\" class=\"agent-button-base\" value=\"REJECT\" set_on_click=\"true\">\n            </div>\n            <div class=\"div_tool_call_row results-section\"><h3>Results</h3><div class=\"div_tool_call_json\"></div></div>\n        </div>\n    </div>\n\n|Searching MCP Marketplace Tools and Servers",
		"id": "33ea10a7-c66a-48cd-80d3-ac9ace6ae747",
		"context": "{\"tool_id\": \"call_30fa90e8ecb944bd8613b3\", \"tool_name\": \"firecrawl_search\", \"tool_arguments\": {\"query\": \"latest AI company research news\", \"limit\": 5, \"lang\": \"en\", \"country\": \"us\", \"scrapeOptions\": {\"formats\": [\"markdown\"], \"onlyMainContent\": true}}, \"toolname_to_server_dict\": {\"firecrawl_search\": [\"firecrawl-mcp\"], \"firecrawl_extract\": [\"firecrawl-mcp\"], \"firecrawl_deep_research\": [\"firecrawl-mcp\"], \"get_pricing\": [\"awslabs.aws-pricing-mcp-server\"], \"generate_cost_report\": [\"awslabs.aws-pricing-mcp-server\"], \"get_pricing_service_codes\": [\"awslabs.aws-pricing-mcp-server\"], \"get_pricing_attribute_values\": [\"awslabs.aws-pricing-mcp-server\"], \"get_price_list_urls\": [\"awslabs.aws-pricing-mcp-server\"], \"sequentialthinking\": [\"sequential-thinking\"], \"get_notification_details\": [\"github\"], \"get_stock_price_global_market\": [\"finance-agent-mcp-server\"], \"tavily_search\": [\"tavily-remote-mcp\"], \"ExplainCDKNagRule\": [\"awslabs.cdk-mcp-server\"], \"CheckCDKNagSuppressions\": [\"awslabs.cdk-mcp-server\"], \"SearchGenAICDKConstructs\": [\"awslabs.cdk-mcp-server\"], \"show_subscription_plan_details\": [\"paypal\"], \"show_subscription_details\": [\"paypal\"], \"get_order\": [\"paypal\"], \"accept_dispute_claim\": [\"paypal\"], \"get_refund\": [\"paypal\"]}}"
	}],
	"reward": -1.0,
	"reward_description": "DOWNVOTE"
}

Running SFT RL Posting Training Using Dataset

See Detailed on Github

SFT Qwen3/OpenAI format Function Call Datasets Full Example

mkdir ../train/dataset/function_call
cd ../train/dataset/function_call

wget https://static.aiagenta2z.com/scripts/doc/file/06766e91894147319ffd0116b04ff94d/example_deepnlp_agent_function_call_202510.json
cd ../../scripts
python train_function_call_sft_qwen3.py

Loaded TRL Package and Lora Config

def sft_trainer(dataset_path, model_path):
    """
        pip install -U transformers huggingface_hub
        pip install --upgrade torch torchvision torchaudio

        dataset_path="../train/function_calling_dataset_qwen"
    """
    #### Qwen3 SFT Model 
    # import torch
    from transformers import AutoModel, AutoTokenizer, AutoModelForCausalLM, TrainingArguments
    from trl import SFTTrainer, SFTConfig
    from datasets import load_from_disk
    from peft import LoraConfig
    import torch

    model_id = "Qwen/Qwen3-0.6B"
    
    load_from_cache = True 
    run_on_mac = True
    if load_from_cache:
        tokenizer = AutoTokenizer.from_pretrained(restore_model_dir_cache)
        tokenizer.pad_token = tokenizer.eos_token
        if run_on_mac:
            model = AutoModelForCausalLM.from_pretrained(
                model_path,
                torch_dtype=torch.float32,
                trust_remote_code=True
            )
            model.to("mps")
        else:
            model = AutoModelForCausalLM.from_pretrained(model_path)
    else:
        # 1. Load Tokenizer and Model
        tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
        tokenizer.pad_token = tokenizer.eos_token
        model = AutoModelForCausalLM.from_pretrained(
            model_id,
            device_map="auto",
            # torch_dtype=torch.bfloat16,
            trust_remote_code=True
        )

    # 2. LoRA Configuration (Saves memory and prevents forgetting)
    peft_config = LoraConfig(
        r=16,
        lora_alpha=32,
        target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
        lora_dropout=0.05,
        bias="none",
        task_type="CAUSAL_LM",
    )


    if run_on_mac:
        import os
        os.environ["ACCELERATE_DISABLE_MIXED_PRECISION"] = "true"
        os.environ["ACCELERATE_MIXED_PRECISION"] = "no"
        os.environ["ACCELERATE_CONFIG_FILE"] = "/dev/null"

    # 3. Training Arguments, Testing on Mac, Change bf16, fp16 when running on GPU
    sft_config = SFTConfig(
        output_dir=output_sft_model_dir,
        dataset_text_field="messages",
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        learning_rate=1e-4,
        num_train_epochs=3,
        bf16=False,
        fp16=False,
        dataloader_pin_memory=False,
    )

    # 4. Trainer
    dataset = load_from_disk(dataset_path)

    trainer = SFTTrainer(
        model=model,
        train_dataset=dataset,
        # peft_config=peft_config,
        args=sft_config,
        processing_class=tokenizer,
    )

    # 5. Execute
    trainer.train()
    trainer.save_model(output_model_dir)