X

crawl4ai mcp server

Information

# NOTICE > **MCP SERVER CURRENTLY UNDER DEVELOPMENT** > **NOT READY FOR PRODUCTION USE** > **WILL UPDATE WHEN OPERATIONAL** # Crawl4AI MCP Server High-performance MCP Server for Crawl4AI - Enable AI assistants to access web scraping, crawling, and deep research via Model Context Protocol. Faster and more efficient than FireCrawl! ## Overview This project implements a custom Model Context Protocol (MCP) Server that integrates with Crawl4AI, an open-source web scraping and crawling library. The server is deployed as a remote MCP server on CloudFlare Workers, allowing AI assistants like Claude to access Crawl4AI's powerful web scraping capabilities. ## Documentation For comprehensive details about this project, please refer to the following documentation: - [Migration Plan](docs/MIGRATION_PLAN.md) - Detailed plan for migrating from Firecrawl to Crawl4AI - [Enhanced Architecture](docs/ENHANCED_ARCHITECTURE.md) - Multi-tenant architecture with cloud provider flexibility - [Implementation Guide](docs/IMPLEMENTATION_GUIDE.md) - Technical implementation details and code examples - [Codebase Simplification](docs/SIMPLIFICATION.md) - Details on code simplification and best practices implemented - [Docker Setup Guide](docs/DOCKER.md) - Instructions for Docker setup for local development and production ## Features ### Web Data Acquisition - **Single Webpage Scraping**: Extract content from individual webpages - **Web Crawling**: Crawl websites with configurable depth and page limits - **URL Discovery**: Map and discover URLs from a starting point - **Asynchronous Crawling**: Crawl entire websites efficiently ### Content Processing - **Deep Research**: Conduct comprehensive research across multiple pages - **Structured Data Extraction**: Extract specific data using CSS selectors or LLM-based extraction - **Content Search**: Search through previously crawled content ### Integration & Security - **MCP Integration**: Seamless integration with MCP clients (Claude Desktop, etc.) - **OAuth Authentication**: Secure access with proper authorization - **Authentication Options**: Secure access via OAuth or API key (Bearer token) - **High Performance**: Optimized for speed and efficiency ## Project Structure \`\`\`plaintext crawl4ai-mcp/ ├── src/ │ ├── index.ts # Main entry point with OAuth provider setup │ ├── auth-handler.ts # Authentication handler │ ├── mcp-server.ts # MCP server implementation │ ├── crawl4ai-adapter.ts # Adapter for Crawl4AI API │ ├── tool-schemas/ # MCP tool schema definitions │ │ └── [...].ts # Tool schemas │ ├── handlers/ │ │ ├── crawl.ts # Web crawling implementation │ │ ├── search.ts # Search functionality │ │ └── extract.ts # Content extraction │ └── utils/ # Utility functions ├── tests/ # Test cases ├── .github/ # GitHub configuration ├── wrangler.toml # CloudFlare Workers configuration ├── tsconfig.json # TypeScript configuration ├── package.json # Node.js dependencies └── README.md # Project documentation \`\`\` ## Getting Started ### Prerequisites - [Node.js](https://nodejs.org/) (v18 or higher) - [npm](https://www.npmjs.com/) - [Wrangler](https://developers.cloudflare.com/workers/wrangler/install-and-update/) (CloudFlare Workers CLI) - A CloudFlare account ### Installation 1. Clone the repository: \`\`\`bash git clone https://github.com/BjornMelin/crawl4ai-mcp-server.git cd crawl4ai-mcp-server \`\`\` 2. Install dependencies: \`\`\`bash npm install \`\`\` 3. Set up CloudFlare KV namespace: \`\`\`bash wrangler kv:namespace create CRAWL_DATA \`\`\` 4. Update \`wrangler.toml\` with the KV namespace ID: \`\`\`toml kv_namespaces = [ \{ binding = "CRAWL_DATA", id = "your-namespace-id" \} ] \`\`\` ## Development ### Local Development #### Using NPM 1. Start the development server: \`\`\`bash npm run dev \`\`\` 2. The server will be available at #### Using Docker You can also use Docker for local development, which includes the Crawl4AI API and a debug UI: 1. Set up environment variables: \`\`\`bash cp .env.example .env # Edit .env file with your API key \`\`\` 2. Start the Docker development environment: \`\`\`bash docker-compose up -d \`\`\` 3. Access the services: - MCP Server: - Crawl4AI UI: See the [Docker Setup Guide](docs/DOCKER.md) for more details. ### Testing The project includes a comprehensive test suite using Jest. To run tests: \`\`\`bash # Run all tests npm test # Run tests with watch mode during development npm run test:watch # Run tests with coverage report npm run test:coverage # Run only unit tests npm run test:unit # Run only integration tests npm run test:integration \`\`\` When running in Docker: \`\`\`bash docker-compose exec mcp-server npm test \`\`\` ## Deployment 1. Deploy to CloudFlare Workers: \`\`\`bash npm run deploy \`\`\` 2. Your server will be available at the CloudFlare Workers URL assigned to your deployed worker. ## Usage with MCP Clients This server implements the Model Context Protocol, allowing AI assistants to access its tools. ### Authentication - Implement OAuth authentication with workers-oauth-provider - Add API key authentication using Bearer tokens - Create login page and token management ### Connecting to an MCP Client 1. Use the CloudFlare Workers URL assigned to your deployed worker 2. In Claude Desktop or other MCP clients, add this server as a tool source ### Available Tools - \`crawl\`: Crawl web pages from a starting URL - \`getCrawl\`: Retrieve crawl data by ID - \`listCrawls\`: List all crawls or filter by domain - \`search\`: Search indexed documents by query - \`extract\`: Extract structured content from a URL ## Configuration The server can be configured by modifying environment variables in \`wrangler.toml\`: - \`MAX_CRAWL_DEPTH\`: Maximum depth for web crawling (default: 3) - \`MAX_CRAWL_PAGES\`: Maximum pages to crawl (default: 100) - \`API_VERSION\`: API version string (default: "v1") - \`OAUTH_CLIENT_ID\`: OAuth client ID for authentication - \`OAUTH_CLIENT_SECRET\`: OAuth client secret for authentication ## Roadmap The project is being developed with these components in mind: 1. **Project Setup and Configuration**: CloudFlare Worker setup, TypeScript configuration 2. **MCP Server and Tool Schemas**: Implementation of MCP server with tool definitions 3. **Crawl4AI Adapter**: Integration with the Crawl4AI functionality 4. **OAuth Authentication**: Secure authentication implementation 5. **Performance Optimizations**: Enhancing speed and reliability 6. **Advanced Extraction Features**: Improving structured data extraction capabilities ## Contributing Contributions are welcome! Please check the open issues or create a new one before starting work on a feature or bug fix. See [Contributing Guidelines](CONTRIBUTING.md) for detailed guidelines. ## Support If you encounter issues or have questions: - Open an issue on the GitHub repository - Check the [Crawl4AI documentation](https://crawl4ai.com/docs) - Refer to the [Model Context Protocol specification](https://github.com/anthropics/model-context-protocol) ## How to Cite If you use Crawl4AI MCP Server in your research or projects, please cite it using the following BibTeX entry: \`\`\`bibtex @software\{crawl4ai_mcp_2025, author = \{Melin, Bjorn\}, title = \{Crawl4AI MCP Server: High-performance Web Crawling for AI Assistants\}, url = \{https://github.com/BjornMelin/crawl4ai-mcp-server\}, version = \{1.0.0\}, year = \{2025\}, month = \{5\} \} \`\`\` ## License [MIT](LICENSE)

Prompts

Reviews

Tags

Write Your Review

Detailed Ratings

ALL
Correctness
Helpfulness
Interesting
Upload Pictures and Videos

Name
Size
Type
Download
Last Modified

Upload Files

  • Community

Add Discussion

Upload Pictures and Videos