crawl4ai mcp server

Rating

Information

# NOTICE > **MCP SERVER CURRENTLY UNDER DEVELOPMENT** > **NOT READY FOR PRODUCTION USE** > **WILL UPDATE WHEN OPERATIONAL** # Crawl4AI MCP Server High-performance MCP Server for Crawl4AI - Enable AI assistants to access web scraping, crawling, and deep research via Model Context Protocol. Faster and more efficient than FireCrawl! ## Overview This project implements a custom Model Context Protocol (MCP) Server that integrates with Crawl4AI, an open-source web scraping and crawling library. The server is deployed as a remote MCP server on CloudFlare Workers, allowing AI assistants like Claude to access Crawl4AI's powerful web scraping capabilities. ## Documentation For comprehensive details about this project, please refer to the following documentation: - [Migration Plan](docs/MIGRATION_PLAN.md) - Detailed plan for migrating from Firecrawl to Crawl4AI - [Enhanced Architecture](docs/ENHANCED_ARCHITECTURE.md) - Multi-tenant architecture with cloud provider flexibility - [Implementation Guide](docs/IMPLEMENTATION_GUIDE.md) - Technical implementation details and code examples - [Codebase Simplification](docs/SIMPLIFICATION.md) - Details on code simplification and best practices implemented - [Docker Setup Guide](docs/DOCKER.md) - Instructions for Docker setup for local development and production ## Features ### Web Data Acquisition - **Single Webpage Scraping**: Extract content from individual webpages - **Web Crawling**: Crawl websites with configurable depth and page limits - **URL Discovery**: Map and discover URLs from a starting point - **Asynchronous Crawling**: Crawl entire websites efficiently ### Content Processing - **Deep Research**: Conduct comprehensive research across multiple pages - **Structured Data Extraction**: Extract specific data using CSS selectors or LLM-based extraction - **Content Search**: Search through previously crawled content ### Integration & Security - **MCP Integration**: Seamless integration with MCP clients (Claude Desktop, etc.) - **OAuth Authentication**: Secure access with proper authorization - **Authentication Options**: Secure access via OAuth or API key (Bearer token) - **High Performance**: Optimized for speed and efficiency ## Project Structure \`\`\`plaintext crawl4ai-mcp/ ├── src/ │ ├── index.ts # Main entry point with OAuth provider setup │ ├── auth-handler.ts # Authentication handler │ ├── mcp-server.ts # MCP server implementation │ ├── crawl4ai-adapter.ts # Adapter for Crawl4AI API │ ├── tool-schemas/ # MCP tool schema definitions │ │ └── [...].ts # Tool schemas │ ├── handlers/ │ │ ├── crawl.ts # Web crawling implementation │ │ ├── search.ts # Search functionality │ │ └── extract.ts # Content extraction │ └── utils/ # Utility functions ├── tests/ # Test cases ├── .github/ # GitHub configuration ├── wrangler.toml # CloudFlare Workers configuration ├── tsconfig.json # TypeScript configuration ├── package.json # Node.js dependencies └── README.md # Project documentation \`\`\` ## Getting Started ### Prerequisites - [Node.js](https://nodejs.org/) (v18 or higher) - [npm](https://www.npmjs.com/) - [Wrangler](https://developers.cloudflare.com/workers/wrangler/install-and-update/) (CloudFlare Workers CLI) - A CloudFlare account ### Installation 1. Clone the repository: \`\`\`bash git clone https://github.com/BjornMelin/crawl4ai-mcp-server.git cd crawl4ai-mcp-server \`\`\` 2. Install dependencies: \`\`\`bash npm install \`\`\` 3. Set up CloudFlare KV namespace: \`\`\`bash wrangler kv:namespace create CRAWL_DATA \`\`\` 4. Update \`wrangler.toml\` with the KV namespace ID: \`\`\`toml kv_namespaces = [ \{ binding = "CRAWL_DATA", id = "your-namespace-id" \} ] \`\`\` ## Development ### Local Development #### Using NPM 1. Start the development server: \`\`\`bash npm run dev \`\`\` 2. The server will be available at #### Using Docker You can also use Docker for local development, which includes the Crawl4AI API and a debug UI: 1. Set up environment variables: \`\`\`bash cp .env.example .env # Edit .env file with your API key \`\`\` 2. Start the Docker development environment: \`\`\`bash docker-compose up -d \`\`\` 3. Access the services: - MCP Server: - Crawl4AI UI: See the [Docker Setup Guide](docs/DOCKER.md) for more details. ### Testing The project includes a comprehensive test suite using Jest. To run tests: \`\`\`bash # Run all tests npm test # Run tests with watch mode during development npm run test:watch # Run tests with coverage report npm run test:coverage # Run only unit tests npm run test:unit # Run only integration tests npm run test:integration \`\`\` When running in Docker: \`\`\`bash docker-compose exec mcp-server npm test \`\`\` ## Deployment 1. Deploy to CloudFlare Workers: \`\`\`bash npm run deploy \`\`\` 2. Your server will be available at the CloudFlare Workers URL assigned to your deployed worker. ## Usage with MCP Clients This server implements the Model Context Protocol, allowing AI assistants to access its tools. ### Authentication - Implement OAuth authentication with workers-oauth-provider - Add API key authentication using Bearer tokens - Create login page and token management ### Connecting to an MCP Client 1. Use the CloudFlare Workers URL assigned to your deployed worker 2. In Claude Desktop or other MCP clients, add this server as a tool source ### Available Tools - \`crawl\`: Crawl web pages from a starting URL - \`getCrawl\`: Retrieve crawl data by ID - \`listCrawls\`: List all crawls or filter by domain - \`search\`: Search indexed documents by query - \`extract\`: Extract structured content from a URL ## Configuration The server can be configured by modifying environment variables in \`wrangler.toml\`: - \`MAX_CRAWL_DEPTH\`: Maximum depth for web crawling (default: 3) - \`MAX_CRAWL_PAGES\`: Maximum pages to crawl (default: 100) - \`API_VERSION\`: API version string (default: "v1") - \`OAUTH_CLIENT_ID\`: OAuth client ID for authentication - \`OAUTH_CLIENT_SECRET\`: OAuth client secret for authentication ## Roadmap The project is being developed with these components in mind: 1. **Project Setup and Configuration**: CloudFlare Worker setup, TypeScript configuration 2. **MCP Server and Tool Schemas**: Implementation of MCP server with tool definitions 3. **Crawl4AI Adapter**: Integration with the Crawl4AI functionality 4. **OAuth Authentication**: Secure authentication implementation 5. **Performance Optimizations**: Enhancing speed and reliability 6. **Advanced Extraction Features**: Improving structured data extraction capabilities ## Contributing Contributions are welcome! Please check the open issues or create a new one before starting work on a feature or bug fix. See [Contributing Guidelines](CONTRIBUTING.md) for detailed guidelines. ## Support If you encounter issues or have questions: - Open an issue on the GitHub repository - Check the [Crawl4AI documentation](https://crawl4ai.com/docs) - Refer to the [Model Context Protocol specification](https://github.com/anthropics/model-context-protocol) ## How to Cite If you use Crawl4AI MCP Server in your research or projects, please cite it using the following BibTeX entry: \`\`\`bibtex @software\{crawl4ai_mcp_2025, author = \{Melin, Bjorn\}, title = \{Crawl4AI MCP Server: High-performance Web Crawling for AI Assistants\}, url = \{https://github.com/BjornMelin/crawl4ai-mcp-server\}, version = \{1.0.0\}, year = \{2025\}, month = \{5\} \} \`\`\` ## License [MIT](LICENSE)

Prompts

Reviews

Write Your Review

Detailed Ratings

ALL

Correctness

Helpfulness

Interesting

Upload Pictures and Videos

Name

Size

Type

Download

Last Modified

Upload Files

Community

Add Discussion

Upload Pictures and Videos

Chatbot close

Bot
Hi there
How can I help you today?

Send