# SuiteScrape Full Agent Context SuiteScrape is a production SaaS at https://suitescrape.com for AI-ready web extraction. It combines Playwright-backed scraping, static HTML fallback, Claude question-answering, stored scrape jobs, batch queues, same-site URL mapping, bounded crawls, URL monitors, signed webhooks, export formats, account usage limits, API keys, Stripe billing, and MemPalace saved-page memory. ## Canonical Hosts - Website: https://suitescrape.com - API: https://api.suitescrape.com - Documentation UI: https://suitescrape.com/docs - Pricing UI: https://suitescrape.com/pricing - Public status: https://suitescrape.com/status - OpenAPI: https://suitescrape.com/openapi.json - Short LLM context: https://suitescrape.com/llms.txt - Agent skill: https://suitescrape.com/agent-onboarding/SKILL.md - Remote MCP endpoint: https://api.suitescrape.com/mcp - Local MCP bridge: `SUITESCRAPE_API_KEY=ss_... npm run mcp` ## Authentication Browser users can create accounts and API keys in the SuiteScrape workbench. API keys use the `ss_...` prefix and must be sent with `Authorization: Bearer ss_your_key`. Agents must not invent credentials. Ask the operator for `SUITESCRAPE_API_KEY` or read it only from an approved secret manager or environment variable. Do not print, log, commit, or paste the key. ## Core Agent API Use the API host for all calls: `https://api.suitescrape.com`. ## Local MCP Bridge Agents with Streamable HTTP MCP support can connect directly to `https://api.suitescrape.com/mcp` or `https://api.suitescrape.com/api/mcp` with `Authorization: Bearer ss_your_key`. The remote endpoint is stateless and exposes the same tools as the local bridge. The SuiteScrape repo also includes a local stdio Model Context Protocol bridge at `mcp/server.mjs`. Run it with `SUITESCRAPE_API_KEY` set. Both remote and local MCP expose these tools: `get_plans`, `scrape_url`, `ask_job`, `map_site`, `crawl_site`, `search_jobs`, `export_job`, and `create_monitor`. Claude Desktop or Cursor-style config: ```json { "mcpServers": { "suitescrape": { "command": "node", "args": ["/absolute/path/to/suitescrape-backend-fix/mcp/server.mjs"], "env": { "SUITESCRAPE_API_KEY": "ss_your_key", "SUITESCRAPE_API_URL": "https://api.suitescrape.com" } } } } ``` ### Scrape one URL `POST /api/agent/scrape` Body: ```json { "url": "example.com", "question": "Return the key facts as bullets.", "format": "markdown", "tags": ["research"], "webhookUrl": "https://hooks.example.com/suitescrape" } ``` Returns a stored job with `id` / `jobId`, scrape result, optional Claude answer, usage metadata, and webhook metadata when configured. ### Ask about content or a stored job `POST /api/agent/ask` Body with a stored job: ```json { "jobId": "job_id_here", "question": "What changed or matters here?" } ``` Body with direct content: ```json { "content": "Text to analyze", "question": "Summarize the strongest signal." } ``` ### Search jobs `GET /api/agent/jobs?status=completed&q=pricing&limit=25` Use `q`, `tag`, `status`, `limit`, and `offset` for retrieval. Store useful metadata with `PATCH /api/agent/jobs/:id`. ### Export jobs - `GET /api/agent/jobs/export?format=json` - `GET /api/agent/jobs/export?format=csv` - `GET /api/agent/jobs/:id/export?format=markdown` - `GET /api/agent/jobs/:id/export?format=json` - `GET /api/agent/jobs/:id/export?format=summary` - `GET /api/agent/jobs/:id/export?format=raw` ### Discover and crawl URLs - `POST /api/agent/map` maps same-site URLs from `sitemap.xml` and root-page links without charging scrape quota. - `POST /api/agent/crawl` maps a site, scrapes the root, and queues up to the requested bounded page count. Example: ```json { "url": "iana.org/domains/reserved", "maxPages": 5, "format": "summary" } ``` ### Monitors - `GET /api/agent/monitors` - `POST /api/agent/monitors` - `PATCH /api/agent/monitors/:id` - `POST /api/agent/monitors/:id/check` - `POST /api/agent/monitors/:id/webhook-test` - `DELETE /api/agent/monitors/:id` Monitors can run manual, 6-hour, daily, or weekly checks. They can alert on content changes, word delta thresholds, keywords, email, or signed webhook delivery. ### MemPalace memory - `POST /api/mempalace/ingest` saves a completed scrape result into account memory. - `GET /api/mempalace/items?q=pricing` searches saved pages. - `POST /api/mempalace/ask` asks Claude across saved pages. ## Webhooks Stored jobs and monitors support signed webhook callbacks. Signature headers: - `x-suitescrape-signature: sha256=...` - `x-suitescrape-timestamp` - `x-suitescrape-delivery` Signatures are HMAC-SHA256 over `${timestamp}.${rawBody}` using the per-job or per-monitor webhook secret returned by SuiteScrape. ## Output Formats - `markdown` - clean text for LLM ingestion. - `json` - structured title, metadata, content, links, sections, and query result. - `summary` - compact summary-like result. - `raw` - plain extracted content. ## Pricing And Limits - Free: $0/mo, 50 scrapes, 25 AI queries, 1 API key, 3 monitors, 25 memory items. - Starter: $9/mo, 1,000 scrapes, 250 AI queries, 2 API keys, 25 monitors, 250 memory items. - Pro: $29/mo, 10,000 scrapes, 2,500 AI queries, 5 API keys, 100 monitors, 2,500 memory items. - Team: $79/mo, 50,000 scrapes, 10,000 AI queries, 15 API keys, 500 monitors, 10,000 memory items. - Enterprise: custom. Free and Starter use the default Haiku model for Claude-backed answers. Pro, Team, and Enterprise use the configured Sonnet model. ## Safety And Reliability - SuiteScrape blocks private/internal URL targets, including localhost, RFC1918 ranges, link-local metadata hosts, and unsafe redirects. - Do not use SuiteScrape to scrape private data or content you do not have rights to process. - Use bounded crawl sizes. Default to small page counts unless the operator asks for a larger run. - Prefer batch queues, monitors, and webhooks for long-running workflows. - Treat `429` as a signal to back off or upgrade plan limits. - Treat `503` from AI routes as missing AI configuration or temporary AI dependency failure. ## Recommended Agent Workflow 1. Confirm `SUITESCRAPE_API_KEY` is available. 2. Read `https://suitescrape.com/openapi.json` for exact schemas. 3. For one page, call `POST /api/agent/scrape`. 4. For multiple pages, call `POST /api/agent/map` first, then `POST /api/agent/crawl` or `POST /api/scrape-jobs/batch`. 5. Use `GET /api/agent/jobs` to retrieve completed results. 6. Use exports or MemPalace ingestion for downstream analysis. 7. Use signed webhooks or monitors for recurring workflows.