# SuiteScrape Full Agent Context

SuiteScrape is a production SaaS at https://suitescrape.com for AI-ready web extraction. It combines Playwright-backed scraping, static HTML fallback, Claude question-answering, stored scrape jobs, batch queues, same-site URL mapping, bounded crawls, URL monitors, signed webhooks, export formats, account usage limits, API keys, Stripe billing, and MemPalace saved-page memory.

## Canonical Hosts

- Website: https://suitescrape.com
- API: https://api.suitescrape.com
- Documentation UI: https://suitescrape.com/docs
- Pricing UI: https://suitescrape.com/pricing
- Public status: https://suitescrape.com/status
- OpenAPI: https://suitescrape.com/openapi.json
- Short LLM context: https://suitescrape.com/llms.txt
- Agent skill: https://suitescrape.com/agent-onboarding/SKILL.md
- Remote MCP endpoint: https://api.suitescrape.com/mcp
- Local MCP bridge: `SUITESCRAPE_API_KEY=ss_... npm run mcp`

## Authentication

Browser users can create accounts and API keys in the SuiteScrape workbench. API keys use the `ss_...` prefix and must be sent with `Authorization: Bearer ss_your_key`.

Agents must not invent credentials. Ask the operator for `SUITESCRAPE_API_KEY` or read it only from an approved secret manager or environment variable. Do not print, log, commit, or paste the key.

## Core Agent API

Use the API host for all calls: `https://api.suitescrape.com`.

## Local MCP Bridge

Agents with Streamable HTTP MCP support can connect directly to `https://api.suitescrape.com/mcp` or `https://api.suitescrape.com/api/mcp` with `Authorization: Bearer ss_your_key`. The remote endpoint is stateless and exposes the same tools as the local bridge.

The SuiteScrape repo also includes a local stdio Model Context Protocol bridge at `mcp/server.mjs`. Run it with `SUITESCRAPE_API_KEY` set. Both remote and local MCP expose these tools: `get_plans`, `scrape_url`, `ask_job`, `map_site`, `crawl_site`, `search_jobs`, `export_job`, and `create_monitor`.

Claude Desktop or Cursor-style config:

```json
{
  "mcpServers": {
    "suitescrape": {
      "command": "node",
      "args": ["/absolute/path/to/suitescrape-backend-fix/mcp/server.mjs"],
      "env": {
        "SUITESCRAPE_API_KEY": "ss_your_key",
        "SUITESCRAPE_API_URL": "https://api.suitescrape.com"
      }
    }
  }
}
```

### Scrape one URL

`POST /api/agent/scrape`

Body:

```json
{
  "url": "example.com",
  "question": "Return the key facts as bullets.",
  "format": "markdown",
  "tags": ["research"],
  "webhookUrl": "https://hooks.example.com/suitescrape"
}
```

Returns a stored job with `id` / `jobId`, scrape result, optional Claude answer, usage metadata, and webhook metadata when configured.

### Ask about content or a stored job

`POST /api/agent/ask`

Body with a stored job:

```json
{
  "jobId": "job_id_here",
  "question": "What changed or matters here?"
}
```

Body with direct content:

```json
{
  "content": "Text to analyze",
  "question": "Summarize the strongest signal."
}
```

### Search jobs

`GET /api/agent/jobs?status=completed&q=pricing&limit=25`

Use `q`, `tag`, `status`, `limit`, and `offset` for retrieval. Store useful metadata with `PATCH /api/agent/jobs/:id`.

### Export jobs

- `GET /api/agent/jobs/export?format=json`
- `GET /api/agent/jobs/export?format=csv`
- `GET /api/agent/jobs/:id/export?format=markdown`
- `GET /api/agent/jobs/:id/export?format=json`
- `GET /api/agent/jobs/:id/export?format=summary`
- `GET /api/agent/jobs/:id/export?format=raw`

### Discover and crawl URLs

- `POST /api/agent/map` maps same-site URLs from `sitemap.xml` and root-page links without charging scrape quota.
- `POST /api/agent/crawl` maps a site, scrapes the root, and queues up to the requested bounded page count.

Example:

```json
{
  "url": "iana.org/domains/reserved",
  "maxPages": 5,
  "format": "summary"
}
```

### Monitors

- `GET /api/agent/monitors`
- `POST /api/agent/monitors`
- `PATCH /api/agent/monitors/:id`
- `POST /api/agent/monitors/:id/check`
- `POST /api/agent/monitors/:id/webhook-test`
- `DELETE /api/agent/monitors/:id`

Monitors can run manual, 6-hour, daily, or weekly checks. They can alert on content changes, word delta thresholds, keywords, email, or signed webhook delivery.

### MemPalace memory

- `POST /api/mempalace/ingest` saves a completed scrape result into account memory.
- `GET /api/mempalace/items?q=pricing` searches saved pages.
- `POST /api/mempalace/ask` asks Claude across saved pages.

## Webhooks

Stored jobs and monitors support signed webhook callbacks. Signature headers:

- `x-suitescrape-signature: sha256=...`
- `x-suitescrape-timestamp`
- `x-suitescrape-delivery`

Signatures are HMAC-SHA256 over `${timestamp}.${rawBody}` using the per-job or per-monitor webhook secret returned by SuiteScrape.

## Output Formats

- `markdown` - clean text for LLM ingestion.
- `json` - structured title, metadata, content, links, sections, and query result.
- `summary` - compact summary-like result.
- `raw` - plain extracted content.

## Pricing And Limits

- Free: $0/mo, 50 scrapes, 25 AI queries, 1 API key, 3 monitors, 25 memory items.
- Starter: $9/mo, 1,000 scrapes, 250 AI queries, 2 API keys, 25 monitors, 250 memory items.
- Pro: $29/mo, 10,000 scrapes, 2,500 AI queries, 5 API keys, 100 monitors, 2,500 memory items.
- Team: $79/mo, 50,000 scrapes, 10,000 AI queries, 15 API keys, 500 monitors, 10,000 memory items.
- Enterprise: custom.

Free and Starter use the default Haiku model for Claude-backed answers. Pro, Team, and Enterprise use the configured Sonnet model.

## Safety And Reliability

- SuiteScrape blocks private/internal URL targets, including localhost, RFC1918 ranges, link-local metadata hosts, and unsafe redirects.
- Do not use SuiteScrape to scrape private data or content you do not have rights to process.
- Use bounded crawl sizes. Default to small page counts unless the operator asks for a larger run.
- Prefer batch queues, monitors, and webhooks for long-running workflows.
- Treat `429` as a signal to back off or upgrade plan limits.
- Treat `503` from AI routes as missing AI configuration or temporary AI dependency failure.

## Recommended Agent Workflow

1. Confirm `SUITESCRAPE_API_KEY` is available.
2. Read `https://suitescrape.com/openapi.json` for exact schemas.
3. For one page, call `POST /api/agent/scrape`.
4. For multiple pages, call `POST /api/agent/map` first, then `POST /api/agent/crawl` or `POST /api/scrape-jobs/batch`.
5. Use `GET /api/agent/jobs` to retrieve completed results.
6. Use exports or MemPalace ingestion for downstream analysis.
7. Use signed webhooks or monitors for recurring workflows.