# opendata-query

> This skill should be used when the user asks to "query open data", "find datasets", "search for data on OpenDataSoft", "get data about" a topic, "explore datasets", "export data", asks questions about public data (population, weather, transport, environment, etc.), or needs to combine data from multiple datasets. Provides workflow guidance for the OpenDataSoft MCP server tools.

- Author: loic
- Repository: bacoco/opendata-mcp
- Version: 20260209174419
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-09
- Source: https://github.com/bacoco/opendata-mcp
- Web: https://mule.run/skillshub/@@bacoco/opendata-mcp~opendata-query:20260209174419

---

---
name: opendata-query
description: This skill should be used when the user asks to "query open data", "find datasets", "search for data on OpenDataSoft", "get data about" a topic, "explore datasets", "export data", asks questions about public data (population, weather, transport, environment, etc.), or needs to combine data from multiple datasets. Provides workflow guidance for the OpenDataSoft MCP server tools.
---

# OpenDataSoft Query Skill

Guide Claude through querying any OpenDataSoft portal using the MCP server's progressive disclosure pattern.

## Available MCP Tools

The OpenDataSoft MCP server exposes 4 tools:

| Tool | Purpose | When to use |
|------|---------|-------------|
| `discover()` | List ODSQL doc topics + API routes | Always call first |
| `search_catalog(query)` | Search local dataset cache (no API call) | To find datasets by keyword — use BEFORE `getDatasets` |
| `get_odsql_doc(topic)` | Load ODSQL documentation for a topic | Before writing complex queries |
| `call_api(operation_id, params)` | Execute an API call | To fetch actual data |

## Core Workflow

### Step 1: Discover

Always start with `discover()` to get the list of available API routes and ODSQL documentation topics. This costs ~800 tokens.

### Step 2: Find Datasets (use search_catalog FIRST)

Use the local catalog cache to find datasets instantly without API calls:

```
search_catalog(query="weather france")
```

This returns dataset IDs, titles, keywords, themes, fields, and a relevance score — typically ~300 tokens for 10 results.

**Only fall back to `call_api("getDatasets", ...)` if `search_catalog` is unavailable or returns no results.**

```
call_api("getDatasets", {
  "where": "search(\"keyword\")",
  "limit": 5,
  "select": "dataset_id"
})
```

### Step 3: Load ODSQL Docs (if needed)

Before writing complex queries (aggregations, geo filters, date functions), load the relevant topic:

| Topic | Load when... |
|-------|-------------|
| `language_basics` | Unsure about field names, literals, date format |
| `clauses` | Need SELECT, WHERE, GROUP BY, ORDER BY syntax |
| `functions` | Need date/string/geo functions in SELECT |
| `predicates` | Need WHERE filters: search, geo, comparison |
| `aggregate_functions` | Need avg, count, sum, max, min, percentile |
| `grouping_functions` | Need range(), geo_cluster() in GROUP BY |

### Step 4: Query Records

Fetch data from a specific dataset:

```
call_api("getRecords", {
  "dataset_id": "the-dataset-id",
  "where": "field > value",
  "select": "field1, field2",
  "order_by": "field desc",
  "limit": 10
})
```

### Step 5: Multi-Dataset Queries

For questions spanning multiple domains (e.g., "weather in Paris AND population of the 5th"):

1. Search datasets separately for each domain
2. Query each dataset independently
3. Synthesize results into a coherent answer

There is no cross-dataset join — orchestrate multiple `call_api` calls.

## Key API Operations

### Catalog Operations
- `getDatasets` — Search/list datasets (use `where: "search(\"keyword\")"`)
- `getDataset` — Get metadata for one dataset (needs `dataset_id`)
- `getDatasetsFacets` — List facet values for filtering

### Dataset Operations
- `getRecords` — Query records (main workhorse, needs `dataset_id`)
- `getRecord` — Get single record (needs `dataset_id` + `record_id`)
- `getRecordsFacets` — List facets for a dataset
- `exportRecordsCSV` — Export as CSV (needs `dataset_id`)

## Common Patterns

### Search for datasets (preferred: local cache)
```
search_catalog(query="transport paris")
```

### Search for datasets (fallback: API)
```
call_api("getDatasets", {"where": "search(\"transport paris\")", "limit": 5, "select": "dataset_id"})
```

### Explore dataset fields
```
call_api("getRecords", {"dataset_id": "xxx", "limit": 1})
```

### Filter with ODSQL
```
call_api("getRecords", {
  "dataset_id": "xxx",
  "where": "population > 100000 AND country = 'France'",
  "order_by": "population desc",
  "limit": 20
})
```

### Aggregate data
```
call_api("getRecords", {
  "dataset_id": "xxx",
  "select": "count(*), avg(population)",
  "group_by": "region",
  "order_by": "count(*) desc"
})
```

### Date filtering
```
call_api("getRecords", {
  "dataset_id": "xxx",
  "where": "date >= date'2024-01-01'",
  "order_by": "date desc"
})
```

## Error Recovery

- **400 Bad Request**: Usually an ODSQL syntax error. Load `get_odsql_doc()` for the relevant topic, fix the query, retry.
- **404 Not Found**: Wrong `dataset_id` or `operation_id`. Use `discover()` or `getDatasets` to find correct values.
- **Empty results**: Broaden the search. Try `search()` instead of exact match. Check field names with a `limit: 1` query.

## Tips

- Always use `select` to limit returned fields — reduces response size dramatically
- Use `limit` to avoid huge responses (default is 10, max 100)
- For dataset discovery, `select: "dataset_id"` is sufficient
- To explore a dataset's schema, query with `limit: 1` and inspect the fields
- The portal is configured via `ODS_BASE_URL` environment variable — all queries target that portal

## Additional Resources

### Reference Files

For detailed ODSQL syntax and patterns:
- **`references/odsql-cheatsheet.md`** — Quick reference for all ODSQL syntax
- **`references/multi-dataset-patterns.md`** — Patterns for complex multi-dataset queries