Skip to content

Datasets

Datasets are the fundamental building blocks of Catalyzed. They represent a logical collection of data tables that can be queried, transformed, and analyzed.

A dataset in Catalyzed is a container that:

  • Belongs to a specific team
  • Contains one or more tables (data sources)
  • Has defined schemas per table
  • Can be queried using SQL via the Query Engine

Within a dataset, data is organized into tables. Each table:

  • Has a schema defining its columns and types
  • Supports schema evolution and migrations
  • Maintains statistics for query optimization
  • Can be indexed for faster lookups

Datasets can be created through the UI or API:

Terminal window
curl -X POST https://api.catalyzed.ai/datasets \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"teamId": "YOUR_TEAM_ID",
"name": "my-dataset",
"description": "My first dataset"
}'

Once data is loaded into a dataset, you can query it using SQL through the Query Engine. See the Query Engine documentation for details.

See the Datasets API for complete endpoint documentation.