Datasets
Datasets are the fundamental building blocks of Catalyzed. They represent a logical collection of data tables that can be queried, transformed, and analyzed.
What is a Dataset?
Section titled “What is a Dataset?”A dataset in Catalyzed is a container that:
- Belongs to a specific team
- Contains one or more tables (data sources)
- Has defined schemas per table
- Can be queried using SQL via the Query Engine
Dataset Tables
Section titled “Dataset Tables”Within a dataset, data is organized into tables. Each table:
- Has a schema defining its columns and types
- Supports schema evolution and migrations
- Maintains statistics for query optimization
- Can be indexed for faster lookups
Creating a Dataset
Section titled “Creating a Dataset”Datasets can be created through the UI or API:
curl -X POST https://api.catalyzed.ai/datasets \ -H "Authorization: Bearer YOUR_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "teamId": "YOUR_TEAM_ID", "name": "my-dataset", "description": "My first dataset" }'Querying Datasets
Section titled “Querying Datasets”Once data is loaded into a dataset, you can query it using SQL through the Query Engine. See the Query Engine documentation for details.
API Reference
Section titled “API Reference”See the Datasets API for complete endpoint documentation.