Pipelines
Pipelines are automated workflows that transform, enrich, and process your data. They allow you to build reproducible data processing logic that runs on demand or on a schedule.
What is a Pipeline?
Section titled “What is a Pipeline?”A pipeline is a sequence of steps that:
- Read data from one or more sources
- Apply transformations and business logic
- Write results to a destination
Pipelines are defined declaratively and executed by Catalyzed’s distributed computation infrastructure.
Pipeline Components
Section titled “Pipeline Components”Pipelines consist of ordered steps, each performing a specific operation:
- SQL Transforms - Filter, join, and aggregate data using SQL
- Python Transforms - Custom Python code for complex logic
Triggers
Section titled “Triggers”Pipelines can be executed:
- Manually - On-demand via UI or API
- Scheduled - Run on a cron schedule
- Event-driven - Triggered by file uploads or other events
Creating a Pipeline
Section titled “Creating a Pipeline”curl -X POST https://api.catalyzed.ai/pipelines \ -H "Authorization: Bearer YOUR_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "teamId": "YOUR_TEAM_ID", "name": "my-pipeline", "description": "Transform and enrich data" }'Triggering a Pipeline
Section titled “Triggering a Pipeline”curl -X POST https://api.catalyzed.ai/pipelines/{pipelineId}/trigger \ -H "Authorization: Bearer YOUR_API_TOKEN"Monitoring Executions
Section titled “Monitoring Executions”Track pipeline runs through the executions API:
curl https://api.catalyzed.ai/pipeline-executions?pipelineId={pipelineId} \ -H "Authorization: Bearer YOUR_API_TOKEN"API Reference
Section titled “API Reference”See the Pipelines API for complete endpoint documentation.