Query Engine
The Query Engine is Catalyzed’s distributed SQL execution layer, powered by Apache Ballista. It allows you to run analytical queries across your datasets at scale.
Overview
Section titled “Overview”Catalyzed’s Query Engine provides:
- Standard SQL - ANSI SQL support for familiar querying
- Distributed Execution - Queries are parallelized across multiple workers
- High Performance - Columnar execution with vectorized processing
- Federation - Query across multiple tables in a single query
Running Queries
Section titled “Running Queries”Execute queries via the REST API:
curl -X POST https://api.catalyzed.ai/queries \ -H "Authorization: Bearer YOUR_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "sql": "SELECT * FROM my_table LIMIT 10", "tables": { "my_table": "TABLE_ID" } }'The tables parameter maps table names used in your SQL to actual table IDs in Catalyzed.
SQL Support
Section titled “SQL Support”The Query Engine supports a comprehensive SQL dialect including:
SELECT,FROM,WHERE,GROUP BY,ORDER BY,LIMITJOIN(INNER, LEFT, RIGHT, FULL, CROSS)- Subqueries and CTEs (
WITHclauses) - Window functions (
ROW_NUMBER,RANK,LAG,LEAD, etc.) - Aggregate functions (
COUNT,SUM,AVG,MIN,MAX, etc.) - Date/time, string, and mathematical functions
Performance Tips
Section titled “Performance Tips”- Use
LIMIT- Always limit results during exploration - Filter early - Apply
WHEREclauses to reduce data scanned - Select specific columns - Avoid
SELECT *in production
API Reference
Section titled “API Reference”See the Queries API for complete endpoint documentation.