Skip to content

Query Engine

The Query Engine is Catalyzed’s distributed SQL execution layer, powered by Apache Ballista. It allows you to run analytical queries across your datasets at scale.

Catalyzed’s Query Engine provides:

  • Standard SQL - ANSI SQL support for familiar querying
  • Distributed Execution - Queries are parallelized across multiple workers
  • High Performance - Columnar execution with vectorized processing
  • Federation - Query across multiple tables in a single query

Execute queries via the REST API:

Terminal window
curl -X POST https://api.catalyzed.ai/queries \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"sql": "SELECT * FROM my_table LIMIT 10",
"tables": {
"my_table": "TABLE_ID"
}
}'

The tables parameter maps table names used in your SQL to actual table IDs in Catalyzed.

The Query Engine supports a comprehensive SQL dialect including:

  • SELECT, FROM, WHERE, GROUP BY, ORDER BY, LIMIT
  • JOIN (INNER, LEFT, RIGHT, FULL, CROSS)
  • Subqueries and CTEs (WITH clauses)
  • Window functions (ROW_NUMBER, RANK, LAG, LEAD, etc.)
  • Aggregate functions (COUNT, SUM, AVG, MIN, MAX, etc.)
  • Date/time, string, and mathematical functions
  1. Use LIMIT - Always limit results during exploration
  2. Filter early - Apply WHERE clauses to reduce data scanned
  3. Select specific columns - Avoid SELECT * in production

See the Queries API for complete endpoint documentation.