Query Engine

The Query Engine is Catalyzed’s distributed SQL execution layer, powered by Apache Ballista. It allows you to run analytical queries across your datasets at scale.

Overview

Catalyzed’s Query Engine provides:

Standard SQL - ANSI SQL support for familiar querying
Distributed Execution - Queries are parallelized across multiple workers
High Performance - Columnar execution with vectorized processing
Federation - Query across multiple tables in a single query

Running Queries

Execute queries via the REST API:

curl -X POST https://api.catalyzed.ai/queries \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "sql": "SELECT * FROM my_table LIMIT 10",
    "tables": {
      "my_table": "TABLE_ID"
    }
  }'

The tables parameter maps table names used in your SQL to actual table IDs in Catalyzed.

SQL Support

The Query Engine supports a comprehensive SQL dialect including:

SELECT, FROM, WHERE, GROUP BY, ORDER BY, LIMIT
JOIN (INNER, LEFT, RIGHT, FULL, CROSS)
Subqueries and CTEs (WITH clauses)
Window functions (ROW_NUMBER, RANK, LAG, LEAD, etc.)
Aggregate functions (COUNT, SUM, AVG, MIN, MAX, etc.)
Date/time, string, and mathematical functions

Performance Tips

Use LIMIT - Always limit results during exploration
Filter early - Apply WHERE clauses to reduce data scanned
Select specific columns - Avoid SELECT * in production

API Reference

See the Queries API for complete endpoint documentation.