Querying Data
Catalyzed supports standard SQL for querying your data. This guide covers common query patterns and best practices.
Executing Queries
Section titled “Executing Queries”Use the /queries endpoint to run SQL queries against your tables. The tables parameter maps the table names used in your SQL to their table IDs.
Basic SELECT query
curl -X POST https://api.catalyzed.ai/queries \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "sql": "SELECT * FROM users LIMIT 10", "tables": {"users": "KzaMsfA0LSw_Ld0KyaXIS"} }'const response = await fetch("https://api.catalyzed.ai/queries", { method: "POST", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify({ sql: "SELECT * FROM users LIMIT 10", tables: { users: "KzaMsfA0LSw_Ld0KyaXIS" }, }),});const data = await response.json();response = requests.post( "https://api.catalyzed.ai/queries", headers={"Authorization": f"Bearer {api_token}"}, json={ "sql": "SELECT * FROM users LIMIT 10", "tables": {"users": "KzaMsfA0LSw_Ld0KyaXIS"} })data = response.json()Joining Tables
Section titled “Joining Tables”Query across multiple tables by including them all in the tables mapping:
JOIN across tables
curl -X POST https://api.catalyzed.ai/queries \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "sql": "SELECT o.order_id, c.name, o.total FROM orders o JOIN customers c ON o.customer_id = c.id", "tables": { "orders": "Ednc5U676CO4hn-FqsXeA", "customers": "6fTBbbj4uv8TVMVh0gVch" } }'const response = await fetch("https://api.catalyzed.ai/queries", { method: "POST", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify({ sql: `SELECT o.order_id, c.name, o.total FROM orders o JOIN customers c ON o.customer_id = c.id`, tables: { orders: "Ednc5U676CO4hn-FqsXeA", customers: "6fTBbbj4uv8TVMVh0gVch", }, }),});response = requests.post( "https://api.catalyzed.ai/queries", headers={"Authorization": f"Bearer {api_token}"}, json={ "sql": """ SELECT o.order_id, c.name, o.total FROM orders o JOIN customers c ON o.customer_id = c.id """, "tables": { "orders": "Ednc5U676CO4hn-FqsXeA", "customers": "6fTBbbj4uv8TVMVh0gVch" } })SQL Syntax Reference
Section titled “SQL Syntax Reference”SELECT Clauses
Section titled “SELECT Clauses”-- Select specific columnsSELECT name, email, created_at FROM users
-- Select all columnsSELECT * FROM users
-- Column aliasesSELECT name AS customer_name, email AS contact FROM users
-- ExpressionsSELECT name, price * quantity AS total FROM ordersFiltering (WHERE)
Section titled “Filtering (WHERE)”-- Basic comparisonsSELECT * FROM users WHERE status = 'active'SELECT * FROM orders WHERE amount > 100SELECT * FROM orders WHERE created_at >= '2024-01-01'
-- Multiple conditionsSELECT * FROM orders WHERE status = 'active' AND amount > 100SELECT * FROM orders WHERE status = 'active' OR status = 'pending'
-- NULL checksSELECT * FROM users WHERE deleted_at IS NULLSELECT * FROM orders WHERE notes IS NOT NULL
-- IN operatorSELECT * FROM orders WHERE status IN ('active', 'pending', 'review')
-- LIKE pattern matchingSELECT * FROM users WHERE email LIKE '%@example.com'SELECT * FROM users WHERE name LIKE 'John%'
-- BETWEEN rangeSELECT * FROM orders WHERE amount BETWEEN 100 AND 500Sorting (ORDER BY)
Section titled “Sorting (ORDER BY)”-- Ascending (default)SELECT * FROM orders ORDER BY created_at
-- DescendingSELECT * FROM orders ORDER BY created_at DESC
-- Multiple columnsSELECT * FROM orders ORDER BY status ASC, created_at DESC
-- With NULLS handlingSELECT * FROM orders ORDER BY completed_at NULLS LASTPagination (LIMIT/OFFSET)
Section titled “Pagination (LIMIT/OFFSET)”-- First 10 rowsSELECT * FROM users LIMIT 10
-- Skip first 20, get next 10SELECT * FROM users LIMIT 10 OFFSET 20
-- Recommended: use ORDER BY with paginationSELECT * FROM users ORDER BY id LIMIT 10 OFFSET 20Aggregations
Section titled “Aggregations”-- CountSELECT COUNT(*) FROM ordersSELECT COUNT(DISTINCT customer_id) FROM orders
-- Sum, Average, Min, MaxSELECT SUM(amount) FROM ordersSELECT AVG(amount) FROM orders WHERE status = 'completed'SELECT MIN(created_at), MAX(created_at) FROM orders
-- Group bySELECT status, COUNT(*) as count, SUM(amount) as totalFROM ordersGROUP BY status
-- Having (filter groups)SELECT customer_id, COUNT(*) as order_countFROM ordersGROUP BY customer_idHAVING COUNT(*) > 5-- INNER JOIN (only matching rows)SELECT o.*, c.nameFROM orders oJOIN customers c ON o.customer_id = c.id
-- LEFT JOIN (all from left, matching from right)SELECT c.name, o.order_idFROM customers cLEFT JOIN orders o ON c.id = o.customer_id
-- Multiple joinsSELECT o.order_id, c.name, p.product_nameFROM orders oJOIN customers c ON o.customer_id = c.idJOIN products p ON o.product_id = p.idSubqueries
Section titled “Subqueries”-- Subquery in WHERESELECT * FROM ordersWHERE customer_id IN ( SELECT id FROM customers WHERE status = 'premium')
-- Subquery in FROMSELECT avg_amount, COUNT(*) as customer_countFROM ( SELECT customer_id, AVG(amount) as avg_amount FROM orders GROUP BY customer_id) customer_averagesGROUP BY avg_amountCommon Table Expressions (CTEs)
Section titled “Common Table Expressions (CTEs)”WITH high_value_customers AS ( SELECT customer_id, SUM(amount) as total_spent FROM orders GROUP BY customer_id HAVING SUM(amount) > 10000)SELECT c.name, hvc.total_spentFROM high_value_customers hvcJOIN customers c ON hvc.customer_id = c.idORDER BY hvc.total_spent DESCWindow Functions
Section titled “Window Functions”-- Row numberSELECT name, amount, ROW_NUMBER() OVER (ORDER BY amount DESC) as rankFROM orders
-- Partition bySELECT customer_id, order_date, amount, SUM(amount) OVER (PARTITION BY customer_id) as customer_totalFROM orders
-- Running totalSELECT order_date, amount, SUM(amount) OVER (ORDER BY order_date) as running_totalFROM ordersQuery Response Format
Section titled “Query Response Format”Queries return results in a structured JSON format:
{ "queryId": "qry_abc123", "columns": [ {"name": "id", "type": "Utf8"}, {"name": "name", "type": "Utf8"}, {"name": "email", "type": "Utf8"}, {"name": "created_at", "type": "Timestamp(Microsecond, Some(\"UTC\"))"} ], "rows": [ {"id": "1", "name": "Alice", "email": "alice@example.com", "created_at": "2024-01-15T10:00:00Z"}, {"id": "2", "name": "Bob", "email": "bob@example.com", "created_at": "2024-01-15T11:00:00Z"} ], "rowCount": 2, "truncated": false}Query Options
Section titled “Query Options”Limit Results
Section titled “Limit Results”Prevent accidentally returning too many rows:
{ "sql": "SELECT * FROM users", "tables": {"users": "KzaMsfA0LSw_Ld0KyaXIS"}, "maxRows": 1000}Get Execution Plan
Section titled “Get Execution Plan”Preview how a query will execute without running it:
curl -X POST https://api.catalyzed.ai/queries/explain \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "sql": "SELECT * FROM users WHERE status = '\''active'\''", "tables": {"users": "KzaMsfA0LSw_Ld0KyaXIS"} }'Performance Tips
Section titled “Performance Tips”1. Use Specific Columns
Section titled “1. Use Specific Columns”-- Good: Select only needed columnsSELECT id, name, status FROM users
-- Avoid: Select all columnsSELECT * FROM users2. Filter Early
Section titled “2. Filter Early”-- Good: Filter in WHERESELECT * FROM orders WHERE created_at > '2024-01-01'
-- Avoid: Filter in application codeSELECT * FROM orders -- then filter in code3. Use LIMIT
Section titled “3. Use LIMIT”-- Good: Always limit during developmentSELECT * FROM orders WHERE status = 'active' LIMIT 100
-- Avoid: Unbounded queriesSELECT * FROM orders4. Create Indexes
Section titled “4. Create Indexes”For columns frequently used in WHERE clauses:
curl -X POST https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/indexes \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "indexName": "idx_status", "columnName": "status", "indexType": "btree" }'5. Use Appropriate Data Types
Section titled “5. Use Appropriate Data Types”- Use
timestampfor dates, not strings - Use numeric types (
int64,float64) for numbers, not strings - This enables efficient comparisons and aggregations
Error Handling
Section titled “Error Handling”Syntax Errors
Section titled “Syntax Errors”{ "error": "QUERY_ERROR", "message": "SQL syntax error at position 45: unexpected token 'FORM'"}Missing Table Bindings
Section titled “Missing Table Bindings”{ "error": "VALIDATION_ERROR", "message": "Table 'orders' referenced in SQL but not found in tables mapping"}Timeout
Section titled “Timeout”Long-running queries may timeout. Use filters and limits to reduce data scanned.
SQL Translation
Section titled “SQL Translation”Convert natural language queries to SQL using the /queries/translate endpoint. The translator understands table schemas, column types, and supports semantic search functions.
Translate natural language to SQL
curl -X POST https://api.catalyzed.ai/queries/translate \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "tables": { "orders": "Ednc5U676CO4hn-FqsXeA", "customers": "6fTBbbj4uv8TVMVh0gVch" }, "query": "Find customers who placed orders over $1000 last month", "includeExplanation": true }'const response = await fetch("https://api.catalyzed.ai/queries/translate", { method: "POST", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify({ tables: { orders: "Ednc5U676CO4hn-FqsXeA", customers: "6fTBbbj4uv8TVMVh0gVch", }, query: "Find customers who placed orders over $1000 last month", includeExplanation: true, }),});const { sql, explanation, confidence } = await response.json();response = requests.post( "https://api.catalyzed.ai/queries/translate", headers={"Authorization": f"Bearer {api_token}"}, json={ "tables": { "orders": "Ednc5U676CO4hn-FqsXeA", "customers": "6fTBbbj4uv8TVMVh0gVch" }, "query": "Find customers who placed orders over $1000 last month", "includeExplanation": True })result = response.json()sql = result["sql"]explanation = result.get("explanation")confidence = result["confidence"]Response:
{ "sql": "SELECT DISTINCT c.name, c.email FROM customers c JOIN orders o ON c.id = o.customer_id WHERE o.amount > 1000 AND o.created_at >= date_trunc('month', CURRENT_DATE - INTERVAL '1 month')", "confidence": "high", "explanation": "This query joins customers and orders tables, filters for orders over $1000, and restricts to the previous month using date_trunc.", "warnings": [], "referencedTables": ["customers", "orders"], "referencedColumns": ["customers.name", "customers.email", "customers.id", "orders.customer_id", "orders.amount", "orders.created_at"], "usesVectorSearch": false}Translation Features
Section titled “Translation Features”- Schema Awareness — understands table structures, column types, and sample data
- Semantic Search — auto-detects vector columns and enables
knn_search(),knn_cosine(),knn_l2(), andtext_to_embedding() - SQL Validation — validates generated SQL syntax via
EXPLAIN(enabled by default) - Confidence Levels — returns
"high","medium", or"low"confidence for the translation - Explanations — optional natural language explanation of the generated SQL
Request Options
Section titled “Request Options”| Option | Type | Description |
|---|---|---|
tables | object | Map of table aliases (as used in SQL) to table IDs |
query | string | Natural language query to translate (max 4000 chars) |
includeExplanation | boolean | Include explanation of generated SQL (default: false) |
validateSyntax | boolean | Validate SQL syntax before returning (default: true) |
Example: Semantic Search Query
Section titled “Example: Semantic Search Query”{ "tables": { "products": "tbl_abc123" }, "query": "Find products similar to wireless headphones", "includeExplanation": true}Generated SQL:
SELECT * FROM knn_cosine( 'products', 'embedding', text_to_embedding('wireless headphones'), 10)For optimizing translation quality — table naming, column naming, confidence handling, and the translate→execute workflow — see the SQL Translation Guide.
See the SQL Translation API for the complete endpoint reference.
Vector Search
Section titled “Vector Search”Catalyzed also supports semantic similarity search using vector embeddings. Use knn_search() to find similar documents, products, or any embedded content:
SELECT * FROM knn_cosine( 'products', 'embedding', text_to_embedding('wireless headphones'), 10)See the Vector Search Guide for complete documentation on:
knn_search(),knn_cosine(), andknn_l2()functionstext_to_embedding()for query-time embeddings- Filtered vector search
- Distance metrics and thresholds
Next Steps
Section titled “Next Steps”- Vector Search - Semantic similarity search with embeddings
- Tables - Learn about table schemas and indexes
- Schema Management - Evolve your schemas safely