Using VQL with Data Models
VQL can also be used in Data Models to dynamically filter data at query execution time. In Data Models, VQL is typically combined with macros to enable flexible, runtime-driven query behavior without requiring the caller to directly modify SQL queries.
The following sections explain how dynamic variables, categories, and categorical macros work when using VQL in Data Models.
Data Model Template Queries
Data Models support template-based queries, where certain parts of the SQL can be dynamically replaced or enabled at runtime using macros.
Currently, Data Models provide __$dynamic* macros that allow values to be injected into SQL queries at execution time.
What are Data Model (DM) Variables?
DM variables are value templates that are substituted into the SQL at runtime.
If a value for a variable is not provided, it is either:
- Replaced with an appropriate SQL value that does not affect statement execution, or
- Replaced with a user-defined default value, as applicable.
The $__dynamicVariable macro alone is sufficient to alter SQL behavior at runtime, including modifying entire SQL statements. However, this approach requires the caller to be SQL-aware.
Given this complexity, we propose splitting the templating problem into two distinct parts:
- Allowing the caller to replace specific SQL fragments at runtime.
- Allowing the caller to enable or disable certain SQL logic at runtime.
Together, these capabilities provide more granular control at the external caller level while preserving flexibility through user input. This enables the caller to control SQL execution without needing to be SQL-aware. To address this, Data Models introduce categories.
Categories
Categories can be considered as control flags defined by users (query authors) that allow the caller or executor to modify specific query behavior at execution time. For example, categories can be used to enable bucketing or time bucketing, or to change the type of aggregation being performed.
There should be a predefined and exhaustive set of categories from which users can choose. Self-defined categories, essentially variations of predefined categories used to distinguish subcategories, may be allowed, subject to certain considerations.
The following categories can be defined:
timebucketagghostappenvservicebranchcityetc.
For self-defined categories, the following naming convention can be enforced:
<predefined_category>_<desired_name>
These categories must then be associated with SQL expressions that can be enabled or disabled at runtime. To achieve this, we introduce categorical macros.
Categorical Macros
$__cat*
We introduce a set of $__cat* macros that conditionally enable specific SQL segments when the corresponding categories are active, and disable them otherwise.
Syntax
$__catSelect(category, expr, repl=>NULL)
$__catFilter(category, expr, repl=>NULL)
$__catGroup(category, expr, repl=>NULL)
$__catOrder(category, expr, repl=>NULL)
All of these macros follow the same syntax and usage pattern.
Arguments
- category: The category name.
- expr: The SQL expression to be enabled when the category is active.
- repl (optional): The SQL expression to use when the category is disabled. This defaults to a value appropriate for the underlying database. By default, an appropriate SQL expression is used to ensure that
expris effectively disabled and that the SQL interpreter does not raise an error.
SELECT
Syntax
SELECT $__catSelect(category, expr)
FROM table_name
WHERE condition
Example
User-defined query:
SELECT timestamp,
cpu_used_pct AS "CPU Util",
mem_used_pct AS "Memory Util",
$__catSelect(app, app_name AS "Application"),
$__catSelect(branch, branch_name AS "Branch")
FROM system_stats
WHERE $__timeFilter("timestamp")
Macro Expansion
Assume the caller enables the app category.
SELECT timestamp,
cpu_used_pct AS "CPU Util",
mem_used_pct AS "Memory Util",
app_name AS Application,
NULL
FROM system_stats
WHERE <time filter on "timestamp">
The NULL value in the SELECT clause is ignored during data retrieval.
WHERE
Syntax
SELECT column_name(s)
FROM table_name
WHERE $__catFilter(category, expr)
Example
User-defined query:
SELECT timestamp,
cpu_used_pct AS "CPU Util",
mem_used_pct AS "Memory Util",
$__catSelect(app, app_name AS "Application"),
$__catSelect(branch, branch_name AS "Branch")
FROM system_stats
WHERE $__timeFilter("timestamp")
AND $__catFilter(app, $__dynamicFilter("app_name", $Application))
AND $__catFilter(branch, $__dynamicFilter("branch_name", $Branch))
Macro Expansion
Assume the caller enables the app category and provides
$Application = ['App1', 'App2'].
SELECT timestamp,
cpu_used_pct AS "CPU Util",
mem_used_pct AS "Memory Util",
app_name AS Application
FROM system_stats
WHERE <time filter on "timestamp">
AND "app_name" IN ('App1', 'App2')
AND true
GROUP BY
Syntax
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY $__catGroup(category, expr)
The $__catGroup macro should always be used together with $__catSelect to avoid GROUP BY errors.
Example
User-defined query:
SELECT timestamp,
avg(cpu_used_pct) AS "CPU Util",
avg(mem_used_pct) AS "Memory Util",
$__catSelect(app, app_name AS "Application"),
$__catSelect(host, server_ip AS "Server IP"),
$__catSelect(branch, branch_name AS "Branch")
FROM system_stats
WHERE $__timeFilter("timestamp")
GROUP BY $__catGroup(app, "Application"),
$__catGroup(host, "Server IP"),
$__catGroup(branch, "Branch")
Macro Expansion
Assume the caller enables the app category.
SELECT timestamp,
avg(cpu_used_pct) AS "CPU Util",
avg(mem_used_pct) AS "Memory Util",
app_name AS "Application",
NULL,
NULL
FROM system_stats
WHERE <time filter on "timestamp">
GROUP BY "Application",
NULL,
NULL
ORDER BY
Syntax
SELECT column_name(s)
FROM table_name
WHERE condition
ORDER BY $__catOrder(expr, category)
The sort order (ASC / DESC) is determined dynamically by the caller module at query execution time, based on the requirement. If the caller does not specify an order, ASC is used by default.
Example
User-defined query:
SELECT timestamp,
avg(cpu_used_pct) AS "CPU Util",
avg(mem_used_pct) AS "Memory Util",
$__catSelect(app, app_name AS "Application"),
$__catSelect(app, server_ip AS "Server IP"),
$__catSelect(branch, branch_name AS "Branch")
FROM system_stats
WHERE $__timeFilter("timestamp")
ORDER BY $__catOrder(app, "Application"),
$__catOrder(app, "Server IP"),
$__catOrder(branch, "Branch")
Macro Expansion
Assume the caller enables the app category.
SELECT timestamp,
avg(cpu_used_pct) AS "CPU Util",
avg(mem_used_pct) AS "Memory Util",
app_name AS "Application",
server_ip AS "Server IP",
NULL
FROM system_stats
WHERE <time filter on "timestamp">
ORDER BY "Application",
"Server IP",
NULL
$__catDynReplace
We introduce this powerful categorical dynamic replacement macro to address any use case not already covered by the existing $__cat* macros. This macro provides maximum flexibility by allowing multiple expressions, default values for each expression, and the ability to disable the expression—similar to standard $__cat* macros—by specifying a replacement expression when the category is disabled.
Syntax
$__catDynReplace(cat, expr, repl, clause, choices=>NULL)
Arguments (supports named notation)
- category: The category name.
- expr: The SQL expression to enable when the category is enabled. This expression may include variables. If variables are used in
expr, the user must also providechoices. - repl: The SQL expression to use when the category is disabled. This macro is intended for advanced users who understand database behavior; therefore, no default replacement is applied.
- clause: The SQL clause in which the macro is used.
- choices: The allowed choices for this macro, including default values.
- The number of choices must match the number of variables.
- If one choice is provided per variable, it is treated as the default value.
- If multiple choices are provided per variable, the first choice for each variable is treated as the default value.
With great power comes great responsibility. Since this macro provides extensive control over SQL behavior, it is the user’s responsibility to ensure that the DM schema is preserved for all replacements. This includes proper use of AS in the SELECT clause, correct usage of repl where required, and overall SQL correctness.
Example Use Cases
How can you write a query that enables time bucketing with different intervals?
If you want to write a query that enables time bucketing with different intervals, the following query can be used.
SELECT $__catDynReplace(
timebucket,
toStartOfInterval("timestamp", INTERVAL $interval),
toStartOfInterval("timestamp", INTERVAL '1 hour'),
SELECT,
choices=>('1 hour', '5 minutes', '15 minutes', '30 minutes')
) AS ts,
$__catSelect(
timebucket,
avg(cpu_used_pct),
cpu_used_pct
) AS "CPU Util",
$__catSelect(
timebucket,
avg(mem_used_pct),
mem_used_pct
) AS "Memory Util"
FROM system_stats
WHERE $__timeFilter("timestamp")
GROUP BY ts
How can you aggregate fields dynamically?
If you want to write a query that allows you to dynamically choose the aggregation type and the column on which the aggregation is performed, the following query can be used.
SELECT $__catDynReplace(
agg,
$aggType($aggField),
min(cpu_used_pct),
SELECT,
choices=>(
(min, max, avg),
(cpu_used_pct, mem_used_pct)
)
) AS aggregate,
$__catSelect(app, app_name AS "Application"),
$__catSelect(branch, branch_name AS "Branch"),
city
FROM system_stats
WHERE $__timeFilter("timestamp")
GROUP BY $__catGroup(app, "Application"),
$__catGroup(branch, "Branch"),
city
The above query returns one aggregate of type $aggType on the field $aggField.
How can you return multiple dynamic aggregates?
If you want the query to return two aggregates, you can achieve this in the following ways.
Option 1: Use two instances of $__catDynReplace(...)
SELECT $__catDynReplace(
agg,
$aggType1($aggField1),
min(cpu_used_pct),
SELECT,
choices=>(
(min, max, avg),
(cpu_used_pct, mem_used_pct)
)
) AS aggregate1,
$__catDynReplace(
agg,
$aggType2($aggField2),
min(cpu_used_pct),
SELECT,
choices=>(
(min, max, avg),
(cpu_used_pct, mem_used_pct)
)
) AS aggregate2,
$__catSelect(app, app_name AS "Application"),
$__catSelect(branch, branch_name AS "Branch"),
city
FROM system_stats
WHERE $__timeFilter("timestamp")
GROUP BY $__catGroup(app, "Application"),
$__catGroup(branch, "Branch"),
city
Option 2: Use a single instance and handle the expression appropriately
SELECT $__catDynReplace(
agg,
$aggType1($aggField1) AS aggregate1,
$aggType2($aggField2) AS aggregate2,
min(cpu_used_pct) AS aggregate1,
min(mem_used_pct) AS aggregate2,
SELECT,
choices=>(
(min, max, avg),
("cpu_used_pct", "mem_used_pct"),
(min, max, avg),
("cpu_used_pct", "mem_used_pct")
)
),
$__catSelect(app, app_name AS "Application"),
$__catSelect(branch, branch_name AS "Branch"),
city
FROM system_stats
WHERE $__timeFilter("timestamp")
GROUP BY $__catGroup(app, "Application"),
$__catGroup(branch, "Branch"),
city
In the second approach, $__catDynReplace(...) is used to return multiple selections within a single expression. While this is supported, the query author must ensure that the DM schema is preserved. Failure to do so may result in unforeseen downstream issues.
How can the same query supply both time-bucketed and raw time series data?
If you want to write a query that enables time bucketing when requested, and otherwise returns raw time series data, the following query can be used.
SELECT $__catDynReplace(
timebucket,
toStartOfInterval("timestamp", INTERVAL $interval),
"timestamp",
SELECT,
choices=>('5 minutes', '15 minutes', '30 minutes', '1 hour')
) AS ts,
$__catSelect(
timebucket,
avg(cpu_used_pct),
cpu_used_pct
) AS "CPU Util",
$__catSelect(
timebucket,
avg(mem_used_pct),
mem_used_pct
) AS "Memory Util"
FROM system_stats
WHERE $__timeFilter("timestamp")
GROUP BY $__catGroup(timebucket, ts)
