Specification

The Dalmatiner Query Language (DQL) is loosely based on SQL syntax combining it with a function centric graphite like approach to data transformation.

We feel this combines two concepts that are very powerful in their respective areas. SQL is en excellent tool to describe data identity and relations, while functions are a lot more convenient when it comes to transformations.

DQL is a typed language, currently supporting four types:

  • metric - a metric stream, can not be a constant is returned as part of get statement
  • integer - a whole number i.e. 1 or 42 or -23
  • float - a floating point number i.e. 1.0 or 1e-10
  • time - time based data, a integer with a time suffixes: ms, s, m, h, d, w 1w
  • date - an iso date written encapsulated in " such as "2016-04-19" or a relative time which is the combination of a time and the keyword AGO such as: 1w AGO
  • string - a possibly quoted string. Many strings can be written without quoting them but when there are conflicts with keywords or strings contain spaces or special characters they can and need to be encapsulated in single quotes. So both banana and 'banana pudding' are valid strings.

Each query returns one or more metrics, the other types are not returnable.

Each query is following the same structure.

SELECT <statement | events>[, <statement | events>]* [ALIASSES <get statement>] <time range> [<limit>].

The following sections will describe any part and it's sub parts. If you find something in <> that is neither a datatype nor explained in the section itself it will be a own section as it is complex enough to warrant that.

time range

A time range specifies the timeframe over which the data is queried there are multiple ways to describe this. Supported ranges are:

  • BETWEEN <date> AND <date> - the range between two dates, the order of the dates does not matter.
  • LAST <time> - same as BETWEEN <now> AND <time> AGO
  • AFTER <date> FOR <time> - same as BETWEEN <date> AND <date> + <time>
  • BEFORE <date> FOR <time> - same as BETWEEN <date> AND <date> - <time>

limit

The limit allows to cut down on the number of returned 'rows'. This is done by first defining how many rows are desired, and then by what criteria they should be selected. As a criteria every aggregate function can be used, omitting both the 'metric' part of it as well as the time range.

  • TOP <integer> BY <function> - Returns rows for the the N largest values
  • BOTTOM <integer> BY <function> - Returns rows for the the N smallest values

event statement

Extracting events works nearly the same was as extracting data points a events statement has the form:

EVENTS FORM <bucket> [WHERE <event selector>] [AS <name>] [METADATA <metadata>]

The same time range applies as for metric queries. The <metadata> and <as> section are similar to a metrics query with the exceptions that lookup variables will provide results.

event selector

It is possible to filter on events before returning them multiple filters can be combined with AND or OR. Each filter has the form:

<event path> <comparison operator> <value>

  • <event path> is a JSON like path segment that can access nested both objects as well as arries, an example could be data[1].
  • <comparison operator> currently the following currently supported ==, !=, >, <, >=, =< and ~=.
  • <value> Values can be either integers, floats or strings.

The ~= operator is a special case, it will treat the value, which must be a string, as a regular expression and matches it against the value.

statement

A statement is the fundamental building block of a query it defines the source of a metric as well as the transformations carried out of that a statement it consists of two parts:

<transformation> [AS <naming>] [METADATA <metadata>]

Statements that use a get statement with globs or dimensions might be expanded so that a single statement in the query results in more then one row in the results.

Multiple statements can be combined using combination functions or arithmetics, this can be done to both expanded statements and manually written statements.

naming

Where naming can be a . separated list of strings, positional lookups such as $<position> or dimensional. <position> is an integer and both namespace and dimension are strings.

Naming is especially interesting when using expanding gets to identify different parts of the result.

metadata

The metadata section for each query allows to add some structured data to the returned results. It takes the form close a simplified JSON object without support for nesting or arrays. Both positional and dimensional variables are supported as values to keys along with strings, integers and floats.

An simple example would be: METADATA {color: red, host: $host, id: 7}

transformation

A transformation is recursively defined as one of the following things:

  • <group by | get statement | time shift>
  • <function>(<transformation>[, <arguments>]*)

Functions can either be transformations or combinations. The exact function is determined based on the argument types.

dimension

A reference to a dimension written as $[<namespace key>:]<dimension key>.

group by

<get statement> [GROUP BY <dimension>[, <dimension>] USING <function name>

This requires the get statement to be a dimension based statement and will expand the metrics then group them by the dimensions given using the function with the given name. The functions typing is a metric list.

time shift

<group by | get statement> SHIFT BY <time> Shifts the underlying get statement back in time, so if <time> is 1w then the data returned is one week older then what the <time range> specified.

get statement

Get statements can have two forms depending on weather you use dimensions or the classical tree like structure.

For the classical structure the statement is very simple - <metric key> BUCKET <bucket name> will return one ore multiple metric.

  • <bucket name> is a possible string.
  • <metric key> is a list strings or asterisks separated by the . character such as some.metric or some.'metric with a . in there' or some.*.metrics

When using dimensions the query looks slightly different - <metric name>|ALL FROM <collection name> [WHERE <where clause>]. Using the ALL keyword instead of a metric name will select all metrics in a collection.