v1.1.0 Summary Appendix I: Schema microlanguage

Appendix I: Schema microlanguage

One of the most important features available in pgModeler is code generation. This process consists of translating the database model into SQL or XML code depending on the desired usage. In the case of SQL generation, there is an additional feature which is the support for different PostgreSQL syntaxes from the series 10.x to 16.x. It is a fact that compared to its prior releases the DBMS changed some aspects of the SQL syntax for certain kinds of objects and due to these updates building a code generator tool capable of creating code for a wide range of versions can be challenging.

Thinking of that, pgModeler implements an internal schema micro language that is exclusively used to generate code from templates or schemas. This language is quite rudimentary and consists of simple syntactical structures that resemble the ones present in some common programming languages but on a very small scale. The reason for the micro language to exist is to avoid creating the code templates in the form of raw strings in pgModeler's source code which can greatly complicate the maintenance. Another reason is that the user can change the default formatting for the generated code just by tweaking the schema files (the source code of the templates) without the need to change the source code of the software which does not require a new compilation but a single restart.

The template or schema files are the ones suffixed with .sch and stored in the schemas folder in the pgModeler's installation path. There are several usages for them as told before but specifically, they are used to translate the model to SQL or XML, generate instructions that keep up-to-date the database in relation to a certain database model, and create commands that are used to reverse engineer a database. You can open one of these schema files in any text editor to see how the schema microlanguage is used.

Before we start to describe the language's structures it is important to inform some of its limitations:

  • It is a structural language, which means, any portion of code is interpreted from top to bottom, line by line;
  • There are no repetition structures like for, while, and do;
  • There are no jump instructions like goto;
  • Being structural there is no support for structures and classes;
  • The only data type available is a string;
  • Logical expressions can't be prioritized by using parenthesis as in other languages;
  • Only the logical operations and, or and 'not` are supported;
  • There are only basic comparison operations and they involve only string values being them == (equals to), != (different from), > (greater than), < (less than), >= (greater or equal to), <= (less or equal to);
  • There is no support for arrays, user-defined types, memory allocations, and many other advanced features provided by known modern programming languages.

The table below details all available instructions in the pgModeler's micro language and how they work.

Instruction Description
# Comment This is a comment statement. Starting a text with # will cause the parser to completely ignore any character from the position where the hash appears until the end of the line.
{attribute} References or outputs an attribute's value.
&{attribute} Outputs an attribute's value having the special characters & < > " converted to their respective XML entities.
%set {attrib} expr
%set @{attrib} expr
Defines a new attribute named attrib whose value is expr. The expression here can be another attribute, a string value, a metacharacter, or a plain text enclosed by [].

The instruction variation where we have an @ starting an attribute is used to create another attribute which name will be the value in the attribute {attrib}.
%unset {attrib} Clears the attribute's value.
[some plain text] By enclosing a text between [] causes the parser to not interpret any word in it and the string will be processed as it appears.
$br, $sp, $tb, $ob, $cb, $oc, $cc
$ms, $hs, $ps, $at, $ds, $am
These are metacharacters or special keywords that are converted into a specified character, printable or not. They are used to insert characters in the resulting code after being processed by the parser. Their reason to exist is that the schema parser ignores blank spaces, tabulations, and line breaks in the schema source code, so to use these characters in the generated code, the metacharacters were created. The meaning of each keyword is:

$br: line break
$sp: blank space
$tb: tabulation
$ob: open square bracket [
$cb: close square bracket ]
$oc: open curly bracket {
$cc: close curly bracket
$ms: money sign $
$hs: hash/number sign #
$ps: percentage sign %
$at: at character @
$ds: special data separator character
$am: ampersand character &
%if, %then, %else, %end Simple conditional instructions. They work almost the same way as the ones found in other programming languages. The only difference here is that the logical expression elements that come between %if and %then are evaluated as they appear. For instance, assume the logical expression A or B and C. First, the or operation will occur between A and B. The result of that operation will be compared with C through an and operation.
%and, %or, %not These are the logical operators. They are all self explainable and an example of their usage in a %if instruction could be:

%if {a} %or {b} %and %not {c} %then
...
%end

Note that the %not keyword is used only to negate the value of an attribute and can't be used to invert an expression as a whole as we can do in Boolean logic. Unlike other languages, an attribute with any value different from null (empty) is considered as true in the expression above otherwise is false, unless we use the comparison expression explicitly.
==, !=, >, <, >=, <= These are the comparison operators, being them: equals to (==), different from (!=), greater than (>), less than (<), greater or equals to (>=) and less or equals to (<=). They are all used to perform value comparison in the comparison expressions used in %if instruction.

By default, the attributes are compared as strings but there's a special behavior of these operations that let the values be compared either as integer or floating point values. Just append an i, e.g., A <=i B to implicitly convert the values of the attributes to integer or f, e.g., A >=f B to convert the values to floating-point before the comparison. This is useful when you need to compare two numbers wrapped in a sting like this %if "9.0" <f "10.0" %then .... If the value conversion is not used in the previous example the result would be false because the string 9.0 is greater than 10.0, now, if the floating-point conversion is used then the result of the expression would be true.
({attrib} oper “value”) This is a comparison expression and it can be used when we need to compare the current value of an attribute against a constant instead of only checking if an attribute has a value or not. One sample usage for a comparison expression can be:

%if ({pgsql-version} == ”9.4”) %then
...
%end

The expression above will only be true when the value of the attribute {pgsql-version} is 9.4 otherwise the expression is evaluated as false.


To give a practical example of how the schema micro language works let's take a look at the sample code below:

At the moment the parser starts to read the document the lines 1 and 2 are discarded because the first is a commented line and the second is empty. When it reaches lines 3 and 4 the parser will create an attribute called name whose value is tableA and a second empty attribute named columns.

Proceeding with the source code analysis, line 5 is discarded as well because is empty. Entering in line 6 the parser finds a plain text statement and it will output the text as it is written so the result will be CREATE TABLE . Note that additional space at the end of the statement, since the parser will not reject any character in this kind of instruction the blank space is kept. After that, the reference to {name} is found and the parser is instructed to print the attribute's value. Now, the parser will find another plain text statement and it will be outputted as ( and lastly a metacharacter $br is identified as being the one that outputs a line break. The whole parsed line will look like CREATE TABLE tableA (\n (the \n indicates a line break which isn't a printable character).

At line 7 the parser finds an%if statement with a simple expression %not {columns} %and ({name} == "tableA") that can be translated as “if columns is empty and the name is tableA”. The result of the expression is true because {columns} is empty and {name} has the value tableA as the declaration defined in the lines 3 and 4. Since the evaluated logical expression is true then the lines 8 and 9 are processed as well. The metacharacter in them are translated and the plain text statement is processed as well. The lines 10 to 12 are ignored because they are the else part of the if and the parser already evaluated it.

Finally, the parser reaches the last line 13 and finds another plain text statement [);] and a metacharacter $br and translates them. Thus the whole resulting code is:

CREATE TABLE tableA (
    id serial,
    PRIMARY KEY(id)
);


Despite the excessive usage of schema code to generate so few SQL lines the main advantage of this approach is to facilitate the code maintenance and give users the freedom to change the default formatting or even add additional instructions to the resulting code. Currently, the tool does not offer an interface to handle schema microlanguage easily. The only way to play around with it is to make use of the class SchemaParser available in pgModeler's codebase, which means, code in C++ and Qt. If the user wants to go deeper into this topic contact the pgModeler's developer for additional support.


Mar 25, 2024 at 10:37