The Semantics of SystemVerilog Syntax
Trying to grasp any programming language from scratch can be a difficult task, especially when you start by reading the Language Reference Manual (LRM). The SystemVerilog LRM is 1315 pages. For comparison, the C(2018) LRM is 520 and the C++(2020) LRM is 1853 pages. LRMs are not written to teach one how to learn a language; they are mainly worded for people writing tools to be compliant with the standard. Although there are a number of additional resources for learning SystemVerilog, this post tackles one key aspect of the SystemVerilog usually left out: the BNF.
Backus-Naur Form
Syntax is a set of formal grammar rules, expressed in metalanguage called Backus-Naur Form (BNF), listed in Annex A of the SystemVerilog LRM. Since the creation of ALGOL, considered the ancestor of most modern programming languages, the BNF is used by compilers of many tool chains to determine what each token in the source code represents (i.e., keyword, identifier, operator, string, comment). And all the possible legal groupings of tokens are given names called “productions”.
Semantics is everything that happens after the source code is parsed. Does the compiler understand how to implement what it just parsed? Does what you have written violate some use pattern deemed too unsafe to implement? You can have illegal code with legal syntax, but you cannot have legal code without legal syntax. Syntax allows you to write !A
, but semantics will not allow it if A
is a real number.
Say we see the piece of code below, assuming A, B, and C have already been declared as integral variables:
A <= B <= C;
There could be different ways of interpreting this code depending on the context of the location of this code. As a statement it would be interpreted as “a non-blocking assignment to A with the expression B less than or equal to C”. As a constraint expression, this would be interpreted as “the result of A less than or equal to B must be less than or equal to C”. If we saw that same piece of code with more context:
always @(posedge clk) begin
A <= B <= C;
end
the BNF rules, by position of all the tokens, determines that the first <= is a non-blocking assignment operator, and the second <= is a less-than or equal to operator. If we assume the always
is in the proper location within a module, we can start looking for a matching production in the BNF.
always_construct ::= always_keyword statement always_keyword ::= always | always_comb | always_latch | always_ff statement ::= [ block_identifier : ] { attribute_instance } statement_item statement_item ::= blocking_assignment ; | nonblocking_assignment ; | procedural_continuous_assignment ; | inc_or_dec_expression ; | procedural_timing_control_statement | seq_block |… [14 more productions]
The always_construct production lets the parser know that SystemVerilog allows an always_keyword production followed by a single statement production. The always_keyword production lists four possible keywords that could start an always_construct. Then the parser looks through the 20 possible productions for a statement_item and finds a match with @(posedge)
as a procedural timing control statement
procedural_timing_control_statement ::= procedural_timing_control statement_or_null
If you follow statement_or_null production, you notice that it recurses back to a statement. That is what allows you to write a series of timing controls before any procedural statement. Then the parser comes to the begin
keyword, which is a sequential block statement.
seq_block ::= // from A.6.3 begin [ : block_identifier ] { block_item_declaration } { statement_or_null } end [ : block_identifier ]
From this we see a begin/end
block can take an optional block_identifier, 0 or more {block_item_declaration}, followed by 1 or more { statement_or_null }. By the way, this BNF rule dictates that declarations must come before any procedural statements. You cannot have declarations in the middle of a set of procedural statements.
Finally, we get down to the parsing of the variable A
, which matches 3 of the 20 available productions for a statement: a blocking_assignment, nonblocking_assignment, or an inc_or_dec_expression. The next token <=
only matches one production:
nonblocking_assignment ::=
variable_lvalue <= [ delay_or_event_control ] expression
Compiler Errors
Syntax errors occur when the compiler can’t find a match in the BNF for what you wrote in your source code. If you wrote A => B
, there is no rule in the BNF determining what that is supposed to represent. And it makes it extremely difficult to parse any code after the first syntax error because it has to throw out all the tokens it could not match. The token that follows it could be part of the statement that had the error or could be the start of a new statement.
Semantic errors come at a later stage when the compiler knows what you wrote, but other rules in the LRM prevent it from being legal. For example, if it turned out that B
and C
were class variables, that would be a semantic error and would be reported much later in compilation. Because of parameterization, sometimes the error cannot be generated until the final elaboration of the design.
DVCon
That’s it for just one small aspect of working with SystemVerilog. Hope to meet you on-line at DVCon 2021 when I’ll be chairing a number of sessions and presenting my paper on “The Life of SystemVerilog Variables”.
Exercise
Here is an exercise for you to work on yourself:
module top;
logic A;
logic [7:0] B=3;
initial begin
A = |~B; // illegal
A = |(~B); // legal with parenthesis
#B; // legal
#B++; // illegal
#(B++); // legal with parenthesis
end
endmodule
Question: Why are the above statements illegal without parentheses made legal by adding ()’s.
Hint: Start with expression and delay_control in the BNF.