This document is provided for historical interest. This feature is now implemented in the form of the $next
keyword.
Many structures have many or most fields laid out consecutively, possibly with padding for alignment. For example:
struct Simple: 0 [+2] UInt field_1 2 [+4] UInt field_2 6 [+2] UInt field_3
For simple structures of fixed-size fields, the main issue is unchecked redundancy: it is relatively easy to enter the wrong value for the field offset, and no compiler checks will help.
For more complex structures with multiple variable-sized fields, this can lead to unwieldy offsets:
struct Complex: 0 [+2] UInt header_length (h) 2 [+h] Header header 2+h [+2] UInt body_length (b) 4+h [+b] Body body 4+h+b [+4] UInt crc
In both cases, there is some benefit to a shorthand notation that says 'this field should be placed immediately after the end of the lexically-previous field.
(Exact syntax TBD.)
struct Complex: 0 [+2] UInt header_length (h) $next [+h] Header header $next [+2] UInt body_length (b) $next [+b] Body body $next [+4] UInt crc
It is tempting to use some more specialized, terser syntax, like:
struct Complex: 0 [+2] UInt header_length (h) ^^ [+h] Header header ^^ [+2] UInt body_length (b) ^^ [+b] Body body ^^ [+4] UInt crc
However, an explicit symbol has the advantage that you can use it in expressions, if needed:
struct ComplexWithGap: 0 [+2] UInt header_length (h) $next [+h] Header header $next [+2] UInt body_length (b) # 2-byte reserved gap. $next+2 [+b] Body body $next [+4] UInt crc
Or, with a (not-yet-implemented) $align()
function:
struct ComplexWithAlignment: 0 [+2] UInt header_length (h) $align($next, 4) [+h] Header header $align($next, 4) [+2] UInt body_length (b) $align($next, 4) [+b] Body body $align($next, 4) [+4] UInt crc
Assuming the “new symbol” approach:
$
.LITERAL_TOKEN_PATTERNS
in compiler/front_end/tokenizer.py
.builtin-word -> "$new_symbol"
to the _word()
function in module_ir.py
.synthetics.synthesize_fields
, to replace the new symbol with the expanded representation. This should be relatively straightforward -- something that uses fast_traverse_ir_top_down()
to find all ir_pb2.Structure
elements in the IR, then iterates over the field offsets within each structure, and recursively replaces any ir_pb2.Expression
s with a builtin_reference.canonical_name.object_path[0]
equal to "$new_symbol"
. It would probably be useful to make traverse_ir._fast_traverse_proto_top_down()
into a public function, so that you do not have to re-write Expression
traversal.For this change, the back end should not need any modifications.