doc/how-to-design.md - third_party/github/google/emboss - Git at Google

 # Things to Think About When Designing Features for Emboss (An Incomplete List)

 Original Author:

 Ben Olmstead (aka reventlov, aka Dmitri Prime), original designer and author of
 Emboss


 # General Design Principles

 There are many, many books, articles, talks, classes, and exercises on good
 software design, and most general design principles apply to Emboss.  In this
 section, I will only cover the "most important" principles and those that I do
 not see highlighted in many other places.


 ## Design to Real Problems, Not Hypotheticals

 In order to avoid "second system effect," designs that do not work in practice,
 and wasted effort, it is best to design to a specific problem — preferably a
 few instances of that problem, so that your design is more likely to solve a
 wide range of real world problems.

 For example, in Emboss if you wait until you have a specific data structure
 that is awkward or impossible to express, then try to find examples of other
 structures that are awkward in the same way, and then design a feature to
 handle those data structures, you are much more likely to come up with a
 solution that a) will actually be used, and b) will be used in more than one
 place.


 ## Design to the Problem, Not the Solution

 Often, users will have a problem, think "I could solve this if I could do X,"
 and then ask for a feature for X without mentioning their original problem.  As
 a software designer, one of the first things you should do is try to figure out
 the original problem — usually by asking the user some probing questions — so
 that you can design to the problem, not to the user's solution.

 (Note that this is sometimes true even if you are the user: it is easy to get
 tunnel vision about a solution you came up with.  Sometimes you need to step
 back and try to find a different solution.)


 ## Do Not Try to Do Everything

 Avoid the temptation to cover every possible use case, even if some of those
 would generally fit within the domain of your project.  A project like Emboss
 will attract extremely specific requests — requests whose solutions do not
 generalize.


 ### Emboss is a "95% Solution"

 Instead of trying to cover every use case for every user, leave "escape
 hatches" in your design, so that users can use Emboss for the cases it covers,
 and integrate their own solutions in the places that Emboss does not cover.

 There will always be formats that Emboss cannot handle without becoming an
 actual programming language — even something as "basic" as compression is
 generally beyond what Emboss is meant to be capable of.


 ## Be Conservative

 Emboss has strong backwards-compatibility guarantees: in particular, once a
 feature is "released," support for that feature is guaranteed more or less
 forever.  Because of this, new features should be narrow, even if there are
 "obvious" expansions, and even if narrowing the feature actually takes more
 code in the compiler.  You can always expand a feature later, but narrowing it
 or cutting it out would break Emboss's support guarantees.

 Although this principle is very standard for professional, publicly-released
 software, it may be a culture shock to developers who are used to
 "monorepo"[^mono] environments such as Google — it is not possible to just
 update all users in the real world!  Note that even many of Google's *open
 source* projects, such as Abseil, require their users to periodically update
 their code to the latest conventions, which imposes a cost on users of those
 projects.  Emboss is intended for smaller developers and embedded systems,
 which often do not have the resources for such migrations.

 [^mono]: In the several years that Emboss spent inside Google's monorepo it
     underwent many large, backwards-incompatible changes that made the current
     language significantly better.  Early incubation in a controlled
     environment can be valuable for a new language!


 ## Design for Later Expansion

 ### Leave "Reserved Space" for Future Features

 Emboss uses `$` in many keyword names, but does not allow `$` to be used in
 user identifiers — this lets Emboss add `$` keywords without worrying about
 colliding with identifiers in existing code.  (This is in direct contrast to
 most programming languages, where introducing new keywords often breaks
 existing code.)

 As another example, Emboss disallows identifiers that collide with keywords in
 many programming languages — this gives room for Emboss to add back ends for
 those programming languages later, without having to figure out a convention
 for mangling identifiers that collide.  As a real-world counterexample,
 Protocol Buffers had to figure out a convention for handling field names that
 collide with C++ identifiers such as `class` — and `protoc` still generates
 broken C++ code if you have two fields named `class` and `class_` in the same
 `message`.


 ### Leave "Extension Points"

 An "extension point" is a place where someone should be able to hook into the
 system without changing the system.  This can be an API, a "hook," a defined
 data format, or something else entirely, but the defining factor is that it is
 a way to add new features or alter behavior without changing the existing
 software.

 In practice, many extension points won't "just work" until there are at least a
 few things using them, due to bugs or unexpected coupling, but in principle
 they should not require any modification.

 One extension point in the Emboss compiler is the full separation between front
 and back ends, so that future back ends (such as Rust, Protocol Buffers, PDF
 documentation, etc.) can be added without changing the overall design or
 (theoretically) any of the existing compiler.[^ext]

 [^ext]: This is not unique or original to Emboss: separate front and back ends
     are totally standard in modern compiler design.

 In the physical world, an electrical outlet or a network port is an extension
 point — there is nothing there right now, but there is a defined place for
 something to be added later.


 ### Leave "Lines of Cleavage"

 A "line of cleavage" is similar to an extension point, except that instead of
 being a ready-to-go place to add something new, it's a place where the major
 work was done, but there are still some pieces that need to be fixed up.

 A line of cleavage in the Emboss compiler is the use of a special `.emb` file
 (`prelude.emb`) to define "built-in" types, with the aim of eventually allowing
 end users to define their own types at the same level.  This feature still has
 open design decisions, such as:

 *   How will users define their type for the back end(s)?
 *   How will users define the range of an integer type for the expression
     system?

 But these are relatively minor compared to the larger question of "how can
 Emboss allow end users to define their own basic types?"

 In software, lines of cleavage are usually invisible to end users, and can be
 difficult to see even for developers working on the code.

 In the physical world, an example of this is putting empty conduit into walls
 or ceilings: that way, new electrical or communication wires or pneumatic tubes
 can be pulled through the conduit and attached to new outlets, without having
 to open up *all* the walls.


 ## Consider Known Potential Features

 Every complex software system has a cloud of potential features around it:
 features which, for one reason or another, have not been implemented yet, but
 which some stakeholder(s) want.  These features usually exist at every stage
 from "idle thought in a developer's mind" to "partially implemented, but not
 finished," and the likelihoods of each one to become a finished feature cover
 an equally wide range.

 When designing a new feature there are very good reasons to think about these
 potential features:

 First, you should ensure that your new feature does not make another
 highly-desirable feature impossible.  In Emboss, for example, if your new
 feature made it impossible to support a string type, that would be a very good
 reason to redesign your feature (or abandon it, if it is fundamentally
 incompatible).

 Second, sometimes you can tweak your design so that a potential feature becomes
 obsolete: fundamentally, every feature request exists to solve a problem, and
 often it is not the only way to solve that problem.  If you can solve it in a
 different way, you can make users happy and avoid some future work.  (Though be
 careful: it can be difficult to infer the full scope of a user's problem(s)
 from a feature request.)

 Third, thinking about specific potential features can help narrow the amount of
 "future design space" that you need to consider, which makes it easier to put
 extension points and lines of cleavage in your design in places where they will
 actually be used.


 # General Language Design Principles

 In contrast to general software design principles, there are far fewer sources
 on good *language* design.  I speculate that this is because there are far
 fewer language designers than software designers.  (There are tens of millions
 of software developers, but only tens of thousands of programming, markup, and
 data definition languages — and of those, maybe two thousand or so are
 "serious" languages with significant real-world use.)

 Luckily, there are many publicly available and documented languages to learn
 from directly.

 Language design can be very roughly divided into syntactic and semantic
 concerns: syntax is how the language *looks* (what symbols and keywords are
 used, and in what order), while semantics cover how the language *works* (what
 actually happens).  It might seem like semantics are more important, but syntax
 has a huge effect on how easy it is to understand existing code and to write
 correct code, which are both incredibly important in real-world use.

 In this section, I will try to outline language design principles that I have
 found or developed, particularly when they are useful for Emboss.


 ## Be Mindful of the Power/Analysis Tradeoff

 [Turing-complete languages cannot be fully
 analyzed](https://en.wikipedia.org/wiki/Halting_problem).  This is one of the
 reasons that languages like HTML and CSS are not programming languages: the
 more expressive a language is, the more difficult it is to analyze.

 The `.emb` format is intended to be more on the declarative side, so that
 definitions can be analyzed and transformed as necessary.


 ## Look at Other Languages

 Although Emboss is a data definition language (DDL), not a programming
 language, many lessons and principles from programming language design can be
 applied, as well as lessons from other DDLs, and sometimes even interface
 definition languages (IDLs), as well as markup and query languages.

 In particular, for Emboss it is often worth looking at:

 *   Popular programming languages: C, C++, Rust, JavaScript, TypeScript, C#,
     Java, Go, Python 3, Swift, Objective C, Lua.  "Systems" programming
     languages such as C, C++, and Rust are usually the most relevant of these,
     but it is useful to survey all the popular languages because many Emboss
     users will be familiar with them.  Note that Lua is used for Wireshark
     packet definitions.

 *   Selected "interesting" programming languages: Wuffs, Haskell, Ocaml, Agda,
     Coq.  These have some lessons for Emboss, especially its expression system
     — in particular, they're all much more principled than "standard"
     programming languages about how they handle types and values.  There are
     many other programming languages that have interesting ideas (FORTH,
     Prolog, D, Perl, Logo, Scratch, APL, so-called "esoteric" programming
     languages), but they usually are not relevant to Emboss.

 *   DDLs: Kaitai Struct, Protocol Buffers, Cap'n Proto, SQL-DDL.  Kaitai Struct
     is the closest of these to solving the same problem as Emboss (though it
     has some fundamentally different design decisions which make it far worse
     for embedded systems), but all have some lessons.  Some higher-level schema
     languages like DTD, XML Schema, or JSON Schema tend to be less relevant to
     Emboss.  Note that there are a number of DDLs that are also IDLs: in actual
     use, some of them (Protocol Buffers) are used more often for their DDL
     features, while others (XPIDL, COM) are used more for their IDL features.


 ## Learn Academic Theory

 Many (most?) languages are designed by people who have minimal knowledge of the
 academic theories of how programming languages work — for Emboss, Category
 Theory is particularly useful, and the computer science of parsers (especially
 LR(1) parsers) is useful for tweaking the parser generator or adding new
 syntax.

 This is a case where a little bit of learning goes a long way: you do not need
 to learn a *lot* about parsers or Category Theory to benefit from them.


 ## Try to Acquire Practical Knowledge

 Many of the academic topics related to programming language design have
 corresponding industrial knowledge, and there are practical concerns that have
 very little to do with academic theory.

 The Emboss compiler is (loosely) based on the design of LLVM, with a series of
 transformation passes that operate somewhat independently, and independent back
 end code generators.[^designoops]

 [^designoops]: After many years of experience with this, I think that this is
     not quite the right design for Emboss, and I would make two major changes:
     first (and simplest), I would divide the current "front end" into a true
     front end that only handled syntax and some types of syntax sugar, and a
     "middle end" that handled all of the symbol resolution, bounds analysis,
     constraint checking, etc.  Second, I would use a "compute-on-demand" (lazy
     evaluation) approach in the middle end, which would allow certain
     operations to be decoupled.  The LLVM design is more suited for independent
     optimization passes, not for the kind of gradual annotation process in the
     Emboss middle end.

 As another example, understanding how (and how well) Clang, GCC, and MSVC can
 optimize C++ code is crucial to generating high-performance code from Emboss
 (and Emboss leans very heavily on the C++ compiler to optimize its output).

 Some bits of practical knowledge are tiny little bits of almost-trivia.  For
 example, if you have C or C++ code in a (text) template, and you use `$` to
 indicate substitution variables (as in `$var` or `$var$`), then most editors
 and code formatters will treat your substitution variables as normal
 identifiers.  This is because almost every C and C++ compiler allows you to use
 `$` in identifiers, even though there has never been a C or C++ standard that
 allows those names, and it is rarely noted in any compiler, editor, or
 formatter's documentation.


 ## Use Existing Syntax

 Emboss pulls many conventions from programming, data definition, and markup
 languages.  In general, if there is a feature in Emboss that works in a way
 that is the same as in other languages, it is best to pull syntax from
 elsewhere — ideally, pull in the most common syntax.  Many examples of this in
 Emboss are so common you might not even think about them:

 *   Arithmetic operators (`+`, `-`, `*`)
 *   Operator precedence (`*` binds more tightly than `+` and `-`, but also: see
     the next section)

 Other examples are most specific, with no universal convention:

 *   `: Type` syntax for type annotation (TypeScript, Python, Ocaml, Rust, ...)

 This is *especially* important for Emboss, because most people reading or
 writing Emboss code will not want to spend much time becoming an "Emboss
 expert" — where someone might be willing to spend days or weeks to learn how to
 write Rust code, they are more likely to spend hours or minutes learning to
 write Emboss.


 ## Avoid Existing Syntax

 However, there are three main reasons to avoid using existing syntax:

 *   The "standard" syntax is error prone.  One example of this is operator
     precedence in most programming languages: errors related to not knowing the
     relative precedence of `&&` and `||` are so common that most compilers have
     an option to warn if they are mixed without parentheses.  Emboss handles
     this — and a few other error-prone constructs — by having a *partial
     ordering* for precedence instead of the standard total ordering, and making
     it a syntax error to mix operators such as `&&` and `||` that have
     incomparable (neither equal, less than, nor greater than) precedence.  As
     far as I can tell, this is a totally new innovation in Emboss: there is no
     precedent (no pun intended) whatsoever for partial precedence order.

     When avoiding syntax in this way, it is ideal to make the standard syntax
     into a syntax error (so that no one can use it accidentally) and to add an
     error message to the compiler that suggests the correct syntax.

 *   The existing syntax is not used consistently: if multiple programming
     languages use the same syntax for slightly different semantics, it is
     usually worth avoiding the syntax.  For example, `/` has quite a few
     different semantics — in many languages, it is a type-parameterized
     division, where the numeric result depends on the (static or dynamic) types
     of its operands, and across languages, the "integer division" flavor is not
     consistent — in most programming languages it is *truncating division* (`-7
     / 3 == -2`), but in some programming languages it is *flooring division*
     (`-7 / 3 == -3`).

 *   The semantics do not match: if an Emboss feature is *almost*, but *not
     quite* equivalent to a feature in other languages, it is best to avoid
     making the Emboss feature look like the other feature.


 ## Poll Users/Programmers

 When designing a new feature, try to come up with several alternatives and poll
 Emboss users (or sometimes non-Emboss-using programmers) as to which one they
 prefer.

 For syntax, one especially powerful technique is to show an example of the
 proposed syntax to people who have never seen it, and ask "what do you think
 this means?" without any hinting or prompting.  This is the "gold standard" way
 of finding out whether your syntax is clear or not.


 ## Avoid Error-Prone Constructs

 Computing now has roughly seventy years of experience with artificial languages
 (in programming, markup, data definition, query, etc. flavors), and we have
 learned a lot about what kinds of constructs are error-prone for humans to use.
 Avoid these, where possible!  Some examples include:

 *   Large semantic differences should not have small, easily-overlooked
     syntactic differences.  For example, allowing single- and double-character
     operators (`=` and `==`, `|` and `||`, etc.) in the same contexts: a
     classic C-family programming error is to use `=` in a condition instead of
     `==`.  Many modern languages either force `=` to be used only in "statement
     context" (and some, like C#, also ban side-effectless statements such as `x
     == y;`) or use a different operator like `:=` for assignment.  (Or both, as
     in Python, which allows `:=` but not `=` for "expression assignment.")

 *   Syntax should have *consistent* semantic meaning.  For example, in
     JavaScript these two snippets mean the same thing:

     ```js
     return f() + 10;
     ```

     ```js
     return f() +
               10;
     ```

     but this one is different (it returns `undefined`, thanks to JavaScript's
     automatic `;` insertion):

     ```js
     return
         f() + 10;
     ```

     A small difference in the placement of the line break leads to totally
     different semantics!

     C++ has a number of places where identical syntax can have wildly different
     semantics, especially (ab)use of operator overloads and [the most vexing
     parse](https://en.wikipedia.org/wiki/Most_vexing_parse).

 *   Hoare calls "null" his "billion-dollar mistake," and the way that null
     pointers are handled in most programming languages, especially C and C++,
     is particularly error-prone.  (But note that it isn't really "null" itself
     that is problematic — it's that there is no way to mark a pointer as "not
     null," and that doing anything with a null pointer leads to undefined
     behavior.  However, some popular language features, such as the `?.`
     operator found in several programming languages and the `std::optional<>`
     type in C++, show that there is some utility to nullable types, as long as
     there is language support for enforcing null checks and/or allowing null to
     propagate in the same way that NaN can.)

 *   Edge cases, such as integer overflow, are difficult for humans to reason
     about.  In systems programming languages like C and C++, this leads to a
     significant percentage of security flaws.  (C and C++ compilers use the
     "integer overflow is undefined" rule *extensively* in optimization, so
     there are pragmatic trade-offs in general.  Emboss is used in smaller
     contexts with tighter safety guarantees.)


 # Emboss-Specific Considerations

 Emboss sits in a section of design space that has very few alternatives, and as
 a result there are things to think about when designing Emboss features that do
 not apply to many other languages.

 Also, because Emboss already exists, there are a number of systems within
 Emboss-the-language that may interact with new features.

 And finally, if you want your feature to become implemented, it is necessary to
 consider how difficult it would be to implement new features in `embossc`.


 ## Survey Data Formats

 Maybe the least fun (at least for me[^unfun]) part of designing Emboss features
 is reading through data sheets, programming manuals, RFCs, and user guides to
 understand the data formats used in the real world, so that any new feature can
 handle a reasonable subset of those formats.  Some sources to consider:

 *   Data sheets and programming manuals for:
     *   complex sensors, such as LiDAR
     *   GPS receivers
     *   servos
     *   LED panels and segmented displays
     *   clock hardware
     *   ADCs and DACs
     *   camera sensors
     *   power control devices
     *   simple sensors such as barometers, hygrometers, current sensors,
         voltage sensors, light sensors, etc. (though many very simple sensors
         use analog outputs or very, very simple digital outputs that do not
         have a "protocol" as such)
 *   RFCs for low-level protocols such as Ethernet, IP, ICMP, UDP, TCP, and ARP

 <!-- TODO: assemble a list of links to actual examples -->

 [^unfun]: One of my original motivations for creating Emboss is that I find
     reading data sheets and implementing code to read/write the data formats
     therein to be extremely tedious.


 ## Structure Layout System

 The "heart" of Emboss is what may be called the "structure layout system:" the
 engine that determines which bits to read and write in order to produce or
 commit the values of fields.  When designing, consider:

 *   Does this feature require reaching "outside" of a scope?  For example,
     referencing a sibling field from within a field's scope is currently
     impossible, because each field has its own scope.  Allowing `[requires:
     this == sibling]` means expanding that scope.

 *   Does this feature require information that is not (currently) available to
     the layout engine, or not available at the right place or time?  For
     example, if you are designing a feature to allow field sizes to be `$auto`,
     how does that interact with structures that are variable size?

 *   Does this feature require information that is potentially circular, or
     would it interact with another potential feature to require circular
     information, and is there a way to resolve that?  For example: if you are
     designing a feature to allow field sizes to be `$auto`, inferring their
     size from their type, how will that interact with the potential feature to
     allow `struct`s that grow to the size they are given?


 ## Expression System

 Although most expressions in Emboss definitions are simple (such as `x*4` or
 even just `0`), the expression system in Emboss tracks a lot of information,
 such as:

 *   What is the type of every subexpression (e.g., integer, specific
     enumeration, opaque, etc.)?
 *   For integer and boolean expressions, does the expression evaluate to fixed
     (constant) value?
 *   For integer expressions, what are the upper and lower bounds of the
     expression?  (Used for determining the correct integer types to use in
     generated code.)
 *   For integer expressions, is the value guaranteed to be equal to some fixed
     value modulo some constant?  (Used for generating faster code for aligned
     memory access.)

 When designing a feature, consider:

 *   Will any new types be `opaque` to the expression system, or will it be
     possible to perform operations on them?  If they are `opaque` for now, will
     they stay that way, or will it be possible to manipulate them in the
     future?  For example, adding a string type in Emboss might start as
     `opaque`, but allow operations like "value at index" or "substring" in the
     future.
 *   When adding new operations, how will they interact with the bounds and
     alignment tracking?  For example: truncating division often breaks
     alignment tracking, whereas flooring division does not.
 *   Will this feature invalidate existing code?  Anything that causes the
     inferred integer bounds of existing code to expand can break existing code.

 Note that the entire point of Emboss is to provide a bridge between physical
 data layout (as defined in the structure layout system) and abstract values
 with no specific representation (as exposed through the expression system).


 ## Parsing

 Any new syntax has to be added to the parser.  Aside from the language design
 considerations for new syntax (see the ["General Language Design Principles"
 section](#general-language-design-principles)), there are a few levels of
 concern for the actual implementation:

 *   Is it computationally feasible to parse this syntax in an intuitive,
     unambiguous way?
 *   Is it humanly feasible to express this syntax as an LR(1) grammar that can
     be parsed by Emboss's shift-reduce parser engine?
 *   Is it feasible to parse this syntax using a different parsing engine type
     (Earley, recursive descent, TDOP, parser combinator, etc.)?

 The first consideration is more of a general language design consideration: if
 your language design says "users will be able to specify their program in
 English," that is not really feasible (or unambiguous).  (Not that it hasn't
 been tried, many times.)

 The second consideration — can you add this syntax to `embossc`? — is the most
 practical and important consideration for Emboss.  LR(1) grammars are pretty
 restrictive (though shift-reduce parsers have advantages — there are reasons
 Emboss is using one), and even when it is *possible* to express a particular
 syntactic construct in LR(1)[^zimm], it may be difficult for most programmers to
 actually do so.  As a practical matter, I recommend trying to actually add your
 syntax to `module_ir.py`.

 [^zimm]: I (Ben Olmstead) think it would be awesome to implement [[Zimmerman,
     2022](https://arxiv.org/abs/2209.08383)] plus a few extensions of my own
     devising in Emboss's shift-reduce engine, which would make the grammar
     design space significantly larger.  I would also separate the parser
     generator engine into its own project.

 The third consideration is more future-focused and abstract: does this syntax
 lock Emboss into using a shift-reduce parser in the future?  Ideally, no.
 Luckily(?), LR(1) grammars are one of the more restrictive types of grammars in
 common use, so it is likely that anything that can be handled by the current
 parser can be handled by many other types of parsers.


 ## Generated Code

 Right now, there is only the generated C++ code, but there should be other back
 ends in the future.  Some new features are pure syntax sugar (e.g., `$next` or
 `a < b < c`) that are replaced in the IR long before it reaches the back end
 (e.g., with the offset+length of the syntactically-previous field, or the IR
 equivalent of `a < b && b < c`), while others require extensive changes to how
 code is generated.

 *   What information will the back end need in order to generate working code?
 *   Does this feature require embedded-unfriendly generated code?  (E.g.,
     memory allocation, I/O.)
 *   Can the existing C++ back end, which just walks the IR tree in a single
     pass while building up strings which are combined into a `.h`, handle this
     feature in its current design?
 *   How will this feature interact with various generated templates?
 *   Can/should this feature be, itself, templated?


 ## C++ Runtime Library

 The runtime library will be included with every program that touches Emboss, so
 it is important to make it efficient.  When adding features, consider:

 *   Can the feature be added in such a way that it does not cost anything for
     programs that do not use the feature?  A standalone C++ template will not
     be included in a program unless the program instantiates the template, but
     if the new code is used from somewhere in an existing function, it may be
     included in programs that do not use it directly.

 *   Can the feature be added without allocating any heap memory?  Can it be
     added with O(1) stack memory use?  Both of these are important for some
     embedded systems, such as OS-less microcontroller and hard-real-time
     environments.  Some features may intrinsically require memory allocation,
     in which case it is best if they can be separated: for example, Emboss
     structure-to-string conversion requires allocation, and even `#include`'ing
     the appropriate headers can be too much for some environments, even if the
     serialization code is never included in the final binary.

 *   How much can you rely on the C++ compiler to optimize things?  If you have
     to implement your own optimizations, that will cost more development time
     and add more complexity to the standard library.


 ## Compiler Complexity

 The Emboss compiler is already quite complex, and has many subsystems that
 interact.  It is already quite difficult to reason about some interactions.

 *   Can the feature be added at an "edge" of the compiler?  For example, if you
     can implement your feature as syntax sugar that converts the new feature to
     existing IR early in the compilation process, it is much easier to verify
     that it will not cause problematic interactions.  Similarly, if you can
     implement your feature entirely in the back end or in the runtime library,
     you do not need to worry about interactions inside the front end.

 *   If a feature cannot be added at an edge, how can you design it to minimize
     the complexity?  (Ideally, you could even unify existing systems in such a
     way that the overall complexity of the compiler is lower at the end.)


 ## Future Back Ends

 It is important to have some idea of how any feature would be implemented
 against future back ends.


 ### Programming Language (Rust/Python/Java/Go/C#/Lua/etc) Back Ends

 Some features may be difficult to implement in other languages.  For example,
 Python does not have a native `switch` statement, so any `switch`-like feature
 in Emboss may be awkward to implement — but this does not necessarily mean that
 Emboss should not have a `switch`.

 As a rule of thumb, languages can be grouped into tiers:

 1.  "Systems"/embedded-friendly languages: C++, Rust, C.  Top support.
 2.  Languages used for parsing/analyzing raw sensor dumps: C#, Java, Go,
     Python, etc.  Should have good support, but not gate any features.
 3.  Languages that are rarely used to touch binary data: JavaScript,
     TypeScript, etc.  Can be mostly ignored.
 4.  Dead and obscure languages: Perl, COBOL, APL, INTERCAL, etc.  Can be
     ignored entirely.

 (It may be difficult to classify some languages, such as FORTRAN, which is
 still hanging around in 2024.)

 Remember that other back ends may have different requirements and guarantees
 than the C++ back end: for example, it would be unreasonable for a Java back
 end to promise "no dynamic memory allocation."


 ### Other Data Format (Protobuf/JSON/etc) Back Ends

 These back ends would translate binary structures into alternate
 representations that are easier for some tools to use: for example, Google has
 many, many tools for processing Protocol Buffers, and JSON is popular in the
 open-source world.

 Most other formats have limitations that may make some kinds of Emboss
 constructs difficult or impossible to correctly reproduce: for example, Emboss
 already supports "infinitely nested" `struct` types, like:

 ```
 struct Foo:
     0 [+10]  Foo  child_foo
 ```

 Formats like Protobuf or JSON, which do not have any way of representing loops
 in their data graph, cannot handle this.

 Until the most recent versions of Protobuf, mismatches between Protobuf `enum`
 and Emboss `enum` made it functionally impossible to map any Emboss `enum`
 types onto Protobuf `enum` types: Emboss `enum` types are open (allow any
 value, even ones that are not listed in the `enum`), where all Protobuf `enum`
 types were closed (only allowed known values).  (The most recent Protobuf
 versions, Proto3 and Editions, allow you to have open `enum` types.)

 Generally, it is not worth blocking an Emboss feature because of these kinds of
 mismatches, but it is worth thinking about how to avoid them, if possible.


 ### Documentation (PDF/Markdown/etc) Back Ends

 These back ends would translate `.emb` files to a form of human-readable
 documentation, intended for publication on a web site, in an RFC, or as part of
 a PDF datasheet.  This type of back end is the motivation for having both `--`
 documentation blocks and `#` comments in Emboss.

 Since the output from these back ends would be intended for human consumption,
 for the most part you would only need to ensure that your feature can be
 understood by humans.