blob: 04e28ea65643150f88ac186bbac8d9426d4f2356 [file] [log] [blame] [view] [edit]
# Things to Think About When Designing Features for Emboss (An Incomplete List)
Original Author:
Ben Olmstead (aka reventlov, aka Dmitri Prime), original designer and author of
Emboss
# General Design Principles
There are many, many books, articles, talks, classes, and exercises on good
software design, and most general design principles apply to Emboss. In this
section, I will only cover the "most important" principles and those that I do
not see highlighted in many other places.
## Design to Real Problems, Not Hypotheticals
In order to avoid "second system effect," designs that do not work in practice,
and wasted effort, it is best to design to a specific problem — preferably a
few instances of that problem, so that your design is more likely to solve a
wide range of real world problems.
For example, in Emboss if you wait until you have a specific data structure
that is awkward or impossible to express, then try to find examples of other
structures that are awkward in the same way, and then design a feature to
handle those data structures, you are much more likely to come up with a
solution that a) will actually be used, and b) will be used in more than one
place.
## Design to the Problem, Not the Solution
Often, users will have a problem, think "I could solve this if I could do X,"
and then ask for a feature for X without mentioning their original problem. As
a software designer, one of the first things you should do is try to figure out
the original problem — usually by asking the user some probing questions — so
that you can design to the problem, not to the user's solution.
(Note that this is sometimes true even if you are the user: it is easy to get
tunnel vision about a solution you came up with. Sometimes you need to step
back and try to find a different solution.)
## Do Not Try to Do Everything
Avoid the temptation to cover every possible use case, even if some of those
would generally fit within the domain of your project. A project like Emboss
will attract extremely specific requests — requests whose solutions do not
generalize.
### Emboss is a "95% Solution"
Instead of trying to cover every use case for every user, leave "escape
hatches" in your design, so that users can use Emboss for the cases it covers,
and integrate their own solutions in the places that Emboss does not cover.
There will always be formats that Emboss cannot handle without becoming an
actual programming language — even something as "basic" as compression is
generally beyond what Emboss is meant to be capable of.
## Be Conservative
Emboss has strong backwards-compatibility guarantees: in particular, once a
feature is "released," support for that feature is guaranteed more or less
forever. Because of this, new features should be narrow, even if there are
"obvious" expansions, and even if narrowing the feature actually takes more
code in the compiler. You can always expand a feature later, but narrowing it
or cutting it out would break Emboss's support guarantees.
Although this principle is very standard for professional, publicly-released
software, it may be a culture shock to developers who are used to
"monorepo"[^mono] environments such as Google — it is not possible to just
update all users in the real world! Note that even many of Google's *open
source* projects, such as Abseil, require their users to periodically update
their code to the latest conventions, which imposes a cost on users of those
projects. Emboss is intended for smaller developers and embedded systems,
which often do not have the resources for such migrations.
[^mono]: In the several years that Emboss spent inside Google's monorepo it
underwent many large, backwards-incompatible changes that made the current
language significantly better. Early incubation in a controlled
environment can be valuable for a new language!
## Design for Later Expansion
### Leave "Reserved Space" for Future Features
Emboss uses `$` in many keyword names, but does not allow `$` to be used in
user identifiers — this lets Emboss add `$` keywords without worrying about
colliding with identifiers in existing code. (This is in direct contrast to
most programming languages, where introducing new keywords often breaks
existing code.)
As another example, Emboss disallows identifiers that collide with keywords in
many programming languages — this gives room for Emboss to add back ends for
those programming languages later, without having to figure out a convention
for mangling identifiers that collide. As a real-world counterexample,
Protocol Buffers had to figure out a convention for handling field names that
collide with C++ identifiers such as `class` — and `protoc` still generates
broken C++ code if you have two fields named `class` and `class_` in the same
`message`.
### Leave "Extension Points"
An "extension point" is a place where someone should be able to hook into the
system without changing the system. This can be an API, a "hook," a defined
data format, or something else entirely, but the defining factor is that it is
a way to add new features or alter behavior without changing the existing
software.
In practice, many extension points won't "just work" until there are at least a
few things using them, due to bugs or unexpected coupling, but in principle
they should not require any modification.
One extension point in the Emboss compiler is the full separation between front
and back ends, so that future back ends (such as Rust, Protocol Buffers, PDF
documentation, etc.) can be added without changing the overall design or
(theoretically) any of the existing compiler.[^ext]
[^ext]: This is not unique or original to Emboss: separate front and back ends
are totally standard in modern compiler design.
In the physical world, an electrical outlet or a network port is an extension
point — there is nothing there right now, but there is a defined place for
something to be added later.
### Leave "Lines of Cleavage"
A "line of cleavage" is similar to an extension point, except that instead of
being a ready-to-go place to add something new, it's a place where the major
work was done, but there are still some pieces that need to be fixed up.
A line of cleavage in the Emboss compiler is the use of a special `.emb` file
(`prelude.emb`) to define "built-in" types, with the aim of eventually allowing
end users to define their own types at the same level. This feature still has
open design decisions, such as:
* How will users define their type for the back end(s)?
* How will users define the range of an integer type for the expression
system?
But these are relatively minor compared to the larger question of "how can
Emboss allow end users to define their own basic types?"
In software, lines of cleavage are usually invisible to end users, and can be
difficult to see even for developers working on the code.
In the physical world, an example of this is putting empty conduit into walls
or ceilings: that way, new electrical or communication wires or pneumatic tubes
can be pulled through the conduit and attached to new outlets, without having
to open up *all* the walls.
## Consider Known Potential Features
Every complex software system has a cloud of potential features around it:
features which, for one reason or another, have not been implemented yet, but
which some stakeholder(s) want. These features usually exist at every stage
from "idle thought in a developer's mind" to "partially implemented, but not
finished," and the likelihoods of each one to become a finished feature cover
an equally wide range.
When designing a new feature there are very good reasons to think about these
potential features:
First, you should ensure that your new feature does not make another
highly-desirable feature impossible. In Emboss, for example, if your new
feature made it impossible to support a string type, that would be a very good
reason to redesign your feature (or abandon it, if it is fundamentally
incompatible).
Second, sometimes you can tweak your design so that a potential feature becomes
obsolete: fundamentally, every feature request exists to solve a problem, and
often it is not the only way to solve that problem. If you can solve it in a
different way, you can make users happy and avoid some future work. (Though be
careful: it can be difficult to infer the full scope of a user's problem(s)
from a feature request.)
Third, thinking about specific potential features can help narrow the amount of
"future design space" that you need to consider, which makes it easier to put
extension points and lines of cleavage in your design in places where they will
actually be used.
# General Language Design Principles
In contrast to general software design principles, there are far fewer sources
on good *language* design. I speculate that this is because there are far
fewer language designers than software designers. (There are tens of millions
of software developers, but only tens of thousands of programming, markup, and
data definition languages — and of those, maybe two thousand or so are
"serious" languages with significant real-world use.)
Luckily, there are many publicly available and documented languages to learn
from directly.
Language design can be very roughly divided into syntactic and semantic
concerns: syntax is how the language *looks* (what symbols and keywords are
used, and in what order), while semantics cover how the language *works* (what
actually happens). It might seem like semantics are more important, but syntax
has a huge effect on how easy it is to understand existing code and to write
correct code, which are both incredibly important in real-world use.
In this section, I will try to outline language design principles that I have
found or developed, particularly when they are useful for Emboss.
## Be Mindful of the Power/Analysis Tradeoff
[Turing-complete languages cannot be fully
analyzed](https://en.wikipedia.org/wiki/Halting_problem). This is one of the
reasons that languages like HTML and CSS are not programming languages: the
more expressive a language is, the more difficult it is to analyze.
The `.emb` format is intended to be more on the declarative side, so that
definitions can be analyzed and transformed as necessary.
## Look at Other Languages
Although Emboss is a data definition language (DDL), not a programming
language, many lessons and principles from programming language design can be
applied, as well as lessons from other DDLs, and sometimes even interface
definition languages (IDLs), as well as markup and query languages.
In particular, for Emboss it is often worth looking at:
* Popular programming languages: C, C++, Rust, JavaScript, TypeScript, C#,
Java, Go, Python 3, Swift, Objective C, Lua. "Systems" programming
languages such as C, C++, and Rust are usually the most relevant of these,
but it is useful to survey all the popular languages because many Emboss
users will be familiar with them. Note that Lua is used for Wireshark
packet definitions.
* Selected "interesting" programming languages: Wuffs, Haskell, Ocaml, Agda,
Coq. These have some lessons for Emboss, especially its expression system
— in particular, they're all much more principled than "standard"
programming languages about how they handle types and values. There are
many other programming languages that have interesting ideas (FORTH,
Prolog, D, Perl, Logo, Scratch, APL, so-called "esoteric" programming
languages), but they usually are not relevant to Emboss.
* DDLs: Kaitai Struct, Protocol Buffers, Cap'n Proto, SQL-DDL. Kaitai Struct
is the closest of these to solving the same problem as Emboss (though it
has some fundamentally different design decisions which make it far worse
for embedded systems), but all have some lessons. Some higher-level schema
languages like DTD, XML Schema, or JSON Schema tend to be less relevant to
Emboss. Note that there are a number of DDLs that are also IDLs: in actual
use, some of them (Protocol Buffers) are used more often for their DDL
features, while others (XPIDL, COM) are used more for their IDL features.
## Learn Academic Theory
Many (most?) languages are designed by people who have minimal knowledge of the
academic theories of how programming languages work — for Emboss, Category
Theory is particularly useful, and the computer science of parsers (especially
LR(1) parsers) is useful for tweaking the parser generator or adding new
syntax.
This is a case where a little bit of learning goes a long way: you do not need
to learn a *lot* about parsers or Category Theory to benefit from them.
## Try to Acquire Practical Knowledge
Many of the academic topics related to programming language design have
corresponding industrial knowledge, and there are practical concerns that have
very little to do with academic theory.
The Emboss compiler is (loosely) based on the design of LLVM, with a series of
transformation passes that operate somewhat independently, and independent back
end code generators.[^designoops]
[^designoops]: After many years of experience with this, I think that this is
not quite the right design for Emboss, and I would make two major changes:
first (and simplest), I would divide the current "front end" into a true
front end that only handled syntax and some types of syntax sugar, and a
"middle end" that handled all of the symbol resolution, bounds analysis,
constraint checking, etc. Second, I would use a "compute-on-demand" (lazy
evaluation) approach in the middle end, which would allow certain
operations to be decoupled. The LLVM design is more suited for independent
optimization passes, not for the kind of gradual annotation process in the
Emboss middle end.
As another example, understanding how (and how well) Clang, GCC, and MSVC can
optimize C++ code is crucial to generating high-performance code from Emboss
(and Emboss leans very heavily on the C++ compiler to optimize its output).
Some bits of practical knowledge are tiny little bits of almost-trivia. For
example, if you have C or C++ code in a (text) template, and you use `$` to
indicate substitution variables (as in `$var` or `$var$`), then most editors
and code formatters will treat your substitution variables as normal
identifiers. This is because almost every C and C++ compiler allows you to use
`$` in identifiers, even though there has never been a C or C++ standard that
allows those names, and it is rarely noted in any compiler, editor, or
formatter's documentation.
## Use Existing Syntax
Emboss pulls many conventions from programming, data definition, and markup
languages. In general, if there is a feature in Emboss that works in a way
that is the same as in other languages, it is best to pull syntax from
elsewhere — ideally, pull in the most common syntax. Many examples of this in
Emboss are so common you might not even think about them:
* Arithmetic operators (`+`, `-`, `*`)
* Operator precedence (`*` binds more tightly than `+` and `-`, but also: see
the next section)
Other examples are most specific, with no universal convention:
* `: Type` syntax for type annotation (TypeScript, Python, Ocaml, Rust, ...)
This is *especially* important for Emboss, because most people reading or
writing Emboss code will not want to spend much time becoming an "Emboss
expert" — where someone might be willing to spend days or weeks to learn how to
write Rust code, they are more likely to spend hours or minutes learning to
write Emboss.
## Avoid Existing Syntax
However, there are three main reasons to avoid using existing syntax:
* The "standard" syntax is error prone. One example of this is operator
precedence in most programming languages: errors related to not knowing the
relative precedence of `&&` and `||` are so common that most compilers have
an option to warn if they are mixed without parentheses. Emboss handles
this — and a few other error-prone constructs — by having a *partial
ordering* for precedence instead of the standard total ordering, and making
it a syntax error to mix operators such as `&&` and `||` that have
incomparable (neither equal, less than, nor greater than) precedence. As
far as I can tell, this is a totally new innovation in Emboss: there is no
precedent (no pun intended) whatsoever for partial precedence order.
When avoiding syntax in this way, it is ideal to make the standard syntax
into a syntax error (so that no one can use it accidentally) and to add an
error message to the compiler that suggests the correct syntax.
* The existing syntax is not used consistently: if multiple programming
languages use the same syntax for slightly different semantics, it is
usually worth avoiding the syntax. For example, `/` has quite a few
different semantics — in many languages, it is a type-parameterized
division, where the numeric result depends on the (static or dynamic) types
of its operands, and across languages, the "integer division" flavor is not
consistent — in most programming languages it is *truncating division* (`-7
/ 3 == -2`), but in some programming languages it is *flooring division*
(`-7 / 3 == -3`).
* The semantics do not match: if an Emboss feature is *almost*, but *not
quite* equivalent to a feature in other languages, it is best to avoid
making the Emboss feature look like the other feature.
## Poll Users/Programmers
When designing a new feature, try to come up with several alternatives and poll
Emboss users (or sometimes non-Emboss-using programmers) as to which one they
prefer.
For syntax, one especially powerful technique is to show an example of the
proposed syntax to people who have never seen it, and ask "what do you think
this means?" without any hinting or prompting. This is the "gold standard" way
of finding out whether your syntax is clear or not.
## Avoid Error-Prone Constructs
Computing now has roughly seventy years of experience with artificial languages
(in programming, markup, data definition, query, etc. flavors), and we have
learned a lot about what kinds of constructs are error-prone for humans to use.
Avoid these, where possible! Some examples include:
* Large semantic differences should not have small, easily-overlooked
syntactic differences. For example, allowing single- and double-character
operators (`=` and `==`, `|` and `||`, etc.) in the same contexts: a
classic C-family programming error is to use `=` in a condition instead of
`==`. Many modern languages either force `=` to be used only in "statement
context" (and some, like C#, also ban side-effectless statements such as `x
== y;`) or use a different operator like `:=` for assignment. (Or both, as
in Python, which allows `:=` but not `=` for "expression assignment.")
* Syntax should have *consistent* semantic meaning. For example, in
JavaScript these two snippets mean the same thing:
```js
return f() + 10;
```
```js
return f() +
10;
```
but this one is different (it returns `undefined`, thanks to JavaScript's
automatic `;` insertion):
```js
return
f() + 10;
```
A small difference in the placement of the line break leads to totally
different semantics!
C++ has a number of places where identical syntax can have wildly different
semantics, especially (ab)use of operator overloads and [the most vexing
parse](https://en.wikipedia.org/wiki/Most_vexing_parse).
* Hoare calls "null" his "billion-dollar mistake," and the way that null
pointers are handled in most programming languages, especially C and C++,
is particularly error-prone. (But note that it isn't really "null" itself
that is problematic — it's that there is no way to mark a pointer as "not
null," and that doing anything with a null pointer leads to undefined
behavior. However, some popular language features, such as the `?.`
operator found in several programming languages and the `std::optional<>`
type in C++, show that there is some utility to nullable types, as long as
there is language support for enforcing null checks and/or allowing null to
propagate in the same way that NaN can.)
* Edge cases, such as integer overflow, are difficult for humans to reason
about. In systems programming languages like C and C++, this leads to a
significant percentage of security flaws. (C and C++ compilers use the
"integer overflow is undefined" rule *extensively* in optimization, so
there are pragmatic trade-offs in general. Emboss is used in smaller
contexts with tighter safety guarantees.)
# Emboss-Specific Considerations
Emboss sits in a section of design space that has very few alternatives, and as
a result there are things to think about when designing Emboss features that do
not apply to many other languages.
Also, because Emboss already exists, there are a number of systems within
Emboss-the-language that may interact with new features.
And finally, if you want your feature to become implemented, it is necessary to
consider how difficult it would be to implement new features in `embossc`.
## Survey Data Formats
Maybe the least fun (at least for me[^unfun]) part of designing Emboss features
is reading through data sheets, programming manuals, RFCs, and user guides to
understand the data formats used in the real world, so that any new feature can
handle a reasonable subset of those formats. Some sources to consider:
* Data sheets and programming manuals for:
* complex sensors, such as LiDAR
* GPS receivers
* servos
* LED panels and segmented displays
* clock hardware
* ADCs and DACs
* camera sensors
* power control devices
* simple sensors such as barometers, hygrometers, current sensors,
voltage sensors, light sensors, etc. (though many very simple sensors
use analog outputs or very, very simple digital outputs that do not
have a "protocol" as such)
* RFCs for low-level protocols such as Ethernet, IP, ICMP, UDP, TCP, and ARP
<!-- TODO: assemble a list of links to actual examples -->
[^unfun]: One of my original motivations for creating Emboss is that I find
reading data sheets and implementing code to read/write the data formats
therein to be extremely tedious.
## Structure Layout System
The "heart" of Emboss is what may be called the "structure layout system:" the
engine that determines which bits to read and write in order to produce or
commit the values of fields. When designing, consider:
* Does this feature require reaching "outside" of a scope? For example,
referencing a sibling field from within a field's scope is currently
impossible, because each field has its own scope. Allowing `[requires:
this == sibling]` means expanding that scope.
* Does this feature require information that is not (currently) available to
the layout engine, or not available at the right place or time? For
example, if you are designing a feature to allow field sizes to be `$auto`,
how does that interact with structures that are variable size?
* Does this feature require information that is potentially circular, or
would it interact with another potential feature to require circular
information, and is there a way to resolve that? For example: if you are
designing a feature to allow field sizes to be `$auto`, inferring their
size from their type, how will that interact with the potential feature to
allow `struct`s that grow to the size they are given?
## Expression System
Although most expressions in Emboss definitions are simple (such as `x*4` or
even just `0`), the expression system in Emboss tracks a lot of information,
such as:
* What is the type of every subexpression (e.g., integer, specific
enumeration, opaque, etc.)?
* For integer and boolean expressions, does the expression evaluate to fixed
(constant) value?
* For integer expressions, what are the upper and lower bounds of the
expression? (Used for determining the correct integer types to use in
generated code.)
* For integer expressions, is the value guaranteed to be equal to some fixed
value modulo some constant? (Used for generating faster code for aligned
memory access.)
When designing a feature, consider:
* Will any new types be `opaque` to the expression system, or will it be
possible to perform operations on them? If they are `opaque` for now, will
they stay that way, or will it be possible to manipulate them in the
future? For example, adding a string type in Emboss might start as
`opaque`, but allow operations like "value at index" or "substring" in the
future.
* When adding new operations, how will they interact with the bounds and
alignment tracking? For example: truncating division often breaks
alignment tracking, whereas flooring division does not.
* Will this feature invalidate existing code? Anything that causes the
inferred integer bounds of existing code to expand can break existing code.
Note that the entire point of Emboss is to provide a bridge between physical
data layout (as defined in the structure layout system) and abstract values
with no specific representation (as exposed through the expression system).
## Parsing
Any new syntax has to be added to the parser. Aside from the language design
considerations for new syntax (see the ["General Language Design Principles"
section](#general-language-design-principles)), there are a few levels of
concern for the actual implementation:
* Is it computationally feasible to parse this syntax in an intuitive,
unambiguous way?
* Is it humanly feasible to express this syntax as an LR(1) grammar that can
be parsed by Emboss's shift-reduce parser engine?
* Is it feasible to parse this syntax using a different parsing engine type
(Earley, recursive descent, TDOP, parser combinator, etc.)?
The first consideration is more of a general language design consideration: if
your language design says "users will be able to specify their program in
English," that is not really feasible (or unambiguous). (Not that it hasn't
been tried, many times.)
The second consideration — can you add this syntax to `embossc`? — is the most
practical and important consideration for Emboss. LR(1) grammars are pretty
restrictive (though shift-reduce parsers have advantages — there are reasons
Emboss is using one), and even when it is *possible* to express a particular
syntactic construct in LR(1)[^zimm], it may be difficult for most programmers to
actually do so. As a practical matter, I recommend trying to actually add your
syntax to `module_ir.py`.
[^zimm]: I (Ben Olmstead) think it would be awesome to implement [[Zimmerman,
2022](https://arxiv.org/abs/2209.08383)] plus a few extensions of my own
devising in Emboss's shift-reduce engine, which would make the grammar
design space significantly larger. I would also separate the parser
generator engine into its own project.
The third consideration is more future-focused and abstract: does this syntax
lock Emboss into using a shift-reduce parser in the future? Ideally, no.
Luckily(?), LR(1) grammars are one of the more restrictive types of grammars in
common use, so it is likely that anything that can be handled by the current
parser can be handled by many other types of parsers.
## Generated Code
Right now, there is only the generated C++ code, but there should be other back
ends in the future. Some new features are pure syntax sugar (e.g., `$next` or
`a < b < c`) that are replaced in the IR long before it reaches the back end
(e.g., with the offset+length of the syntactically-previous field, or the IR
equivalent of `a < b && b < c`), while others require extensive changes to how
code is generated.
* What information will the back end need in order to generate working code?
* Does this feature require embedded-unfriendly generated code? (E.g.,
memory allocation, I/O.)
* Can the existing C++ back end, which just walks the IR tree in a single
pass while building up strings which are combined into a `.h`, handle this
feature in its current design?
* How will this feature interact with various generated templates?
* Can/should this feature be, itself, templated?
## C++ Runtime Library
The runtime library will be included with every program that touches Emboss, so
it is important to make it efficient. When adding features, consider:
* Can the feature be added in such a way that it does not cost anything for
programs that do not use the feature? A standalone C++ template will not
be included in a program unless the program instantiates the template, but
if the new code is used from somewhere in an existing function, it may be
included in programs that do not use it directly.
* Can the feature be added without allocating any heap memory? Can it be
added with O(1) stack memory use? Both of these are important for some
embedded systems, such as OS-less microcontroller and hard-real-time
environments. Some features may intrinsically require memory allocation,
in which case it is best if they can be separated: for example, Emboss
structure-to-string conversion requires allocation, and even `#include`'ing
the appropriate headers can be too much for some environments, even if the
serialization code is never included in the final binary.
* How much can you rely on the C++ compiler to optimize things? If you have
to implement your own optimizations, that will cost more development time
and add more complexity to the standard library.
## Compiler Complexity
The Emboss compiler is already quite complex, and has many subsystems that
interact. It is already quite difficult to reason about some interactions.
* Can the feature be added at an "edge" of the compiler? For example, if you
can implement your feature as syntax sugar that converts the new feature to
existing IR early in the compilation process, it is much easier to verify
that it will not cause problematic interactions. Similarly, if you can
implement your feature entirely in the back end or in the runtime library,
you do not need to worry about interactions inside the front end.
* If a feature cannot be added at an edge, how can you design it to minimize
the complexity? (Ideally, you could even unify existing systems in such a
way that the overall complexity of the compiler is lower at the end.)
## Future Back Ends
It is important to have some idea of how any feature would be implemented
against future back ends.
### Programming Language (Rust/Python/Java/Go/C#/Lua/etc) Back Ends
Some features may be difficult to implement in other languages. For example,
Python does not have a native `switch` statement, so any `switch`-like feature
in Emboss may be awkward to implement — but this does not necessarily mean that
Emboss should not have a `switch`.
As a rule of thumb, languages can be grouped into tiers:
1. "Systems"/embedded-friendly languages: C++, Rust, C. Top support.
2. Languages used for parsing/analyzing raw sensor dumps: C#, Java, Go,
Python, etc. Should have good support, but not gate any features.
3. Languages that are rarely used to touch binary data: JavaScript,
TypeScript, etc. Can be mostly ignored.
4. Dead and obscure languages: Perl, COBOL, APL, INTERCAL, etc. Can be
ignored entirely.
(It may be difficult to classify some languages, such as FORTRAN, which is
still hanging around in 2024.)
Remember that other back ends may have different requirements and guarantees
than the C++ back end: for example, it would be unreasonable for a Java back
end to promise "no dynamic memory allocation."
### Other Data Format (Protobuf/JSON/etc) Back Ends
These back ends would translate binary structures into alternate
representations that are easier for some tools to use: for example, Google has
many, many tools for processing Protocol Buffers, and JSON is popular in the
open-source world.
Most other formats have limitations that may make some kinds of Emboss
constructs difficult or impossible to correctly reproduce: for example, Emboss
already supports "infinitely nested" `struct` types, like:
```
struct Foo:
0 [+10] Foo child_foo
```
Formats like Protobuf or JSON, which do not have any way of representing loops
in their data graph, cannot handle this.
Until the most recent versions of Protobuf, mismatches between Protobuf `enum`
and Emboss `enum` made it functionally impossible to map any Emboss `enum`
types onto Protobuf `enum` types: Emboss `enum` types are open (allow any
value, even ones that are not listed in the `enum`), where all Protobuf `enum`
types were closed (only allowed known values). (The most recent Protobuf
versions, Proto3 and Editions, allow you to have open `enum` types.)
Generally, it is not worth blocking an Emboss feature because of these kinds of
mismatches, but it is worth thinking about how to avoid them, if possible.
### Documentation (PDF/Markdown/etc) Back Ends
These back ends would translate `.emb` files to a form of human-readable
documentation, intended for publication on a web site, in an RFC, or as part of
a PDF datasheet. This type of back end is the motivation for having both `--`
documentation blocks and `#` comments in Emboss.
Since the output from these back ends would be intended for human consumption,
for the most part you would only need to ensure that your feature can be
understood by humans.