Design: Alternate Enum Field Cases

This document is provided for historical interest. This feature is now implemented in the form of the [enum_case] attribute on enum values, which can also be $defaulted on module, struct, bits, and enum definitions.

Motivation

Currently, the Emboss compiler requires that enum fields are SHOUTY_CASE, but this is discouraged in some code styles, such as Google's C++ style guide which preferskPrefixedCamelCase. This design considers options for allowing other cases in enums and their possible design.

Open Issues

This design document is related to the following open GitHub issue:

Design

This design will focus on the implementation for the C++ backend, as that is the only currently-supported backend in Emboss. However, this approach should be valid for other backends if and when they are supported, and it is encouraged that new backends that support this or similar functionality use the same or a similar design.

The enum_case Attribute

An attribute will be added to the C++ backend: enum_case. It would apply to all enum fields and specifies which case to use for enum members. More than one case can be specified, in which case the backend will emit both enum member names with the same values. Initially this will support two cases:

  • SHOUTY_CASE - (default) All-capital case with words separated by an underscore
  • kCamelCase - Capitalized camel case prefixed with “k”

The options will be provided as a string to the attribute as comma-separated values. At least one value must be present. More options can be supported in the future, and the implementation in the C++ backend will be written so that new case options shouldn't require much more than adding an identifier and a translation function.

Translations will always be from SHOUTY_CASE since that is the requirement in an Emboss definition file. For kCamelCase, the words will be split on the underscore, the first letter of each word will remain capitalized, and all following letters of each word will be lowercased, then prefixed with the “k”.

Transitioning From SHOUTY_CASE To kCamelCase

The intended purpose of allowing multiple enum_case options to be specified is to enable transitioning between two cases in the event that the Emboss definition and the client code that uses the definitions cannot be updated atomically.

When more than one option is present the backend will emit a definition that includes all specified name-value pairs. The names will be emitted in the order specified, so a reverse name lookup from an enum value will return the first case provided. Thus adding an additional case (by appending to the end of the comma-separated list) should be fully backwards-compatible.

Removing a case will always be backwards-incompatible, so care should be taken to migrate client code to the new case before removing an old case.

Examples

The examples below modify an existing Emboss definition:

enum Foo:
  BAR             = 1
  BAZ             = 2
  MULTI_WORD_ENUM = 4

Use kCamelCase Instead

To allow C++ code to use kBar, kBaz, or kMultiWordEnum to refer to the enum members instead of BAR, BAZ, or MULTI_WORD_ENUM, the enum_case attribute can be added to each field:

enum Foo:
  BAR             = 1  [(cpp) enum_case: "kCamelCase"]
  BAZ             = 2  [(cpp) enum_case: "kCamelCase"]
  MULTI_WORD_ENUM = 4  [(cpp) enum_case: "kCamelCase"]

This would emit code similar to:

enum class Foo: uint64_t {
  kBar = 1,
  kBaz = 2,
  kMultiWordEnum = 4,
};

Note that as written, this would not allow C++ code to refer to Foo::BAR, Foo::BAZ, or Foo::MULTI_WORD_ENUM.

Default enum_case

Additionally, the same code would be emitted with either of the following:

enum Foo:
  [$default (cpp) enum_case: "kCamelCase"]
  BAR             = 1
  BAZ             = 2
  MULTI_WORD_ENUM = 4

or

[$default (cpp) enum_case: "kCamelCase"]

...

enum Foo:
  BAR             = 1
  BAZ             = 2
  MULTI_WORD_ENUM = 4

With the differences being that the former would have the enum_case attribute apply to any new fields of Foo by default, and the latter woulds apply to all enum fields in the Emboss definition file by default.

Transitioning To kCamelCase

In the case that Foo should use kCamelCase but it is used in code that must be updated separately from the .emb file and backwards-compatibility must be maintained, the enum_case attribute will need multiple options specified. For instance:

enum Foo:
  [$default (cpp) enum_case: "SHOUTY_CASE, kCamelCase"]
  BAR             = 1
  BAZ             = 2
  MULTI_WORD_ENUM = 4

would emit code similar to:

enum class Foo: uint64_t {
  BAR = 1,
  kBar = 1,
  BAZ = 2,
  kBaz = 2,
  MULTI_WORD_ENUM = 4,
  kMultiWordEnum = 4,
};

Note that using enum_case: "kCamelCase, SHOUTY_CASE" would technically be backwards-incompatible as that would change the result of code like TryToGetNameFromEnum(Foo::BAR) from "BAR" to "kBar", but if there are no usages of that functionality, it would be backwards-compatible as well.

Once all usages of Foo have been migrated to kShoutyCase, and there is no client code that uses SHOUTY_CASE or relies on the reverse lookup functionality mentioned above, then the SHOUTY_CASE could be removed. The usual caveats of backwards-incompatible changes apply.

Alternatives Considered

In the development of this design, some other alternative designs were considered. A short explanation is provided of each below.

Loosen Enum Name Requirements

The “obvious” approach to allow names like kCamelCase is to simply loosen the requirement that an enum field name must be SHOUTY_CASE.

Pros

  • Flexible and straightforward for users

Cons

  • Adds complexity to the grammar and front-end.
    • Not as simple of an implementation as it first appears.
  • Allows Emboss definition files to diverge from each other, which goes against the design goals of Emboss where all .emb files should look similar to each other.
    • Additionally it adds cognitive overhead in reading an unfamiliar Emboss definition in a different “style”.
  • Backend/language considerations.
    • A style used in C++ (kCamelCase) would also be used in languages where that is not the style.
    • Setting the name for all languages could cause issues in languages where the case of a variable has semantic meaning, like the visibility of a variable in Go.

Specifying An Exact Name In Attributes

Instead of specifying a case transformation in an attribute, provide the specific name to be emitted. For example:

enum Foo:
  BAR             = 1  [(cpp) name: "kBar"]
  BAZ             = 2  [(cpp) name: "kBaz"]
  MULTI_WORD_ENUM = 4  [(cpp) name: "kMultiWordEnum"]

Note that the proposed enum_case design does not preclude an attribute of this nature for resolving other use-cases. Under the principle of “specific overrides general” a name-like attribute could override any enum_case attribute. See the future work section below for planned work on this.

Pros

  • Simple to implement
  • Applies to more than just enum fields.
  • Applies to other use cases (working around restrictions/reserved keywords in backends that are not also restricted/reserved in Emboss).

Cons

  • Not possible to provide a $default attribute that applies generically to all enum fields.
    • This would require an attribute added to every enum member if the intent is to always use a particular style.
    • Requires a user to specify the translation for every field, making it easier to mix cases or styles unintentionally.
      • If mixing cases is intended, this is still possible with the enum_case attribute by overriding the default.

Transitional Cases or Attributes

This alternative design would still use enum_case or something similar, but not allow multiple case options to be asserted. Instead, either a new transition-specific case or a transitional attribute would be used to mark a transition in progress. For example:

enum Foo:
  BAR             = 1  [(cpp) enum_case: "kCamelCase-transitional"]
  BAZ             = 2  [(cpp) enum_case: "kCamelCase-transitional"]
  MULTI_WORD_ENUM = 4  [(cpp) enum_case: "kCamelCase-transitional"]

or

enum Foo:
  BAR             = 1
    [(cpp) enum_case: "kCamelCase"]
    [(cpp) enum_case_transitional: true]
  BAZ             = 2
    [(cpp) enum_case: "kCamelCase"]
    [(cpp) enum_case_transitional: true]
  MULTI_WORD_ENUM = 4
    [(cpp) enum_case: "kCamelCase"]
    [(cpp) enum_case_transitional: true]

These would emit both SHOUTY_CASE and kCamelCase forms for each value.

Pros

  • Explicitly marks a transition in progress, and the reason for having multiple aliasing names to the same enumerated value.
  • Allows codegen to include [[deprecated]] attributes in the generated code so that build time warnings/errors are produced when building client code.
    • However, this could be supported by tagging the cases as transitional, see the future work section for planned work on this.

Cons

  • Requires migrating twice to transition between two non-SHOUTY_CASE cases (old -> shouty -> new)
  • Requires two separate attributes or a suffix to the case name, which can cause readability issues
  • Doesn't allow supporting more than 2 cases if needed, and requires that one case be SHOUTY_CASE.

Implementation

Front End

Now that the attribute checking is separate for the front end and backend (#80), only a small change (to both the grammar and the IR) is required to support attributes on enum values. Specifically:

Grammar

Change the existing grammar

enum-value                             -> constant-name "=" expression doc?
                                          Comment? eol enum-value-body?
enum-value-body                        -> Indent doc-line* Dedent

to

enum-value                             -> constant-name "=" expression doc?
                                          attribute* Comment? eol
                                          enum-value-body?
enum-value-body                        -> Indent doc-line* attribute-line*
                                          Dedent

Intermediate Representation

The only change to IR to support this design would require a Repeated(Attribute) member field to EnumValue.

Back End

The C++ backend can likely retain the same templates for codegen. This design should only require a change in codegen to read the attribute on an attribute name-value pair and translate the name (potentially multiple times for multiple specified cases).

Future Work

The name attribute

Cases may cause name collisions which are not present in SHOUTY_CASE, so there should be some means to override the generated name. For instance, consider:

enum Port:
  # Names taken from manufacturer's programming manual.
  USB    = 128   -- USB port, virtual port 0    # kUsb
  USB_1  = 129   -- USB port, virtual port 1    # kUsb1
  USB1   = 1440  -- USB port 1, virtual port 0  # kUsb1 -- collision
  USB1_1 = 1441  -- USB port 1, virtual port 1  # kUsb11

Additionally, there are other use-cases for setting an alternate name to the one used in the Emboss definition. Thus, an attribute should be provided that can override all naming, including the default name setting in Emboss and any enum_case attributes. For instance:

enum Port:
  # Names taken from manufacturer's programming manual.
  USB    = 128   -- USB port, virtual port 0
    [(cpp) name: "kUsb"]
  USB_1  = 129   -- USB port, virtual port 1
    [(cpp) name: "kUsb_1"]
  USB1   = 1440  -- USB port 1, virtual port 0
    [(cpp) name: "kUsb1"]
  USB1_1 = 1441  -- USB port 1, virtual port 1
    [(cpp) name: "kUsb1_1")

This would not emit names like kUsb11 even if a $default case was set to kCamelCase because the name attribute would always override other naming settings. Similar to enum_case, multiple names could be provided in a comma separated list.

This will be completed in future work, the specifics of which may be updated here or in a separate design. However, the implementation of enum_case should be made to allow name or a similar attribute to be added without major refactoring.

Deprecated Cases/Names

When transitioning between cases or alternate names, it would be useful to mark the old field as [[deprecated]] in the C++ source, so that client code that uses the generated Emboss code will produce build-time warnings or errors and alert maintainers that there will be an upcoming breaking change that could break the client code's build.

One way to do this would be to allow tagging a name or case as deprecated in the attribute string itself. For instance:

enum Foo:
  BAR = 1
    [(cpp) enum_case: "SHOUTY_CASE -deprecated, kCamelCase"]
  BAZ = 2
    [(cpp) enum_case: "SHOUTY_CASE -deprecated, kCamelCase"]
  MULTI_WORD_ENUM = 4
    [(cpp) enum_case: "SHOUTY_CASE -deprecated, kCamelCase"]

This would follow the normal $default rules as it would be the same as any other attribute value, so for instance, to set SHOUTY_CASE to be deprecated in favor of kCamelCase for all members of the enum:

enum Foo:
  [$default (cpp) enum_case: "SHOUTY_CASE -deprecated, kCamelCase"]
  BAR = 1
  BAZ = 2
  MULTI_WORD_ENUM = 4

and to set it for all enums in the module:

[$default (cpp) enum_case: "SHOUTY_CASE -deprecated, kCamelCase"]

...

enum Foo:
  BAR = 1
  BAZ = 2
  MULTI_WORD_ENUM = 4

This will be completed in future work, the specifics of which may be updated here or in a separate design. However, the implementation of enum_case should be made to allow adding -deprecated or a similar approach without major refactoring.