| # Writing a schema |
| |
| The syntax of the schema language (aka IDL, Interface Definition |
| Language) should look quite familiar to users of any of the C family of |
| languages, and also to users of other IDLs. Let's look at an example |
| first: |
| |
| // example IDL file |
| |
| namespace MyGame; |
| |
| attribute "priority"; |
| |
| enum Color : byte { Red = 1, Green, Blue } |
| |
| union Any { Monster, Weapon, Pickup } |
| |
| struct Vec3 { |
| x:float; |
| y:float; |
| z:float; |
| } |
| |
| table Monster { |
| pos:Vec3; |
| mana:short = 150; |
| hp:short = 100; |
| name:string; |
| friendly:bool = false (deprecated, priority: 1); |
| inventory:[ubyte]; |
| color:Color = Blue; |
| test:Any; |
| } |
| |
| root_type Monster; |
| |
| (Weapon & Pickup not defined as part of this example). |
| |
| ### Tables |
| |
| Tables are the main way of defining objects in FlatBuffers, and consist |
| of a name (here `Monster`) and a list of fields. Each field has a name, |
| a type, and optionally a default value (if omitted, it defaults to 0 / |
| NULL). |
| |
| Each field is optional: It does not have to appear in the wire |
| representation, and you can choose to omit fields for each individual |
| object. As a result, you have the flexibility to add fields without fear of |
| bloating your data. This design is also FlatBuffer's mechanism for forward |
| and backwards compatibility. Note that: |
| |
| - You can add new fields in the schema ONLY at the end of a table |
| definition. Older data will still |
| read correctly, and give you the default value when read. Older code |
| will simply ignore the new field. |
| If you want to have flexibility to use any order for fields in your |
| schema, you can manually assign ids (much like Protocol Buffers), |
| see the `id` attribute below. |
| |
| - You cannot delete fields you don't use anymore from the schema, |
| but you can simply |
| stop writing them into your data for almost the same effect. |
| Additionally you can mark them as `deprecated` as in the example |
| above, which will prevent the generation of accessors in the |
| generated C++, as a way to enforce the field not being used any more. |
| (careful: this may break code!). |
| |
| - You may change field names and table names, if you're ok with your |
| code breaking until you've renamed them there too. |
| |
| |
| |
| ### Structs |
| |
| Similar to a table, only now none of the fields are optional (so no defaults |
| either), and fields may not be added or be deprecated. Structs may only contain |
| scalars or other structs. Use this for |
| simple objects where you are very sure no changes will ever be made |
| (as quite clear in the example `Vec3`). Structs use less memory than |
| tables and are even faster to access (they are always stored in-line in their |
| parent object, and use no virtual table). |
| |
| ### Types |
| |
| Built-in scalar types are: |
| |
| - 8 bit: `byte ubyte bool` |
| |
| - 16 bit: `short ushort` |
| |
| - 32 bit: `int uint float` |
| |
| - 64 bit: `long ulong double` |
| |
| Built-in non-scalar types: |
| |
| - Vector of any other type (denoted with `[type]`). Nesting vectors |
| is not supported, instead you can wrap the inner vector in a table. |
| |
| - `string`, which may only hold UTF-8 or 7-bit ASCII. For other text encodings |
| or general binary data use vectors (`[byte]` or `[ubyte]`) instead. |
| |
| - References to other tables or structs, enums or unions (see |
| below). |
| |
| You can't change types of fields once they're used, with the exception |
| of same-size data where a `reinterpret_cast` would give you a desirable result, |
| e.g. you could change a `uint` to an `int` if no values in current data use the |
| high bit yet. |
| |
| ### (Default) Values |
| |
| Values are a sequence of digits, optionally followed by a `.` and more digits |
| for float constants, and optionally prefixed by a `-`. Non-scalar defaults are |
| currently not supported (always NULL). |
| |
| You generally do not want to change default values after they're initially |
| defined. Fields that have the default value are not actually stored in the |
| serialized data but are generated in code, so when you change the default, you'd |
| now get a different value than from code generated from an older version of |
| the schema. There are situations however where this may be |
| desirable, especially if you can ensure a simultaneous rebuild of |
| all code. |
| |
| ### Enums |
| |
| Define a sequence of named constants, each with a given value, or |
| increasing by one from the previous one. The default first value |
| is `0`. As you can see in the enum declaration, you specify the underlying |
| integral type of the enum with `:` (in this case `byte`), which then determines |
| the type of any fields declared with this enum type. |
| |
| ### Unions |
| |
| Unions share a lot of properties with enums, but instead of new names |
| for constants, you use names of tables. You can then declare |
| a union field which can hold a reference to any of those types, and |
| additionally a hidden field with the suffix `_type` is generated that |
| holds the corresponding enum value, allowing you to know which type to |
| cast to at runtime. |
| |
| Unions are a good way to be able to send multiple message types as a FlatBuffer. |
| Note that because a union field is really two fields, it must always be |
| part of a table, it cannot be the root of a FlatBuffer by itself. |
| |
| If you have a need to distinguish between different FlatBuffers in a more |
| open-ended way, for example for use as files, see the file identification |
| feature below. |
| |
| ### Namespaces |
| |
| These will generate the corresponding namespace in C++ for all helper |
| code, and packages in Java. You can use `.` to specify nested namespaces / |
| packages. |
| |
| ### Includes |
| |
| You can include other schemas files in your current one, e.g.: |
| |
| include "mydefinitions.fbs"; |
| |
| This makes it easier to refer to types defined elsewhere. `include` |
| automatically ensures each file is parsed just once, even when referred to |
| more than once. |
| |
| When using the `flatc` compiler to generate code for schema definitions, |
| only definitions in the current file will be generated, not those from the |
| included files (those you still generate separately). |
| |
| ### Root type |
| |
| This declares what you consider to be the root table (or struct) of the |
| serialized data. This is particular important for parsing JSON data, |
| which doesn't include object type information. |
| |
| ### File identification and extension |
| |
| Typically, a FlatBuffer binary buffer is not self-describing, i.e. it |
| needs you to know its schema to parse it correctly. But if you |
| want to use a FlatBuffer as a file format, it would be convenient |
| to be able to have a "magic number" in there, like most file formats |
| have, to be able to do a sanity check to see if you're reading the |
| kind of file you're expecting. |
| |
| Now, you can always prefix a FlatBuffer with your own file header, |
| but FlatBuffers has a built-in way to add an identifier to a |
| FlatBuffer that takes up minimal space, and keeps the buffer |
| compatible with buffers that don't have such an identifier. |
| |
| You can specify in a schema, similar to `root_type`, that you intend |
| for this type of FlatBuffer to be used as a file format: |
| |
| file_identifier "MYFI"; |
| |
| Identifiers must always be exactly 4 characters long. These 4 characters |
| will end up as bytes at offsets 4-7 (inclusive) in the buffer. |
| |
| For any schema that has such an identifier, `flatc` will automatically |
| add the identifier to any binaries it generates (with `-b`), |
| and generated calls like `FinishMonsterBuffer` also add the identifier. |
| If you have specified an identifier and wish to generate a buffer |
| without one, you can always still do so by calling |
| `FlatBufferBuilder::Finish` explicitly. |
| |
| After loading a buffer, you can use a call like |
| `MonsterBufferHasIdentifier` to check if the identifier is present. |
| |
| Note that this is best for open-ended uses such as files. If you simply wanted |
| to send one of a set of possible messages over a network for example, you'd |
| be better off with a union. |
| |
| Additionally, by default `flatc` will output binary files as `.bin`. |
| This declaration in the schema will change that to whatever you want: |
| |
| file_extension "ext"; |
| |
| ### Comments & documentation |
| |
| May be written as in most C-based languages. Additionally, a triple |
| comment (`///`) on a line by itself signals that a comment is documentation |
| for whatever is declared on the line after it |
| (table/struct/field/enum/union/element), and the comment is output |
| in the corresponding C++ code. Multiple such lines per item are allowed. |
| |
| ### Attributes |
| |
| Attributes may be attached to a declaration, behind a field, or after |
| the name of a table/struct/enum/union. These may either have a value or |
| not. Some attributes like `deprecated` are understood by the compiler, |
| user defined ones need to be declared with the attribute declaration |
| (like `priority` in the example above), and are |
| available to query if you parse the schema at runtime. |
| This is useful if you write your own code generators/editors etc., and |
| you wish to add additional information specific to your tool (such as a |
| help text). |
| |
| Current understood attributes: |
| |
| - `id: n` (on a table field): manually set the field identifier to `n`. |
| If you use this attribute, you must use it on ALL fields of this table, |
| and the numbers must be a contiguous range from 0 onwards. |
| Additionally, since a union type effectively adds two fields, its |
| id must be that of the second field (the first field is the type |
| field and not explicitly declared in the schema). |
| For example, if the last field before the union field had id 6, |
| the union field should have id 8, and the unions type field will |
| implicitly be 7. |
| IDs allow the fields to be placed in any order in the schema. |
| When a new field is added to the schema is must use the next available ID. |
| - `deprecated` (on a field): do not generate accessors for this field |
| anymore, code should stop using this data. |
| - `required` (on a non-scalar table field): this field must always be set. |
| By default, all fields are optional, i.e. may be left out. This is |
| desirable, as it helps with forwards/backwards compatibility, and |
| flexibility of data structures. It is also a burden on the reading code, |
| since for non-scalar fields it requires you to check against NULL and |
| take appropriate action. By specifying this field, you force code that |
| constructs FlatBuffers to ensure this field is initialized, so the reading |
| code may access it directly, without checking for NULL. If the constructing |
| code does not initialize this field, they will get an assert, and also |
| the verifier will fail on buffers that have missing required fields. |
| - `original_order` (on a table): since elements in a table do not need |
| to be stored in any particular order, they are often optimized for |
| space by sorting them to size. This attribute stops that from happening. |
| - `force_align: size` (on a struct): force the alignment of this struct |
| to be something higher than what it is naturally aligned to. Causes |
| these structs to be aligned to that amount inside a buffer, IF that |
| buffer is allocated with that alignment (which is not necessarily |
| the case for buffers accessed directly inside a `FlatBufferBuilder`). |
| - `bit_flags` (on an enum): the values of this field indicate bits, |
| meaning that any value N specified in the schema will end up |
| representing 1<<N, or if you don't specify values at all, you'll get |
| the sequence 1, 2, 4, 8, ... |
| - `nested_flatbuffer: "table_name"` (on a field): this indicates that the field |
| (which must be a vector of ubyte) contains flatbuffer data, for which the |
| root type is given by `table_name`. The generated code will then produce |
| a convenient accessor for the nested FlatBuffer. |
| - `key` (on a field): this field is meant to be used as a key when sorting |
| a vector of the type of table it sits in. Can be used for in-place |
| binary search. |
| |
| ## JSON Parsing |
| |
| The same parser that parses the schema declarations above is also able |
| to parse JSON objects that conform to this schema. So, unlike other JSON |
| parsers, this parser is strongly typed, and parses directly into a FlatBuffer |
| (see the compiler documentation on how to do this from the command line, or |
| the C++ documentation on how to do this at runtime). |
| |
| Besides needing a schema, there are a few other changes to how it parses |
| JSON: |
| |
| - It accepts field names with and without quotes, like many JSON parsers |
| already do. It outputs them without quotes as well, though can be made |
| to output them using the `strict_json` flag. |
| - If a field has an enum type, the parser will recognize symbolic enum |
| values (with or without quotes) instead of numbers, e.g. |
| `field: EnumVal`. If a field is of integral type, you can still use |
| symbolic names, but values need to be prefixed with their type and |
| need to be quoted, e.g. `field: "Enum.EnumVal"`. For enums |
| representing flags, you may place multiple inside a string |
| separated by spaces to OR them, e.g. |
| `field: "EnumVal1 EnumVal2"` or `field: "Enum.EnumVal1 Enum.EnumVal2"`. |
| - Similarly, for unions, these need to specified with two fields much like |
| you do when serializing from code. E.g. for a field `foo`, you must |
| add a field `foo_type: FooOne` right before the `foo` field, where |
| `FooOne` would be the table out of the union you want to use. |
| |
| When parsing JSON, it recognizes the following escape codes in strings: |
| |
| - `\n` - linefeed. |
| - `\t` - tab. |
| - `\r` - carriage return. |
| - `\b` - backspace. |
| - `\f` - form feed. |
| - `\"` - double quote. |
| - `\\` - backslash. |
| - `\/` - forward slash. |
| - `\uXXXX` - 16-bit unicode code point, converted to the equivalent UTF-8 |
| representation. |
| - `\xXX` - 8-bit binary hexadecimal number XX. This is the only one that is |
| not in the JSON spec (see http://json.org/), but is needed to be able to |
| encode arbitrary binary in strings to text and back without losing |
| information (e.g. the byte 0xFF can't be represented in standard JSON). |
| |
| It also generates these escape codes back again when generating JSON from a |
| binary representation. |
| |
| ## Gotchas |
| |
| ### Schemas and version control |
| |
| FlatBuffers relies on new field declarations being added at the end, and earlier |
| declarations to not be removed, but be marked deprecated when needed. We think |
| this is an improvement over the manual number assignment that happens in |
| Protocol Buffers (and which is still an option using the `id` attribute |
| mentioned above). |
| |
| One place where this is possibly problematic however is source control. If user |
| A adds a field, generates new binary data with this new schema, then tries to |
| commit both to source control after user B already committed a new field also, |
| and just auto-merges the schema, the binary files are now invalid compared to |
| the new schema. |
| |
| The solution of course is that you should not be generating binary data before |
| your schema changes have been committed, ensuring consistency with the rest of |
| the world. If this is not practical for you, use explicit field ids, which |
| should always generate a merge conflict if two people try to allocate the same |
| id. |
| |