Thrift
Thrift is the format that Protobuf displaced. It is also the format
that arrived at most of Protobuf's design decisions independently and
roughly contemporaneously, with a few differences worth understanding.
The differences are not subtle, and the parts of Thrift that are
better than Protobuf — the explicit required/optional distinction
the schema preserved through its lifetime, the in-band protocol
selection, the integrated RPC stack — are interesting precisely
because they were rejected on operational grounds rather than design
ones. Thrift is the working alternate-history of typed binary
serialization.
Origin
Thrift was built at Facebook in 2007 by Mark Slee and a small team to
solve the same problem Protobuf was solving at Google: heterogeneous
service-to-service communication across a polyglot infrastructure
where the build pipeline could not afford to redeploy every service
in lockstep. Slee's team designed Thrift from scratch rather than
adopting Protobuf, partly because Protobuf was not yet public, partly
because Facebook needed an integrated RPC story, and partly because
Slee believed Protobuf's approach to certain problems (notably field
presence, which Protobuf 1 and 2 expressed via the required keyword)
was correct and wanted to keep that mechanism while extending the
format with multiple wire encodings.
Facebook open-sourced Thrift in 2007 and donated it to the Apache Software Foundation in 2008, where it became Apache Thrift. The project has been in continuous, if slow, maintenance ever since, with the bulk of the activity coming from Apache committers rather than Facebook (which has long since moved to a fork called fbthrift, maintained internally with periodic open-source drops).
The Apache Thrift project's center of gravity is now in the embedded and high-performance computing communities — places where the choice of wire format predates gRPC's dominance and where the cost of migration exceeds the cost of staying put. Thrift remains the external API surface for HBase, Cassandra (until its Thrift API was deprecated in 4.0), several legacy financial systems, and a long tail of internal services at companies that adopted it before 2015. For new projects in 2026, Thrift is rarely the right choice; the ecosystem momentum has shifted decisively to Protobuf and gRPC. But the format is widely deployed enough that engineers will encounter it, and understanding it is worth the chapter.
The format on its own terms
Thrift is unusual among schema-first wire formats in defining multiple wire encodings within a single specification. The two encodings worth knowing are Binary Protocol (the original) and Compact Protocol (the dense one, added in 2008 to compete with Protobuf on size). There is also a JSON Protocol for debugging, a TupleProtocol for fixed-schema fast access, and a Multiplexed Protocol that wraps another encoding to support multiple services over one connection. The encoding is selected by the client and server at connection setup, not by the schema, and a single Thrift schema can be consumed in any of them.
Binary Protocol is straightforward and not particularly compact: each field is encoded as a one-byte type code, a two-byte field ID, and the value. Integers are big-endian fixed-width (i32 takes four bytes regardless of value, i64 takes eight). Strings are length-prefixed with a four-byte length. The encoding is easy to parse, easy to debug in a hex viewer, and pays a substantial size overhead for small values. It exists primarily for backward compatibility; new Thrift deployments use Compact Protocol.
Compact Protocol is the encoding worth understanding. It is
philosophically similar to Protobuf: varint-encoded integers, packed
field tags, length-delimited strings. The differences are mostly in
the framing details. Compact Protocol packs the field type code and
a delta from the previous field ID into a single byte; if the
delta fits in four bits and the type code fits in four bits, the
field tag is one byte total. Boolean values are folded entirely
into the type code (type 1 means "bool true," type 2 means "bool
false," with no payload byte), which is denser than Protobuf's
single-byte boolean encoding. Signed integers are zigzag-encoded,
which Protobuf requires the schema to opt into via the sint32
type.
The data model is comparable to Protobuf's, with a few Thrift-
specific additions. Thrift has a native set type and a native map
type at the wire level, where Protobuf has only repeated fields and
synthesizes maps from the schema language. Thrift's binary type
is distinct from string at the schema level (Protobuf 3 also
makes this distinction, but earlier versions did not). Thrift has
no oneof; the closest equivalent is a struct with several optional
fields and an application convention.
The most interesting schema-level distinction is field presence.
Thrift fields can be declared required, optional, or
default. Required means the field must be present, and a
decoder that does not see it raises an error. Optional means the
field may or may not be present, with no penalty for absence.
Default (the default for fields that don't declare otherwise)
means the field may be absent on the wire, but the generated code
treats absence as the schema-declared default value. Protobuf 2 had
the same three modes; Protobuf 3 dropped required and conflated
default and optional for scalar types (until 3.15 partially
restored the distinction). The Thrift community kept all three and
considers the loss of required a strict regression. The argument
on the other side — that required is a schema-evolution
landmine, because removing a field that consumers expect to be
required produces a protocol break — is genuine, and the Protobuf
team's experience apparently led them to conclude that the
operational cost of required exceeded its safety benefit.
Wire tour
Schema:
struct Person {
1: required i64 id;
2: required string name;
3: optional string email;
4: required i32 birth_year;
5: required list<string> tags;
6: required bool active;
}
Encoded with Compact Protocol:
16 54 field 1 (delta 1, type i64=6), zigzag(42)=84
18 0c 41 64 61 20 4c 6f 76 65 6c 61 63 65 field 2 (delta 1, type binary=8), len 12, "Ada Lovelace"
18 15 61 64 61 40 61 6e 61 6c 79 74 69 63
61 6c 2e 65 6e 67 69 6e 65 field 3 (delta 1, type binary=8), len 21, "ada@..."
15 ae 1c field 4 (delta 1, type i32=5), zigzag(1815)=3630
19 28 field 5 (delta 1, type list=9)
28 0d 6d 61 74 68 65 6d 61 74 69 63 69 61 6e list of 2 strings, "mathematician"
0a 70 72 6f 67 72 61 6d 6d 65 72 "programmer"
11 field 6 (delta 1, type bool=true=1)
00 stop field
71 bytes — essentially identical to Protobuf for this payload, which is no accident. The two formats made similar choices about varint encoding, length prefixes, and tag packing, and the small differences (Thrift's zigzag-by-default for signed ints, Thrift's fold-bool-into- type-code, Thrift's explicit list-element-type byte) tend to wash out across mixed payloads. Thrift Compact occasionally wins by a few bytes for boolean-heavy payloads; Protobuf occasionally wins by a few bytes for unsigned-integer-heavy payloads. Neither formula dominates.
The structural differences from Protobuf are visible in the bytes.
The list field encoding includes an explicit element-type byte
(28 — top nibble is the count 2, bottom nibble is the type 8 for
binary/string), which Protobuf does not require because Protobuf's
repeated fields are not natively typed at the wire level. The stop
field byte (00) marks the end of the struct and is required;
Protobuf has no stop marker because messages are length-delimited
when embedded and rely on EOF when at the top level.
If email were absent, the encoding would skip those 23 bytes, and
the next field's header byte would carry a delta of 2 instead of 1
to compensate. This delta encoding is the small detail that
distinguishes Thrift Compact from Protobuf at the wire level and
explains how Thrift packs the field tag into a single byte for most
fields: the delta from the previous field's number is almost always
small.
Evolution and compatibility
Thrift's evolution rules are closely parallel to Protobuf's. Adding a field with a new ID is safe in both directions if the field is declared optional; consumers without the field's schema skip it, producers without it omit it. Removing a field is safe if the removal is coordinated with consumers; the field ID should be retired and not reused. Renaming a field is safe at the wire level (IDs are what matter). Changing a field's type is mostly unsafe with a similar set of exceptions to Protobuf's.
The single substantial difference is the required keyword. A
producer that omits a required field, or a consumer that does not
recognize a required field, will either fail at decode time or
produce undefined behavior depending on the language binding. This
makes required a one-way door: once a field is declared required
and deployed, removing it is a coordinated migration. The Thrift
guidance is therefore "use required sparingly," which produces the
question of why the keyword exists at all if every guidance
document warns against using it. The Protobuf team eventually
concluded that the answer is "it shouldn't" and removed it. The
Thrift community kept it because the cases where it's right are real.
The deterministic-encoding question for Thrift is the same as for Protobuf: not specified at the wire level, achievable with care. Thrift has no canonical encoding subset, no mandated map ordering, no specified varint widths. Sign-and-verify schemes over Thrift typically canonicalize the bytes themselves rather than rely on the encoder.
Ecosystem reality
The Thrift ecosystem is mature, fragmented, and slowly contracting. The Apache Thrift project ships generators for C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, JavaScript, Smalltalk, OCaml, Delphi, and a few others. Quality varies; the C++ and Java generators are first-class, the Python and Go generators are good, and the long tail is functional but not state of the art.
Facebook's fork, fbthrift, has diverged substantially from Apache Thrift over the years. fbthrift adds streaming support, a ContextStack for cross-cutting concerns, and a number of internal optimizations Facebook needed for its scale. The two are wire-compatible at the Compact Protocol level but not API-compatible at the source level. Most companies pick one or the other; few maintain both.
Twitter's Finagle uses Thrift extensively, and Finagle's Scrooge generator (for Scala) is the canonical Scala Thrift implementation. Apache HBase exposes a Thrift gateway as one of its primary external APIs. Apache Cassandra's Thrift API was the original client interface; it was deprecated in 2.x, removed in 4.0, and is now of historical interest only. Apache Hive uses Thrift internally for its metastore protocol; the Thrift-defined HiveMetaStore service is one of the more durable Thrift deployments in the analytics ecosystem.
Two ecosystem gotchas worth noting. First, the multiple-protocol design means that the wire format is not a property of the schema; it is a runtime configuration choice. A Thrift service in production may speak Binary, Compact, or JSON depending on how the server was started. Tools that snoop on Thrift traffic must detect the protocol from the first few bytes; some tools do this poorly, and traces from the wrong protocol look like noise.
Second, the integrated RPC stack — TServer, TTransport, TProtocol in the canonical naming — is not optional in many of the language bindings. You cannot easily use Apache Thrift's serialization without dragging in its server and transport classes. fbthrift makes this cleaner, as do Finagle's Scrooge bindings. For serialization-only uses, this overhead is real and worth weighing.
When to reach for it
Thrift is the right choice when interoperating with an existing Thrift-using ecosystem: HBase, Hive, fbthrift-using companies, the remaining Cassandra-via-Thrift deployments. It is a defensible choice for new systems where the integrated RPC stack matters and gRPC's HTTP/2 baseline is unwelcome (some embedded environments, some legacy network topologies).
It is the right choice when required field semantics are a hard
requirement and the alternative would be to reimplement them in
application code over Protobuf.
When not to
Thrift is not the right choice for new microservices in greenfield environments. gRPC plus Protobuf has won that space operationally — better tooling (Buf), better runtime libraries, better integration with modern observability stacks, broader language support — and the few axes on which Thrift is technically superior are not enough to overcome the ecosystem gap.
Thrift is also not the right choice when the operational cost of selecting a wire protocol at runtime is unwelcome. Protobuf's single wire format is, on balance, simpler to reason about than Thrift's three.
Position on the seven axes
Schema-required. Not self-describing. Row-oriented. Parse rather
than zero-copy. Codegen-first, with runtime support via dynamic
protocols. Non-deterministic by spec. Evolution by tagged fields,
with required/optional/default distinguishing presence
semantics.
Thrift's stance differs from Protobuf's in two places: the
multiple-protocol design (Thrift permits Binary, Compact, JSON, and
others; Protobuf has a single wire format) and the preserved
required keyword. Both differences are intelligible as choices,
and both produced operational costs that the Protobuf team
explicitly chose to avoid.
A note on the required-field debate
The argument over required is the single most theologically
charged disagreement in the schema-first wire format world, and it
is worth getting the shape of it right. The case for required is
that it documents an invariant the schema author considers
load-bearing: this field cannot be absent without rendering the
record meaningless. The decoder-side enforcement turns silent bugs
(consumer reads garbage because producer omitted a field) into
loud bugs (decode fails). The schema reader can see, at a glance,
which fields are critical versus which are optional, and the
generated code's API reflects that distinction.
The case against required is that it is a one-way door at the
schema level. Once a field is required, you cannot remove it
without coordinating the removal across every producer and consumer
in your fleet. Coordination across a fleet of services in
heterogeneous deployment states is exactly the problem
schema-evolution mechanisms are supposed to solve, and required
makes one of those problems insolvable: the field cannot be
removed because removing it would break decoders, but if the field
is no longer used by anyone, it is dead weight that everyone has
to keep around forever. The Protobuf team's experience inside
Google was that this scenario arose often enough to make
required a net liability, and Protobuf 3 dropped it.
The Thrift community's experience appears to be different, perhaps
because Thrift deployments are typically smaller and more
self-contained than the cross-org Protobuf deployments at Google.
A Thrift schema owned by a single team, deployed on a single
release schedule, can use required safely. A Protobuf schema
crossing dozens of teams and hundreds of services cannot. The
disagreement is therefore not really about the keyword; it is about
the deployment topology in which schema evolution happens. Both
sides are right for their respective topologies.
Epitaph
Thrift is Protobuf's contemporaneous twin, with three wire formats
and the courage to keep required; deployed widely, growing slowly,
displaced operationally by gRPC.