Expand description
Spec for Iceberg.
Structs§
- Blob
Metadata - Represents a blob of metadata, which is a part of a statistics file
- ByteBuf
- Wrapper around
Vec<u8>to serialize and deserialize efficiently. - Data
File - Data file carries data file path, partition tuple, metrics, …
- Data
File Builder - Builder for
DataFile. - Datum
- Literal associated with its type. The value and type pair is checked when construction, so the type and value is guaranteed to be correct when used.
- Encrypted
Key - Keys used for table encryption
- Field
Summary - Field summary for partition field in the spec.
- List
Type - A list is a collection of values with some element type. The element field has an integer id that is unique in the table schema. Elements can be either optional or required. Element types may be any type.
- Manifest
- A manifest contains metadata and a list of entries.
- Manifest
Entry - A manifest is an immutable Avro file that lists data files or delete files, along with each file’s partition data tuple, metrics, and tracking information.
- Manifest
File - Entry in a manifest list.
- Manifest
List - Snapshots are embedded in table metadata, but the list of manifests for a snapshot are stored in a separate manifest list file.
- Manifest
List Writer - A manifest list writer.
- Manifest
Metadata - Meta data of a manifest that is stored in the key-value metadata of the Avro file
- Manifest
Writer - A manifest writer.
- Manifest
Writer Builder - The builder used to create a
ManifestWriter. - Map
- Map is a collection of key-value pairs with a key type and a value type. It used in Literal::Map, to make it hashable, the order of key-value pairs is stored in a separate vector so that we can hash the map in a deterministic way. But it also means that the order of key-value pairs is matter for the hash value.
- MapType
- A map is a collection of key-value pairs with a key type and a value type. Both the key field and value field each have an integer id that is unique in the table schema. Map keys are required and map values can be either optional or required. Both map keys and map values may be any type, including nested types.
- Mapped
Field - Maps field names to IDs.
- Metadata
Log - Encodes changes to the previous metadata files for the table
- Name
Mapping - Iceberg fallback field name to ID mapping.
- Nested
Field - A struct is a tuple of typed values. Each field in the tuple is named and has an integer id that is unique in the table schema. Each field can be either optional or required, meaning that values can (or cannot) be null. Fields may be any type. Fields may have an optional comment or doc string. Fields can have default values.
- Partition
Field - Partition fields capture the transform from table data to partition values.
- Partition
Key - A partition key represents a specific partition in a table, containing the partition spec, schema, and the actual partition values.
- Partition
Spec - Partition spec that defines how to produce a tuple of partition values from a record.
- Partition
Spec Builder - Create valid partition specs for a given schema.
- Partition
Statistics File - Statistics file for a partition
- Schema
- Defines schema in iceberg.
- Schema
Builder - Schema builder.
- Snapshot
- A snapshot represents the state of a table at some time and is used to access the complete set of data files in the table.
- Snapshot
Log - A log of when each snapshot was made.
- Snapshot
Reference - Iceberg tables keep track of branches and tags using snapshot references.
- Snapshot
RowRange - Row range of a snapshot, contains first_row_id and added_rows_count.
- Snapshot
Summary Collector SnapshotSummaryCollectorcollects and aggregates snapshot update metrics. It gathers metrics about added or removed data files and manifests, and tracks partition-specific updates.- Sort
Field - Entry for every column that is to be sorted
- Sort
Order - A sort order is defined by a sort order id and a list of sort fields. The order of the sort fields within the list defines the order in which the sort is applied to the data.
- Sort
Order Builder - Builder for
SortOrder. - SqlView
Representation - The SQL representation stores the view definition as a SQL SELECT, with metadata such as the SQL dialect.
- Statistics
File - Represents a statistics file
- Struct
- The partition struct stores the tuple of partition values for each file. Its type is derived from the partition fields of the partition spec used to write the manifest file. In v2, the partition struct’s field ids must match the ids from the partition spec.
- Struct
Type - DataType for a specific struct
- Summary
- Summarises the changes in the snapshot.
- Table
Metadata - Fields for the version 2 of the table metadata.
- Table
Metadata Build Result - Result of modifying or creating a
TableMetadata. - Table
Metadata Builder - Manipulating table metadata.
- Table
Properties - TableProperties that contains the properties of a table.
- Unbound
Partition Field - Unbound partition field can be built without a schema and later bound to a schema.
- Unbound
Partition Spec - Unbound partition spec can be built without a schema and later bound to a schema.
They are used to transport schema information as part of the REST specification.
The main difference to
PartitionSpecis that the field ids are optional. - Unbound
Partition Spec Builder - Create a new UnboundPartitionSpec
- View
Metadata - Fields for the version 1 of the view metadata.
- View
Metadata Builder - Manipulating view metadata.
- View
Representations - A list of view representations.
- View
Version - A view versions represents the definition of a view at a specific point in time.
- View
Version Log - A log of when each snapshot was made.
Enums§
- Data
Content Type - Type of content stored by the data file: data, equality deletes, or position deletes (all v1 files are data files)
- Data
File Builder Error - Error type for DataFileBuilder
- Data
File Format - Format of this data.
- Format
Version - Iceberg format version
- Literal
- Values present in iceberg type
- Manifest
Content Type - The type of files tracked by the manifest, either data or delete files; Data(0) for all v1 manifests
- Manifest
Status - Used to track additions and deletions in ManifestEntry.
- Null
Order - Describes the order of null values when sorted.
- Operation
- The operation field is used by some operations, like snapshot expiration, to skip processing certain snapshots.
- Primitive
Literal - Values present in iceberg type
- Primitive
Type - Primitive data types
- Snapshot
Retention - The snapshot expiration procedure removes snapshots from table metadata and applies the table’s retention policy.
- Sort
Direction - Sort direction in a partition, either ascending or descending
- Sort
Order Builder Error - Error type for SortOrderBuilder
- Transform
- Transform is used to transform predicates to partition predicates, in addition to transforming data values.
- Type
- All data types are either primitives or nested types, which are maps, lists, or structs.
- View
Format Version - Iceberg format version
- View
Representation - View definitions can be represented in multiple ways. Representations are documented ways to express a view definition.
Constants§
- DEFAULT_
SCHEMA_ ID - Default schema id.
- DEFAULT_
SCHEMA_ NAME_ MAPPING - Property name for name mapping.
- INITIAL_
ROW_ ID - Initial row id for row lineage for new v3 tables and older tables upgrading to v3.
- LIST_
FIELD_ NAME - Field name for list type.
- MAIN_
BRANCH - The ref name of the main branch of the table.
- MAP_
KEY_ FIELD_ NAME - Field name for map type’s key.
- MAP_
VALUE_ FIELD_ NAME - Field name for map type’s value.
- MIN_
FORMAT_ VERSION_ ROW_ LINEAGE - Minimum format version that supports row lineage (v3).
- UNASSIGNED_
SEQUENCE_ NUMBER - Placeholder for sequence number. The field with this value must be replaced with the actual sequence number before it write.
- UNASSIGNED_
SNAPSHOT_ ID - Placeholder for snapshot ID. The field with this value must be replaced with the actual snapshot ID before it is committed.
- VIEW_
PROPERTY_ REPLACE_ DROP_ DIALECT_ ALLOWED - Property key for allowing to drop dialects when replacing a view.
- VIEW_
PROPERTY_ REPLACE_ DROP_ DIALECT_ ALLOWED_ DEFAULT - Default value for the property key for allowing to drop dialects when replacing a view.
- VIEW_
PROPERTY_ VERSION_ HISTORY_ SIZE - Property key for the number of history entries to keep.
- VIEW_
PROPERTY_ VERSION_ HISTORY_ SIZE_ DEFAULT - Default value for the property key for the number of history entries to keep.
Traits§
- Partner
Accessor - Accessor used to get child partner from parent partner.
- Schema
Visitor - A post order schema visitor.
- Schema
With Partner Visitor - A post order schema visitor with partner.
Functions§
- deserialize_
data_ file_ from_ json - Deserialize a DataFile from a JSON string.
- prune_
columns - Visit a schema and returns only the fields selected by id set
- read_
data_ files_ from_ avro - Parse data files from avro bytes.
- serialize_
data_ file_ to_ json - Serialize a DataFile to a JSON string.
- visit_
schema - Visit schema in post order.
- visit_
schema_ with_ partner - Visit schema in post order.
- visit_
struct - Visit struct type in post order.
- visit_
struct_ with_ partner - Visit struct type in post order.
- write_
data_ files_ to_ avro - Convert data files to avro bytes and write to writer. Return the bytes written.
Type Aliases§
- Manifest
Entry Ref - Reference to
ManifestEntry. - Nested
Field Ref - Reference to nested field.
- Partition
Spec Ref - Reference to
PartitionSpec. - Schema
Id - Type alias for schema id.
- Schema
Ref - Reference to
Schema. - Snapshot
Ref - Reference to
Snapshot. - Sort
Order Ref - Reference to
SortOrder. - Table
Metadata Ref - Reference to
TableMetadata. - Unbound
Partition Spec Ref - Reference to
UnboundPartitionSpec. - View
Metadata Ref - Reference to
ViewMetadata. - View
Version Id - Alias for the integer type used for view version ids.
- View
Version Ref - Reference to
ViewVersion.