Module spec

Source
Expand description

Spec for Iceberg.

Structs§

BlobMetadata
Represents a blob of metadata, which is a part of a statistics file
ByteBuf
Wrapper around Vec<u8> to serialize and deserialize efficiently.
DataFile
Data file carries data file path, partition tuple, metrics, …
DataFileBuilder
Builder for DataFile.
Datum
Literal associated with its type. The value and type pair is checked when construction, so the type and value is guaranteed to be correct when used.
EncryptedKey
Keys used for table encryption
FieldSummary
Field summary for partition field in the spec.
ListType
A list is a collection of values with some element type. The element field has an integer id that is unique in the table schema. Elements can be either optional or required. Element types may be any type.
Manifest
A manifest contains metadata and a list of entries.
ManifestEntry
A manifest is an immutable Avro file that lists data files or delete files, along with each file’s partition data tuple, metrics, and tracking information.
ManifestFile
Entry in a manifest list.
ManifestList
Snapshots are embedded in table metadata, but the list of manifests for a snapshot are stored in a separate manifest list file.
ManifestListWriter
A manifest list writer.
ManifestMetadata
Meta data of a manifest that is stored in the key-value metadata of the Avro file
ManifestWriter
A manifest writer.
ManifestWriterBuilder
The builder used to create a ManifestWriter.
Map
Map is a collection of key-value pairs with a key type and a value type. It used in Literal::Map, to make it hashable, the order of key-value pairs is stored in a separate vector so that we can hash the map in a deterministic way. But it also means that the order of key-value pairs is matter for the hash value.
MapType
A map is a collection of key-value pairs with a key type and a value type. Both the key field and value field each have an integer id that is unique in the table schema. Map keys are required and map values can be either optional or required. Both map keys and map values may be any type, including nested types.
MappedField
Maps field names to IDs.
MetadataLog
Encodes changes to the previous metadata files for the table
NameMapping
Iceberg fallback field name to ID mapping.
NestedField
A struct is a tuple of typed values. Each field in the tuple is named and has an integer id that is unique in the table schema. Each field can be either optional or required, meaning that values can (or cannot) be null. Fields may be any type. Fields may have an optional comment or doc string. Fields can have default values.
PartitionField
Partition fields capture the transform from table data to partition values.
PartitionKey
A partition key represents a specific partition in a table, containing the partition spec, schema, and the actual partition values.
PartitionSpec
Partition spec that defines how to produce a tuple of partition values from a record.
PartitionSpecBuilder
Create valid partition specs for a given schema.
PartitionStatisticsFile
Statistics file for a partition
Schema
Defines schema in iceberg.
SchemaBuilder
Schema builder.
Snapshot
A snapshot represents the state of a table at some time and is used to access the complete set of data files in the table.
SnapshotLog
A log of when each snapshot was made.
SnapshotReference
Iceberg tables keep track of branches and tags using snapshot references.
SnapshotRowRange
Row range of a snapshot, contains first_row_id and added_rows_count.
SnapshotSummaryCollector
SnapshotSummaryCollector collects and aggregates snapshot update metrics. It gathers metrics about added or removed data files and manifests, and tracks partition-specific updates.
SortField
Entry for every column that is to be sorted
SortOrder
A sort order is defined by a sort order id and a list of sort fields. The order of the sort fields within the list defines the order in which the sort is applied to the data.
SortOrderBuilder
Builder for SortOrder.
SqlViewRepresentation
The SQL representation stores the view definition as a SQL SELECT, with metadata such as the SQL dialect.
StatisticsFile
Represents a statistics file
Struct
The partition struct stores the tuple of partition values for each file. Its type is derived from the partition fields of the partition spec used to write the manifest file. In v2, the partition struct’s field ids must match the ids from the partition spec.
StructType
DataType for a specific struct
Summary
Summarises the changes in the snapshot.
TableMetadata
Fields for the version 2 of the table metadata.
TableMetadataBuildResult
Result of modifying or creating a TableMetadata.
TableMetadataBuilder
Manipulating table metadata.
TableProperties
TableProperties that contains the properties of a table.
UnboundPartitionField
Unbound partition field can be built without a schema and later bound to a schema.
UnboundPartitionSpec
Unbound partition spec can be built without a schema and later bound to a schema. They are used to transport schema information as part of the REST specification. The main difference to PartitionSpec is that the field ids are optional.
UnboundPartitionSpecBuilder
Create a new UnboundPartitionSpec
ViewMetadata
Fields for the version 1 of the view metadata.
ViewMetadataBuilder
Manipulating view metadata.
ViewRepresentations
A list of view representations.
ViewVersion
A view versions represents the definition of a view at a specific point in time.
ViewVersionLog
A log of when each snapshot was made.

Enums§

DataContentType
Type of content stored by the data file: data, equality deletes, or position deletes (all v1 files are data files)
DataFileBuilderError
Error type for DataFileBuilder
DataFileFormat
Format of this data.
FormatVersion
Iceberg format version
Literal
Values present in iceberg type
ManifestContentType
The type of files tracked by the manifest, either data or delete files; Data(0) for all v1 manifests
ManifestStatus
Used to track additions and deletions in ManifestEntry.
NullOrder
Describes the order of null values when sorted.
Operation
The operation field is used by some operations, like snapshot expiration, to skip processing certain snapshots.
PrimitiveLiteral
Values present in iceberg type
PrimitiveType
Primitive data types
SnapshotRetention
The snapshot expiration procedure removes snapshots from table metadata and applies the table’s retention policy.
SortDirection
Sort direction in a partition, either ascending or descending
SortOrderBuilderError
Error type for SortOrderBuilder
Transform
Transform is used to transform predicates to partition predicates, in addition to transforming data values.
Type
All data types are either primitives or nested types, which are maps, lists, or structs.
ViewFormatVersion
Iceberg format version
ViewRepresentation
View definitions can be represented in multiple ways. Representations are documented ways to express a view definition.

Constants§

DEFAULT_SCHEMA_ID
Default schema id.
DEFAULT_SCHEMA_NAME_MAPPING
Property name for name mapping.
INITIAL_ROW_ID
Initial row id for row lineage for new v3 tables and older tables upgrading to v3.
LIST_FIELD_NAME
Field name for list type.
MAIN_BRANCH
The ref name of the main branch of the table.
MAP_KEY_FIELD_NAME
Field name for map type’s key.
MAP_VALUE_FIELD_NAME
Field name for map type’s value.
MIN_FORMAT_VERSION_ROW_LINEAGE
Minimum format version that supports row lineage (v3).
UNASSIGNED_SEQUENCE_NUMBER
Placeholder for sequence number. The field with this value must be replaced with the actual sequence number before it write.
UNASSIGNED_SNAPSHOT_ID
Placeholder for snapshot ID. The field with this value must be replaced with the actual snapshot ID before it is committed.
VIEW_PROPERTY_REPLACE_DROP_DIALECT_ALLOWED
Property key for allowing to drop dialects when replacing a view.
VIEW_PROPERTY_REPLACE_DROP_DIALECT_ALLOWED_DEFAULT
Default value for the property key for allowing to drop dialects when replacing a view.
VIEW_PROPERTY_VERSION_HISTORY_SIZE
Property key for the number of history entries to keep.
VIEW_PROPERTY_VERSION_HISTORY_SIZE_DEFAULT
Default value for the property key for the number of history entries to keep.

Traits§

PartnerAccessor
Accessor used to get child partner from parent partner.
SchemaVisitor
A post order schema visitor.
SchemaWithPartnerVisitor
A post order schema visitor with partner.

Functions§

deserialize_data_file_from_json
Deserialize a DataFile from a JSON string.
prune_columns
Visit a schema and returns only the fields selected by id set
read_data_files_from_avro
Parse data files from avro bytes.
serialize_data_file_to_json
Serialize a DataFile to a JSON string.
visit_schema
Visit schema in post order.
visit_schema_with_partner
Visit schema in post order.
visit_struct
Visit struct type in post order.
visit_struct_with_partner
Visit struct type in post order.
write_data_files_to_avro
Convert data files to avro bytes and write to writer. Return the bytes written.

Type Aliases§

ManifestEntryRef
Reference to ManifestEntry.
NestedFieldRef
Reference to nested field.
PartitionSpecRef
Reference to PartitionSpec.
SchemaId
Type alias for schema id.
SchemaRef
Reference to Schema.
SnapshotRef
Reference to Snapshot.
SortOrderRef
Reference to SortOrder.
TableMetadataRef
Reference to TableMetadata.
UnboundPartitionSpecRef
Reference to UnboundPartitionSpec.
ViewMetadataRef
Reference to ViewMetadata.
ViewVersionId
Alias for the integer type used for view version ids.
ViewVersionRef
Reference to ViewVersion.