Module spec

Help

Source

Expand description

Spec for Iceberg.

Structs§

BlobMetadata
Represents a blob of metadata, which is a part of a statistics file
DataFile
Data file carries data file path, partition tuple, metrics, …
DataFileBuilder
Builder for DataFile.
Datum
Literal associated with its type. The value and type pair is checked when construction, so the type and value is guaranteed to be correct when used.
FieldSummary
Field summary for partition field in the spec.
ListType
A list is a collection of values with some element type. The element field has an integer id that is unique in the table schema. Elements can be either optional or required. Element types may be any type.
Manifest
A manifest contains metadata and a list of entries.
ManifestEntry
A manifest is an immutable Avro file that lists data files or delete files, along with each file’s partition data tuple, metrics, and tracking information.
ManifestFile
Entry in a manifest list.
ManifestList
Snapshots are embedded in table metadata, but the list of manifests for a snapshot are stored in a separate manifest list file.
ManifestListWriter
A manifest list writer.
ManifestMetadata
Meta data of a manifest that is stored in the key-value metadata of the Avro file
ManifestWriter
A manifest writer.
ManifestWriterBuilder
The builder used to create a ManifestWriter.
Map
Map is a collection of key-value pairs with a key type and a value type. It used in Literal::Map, to make it hashable, the order of key-value pairs is stored in a separate vector so that we can hash the map in a deterministic way. But it also means that the order of key-value pairs is matter for the hash value.
MapType
A map is a collection of key-value pairs with a key type and a value type. Both the key field and value field each have an integer id that is unique in the table schema. Map keys are required and map values can be either optional or required. Both map keys and map values may be any type, including nested types.
MappedField
Maps field names to IDs.
MetadataLog
Encodes changes to the previous metadata files for the table
NameMapping
Iceberg fallback field name to ID mapping.
NestedField
A struct is a tuple of typed values. Each field in the tuple is named and has an integer id that is unique in the table schema. Each field can be either optional or required, meaning that values can (or cannot) be null. Fields may be any type. Fields may have an optional comment or doc string. Fields can have default values.
PartitionField
Partition fields capture the transform from table data to partition values.
PartitionSpec
Partition spec that defines how to produce a tuple of partition values from a record.
PartitionSpecBuilder
Create valid partition specs for a given schema.
PartitionStatisticsFile
Statistics file for a partition
RawLiteral
Raw literal representation used for serde. The serialize way is used for Avro serializer.
Schema
Defines schema in iceberg.
SchemaBuilder
Schema builder.
Snapshot
A snapshot represents the state of a table at some time and is used to access the complete set of data files in the table.
SnapshotLog
A log of when each snapshot was made.
SnapshotReference
Iceberg tables keep track of branches and tags using snapshot references.
SnapshotSummaryCollector
SnapshotSummaryCollector collects and aggregates snapshot update metrics. It gathers metrics about added or removed data files and manifests, and tracks partition-specific updates.
SortField
Entry for every column that is to be sorted
SortOrder
A sort order is defined by a sort order id and a list of sort fields. The order of the sort fields within the list defines the order in which the sort is applied to the data.
SortOrderBuilder
Builder for SortOrder.
SqlViewRepresentation
The SQL representation stores the view definition as a SQL SELECT, with metadata such as the SQL dialect.
StatisticsFile
Represents a statistics file
Struct
The partition struct stores the tuple of partition values for each file. Its type is derived from the partition fields of the partition spec used to write the manifest file. In v2, the partition struct’s field ids must match the ids from the partition spec.
StructType
DataType for a specific struct
Summary
Summarises the changes in the snapshot.
TableMetadata
Fields for the version 2 of the table metadata.
TableMetadataBuildResult
Result of modifying or creating a TableMetadata.
TableMetadataBuilder
Manipulating table metadata.
UnboundPartitionField
Unbound partition field can be built without a schema and later bound to a schema.
UnboundPartitionSpec
Unbound partition spec can be built without a schema and later bound to a schema. They are used to transport schema information as part of the REST specification. The main difference to PartitionSpec is that the field ids are optional.
UnboundPartitionSpecBuilder
Create a new UnboundPartitionSpec
ViewMetadata
Fields for the version 1 of the view metadata.
ViewMetadataBuilder
Manipulating view metadata.
ViewRepresentations
A list of view representations.
ViewVersion
A view versions represents the definition of a view at a specific point in time.
ViewVersionLog
A log of when each snapshot was made.

Enums§

DataContentType
Type of content stored by the data file: data, equality deletes, or position deletes (all v1 files are data files)
DataFileBuilderError
Error type for DataFileBuilder
DataFileFormat
Format of this data.
FormatVersion
Iceberg format version
Literal
Values present in iceberg type
ManifestContentType
The type of files tracked by the manifest, either data or delete files; Data(0) for all v1 manifests
ManifestStatus
Used to track additions and deletions in ManifestEntry.
NullOrder
Describes the order of null values when sorted.
Operation
The operation field is used by some operations, like snapshot expiration, to skip processing certain snapshots.
PrimitiveLiteral
Values present in iceberg type
PrimitiveType
Primitive data types
SnapshotRetention
The snapshot expiration procedure removes snapshots from table metadata and applies the table’s retention policy.
SortDirection
Sort direction in a partition, either ascending or descending
SortOrderBuilderError
Error type for SortOrderBuilder
Transform
Transform is used to transform predicates to partition predicates, in addition to transforming data values.
Type
All data types are either primitives or nested types, which are maps, lists, or structs.
ViewFormatVersion
Iceberg format version
ViewRepresentation
View definitions can be represented in multiple ways. Representations are documented ways to express a view definition.

Constants§

DEFAULT_SCHEMA_NAME_MAPPING
Property name for name mapping.
LIST_FIELD_NAME
Field name for list type.
MAIN_BRANCH
The ref name of the main branch of the table.
MAP_KEY_FIELD_NAME
Field name for map type’s key.
MAP_VALUE_FIELD_NAME
Field name for map type’s value.
PROPERTY_CURRENT_SCHEMA
Reserved table property for the JSON representation of current schema.
PROPERTY_CURRENT_SNAPSHOT_ID
Reserved table property for current snapshot id.
PROPERTY_CURRENT_SNAPSHOT_SUMMARY
Reserved table property for current snapshot summary.
PROPERTY_CURRENT_SNAPSHOT_TIMESTAMP
Reserved table property for current snapshot timestamp.
PROPERTY_DEFAULT_PARTITION_SPEC
Reserved table property for the JSON representation of current(default) partition spec.
PROPERTY_DEFAULT_SORT_ORDER
Reserved table property for the JSON representation of current(default) sort order.
PROPERTY_FORMAT_VERSION
Reserved table property for table format version.
PROPERTY_METADATA_PREVIOUS_VERSIONS_MAX
Property key for max number of previous versions to keep.
PROPERTY_METADATA_PREVIOUS_VERSIONS_MAX_DEFAULT
Default value for max number of previous versions to keep.
PROPERTY_SNAPSHOT_COUNT
Reserved table property for the total number of snapshots.
PROPERTY_UUID
Reserved table property for table UUID.
PROPERTY_WRITE_PARTITION_SUMMARY_LIMIT
Property key for max number of partitions to keep summary stats for.
PROPERTY_WRITE_PARTITION_SUMMARY_LIMIT_DEFAULT
Default value for the max number of partitions to keep summary stats for.
RESERVED_PROPERTIES
Reserved Iceberg table properties list.
UNASSIGNED_SEQUENCE_NUMBER
Placeholder for sequence number. The field with this value must be replaced with the actual sequence number before it write.
UNASSIGNED_SNAPSHOT_ID
Placeholder for snapshot ID. The field with this value must be replaced with the actual snapshot ID before it is committed.
VIEW_PROPERTY_REPLACE_DROP_DIALECT_ALLOWED
Property key for allowing to drop dialects when replacing a view.
VIEW_PROPERTY_REPLACE_DROP_DIALECT_ALLOWED_DEFAULT
Default value for the property key for allowing to drop dialects when replacing a view.
VIEW_PROPERTY_VERSION_HISTORY_SIZE
Property key for the number of history entries to keep.
VIEW_PROPERTY_VERSION_HISTORY_SIZE_DEFAULT
Default value for the property key for the number of history entries to keep.

Traits§

PartnerAccessor
Accessor used to get child partner from parent partner.
SchemaVisitor
A post order schema visitor.
SchemaWithPartnerVisitor
A post order schema visitor with partner.

Functions§

prune_columns
Visit a schema and returns only the fields selected by id set
read_data_files_from_avro
Parse data files from avro bytes.
visit_schema
Visit schema in post order.
visit_schema_with_partner
Visit schema in post order.
visit_struct
Visit struct type in post order.
visit_struct_with_partner
Visit struct type in post order.
write_data_files_to_avro
Convert data files to avro bytes and write to writer. Return the bytes written.

Type Aliases§

ManifestEntryRef
Reference to ManifestEntry.
NestedFieldRef
Reference to nested field.
PartitionSpecRef
Reference to PartitionSpec.
SchemaId
Type alias for schema id.
SchemaRef
Reference to Schema.
SnapshotRef
Reference to Snapshot.
SortOrderRef
Reference to SortOrder.
TableMetadataRef
Reference to TableMetadata.
UnboundPartitionSpecRef
Reference to UnboundPartitionSpec.
ViewMetadataRef
Reference to ViewMetadata.
ViewVersionId
Alias for the integer type used for view version ids.
ViewVersionRef
Reference to ViewVersion.

Module specCopy item path

Structs§

Enums§

Constants§

Traits§

Functions§

Type Aliases§

Module spec