Struct DataFileBuilder Copy item path

impl DataFileBuilder

pub fn content(&mut self, value: DataContentType) -> &mut Self

field id: 134

Type of content stored by the data file: data, equality deletes, or position deletes (all v1 files are data files)

pub fn file_path(&mut self, value: String) -> &mut Self

field id: 100

Full URI for the file with FS scheme

pub fn file_format(&mut self, value: DataFileFormat) -> &mut Self

field id: 101

String file format name, avro, orc, parquet, or puffin

pub fn partition(&mut self, value: Struct) -> &mut Self

field id: 102

Partition data tuple, schema based on the partition spec output using partition field ids for the struct field ids

pub fn record_count(&mut self, value: u64) -> &mut Self

field id: 103

Number of records in this file, or the cardinality of a deletion vector

pub fn file_size_in_bytes(&mut self, value: u64) -> &mut Self

field id: 104

Total file size in bytes

pub fn column_sizes(&mut self, value: HashMap<i32, u64>) -> &mut Self

field id: 108 key field id: 117 value field id: 118

Map from column id to the total size on disk of all regions that store the column. Does not include bytes necessary to read other columns, like footers. Leave null for row-oriented formats (Avro)

pub fn value_counts(&mut self, value: HashMap<i32, u64>) -> &mut Self

field id: 109 key field id: 119 value field id: 120

Map from column id to number of values in the column (including null and NaN values)

pub fn null_value_counts(&mut self, value: HashMap<i32, u64>) -> &mut Self

field id: 110 key field id: 121 value field id: 122

Map from column id to number of null values in the column

pub fn nan_value_counts(&mut self, value: HashMap<i32, u64>) -> &mut Self

field id: 137 key field id: 138 value field id: 139

Map from column id to number of NaN values in the column

pub fn lower_bounds(&mut self, value: HashMap<i32, Datum>) -> &mut Self

field id: 125 key field id: 126 value field id: 127

Map from column id to lower bound in the column serialized as binary. Each value must be less than or equal to all non-null, non-NaN values in the column for the file.

Reference:

Binary single-value serialization

pub fn upper_bounds(&mut self, value: HashMap<i32, Datum>) -> &mut Self

field id: 128 key field id: 129 value field id: 130

Map from column id to upper bound in the column serialized as binary. Each value must be greater than or equal to all non-null, non-Nan values in the column for the file.

Reference:

Binary single-value serialization

pub fn key_metadata(&mut self, value: Option<Vec<u8>>) -> &mut Self

field id: 131

Implementation-specific key metadata for encryption

pub fn split_offsets(&mut self, value: Vec<i64>) -> &mut Self

field id: 132 element field id: 133

Split offsets for the data file. For example, all row group offsets in a Parquet file. Must be sorted ascending

pub fn equality_ids(&mut self, value: Vec<i32>) -> &mut Self

field id: 135 element field id: 136

Field ids used to determine row equality in equality delete files. Required when content is EqualityDeletes and should be null otherwise. Fields with ids listed in this column must be present in the delete file

pub fn sort_order_id(&mut self, value: i32) -> &mut Self

field id: 140

ID representing sort order for this file.

If sort order ID is missing or unknown, then the order is assumed to be unsorted. Only data files and equality delete files should be written with a non-null order id. Position deletes are required to be sorted by file and position, not a table order, and should set sort order id to null. Readers must ignore sort order id for position delete files.

pub fn first_row_id(&mut self, value: Option<i64>) -> &mut Self

field id: 142

The _row_id for the first row in the data file. For more details, refer to https://github.com/apache/iceberg/blob/main/format/spec.md#first-row-id-inheritance

pub fn partition_spec_id(&mut self, value: i32) -> &mut Self

This field is not included in spec. It is just store in memory representation used in process.

pub fn referenced_data_file(&mut self, value: Option<String>) -> &mut Self

field id: 143

Fully qualified location (URI with FS scheme) of a data file that all deletes reference. Position delete metadata can use referenced_data_file when all deletes tracked by the entry are in a single data file. Setting the referenced file is required for deletion vectors.

pub fn content_offset(&mut self, value: Option<i64>) -> &mut Self

field: 144

The offset in the file where the content starts. The content_offset and content_size_in_bytes fields are used to reference a specific blob for direct access to a deletion vector. For deletion vectors, these values are required and must exactly match the offset and length stored in the Puffin footer for the deletion vector blob.

pub fn content_size_in_bytes(&mut self, value: Option<i64>) -> &mut Self

field: 145

The length of a referenced content stored in the file; required if content_offset is present

pub fn build(&self) -> Result<DataFile, DataFileBuilderError>

Builds a new DataFile.

§Errors

If a required field has not been initialized.

Trait Implementations§

impl Clone for DataFileBuilder

fn clone(&self) -> DataFileBuilder

Returns a copy of the value. Read more

1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

impl Default for DataFileBuilder

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

impl UnwindSafe for DataFileBuilder

Blanket Implementations§

impl<T> Any for T
where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T
where T: ?Sized,

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T
where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> CloneToUninit for T
where T: Clone,

unsafe fn clone_to_uninit(&self, dst: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dst. Read more

impl<T> From<T> for T

fn from(t: T) -> T

Returns the argument unchanged.

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided [Span], returning an Instrumented wrapper. Read more

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

impl<T, U> Into for T
where U: From<T>,

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self> ⓘ

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self> ⓘ
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more