pub struct DataFileBuilder { /* private fields */ }
Expand description
Builder for DataFile
.
Implementations§
Source§impl DataFileBuilder
impl DataFileBuilder
Sourcepub fn content(&mut self, value: DataContentType) -> &mut Self
pub fn content(&mut self, value: DataContentType) -> &mut Self
field id: 134
Type of content stored by the data file: data, equality deletes, or position deletes (all v1 files are data files)
Sourcepub fn file_path(&mut self, value: String) -> &mut Self
pub fn file_path(&mut self, value: String) -> &mut Self
field id: 100
Full URI for the file with FS scheme
Sourcepub fn file_format(&mut self, value: DataFileFormat) -> &mut Self
pub fn file_format(&mut self, value: DataFileFormat) -> &mut Self
field id: 101
String file format name, avro
, orc
, parquet
, or puffin
Sourcepub fn partition(&mut self, value: Struct) -> &mut Self
pub fn partition(&mut self, value: Struct) -> &mut Self
field id: 102
Partition data tuple, schema based on the partition spec output using partition field ids for the struct field ids
Sourcepub fn record_count(&mut self, value: u64) -> &mut Self
pub fn record_count(&mut self, value: u64) -> &mut Self
field id: 103
Number of records in this file, or the cardinality of a deletion vector
Sourcepub fn file_size_in_bytes(&mut self, value: u64) -> &mut Self
pub fn file_size_in_bytes(&mut self, value: u64) -> &mut Self
field id: 104
Total file size in bytes
Sourcepub fn column_sizes(&mut self, value: HashMap<i32, u64>) -> &mut Self
pub fn column_sizes(&mut self, value: HashMap<i32, u64>) -> &mut Self
field id: 108 key field id: 117 value field id: 118
Map from column id to the total size on disk of all regions that store the column. Does not include bytes necessary to read other columns, like footers. Leave null for row-oriented formats (Avro)
Sourcepub fn value_counts(&mut self, value: HashMap<i32, u64>) -> &mut Self
pub fn value_counts(&mut self, value: HashMap<i32, u64>) -> &mut Self
field id: 109 key field id: 119 value field id: 120
Map from column id to number of values in the column (including null and NaN values)
Sourcepub fn null_value_counts(&mut self, value: HashMap<i32, u64>) -> &mut Self
pub fn null_value_counts(&mut self, value: HashMap<i32, u64>) -> &mut Self
field id: 110 key field id: 121 value field id: 122
Map from column id to number of null values in the column
Sourcepub fn nan_value_counts(&mut self, value: HashMap<i32, u64>) -> &mut Self
pub fn nan_value_counts(&mut self, value: HashMap<i32, u64>) -> &mut Self
field id: 137 key field id: 138 value field id: 139
Map from column id to number of NaN values in the column
Sourcepub fn lower_bounds(&mut self, value: HashMap<i32, Datum>) -> &mut Self
pub fn lower_bounds(&mut self, value: HashMap<i32, Datum>) -> &mut Self
field id: 125 key field id: 126 value field id: 127
Map from column id to lower bound in the column serialized as binary. Each value must be less than or equal to all non-null, non-NaN values in the column for the file.
Reference:
Sourcepub fn upper_bounds(&mut self, value: HashMap<i32, Datum>) -> &mut Self
pub fn upper_bounds(&mut self, value: HashMap<i32, Datum>) -> &mut Self
field id: 128 key field id: 129 value field id: 130
Map from column id to upper bound in the column serialized as binary. Each value must be greater than or equal to all non-null, non-Nan values in the column for the file.
Reference:
Sourcepub fn key_metadata(&mut self, value: Option<Vec<u8>>) -> &mut Self
pub fn key_metadata(&mut self, value: Option<Vec<u8>>) -> &mut Self
field id: 131
Implementation-specific key metadata for encryption
Sourcepub fn split_offsets(&mut self, value: Vec<i64>) -> &mut Self
pub fn split_offsets(&mut self, value: Vec<i64>) -> &mut Self
field id: 132 element field id: 133
Split offsets for the data file. For example, all row group offsets in a Parquet file. Must be sorted ascending
Sourcepub fn equality_ids(&mut self, value: Vec<i32>) -> &mut Self
pub fn equality_ids(&mut self, value: Vec<i32>) -> &mut Self
field id: 135 element field id: 136
Field ids used to determine row equality in equality delete files. Required when content is EqualityDeletes and should be null otherwise. Fields with ids listed in this column must be present in the delete file
Sourcepub fn sort_order_id(&mut self, value: i32) -> &mut Self
pub fn sort_order_id(&mut self, value: i32) -> &mut Self
field id: 140
ID representing sort order for this file.
If sort order ID is missing or unknown, then the order is assumed to be unsorted. Only data files and equality delete files should be written with a non-null order id. Position deletes are required to be sorted by file and position, not a table order, and should set sort order id to null. Readers must ignore sort order id for position delete files.
Sourcepub fn first_row_id(&mut self, value: Option<i64>) -> &mut Self
pub fn first_row_id(&mut self, value: Option<i64>) -> &mut Self
field id: 142
The _row_id for the first row in the data file. For more details, refer to https://github.com/apache/iceberg/blob/main/format/spec.md#first-row-id-inheritance
Sourcepub fn partition_spec_id(&mut self, value: i32) -> &mut Self
pub fn partition_spec_id(&mut self, value: i32) -> &mut Self
This field is not included in spec. It is just store in memory representation used in process.
Sourcepub fn referenced_data_file(&mut self, value: Option<String>) -> &mut Self
pub fn referenced_data_file(&mut self, value: Option<String>) -> &mut Self
field id: 143
Fully qualified location (URI with FS scheme) of a data file that all deletes reference.
Position delete metadata can use referenced_data_file
when all deletes tracked by the
entry are in a single data file. Setting the referenced file is required for deletion vectors.
Sourcepub fn content_offset(&mut self, value: Option<i64>) -> &mut Self
pub fn content_offset(&mut self, value: Option<i64>) -> &mut Self
field: 144
The offset in the file where the content starts.
The content_offset
and content_size_in_bytes
fields are used to reference a specific blob
for direct access to a deletion vector. For deletion vectors, these values are required and must
exactly match the offset
and length
stored in the Puffin footer for the deletion vector blob.
Sourcepub fn content_size_in_bytes(&mut self, value: Option<i64>) -> &mut Self
pub fn content_size_in_bytes(&mut self, value: Option<i64>) -> &mut Self
field: 145
The length of a referenced content stored in the file; required if content_offset
is present
Trait Implementations§
Source§impl Clone for DataFileBuilder
impl Clone for DataFileBuilder
Source§fn clone(&self) -> DataFileBuilder
fn clone(&self) -> DataFileBuilder
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moreAuto Trait Implementations§
impl Freeze for DataFileBuilder
impl RefUnwindSafe for DataFileBuilder
impl Send for DataFileBuilder
impl Sync for DataFileBuilder
impl Unpin for DataFileBuilder
impl UnwindSafe for DataFileBuilder
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self> ⓘ
fn into_either(self, into_left: bool) -> Either<Self, Self> ⓘ
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self> ⓘ
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self> ⓘ
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more