Class KeyValueGroupedDataset<K,V> 
- All Implemented Interfaces:
- Serializable
Dataset has been logically grouped by a user specified grouping key.  Users should not
 construct a KeyValueGroupedDataset directly, but should instead call groupByKey on
 an existing Dataset.
 - Since:
- 2.0.0
- See Also:
- 
Method SummaryModifier and TypeMethodDescriptionagg(TypedColumn<V, U1> col1) Computes the given aggregation, returning aDatasetof tuples for each unique key and the result of computing this aggregation over all elements in the group.agg(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2) Computes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.agg(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3) Computes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.agg(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4) Computes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.agg(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5) Computes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.agg(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5, TypedColumn<V, U6> col6) Computes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.agg(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5, TypedColumn<V, U6> col6, TypedColumn<V, U7> col7) Computes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.agg(TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5, TypedColumn<V, U6> col6, TypedColumn<V, U7> col7, TypedColumn<V, U8> col8) Computes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.<U,R> Dataset<R> cogroup(KeyValueGroupedDataset<K, U> other, CoGroupFunction<K, V, U, R> f, Encoder<R> encoder) <U,R> Dataset<R> cogroup(KeyValueGroupedDataset<K, U> other, scala.Function3<K, scala.collection.Iterator<V>, scala.collection.Iterator<U>, scala.collection.IterableOnce<R>> f, Encoder<R> evidence$30) <U,R> Dataset<R> cogroupSorted(KeyValueGroupedDataset<K, U> other, Column[] thisSortExprs, Column[] otherSortExprs, CoGroupFunction<K, V, U, R> f, Encoder<R> encoder) <U,R> Dataset<R> cogroupSorted(KeyValueGroupedDataset<K, U> other, scala.collection.immutable.Seq<Column> thisSortExprs, scala.collection.immutable.Seq<Column> otherSortExprs, scala.Function3<K, scala.collection.Iterator<V>, scala.collection.Iterator<U>, scala.collection.IterableOnce<R>> f, Encoder<R> evidence$21) count()Returns aDatasetthat contains a tuple with each key and the number of items present for that key.<U> Dataset<U>flatMapGroups(FlatMapGroupsFunction<K, V, U> f, Encoder<U> encoder) (Java-specific) Applies the given function to each group of data.<U> Dataset<U>flatMapGroups(scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce<U>> f, Encoder<U> evidence$22) (Scala-specific) Applies the given function to each group of data.<S,U> Dataset<U> flatMapGroupsWithState(FlatMapGroupsWithStateFunction<K, V, S, U> func, OutputMode outputMode, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf) (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.<S,U> Dataset<U> flatMapGroupsWithState(FlatMapGroupsWithStateFunction<K, V, S, U> func, OutputMode outputMode, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState) <S,U> Dataset<U> flatMapGroupsWithState(OutputMode outputMode, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, scala.collection.Iterator<U>> func, Encoder<S> evidence$12, Encoder<U> evidence$13) <S,U> Dataset<U> flatMapGroupsWithState(OutputMode outputMode, GroupStateTimeout timeoutConf, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, scala.collection.Iterator<U>> func, Encoder<S> evidence$10, Encoder<U> evidence$11) (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.<U> Dataset<U>flatMapSortedGroups(Column[] SortExprs, FlatMapGroupsFunction<K, V, U> f, Encoder<U> encoder) (Java-specific) Applies the given function to each group of data.<U> Dataset<U>flatMapSortedGroups(scala.collection.immutable.Seq<Column> sortExprs, scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce<U>> f, Encoder<U> evidence$3) (Scala-specific) Applies the given function to each group of data.<L> KeyValueGroupedDataset<L,V> Returns a newKeyValueGroupedDatasetwhere the type of the key has been mapped to the specified type.keys()Returns aDatasetthat contains each unique key.<U> Dataset<U>mapGroups(MapGroupsFunction<K, V, U> f, Encoder<U> encoder) (Java-specific) Applies the given function to each group of data.<U> Dataset<U>(Scala-specific) Applies the given function to each group of data.<S,U> Dataset<U> mapGroupsWithState(MapGroupsWithStateFunction<K, V, S, U> func, Encoder<S> stateEncoder, Encoder<U> outputEncoder) (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.<S,U> Dataset<U> mapGroupsWithState(MapGroupsWithStateFunction<K, V, S, U> func, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf) (Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.<S,U> Dataset<U> mapGroupsWithState(MapGroupsWithStateFunction<K, V, S, U> func, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState) <S,U> Dataset<U> mapGroupsWithState(GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, U> func, Encoder<S> evidence$8, Encoder<U> evidence$9) <S,U> Dataset<U> mapGroupsWithState(GroupStateTimeout timeoutConf, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, U> func, Encoder<S> evidence$6, Encoder<U> evidence$7) (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.<S,U> Dataset<U> mapGroupsWithState(scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, U> func, Encoder<S> evidence$4, Encoder<U> evidence$5) (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.<W> KeyValueGroupedDataset<K,W> mapValues(MapFunction<V, W> func, Encoder<W> encoder) Returns a newKeyValueGroupedDatasetwhere the given functionfunchas been applied to the data.<W> KeyValueGroupedDataset<K,W> Returns a newKeyValueGroupedDatasetwhere the given functionfunchas been applied to the data.org.apache.spark.sql.execution.QueryExecution(Java-specific) Reduces the elements of each group of data using the specified binary function.reduceGroups(scala.Function2<V, V, V> f) (Scala-specific) Reduces the elements of each group of data using the specified binary function.toString()Methods inherited from class org.apache.spark.sql.api.KeyValueGroupedDatasetcogroup, cogroup, cogroupSorted, cogroupSorted, flatMapGroupsWithState, flatMapGroupsWithState, mapGroupsWithState, mapGroupsWithState
- 
Method Details- 
aggDescription copied from class:KeyValueGroupedDatasetComputes the given aggregation, returning aDatasetof tuples for each unique key and the result of computing this aggregation over all elements in the group.- Overrides:
- aggin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- col1- (undocumented)
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
aggDescription copied from class:KeyValueGroupedDatasetComputes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
- aggin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- col1- (undocumented)
- col2- (undocumented)
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
aggpublic <U1,U2, Dataset<scala.Tuple4<K,U3> U1, aggU2, U3>> (TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3) Description copied from class:KeyValueGroupedDatasetComputes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
- aggin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- col1- (undocumented)
- col2- (undocumented)
- col3- (undocumented)
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
aggpublic <U1,U2, Dataset<scala.Tuple5<K,U3, U4> U1, aggU2, U3, U4>> (TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4) Description copied from class:KeyValueGroupedDatasetComputes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
- aggin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- col1- (undocumented)
- col2- (undocumented)
- col3- (undocumented)
- col4- (undocumented)
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
aggpublic <U1,U2, Dataset<scala.Tuple6<K,U3, U4, U5> U1, aggU2, U3, U4, U5>> (TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5) Description copied from class:KeyValueGroupedDatasetComputes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
- aggin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- col1- (undocumented)
- col2- (undocumented)
- col3- (undocumented)
- col4- (undocumented)
- col5- (undocumented)
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
aggpublic <U1,U2, Dataset<scala.Tuple7<K,U3, U4, U5, U6> U1, aggU2, U3, U4, U5, U6>> (TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5, TypedColumn<V, U6> col6) Description copied from class:KeyValueGroupedDatasetComputes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
- aggin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- col1- (undocumented)
- col2- (undocumented)
- col3- (undocumented)
- col4- (undocumented)
- col5- (undocumented)
- col6- (undocumented)
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
aggpublic <U1,U2, Dataset<scala.Tuple8<K,U3, U4, U5, U6, U7> U1, aggU2, U3, U4, U5, U6, U7>> (TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5, TypedColumn<V, U6> col6, TypedColumn<V, U7> col7) Description copied from class:KeyValueGroupedDatasetComputes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
- aggin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- col1- (undocumented)
- col2- (undocumented)
- col3- (undocumented)
- col4- (undocumented)
- col5- (undocumented)
- col6- (undocumented)
- col7- (undocumented)
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
aggpublic <U1,U2, Dataset<scala.Tuple9<K,U3, U4, U5, U6, U7, U8> U1, aggU2, U3, U4, U5, U6, U7, U8>> (TypedColumn<V, U1> col1, TypedColumn<V, U2> col2, TypedColumn<V, U3> col3, TypedColumn<V, U4> col4, TypedColumn<V, U5> col5, TypedColumn<V, U6> col6, TypedColumn<V, U7> col7, TypedColumn<V, U8> col8) Description copied from class:KeyValueGroupedDatasetComputes the given aggregations, returning aDatasetof tuples for each unique key and the result of computing these aggregations over all elements in the group.- Overrides:
- aggin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- col1- (undocumented)
- col2- (undocumented)
- col3- (undocumented)
- col4- (undocumented)
- col5- (undocumented)
- col6- (undocumented)
- col7- (undocumented)
- col8- (undocumented)
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
cogrouppublic <U,R> Dataset<R> cogroup(KeyValueGroupedDataset<K, U> other, scala.Function3<K, scala.collection.Iterator<V>, scala.collection.Iterator<U>, scala.collection.IterableOnce<R>> f, Encoder<R> evidence$30) - Inheritdoc:
 
- 
cogrouppublic <U,R> Dataset<R> cogroup(KeyValueGroupedDataset<K, U> other, CoGroupFunction<K, V, U, R> f, Encoder<R> encoder) - Inheritdoc:
 
- 
cogroupSortedpublic <U,R> Dataset<R> cogroupSorted(KeyValueGroupedDataset<K, U> other, scala.collection.immutable.Seq<Column> thisSortExprs, scala.collection.immutable.Seq<Column> otherSortExprs, scala.Function3<K, scala.collection.Iterator<V>, scala.collection.Iterator<U>, scala.collection.IterableOnce<R>> f, Encoder<R> evidence$21) - Inheritdoc:
 
- 
cogroupSortedpublic <U,R> Dataset<R> cogroupSorted(KeyValueGroupedDataset<K, U> other, Column[] thisSortExprs, Column[] otherSortExprs, CoGroupFunction<K, V, U, R> f, Encoder<R> encoder) - Inheritdoc:
 
- 
countDescription copied from class:KeyValueGroupedDatasetReturns aDatasetthat contains a tuple with each key and the number of items present for that key.- Overrides:
- countin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
flatMapGroupspublic <U> Dataset<U> flatMapGroups(scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce<U>> f, Encoder<U> evidence$22) Description copied from class:KeyValueGroupedDataset(Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a newDataset.This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or anorg.apache.spark.sql.expressions#Aggregator.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.- Overrides:
- flatMapGroupsin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- f- (undocumented)
- evidence$22- (undocumented)
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
flatMapGroupsDescription copied from class:KeyValueGroupedDataset(Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a newDataset.This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or anorg.apache.spark.sql.expressions#Aggregator.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.- Overrides:
- flatMapGroupsin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- f- (undocumented)
- encoder- (undocumented)
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
flatMapGroupsWithStatepublic <S,U> Dataset<U> flatMapGroupsWithState(OutputMode outputMode, GroupStateTimeout timeoutConf, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, scala.collection.Iterator<U>> func, Encoder<S> evidence$10, Encoder<U> evidence$11) Description copied from class:KeyValueGroupedDataset(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. SeeGroupStatefor more details.- Specified by:
- flatMapGroupsWithStatein class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- outputMode- The output mode of the function.
- timeoutConf- Timeout configuration for groups that do not receive data for a while.- See - Encoderfor more details on what types are encodable to Spark SQL.
- func- Function to be called on every group.
- evidence$10- (undocumented)
- evidence$11- (undocumented)
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
flatMapGroupsWithStatepublic <S,U> Dataset<U> flatMapGroupsWithState(OutputMode outputMode, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, scala.collection.Iterator<U>> func, Encoder<S> evidence$12, Encoder<U> evidence$13) - Inheritdoc:
 
- 
flatMapGroupsWithStatepublic <S,U> Dataset<U> flatMapGroupsWithState(FlatMapGroupsWithStateFunction<K, V, S, U> func, OutputMode outputMode, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf) Description copied from class:KeyValueGroupedDataset(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. SeeGroupStatefor more details.- Overrides:
- flatMapGroupsWithStatein class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- func- Function to be called on every group.
- outputMode- The output mode of the function.
- stateEncoder- Encoder for the state type.
- outputEncoder- Encoder for the output type.
- timeoutConf- Timeout configuration for groups that do not receive data for a while.- See - Encoderfor more details on what types are encodable to Spark SQL.
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
flatMapGroupsWithStatepublic <S,U> Dataset<U> flatMapGroupsWithState(FlatMapGroupsWithStateFunction<K, V, S, U> func, OutputMode outputMode, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState) - Inheritdoc:
 
- 
flatMapSortedGroupspublic <U> Dataset<U> flatMapSortedGroups(scala.collection.immutable.Seq<Column> sortExprs, scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce<U>> f, Encoder<U> evidence$3) Description copied from class:KeyValueGroupedDataset(Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and a sorted iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a newDataset.This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or anorg.apache.spark.sql.expressions#Aggregator.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.This is equivalent to KeyValueGroupedDataset.flatMapGroups(scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce<U>>, org.apache.spark.sql.Encoder<U>), except for the iterator to be sorted according to the given sort expressions. That sorting does not add computational complexity.- Specified by:
- flatMapSortedGroupsin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- sortExprs- (undocumented)
- f- (undocumented)
- evidence$3- (undocumented)
- Returns:
- (undocumented)
- See Also:
- 
- org.apache.spark.sql.api.KeyValueGroupedDataset#flatMapGroups
 
- Inheritdoc:
 
- 
flatMapSortedGroupspublic <U> Dataset<U> flatMapSortedGroups(Column[] SortExprs, FlatMapGroupsFunction<K, V, U> f, Encoder<U> encoder) Description copied from class:KeyValueGroupedDataset(Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and a sorted iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a newDataset.This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or anorg.apache.spark.sql.expressions#Aggregator.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.This is equivalent to KeyValueGroupedDataset.flatMapGroups(scala.Function2<K, scala.collection.Iterator<V>, scala.collection.IterableOnce<U>>, org.apache.spark.sql.Encoder<U>), except for the iterator to be sorted according to the given sort expressions. That sorting does not add computational complexity.- Overrides:
- flatMapSortedGroupsin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- SortExprs- (undocumented)
- f- (undocumented)
- encoder- (undocumented)
- Returns:
- (undocumented)
- See Also:
- 
- org.apache.spark.sql.api.KeyValueGroupedDataset#flatMapGroups
 
- Inheritdoc:
 
- 
keyAsDescription copied from class:KeyValueGroupedDatasetReturns a newKeyValueGroupedDatasetwhere the type of the key has been mapped to the specified type. The mapping of key columns to the type follows the same rules asasonDataset.- Specified by:
- keyAsin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- evidence$1- (undocumented)
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
keysDescription copied from class:KeyValueGroupedDatasetReturns aDatasetthat contains each unique key. This is equivalent to doing mapping over the Dataset to extract the keys and then running a distinct operation on those.- Specified by:
- keysin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
mapGroupspublic <U> Dataset<U> mapGroups(scala.Function2<K, scala.collection.Iterator<V>, U> f, Encoder<U> evidence$23) Description copied from class:KeyValueGroupedDataset(Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an element of arbitrary type which will be returned as a newDataset.This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or anorg.apache.spark.sql.expressions#Aggregator.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.- Overrides:
- mapGroupsin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- f- (undocumented)
- evidence$23- (undocumented)
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
mapGroupsDescription copied from class:KeyValueGroupedDataset(Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an element of arbitrary type which will be returned as a newDataset.This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or anorg.apache.spark.sql.expressions#Aggregator.Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.- Overrides:
- mapGroupsin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- f- (undocumented)
- encoder- (undocumented)
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
mapGroupsWithStatepublic <S,U> Dataset<U> mapGroupsWithState(scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, U> func, Encoder<S> evidence$4, Encoder<U> evidence$5) Description copied from class:KeyValueGroupedDataset(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. SeeGroupStatefor more details.- Specified by:
- mapGroupsWithStatein class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- func- Function to be called on every group.- See - Encoderfor more details on what types are encodable to Spark SQL.
- evidence$4- (undocumented)
- evidence$5- (undocumented)
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
mapGroupsWithStatepublic <S,U> Dataset<U> mapGroupsWithState(GroupStateTimeout timeoutConf, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, U> func, Encoder<S> evidence$6, Encoder<U> evidence$7) Description copied from class:KeyValueGroupedDataset(Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. SeeGroupStatefor more details.- Specified by:
- mapGroupsWithStatein class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- timeoutConf- Timeout configuration for groups that do not receive data for a while.- See - Encoderfor more details on what types are encodable to Spark SQL.
- func- Function to be called on every group.
- evidence$6- (undocumented)
- evidence$7- (undocumented)
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
mapGroupsWithStatepublic <S,U> Dataset<U> mapGroupsWithState(GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState, scala.Function3<K, scala.collection.Iterator<V>, GroupState<S>, U> func, Encoder<S> evidence$8, Encoder<U> evidence$9) - Inheritdoc:
 
- 
mapGroupsWithStatepublic <S,U> Dataset<U> mapGroupsWithState(MapGroupsWithStateFunction<K, V, S, U> func, Encoder<S> stateEncoder, Encoder<U> outputEncoder) Description copied from class:KeyValueGroupedDataset(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. SeeGroupStatefor more details.- Overrides:
- mapGroupsWithStatein class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- func- Function to be called on every group.
- stateEncoder- Encoder for the state type.
- outputEncoder- Encoder for the output type.- See - Encoderfor more details on what types are encodable to Spark SQL.
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
mapGroupsWithStatepublic <S,U> Dataset<U> mapGroupsWithState(MapGroupsWithStateFunction<K, V, S, U> func, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf) Description copied from class:KeyValueGroupedDataset(Java-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. SeeGroupStatefor more details.- Overrides:
- mapGroupsWithStatein class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- func- Function to be called on every group.
- stateEncoder- Encoder for the state type.
- outputEncoder- Encoder for the output type.
- timeoutConf- Timeout configuration for groups that do not receive data for a while.- See - Encoderfor more details on what types are encodable to Spark SQL.
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
mapGroupsWithStatepublic <S,U> Dataset<U> mapGroupsWithState(MapGroupsWithStateFunction<K, V, S, U> func, Encoder<S> stateEncoder, Encoder<U> outputEncoder, GroupStateTimeout timeoutConf, KeyValueGroupedDataset<K, S> initialState) - Inheritdoc:
 
- 
mapValuesDescription copied from class:KeyValueGroupedDatasetReturns a newKeyValueGroupedDatasetwhere the given functionfunchas been applied to the data. The grouping key is unchanged by this.// Create values grouped by key from a Dataset[(K, V)] ds.groupByKey(_._1).mapValues(_._2) // Scala- Specified by:
- mapValuesin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- func- (undocumented)
- evidence$2- (undocumented)
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
mapValuesDescription copied from class:KeyValueGroupedDatasetReturns a newKeyValueGroupedDatasetwhere the given functionfunchas been applied to the data. The grouping key is unchanged by this.// Create Integer values grouped by String key from a Dataset<Tuple2<String, Integer>> Dataset<Tuple2<String, Integer>> ds = ...; KeyValueGroupedDataset<String, Integer> grouped = ds.groupByKey(t -> t._1, Encoders.STRING()).mapValues(t -> t._2, Encoders.INT());- Overrides:
- mapValuesin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- func- (undocumented)
- encoder- (undocumented)
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
queryExecutionpublic org.apache.spark.sql.execution.QueryExecution queryExecution()
- 
reduceGroupsDescription copied from class:KeyValueGroupedDataset(Scala-specific) Reduces the elements of each group of data using the specified binary function. The given function must be commutative and associative or the result may be non-deterministic.- Specified by:
- reduceGroupsin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- f- (undocumented)
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
reduceGroupsDescription copied from class:KeyValueGroupedDataset(Java-specific) Reduces the elements of each group of data using the specified binary function. The given function must be commutative and associative or the result may be non-deterministic.- Overrides:
- reduceGroupsin class- KeyValueGroupedDataset<K,- V, - Dataset> 
- Parameters:
- f- (undocumented)
- Returns:
- (undocumented)
- Inheritdoc:
 
- 
toString
 
-