pyspark.sql.functions.nth_value#
- pyspark.sql.functions.nth_value(col, offset, ignoreNulls=False)[source]#
- Window function: returns the value that is the offsetth row of the window frame (counting from 1), and null if the size of window frame is less than offset rows. - It will return the offsetth non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned. - This is equivalent to the nth_value function in SQL. - New in version 3.1.0. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- colColumnor str
- name of column or expression 
- offsetint
- number of row to use as the value 
- ignoreNullsbool, optional
- indicates the Nth value should skip null in the determination of which row to use 
 
- col
- Returns
- Column
- value of nth row. 
 
 - Examples - >>> from pyspark.sql import Window >>> df = spark.createDataFrame([("a", 1), ... ("a", 2), ... ("a", 3), ... ("b", 8), ... ("b", 2)], ["c1", "c2"]) >>> df.show() +---+---+ | c1| c2| +---+---+ | a| 1| | a| 2| | a| 3| | b| 8| | b| 2| +---+---+ >>> w = Window.partitionBy("c1").orderBy("c2") >>> df.withColumn("nth_value", nth_value("c2", 1).over(w)).show() +---+---+---------+ | c1| c2|nth_value| +---+---+---------+ | a| 1| 1| | a| 2| 1| | a| 3| 1| | b| 2| 2| | b| 8| 2| +---+---+---------+ >>> df.withColumn("nth_value", nth_value("c2", 2).over(w)).show() +---+---+---------+ | c1| c2|nth_value| +---+---+---------+ | a| 1| NULL| | a| 2| 2| | a| 3| 2| | b| 2| NULL| | b| 8| 8| +---+---+---------+