pyspark.pandas.DataFrame.set_index#
- DataFrame.set_index(keys, drop=True, append=False, inplace=False)[source]#
- Set the DataFrame index (row labels) using one or more existing columns. - Set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length). The index can replace the existing index or expand on it. - Parameters
- keyslabel or array-like or list of labels/arrays
- This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list containing an arbitrary combination of column keys and arrays. Here, “array” encompasses - Series,- Indexand- np.ndarray.
- dropbool, default True
- Delete columns to be used as the new index. 
- appendbool, default False
- Whether to append columns to existing index. 
- inplacebool, default False
- Modify the DataFrame in place (do not create a new object). 
 
- Returns
- DataFrame
- Changed row labels. 
 
 - See also - DataFrame.reset_index
- Opposite of set_index. 
 - Examples - >>> df = ps.DataFrame({'month': [1, 4, 7, 10], ... 'year': [2012, 2014, 2013, 2014], ... 'sale': [55, 40, 84, 31]}, ... columns=['month', 'year', 'sale']) >>> df month year sale 0 1 2012 55 1 4 2014 40 2 7 2013 84 3 10 2014 31 - Set the index to become the ‘month’ column: - >>> df.set_index('month') year sale month 1 2012 55 4 2014 40 7 2013 84 10 2014 31 - Create a MultiIndex using columns ‘year’ and ‘month’: - >>> df.set_index(['year', 'month']) sale year month 2012 1 55 2014 4 40 2013 7 84 2014 10 31