pyspark.pandas.DataFrame.to_json#
- DataFrame.to_json(path=None, compression='uncompressed', num_files=None, mode='w', orient='records', lines=True, partition_cols=None, index_col=None, **options)#
- Convert the object to a JSON string. - Note - pandas-on-Spark to_json writes files to a path or URI. Unlike pandas’, pandas-on-Spark respects HDFS’s property such as ‘fs.default.name’. - Note - pandas-on-Spark writes JSON files into the directory, path, and writes multiple part-… files in the directory when path is specified. This behavior was inherited from Apache Spark. The number of partitions can be controlled by num_files. This is deprecated. Use DataFrame.spark.repartition instead. - Note - output JSON format is different from pandas’. It always uses orient=’records’ for its output. This behavior might have to change soon. - Note - Set ignoreNullFields keyword argument to True to omit None or NaN values when writing JSON objects. It works only when path is provided. - Note NaN’s and None will be converted to null and datetime objects will be converted to UNIX timestamps. - Parameters
- path: string, optional
- File path. If not specified, the result is returned as a string. 
- lines: bool, default True
- If ‘orient’ is ‘records’ write out line delimited JSON format. Will throw ValueError if incorrect ‘orient’ since others are not list like. It should be always True for now. 
- orient: str, default ‘records’
- It should be always ‘records’ for now. 
- compression: {‘gzip’, ‘bz2’, ‘xz’, None}
- A string representing the compression to use in the output file, only used when the first argument is a filename. By default, the compression is inferred from the filename. 
- num_files: the number of partitions to be written in `path` directory when
- this is a path. This is deprecated. Use DataFrame.spark.repartition instead. 
- mode: str
- Python write mode, default ‘w’. - Note - mode can accept the strings for Spark writing mode. Such as ‘append’, ‘overwrite’, ‘ignore’, ‘error’, ‘errorifexists’. - ‘append’ (equivalent to ‘a’): Append the new data to existing data. 
- ‘overwrite’ (equivalent to ‘w’): Overwrite existing data. 
- ‘ignore’: Silently ignore this operation if data already exists. 
- ‘error’ or ‘errorifexists’: Throw an exception if data already exists. 
 
- partition_cols: str or list of str, optional, default None
- Names of partitioning columns 
- index_col: str or list of str, optional, default: None
- Column names to be used in Spark to represent pandas-on-Spark’s index. The index name in pandas-on-Spark is ignored. By default, the index is always lost. 
- options: keyword arguments for additional options specific to PySpark.
- It is specific to PySpark’s JSON options to pass. Check the options in PySpark’s API documentation for spark.write.json(…). It has a higher priority and overwrites all other options. This parameter only works when path is specified. 
 
- Returns
- str or None
 
 - Examples - >>> df = ps.DataFrame([['a', 'b'], ['c', 'd']], ... columns=['col 1', 'col 2']) >>> df.to_json() '[{"col 1":"a","col 2":"b"},{"col 1":"c","col 2":"d"}]' - >>> df['col 1'].to_json() '[{"col 1":"a"},{"col 1":"c"}]' - >>> df.to_json(path=r'%s/to_json/foo.json' % path, num_files=1) >>> ps.read_json( ... path=r'%s/to_json/foo.json' % path ... ).sort_values(by="col 1") col 1 col 2 0 a b 1 c d - >>> df['col 1'].to_json(path=r'%s/to_json/foo.json' % path, num_files=1, index_col="index") >>> ps.read_json( ... path=r'%s/to_json/foo.json' % path, index_col="index" ... ).sort_values(by="col 1") col 1 index 0 a 1 c