pyspark.RDD.saveAsSequenceFile#
- RDD.saveAsSequenceFile(path, compressionCodecClass=None)[source]#
- Output a Python RDD of key-value pairs (of form - RDD[(K, V)]) to any Hadoop file system, using the “org.apache.hadoop.io.Writable” types that we convert from the RDD’s key and value types. The mechanism is as follows:- Pickle is used to convert pickled Python RDD into RDD of Java objects. 
- Keys and values of this Java RDD are converted to Writables and written out. 
 - New in version 1.1.0. - Parameters
- pathstr
- path to sequence file 
- compressionCodecClassstr, optional
- fully qualified classname of the compression codec class i.e. “org.apache.hadoop.io.compress.GzipCodec” (None by default) 
 
 - See also - Examples - >>> import os >>> import tempfile - Set the related classes - >>> with tempfile.TemporaryDirectory(prefix="saveAsSequenceFile") as d: ... path = os.path.join(d, "sequence_file") ... ... # Write a temporary sequence file ... rdd = sc.parallelize([(1, ""), (1, "a"), (3, "x")]) ... rdd.saveAsSequenceFile(path) ... ... # Load this sequence file as an RDD ... loaded = sc.sequenceFile(path) ... sorted(loaded.collect()) [(1, ''), (1, 'a'), (3, 'x')]