pyspark.sql.functions.stack#
- pyspark.sql.functions.stack(*cols)[source]#
Separates col1, …, colk into n rows. Uses column names col0, col1, etc. by default unless specified otherwise.
New in version 3.5.0.
- Parameters
- cols
Column
or column name the first element should be a literal int for the number of rows to be separated, and the remaining are input elements to be separated.
- cols
Examples
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(1, 2, 3)], ['a', 'b', 'c']) >>> df.select('*', sf.stack(sf.lit(2), df.a, df.b, 'c')).show() +---+---+---+----+----+ | a| b| c|col0|col1| +---+---+---+----+----+ | 1| 2| 3| 1| 2| | 1| 2| 3| 3|NULL| +---+---+---+----+----+
>>> df.select('*', sf.stack(sf.lit(2), df.a, df.b, 'c').alias('x', 'y')).show() +---+---+---+---+----+ | a| b| c| x| y| +---+---+---+---+----+ | 1| 2| 3| 1| 2| | 1| 2| 3| 3|NULL| +---+---+---+---+----+
>>> df.select('*', sf.stack(sf.lit(3), df.a, df.b, 'c')).show() +---+---+---+----+ | a| b| c|col0| +---+---+---+----+ | 1| 2| 3| 1| | 1| 2| 3| 2| | 1| 2| 3| 3| +---+---+---+----+
>>> df.select('*', sf.stack(sf.lit(4), df.a, df.b, 'c')).show() +---+---+---+----+ | a| b| c|col0| +---+---+---+----+ | 1| 2| 3| 1| | 1| 2| 3| 2| | 1| 2| 3| 3| | 1| 2| 3|NULL| +---+---+---+----+