pyspark.sql.functions.regexp_extract_all#
- pyspark.sql.functions.regexp_extract_all(str, regexp, idx=None)[source]#
Extract all strings in the str that match the Java regex regexp and corresponding to the regex group index.
New in version 3.5.0.
- Parameters
- Returns
Column
all strings in the str that match a Java regex and corresponding to the regex group index.
Examples
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([("100-200, 300-400", r"(\d+)-(\d+)")], ["str", "regexp"]) >>> df.select('*', sf.regexp_extract_all('str', sf.lit(r'(\d+)-(\d+)'))).show() +----------------+-----------+---------------------------------------+ | str| regexp|regexp_extract_all(str, (\d+)-(\d+), 1)| +----------------+-----------+---------------------------------------+ |100-200, 300-400|(\d+)-(\d+)| [100, 300]| +----------------+-----------+---------------------------------------+
>>> df.select('*', sf.regexp_extract_all('str', sf.lit(r'(\d+)-(\d+)'), sf.lit(1))).show() +----------------+-----------+---------------------------------------+ | str| regexp|regexp_extract_all(str, (\d+)-(\d+), 1)| +----------------+-----------+---------------------------------------+ |100-200, 300-400|(\d+)-(\d+)| [100, 300]| +----------------+-----------+---------------------------------------+
>>> df.select('*', sf.regexp_extract_all('str', sf.lit(r'(\d+)-(\d+)'), 2)).show() +----------------+-----------+---------------------------------------+ | str| regexp|regexp_extract_all(str, (\d+)-(\d+), 2)| +----------------+-----------+---------------------------------------+ |100-200, 300-400|(\d+)-(\d+)| [200, 400]| +----------------+-----------+---------------------------------------+
>>> df.select('*', sf.regexp_extract_all('str', sf.col("regexp"))).show() +----------------+-----------+----------------------------------+ | str| regexp|regexp_extract_all(str, regexp, 1)| +----------------+-----------+----------------------------------+ |100-200, 300-400|(\d+)-(\d+)| [100, 300]| +----------------+-----------+----------------------------------+
>>> df.select('*', sf.regexp_extract_all(sf.col('str'), "regexp")).show() +----------------+-----------+----------------------------------+ | str| regexp|regexp_extract_all(str, regexp, 1)| +----------------+-----------+----------------------------------+ |100-200, 300-400|(\d+)-(\d+)| [100, 300]| +----------------+-----------+----------------------------------+