pyspark.sql.functions.log#

pyspark.sql.functions.log(arg1, arg2=None)[source]#

Returns the first argument-based logarithm of the second argument.

If there is only one argument, then this takes the natural logarithm of the argument.

New in version 1.5.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
arg1Column, str or float

base number or actual number (in this case base is e)

arg2Column, str or float, optional

number to calculate logariphm for.

Returns
Column

logariphm of given value.

Examples

Example 1: Specify both base number and the input value

>>> from pyspark.sql import functions as sf
>>> df = spark.sql("SELECT * FROM VALUES (1), (2), (4) AS t(value)")
>>> df.select("*", sf.log(2.0, df.value)).show()
+-----+---------------+
|value|LOG(2.0, value)|
+-----+---------------+
|    1|            0.0|
|    2|            1.0|
|    4|            2.0|
+-----+---------------+

Example 2: Return NULL for invalid input values

>>> from pyspark.sql import functions as sf
>>> df = spark.sql("SELECT * FROM VALUES (1), (2), (0), (-1), (NULL) AS t(value)")
>>> df.select("*", sf.log(3.0, df.value)).show()
+-----+------------------+
|value|   LOG(3.0, value)|
+-----+------------------+
|    1|               0.0|
|    2|0.6309297535714...|
|    0|              NULL|
|   -1|              NULL|
| NULL|              NULL|
+-----+------------------+

Example 3: Specify only the input value (Natural logarithm)

>>> from pyspark.sql import functions as sf
>>> df = spark.sql("SELECT * FROM VALUES (1), (2), (4) AS t(value)")
>>> df.select("*", sf.log(df.value)).show()
+-----+------------------+
|value|         ln(value)|
+-----+------------------+
|    1|               0.0|
|    2|0.6931471805599...|
|    4|1.3862943611198...|
+-----+------------------+