pyspark.sql.functions.make_valid_utf8#

pyspark.sql.functions.make_valid_utf8(str)[source]#

Returns a new string in which all invalid UTF-8 byte sequences, if any, are replaced by the Unicode replacement character (U+FFFD).

New in version 4.0.0.

Parameters
strColumn or column name

A column of strings, each representing a UTF-8 byte sequence.

Returns
Column

the valid UTF-8 version of the given input string.

Examples

>>> import pyspark.sql.functions as sf
>>> spark.range(1).select(sf.make_valid_utf8(sf.lit("SparkSQL"))).show()
+-------------------------+
|make_valid_utf8(SparkSQL)|
+-------------------------+
|                 SparkSQL|
+-------------------------+