>>> d = [{'name': 'Alice', 'age': 1}] >>> f = spark.createDataFrame(d) >>> f.collect() [Row(age=1, name=u'Alice')] >>> from pyspark.sql import functions as F
如今要新增長一列newNamepython
>>> ff = ff.withColumn('newName','===') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/spark-current/python/pyspark/sql/dataframe.py", line 1619, in withColumn assert isinstance(col, Column), "col should be Column" AssertionError: col should be Column
會報錯,說不是列sql
>>> ff = f.withColumn('newName',F.col('name') + '===') >>> ff.collect() [Row(age=1, name=u'Alice', newName=None)]
沒有報錯,可是新列值爲None,可是針對於整數類型能夠函數
>>> ff = ff.withColumn('newAge',F.col('age') + 1) >>> ff.collect() [Row(age=1, name=u'Alice', newName=None, newAge=2)]
>>> ff = ff.withColumn('newNameV2',F.lit('===')) >>> ff.collect() [Row(name=u'Alice', age=1, newNameV2=u'===')]
sql.functions.lit()函數,直接返回的是字面值spa
轉化爲rdd,用map函數增長code