Description
reproducing the bug from the example in the documentation:java
import pyspark
from pyspark.ml.linalg import Vectors from pyspark.ml.stat import Correlation spark = pyspark.sql.SparkSession.builder.getOrCreate() dataset = [[Vectors.dense([1, 0, 0, -2])], [Vectors.dense([4, 5, 0, 3])], [Vectors.dense([6, 7, 0, 8])], [Vectors.dense([9, 0, 0, 1])]] dataset = spark.createDataFrame(dataset, ['features']) df = Correlation.corr(dataset, 'features', 'pearson') df.collect()
This produces the following stack trace:python
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-92-e7889fa5d198> in <module>()
11 dataset = spark.createDataFrame(dataset, ['features'])
12 df = Correlation.corr(dataset, 'features', 'pearson') ---> 13 df.collect() /opt/spark/python/pyspark/sql/dataframe.py in collect(self) 530 [Row(age=2, name=u'Alice'), Row(age=5, name=u'Bob')] 531 """ --> 532 with SCCallSiteSync(self._sc) as css: 533 sock_info = self._jdf.collectToPython() 534 return list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer()))) /opt/spark/python/pyspark/traceback_utils.py in __enter__(self) 70 def __enter__(self): 71 if SCCallSiteSync._spark_stack_depth == 0: ---> 72 self._context._jsc.setCallSite(self._call_site) 73 SCCallSiteSync._spark_stack_depth += 1 74 AttributeError: 'NoneType' object has no attribute 'setCallSite'
Analysis:sql
Somehow the dataframe properties `df.sql_ctx.sparkSession._jsparkSession`, and `spark._jsparkSession` do not match with the ones available in the spark session.session
The following code fixes the problem (I hope this helps you narrowing down the root cause)app
df.sql_ctx.sparkSession._jsparkSession = spark._jsparkSession df._sc = spark._sc df.collect() >>> [Row(pearson(features)=DenseMatrix(4, 4, [1.0, 0.0556, nan, 0.4005, 0.0556, 1.0, nan, 0.9136, nan, nan, 1.0, nan, 0.4005, 0.9136, nan, 1.0], False))]