TempView和GlobalTempView在spark的Dataframe中常用,二者的區別和應用場景有什麼不一樣。python
咱們如下面的例子比較下二者的不一樣。sql
from pyspark.sql import SparkSession import numpy as np import pandas as pd spark = SparkSession.builder.getOrCreate()
d = np.random.randint(1,100, 5*5).reshape(5,-1) data = pd.DataFrame(d, columns=list('abcde')) df = spark.createDataFrame(data) df.show()
+---+---+---+---+---+ | a| b| c| d| e| +---+---+---+---+---+ | 17| 30| 61| 61| 33| | 32| 23| 24| 7| 7| | 47| 6| 4| 95| 34| | 50| 69| 83| 21| 46| | 52| 12| 83| 49| 85| +---+---+---+---+---+
temp = df.createTempView('temp') temp_sql = "select * from temp where a=50" res = spark.sql(temp_sql) res.show()
+---+---+---+---+---+ | a| b| c| d| e| +---+---+---+---+---+ | 50| 69| 83| 21| 46| +---+---+---+---+---+
glob = df.createGlobalTempView('glob') glob_sql = "select * from global_temp.glob where a = 17" res2 = spark.sql(glob_sql) res2.show()
+---+---+---+---+---+ | a| b| c| d| e| +---+---+---+---+---+ | 17| 30| 61| 61| 33| +---+---+---+---+---+
# 建立新的sparkSession spark2 = spark.newSession() spark2 == spark
False
# 新的sparkSession能夠獲取globaltempview中的數據 new_sql = "select * from global_temp.glob where a = 47" temp = spark2.sql(new_sql) temp.show()
+---+---+---+---+---+ | a| b| c| d| e| +---+---+---+---+---+ | 47| 6| 4| 95| 34| +---+---+---+---+---+
# 新的sparkSession沒法獲取tempview中的數據 # 會提示找不到temp表 new_sql2 = "select * from temp where a = 47" temp = spark2.sql(new_sql2) temp.show()
# 使用global_temp前綴也不行 new_sql2 = "select * from global_temp.temp where a = 47" temp = spark2.sql(new_sql2) temp.show()
--------------------------------------------------------------------------- Py4JJavaError Traceback (most recent call last) # 此處多行刪除異常信息 AnalysisException: "Table or view not found: `global_temp`.`temp`; line 1 pos 14;\n'Project [*]\n+- 'Filter ('a = 47)\n +- 'UnresolvedRelation `global_temp`.`temp`\n"
spark.catalog.dropTempView('temp') spark.catalog.dropGlobalTempView('glob') # 報錯,找不到table temp temp_sql2 = "select * from temp where a = 47" temp = spark.sql(temp_sql2) # 報錯,找不到global_temp.glob,spark和spark2中均報錯 glob_sql2 = "select * from global_temp.glob where a = 47" temp = spark.sql(glob_sql2) temp = spark2.sql(glob_sql2)
spark中有四個tempview方法session
replace方法:不存在則直接建立,存在則替換dom
tempview刪除後沒法使用ui
兩個刪除方法
spark.catalog.dropTempView('temp')
spark.catalog.dropGlobalTempView('glob')spa
TempView和GlobalTempView的異同code