PyCharm 遠程鏈接linux中Python 運行pyspark

PySpark in PyCharm on a remote server

一、確保remote端Python、spark安裝正確python

二、remote端安裝、設置git

vi /etc/profile
添加一行:github

export PYTHONPATH=$SPARK_HOME/python/:$SPARK_HOME/python/lib/py4j-0.10.4-src.zipweb

PYTHONPATH=$SPARK_HOME/python/:$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip
source /etc/profileapp

 # 安裝pip 和 py4jssh

下載pip-7.1.2.tar
tar -xvf pip-7.1.2.tar
cd pip-7.1.2
python setup.py install
pip install py4jwordpress

 # 避免ssh時tty檢測oop

cd /etc
chmod 640 sudoers
vi /etc/sudoers
#Default requiretty測試

三、本地Pycharm設置ui

File > Settings > Project Interpreter:

Project Interpreter > Add remote(前提:remote端python安裝成功):

注意,這裏的Python路徑爲python interpreter path,若是python安裝在其它路徑,要把路徑改過來

Run > Edit Configuration (前提:虛擬機中共享本地目錄成功):

此處我配置映射是在Tools中進行的

Tools > Dployment > Configuration

四、測試

import os
import sys
os.environ['SPARK_HOME'] = '/root/spark-1.4.0-bin-hadoop2.6'
sys.path.append("/root/spark-1.4.0-bin-hadoop2.6/python")

try:
    from pyspark import SparkContext
    from pyspark import SparkConf

    print ("Successfully imported Spark Modules")

except ImportError as e:
    print ("Can not import Spark Modules", e)
    sys.exit(1)
Result:

ssh://hadoop@192.168.1.131:22/usr/bin/python -u /home/hadoop/TestFile/pysparkProgram/Mainprogram.py
Successfully imported Spark Modules Process finished with exit code 0

或者:

import sys
sys.path.append("/root/programs/spark-1.4.0-bin-hadoop2.6/python")

try:
    import numpy as np
    import scipy.sparse as sps
    from pyspark.mllib.linalg import Vectors

    dv1 = np.array([1.0, 0.0, 3.0])
    dv2 = [1.0, 0.0, 3.0]
    sv1 = Vectors.sparse(3, [0, 2], [1.0, 3.0])
    sv2 = sps.csc_matrix((np.array([1.0, 3.0]), np.array([0, 2]), np.array([0, 2])), shape=(3, 1))

    print(sv2)

except ImportError as e:
    print("Can not import Spark Modules", e)
    sys.exit(1)
Result

ssh://hadoop@192.168.1.131:22/usr/bin/python -u /home/hadoop/TestFile/pysparkProgram/Mainprogram.py
(0, 0)
1.0 (2, 0) 3.0 Process finished with exit code 0

參考:
https://edumine.wordpress.com/2015/08/14/pyspark-in-pycharm/
http://renien.github.io/blog/accessing-pyspark-pycharm/
http://www.tuicool.com/articles/MJnYJb

 參照:

http://blog.csdn.net/u011196209/article/details/9934721

相關文章
相關標籤/搜索