數據處理利器python與scala面向對象對比分析2-大數據ML樣本集案例實戰

版權聲明:本套技術專欄是做者(秦凱新)平時工做的總結和昇華,經過從真實商業環境抽取案例進行總結和分享,並給出商業應用的調優建議和集羣環境容量規劃等內容,請持續關注本套博客。QQ郵箱地址:1120746959@qq.com,若有任何學術交流,可隨時聯繫。java

1 python與scala面向對象對比分析

1.1 scala面向對象

  • 定義類,包含field以及方法python

    class HelloWorld {
        private var name = "leo"
        def sayHello() { print("Hello, " + name) }  
        def getName = name
      }
    
      建立類的對象,並調用其方法
      val helloWorld = new HelloWorld
      
      helloWorld.sayHello() 
      print(helloWorld.getName())  也能夠不加括號,若是定義方法時不帶括號,則調用方法時也不能帶括號
      print(helloWorld.getName) 
    複製代碼
  • scala自動生成getter與setter編程

    1 定義不帶private的var field,此時scala生成的面向JVM的類時,
      會定義爲private的name字段,並提供public的getter和setter方法
      
      2 而若是使用private修飾field,則生成的getter和setter也是private的
      
      3 若是定義val field,則只會生成getter方法
      
      4 若是不但願生成setter和getter方法,則將field聲明爲private[this]
      
      class Student {
        var name = "leo"
      }
      
      調用getter和setter方法,分別叫作name和name_ =
      val leo = new Student
      get方法
      print(leo.name)
      
      set方法1
      leo.name = "leo1"
      
      set方法2
      scala> leo.name_=("leo2")
      scala> leo.name
      res3: String = leo2
    複製代碼
  • 自定義getter與setter方法設計模式

    若是隻是但願擁有簡單的getter和setter方法,那麼就按照scala提供的語法規則,根據需求爲field選擇合適的修飾符就好:var、val、private、private[this]
      
      可是若是但願可以本身對getter與setter進行控制,則能夠自定義getter與setter方法
      自定義setter方法的時候必定要注意scala的語法限制,簽名、=、參數間不能有空格
    
     雖然設置私有private,單提供了公共的set和get方法,能夠對外訪問
      class Student {
        private var myName = "leo"
        def name = "your name is " + myName
        def name_=(newValue: String)  {
          print("you cannot edit your name!!!")
        }
      }
      
      scala>   val leo = new Student
      leo: Student = Student@e36bc01
      
      scala> print(leo.name)
      your name is leo
      scala> leo.name = "leo1"
      you cannot edit your name!!!leo.name: String = your name is leo
      
      一旦設置私有private,則對外將不能夠訪問
       class Student {
        private var myName = "leo"
      }
      
      scala> ss.name
      <console>:26: error: value name is not a member of Student
             ss.name
                ^
      
      scala> ss.name_="asd"
      <console>:1: error: identifier expected but string literal found.
      ss.name_="asd"
    複製代碼
  • Java風格的getter和setter方法app

    Scala的getter和setter方法的命名與java是不一樣的,是field和field_=的方式
      若是要讓scala自動生成java風格的getter和setter方法,只要給field添加@BeanProperty註解便可
      此時會生成4個方法,
      name: String、
      name_=(newValue: String): Unit
      
      getName():String、
      setName(newValue: String): Unit
      
      import scala.reflect.BeanProperty
      class Student {
        @BeanProperty var name: String = _
      }
      
      class Student(@BeanProperty var name: String)
      
      val s = new Student
      s.setName("leo")
      s.getName()
    複製代碼
  • 輔助constructoride

    Scala中,能夠給類定義多個輔助constructor,相似於java中的構造函數重載
      輔助constructor之間能夠互相調用,並且必須第一行調用主constructor
      
      class Student {
        private var name = ""
        private var age = 0
        
        def this(name: String) {
          this()
          this.name = name
        }
        def this(name: String, age: Int) {
          this(name)
          this.age = age
        }
      }
    複製代碼
  • 主constructor(除了方法體內的代碼都會執行)函數

    Scala中,主constructor是與類名放在一塊兒的,與java不一樣
      並且類中,沒有定義在任何方法或者是代碼塊之中的代碼,就是主constructor的代碼,這點感受沒有java那麼清晰
      
      class Student(val name: String, val age: Int) {
        println("your name is " + name + ", your age is " + age)
      }
      
      主constructor中還能夠經過使用默認參數,來給參數默認的值
      class Student(val name: String = "leo", val age: Int = 30) {
        println("your name is " + name + ", your age is " + age)
      }
      
      若是主constrcutor傳入的參數什麼修飾都沒有,好比name:
      String,那麼若是類內部的方法使用到了,則會聲明爲private[this]
      name;不然沒有該field,就只能被constructor代碼使用而已
    複製代碼
  • 伴生對象大數據

    1 若是有一個class,還有一個與class同名的object,那麼就稱這個object是class的伴生對象
      ,class是object的伴生類
      2 伴生類和伴生對象必須存放在一個.scala文件之中
      3 伴生類和伴生對象,最大的特色就在於,互相能夠訪問private field
      
      object Person {
        private val eyeNum = 2
        def getEyeNum = eyeNum
      }
      
      class Person(val name: String, val age: Int) {
        def sayHello = println("Hi, " + name + ", I guess you are " + age + " years old!" + ", and usually you must have " + Person.eyeNum + " eyes.")
      }
      
      scala> val s =new Person("leo",12)
      s: Person = Person@4d0abb23
      
      scala> s.sayHello
      Hi, leo, I guess you are 12 years old!, and usually you must have 2 eyes.
    複製代碼
  • apply方法this

    1 object中很是重要的一個特殊方法,就是apply方法,在建立伴生類的對象時,一般不會使用new Class的方式,而是使用Class()的方式,隱式地調用伴生對象得apply方法,這樣會讓對象建立更加簡潔
      
      好比,Array類的伴生對象的apply方法就實現了接收可變數量的參數,並建立一個Array對象的功能
      val a = Array(1, 2, 3, 4, 5)
    
      2 定義本身的伴生類和伴生對象,省略掉new操做
      class Person(val name: String)
      object Person {
        def apply(name: String) = new Person(name)
      }
      scala> Person("xin")
      res7: Person = Person@484a5ddd  
    複製代碼
  • main方法spa

    scala中的main方法定義爲def main(args: Array[String]),並且必須定義在object中
    
      除了本身實現main方法以外,還能夠繼承App Trait,而後將須要在main方法中運行的代碼,直接做爲object的constructor代碼;並且用args能夠接受傳入的參數
      
      object HelloWorld extends App {
        if (args.length > 0) println("hello, " + args(0))
        else println("Hello World!!!")
      }
    複製代碼
  • extends

    子類能夠覆蓋父類的field和method;可是若是父類用final修飾,field和method用final修
      飾,則該類是沒法被繼承的,field和method是沒法被覆蓋的
      
      class Person {
        private var name = "leo"
        def getName = name
      }
      class Student extends Person {
        private var score = "A"
        def getScore = score
      }
    複製代碼
  • override和super

    Scala中,若是子類要覆蓋一個父類中的非抽象方法,則必須使用override關鍵字,在子類覆
      蓋父類方法以後,若是咱們在子類中就是要調用父類的被覆蓋的方法呢?那就可使用super
      關鍵字,顯式地指定要調用父類的方法
      
      class Person {
        private var name = "leo"
        def getName = name
      }
      class Student extends Person {
        private var score = "A"
        def getScore = score
        override def getName = "Hi, I'm " + super.getName
      }
    複製代碼
  • 類判別

    class Person
      class Student extends Person
      
      scala> val p: Person = new Student
      p: Person = Student@683fac7e
      
      scala> p.isInstanceOf[Person]
      res10: Boolean = true
      
      scala> p.getClass == classOf[Person]
      res11: Boolean = false
      
      scala> p.getClass == classOf[Student]
      res12: Boolean = true
    複製代碼
  • 使用模式匹配進行類型判斷

    使用模式匹配,功能性上來講,與isInstanceOf同樣,也是判斷主要是該類以及該類的子類
      的對象便可,不是精準判斷的
      
      class Person
      class Student extends Person
      val p: Person = new Student
      
      scala> p match {
       |           case per: Person => println("it's Person's object")
       |           case _  => println("unknown type")
       |         }
       it's Person's object
    複製代碼
  • 調用父類的constructor

    1 Scala中,每一個類能夠有一個主constructor和任意多個輔助constructor,而每一個輔助constructor的第一行都必須是調用其餘輔助constructor或者是主constructor;所以子類的輔助constructor是必定不可能直接調用父類的constructor的
      
      2 只能在子類的主constructor中調用父類的constructor,如下這種語法,就是經過子類的主構造函數來調用父類的構造函數
      
     3  注意!若是是父類中接收的參數,好比name和age,子類中接收時,就不要用任何val或var來修飾了,不然會認爲是子類要覆蓋父類的field
     
      class Person(val name: String, val age: Int)
      class Student(name: String, age: Int, var score: Double) extends Person(name, age) {
        def this(name: String) {
          this(name, 0, 0)
        }
        def this(age: Int) {
          this("leo", age, 0)
        }
      }
    複製代碼
  • 將trait做爲接口使用

    Scala中的Triat是一種特殊的概念
     首先咱們能夠將Trait做爲接口來使用,此時的Triat就與Java中的接口很是相似
     在triat中能夠定義抽象方法,就與抽象類中的抽象方法同樣,只要不給出方法的具體實現便可
     
     類可使用extends關鍵字繼承trait,注意,這裏不是implement,而是extends,在scala中沒有implement的概念,不管繼承類仍是trait,統一都是extends
     
     類繼承trait後,必須實現其中的抽象方法,實現時不須要使用override關鍵字
     scala不支持對類進行多繼承,可是支持多重繼承trait,使用with關鍵字便可
     
     trait HelloTrait {
       def sayHello(name: String)
     }
     trait MakeFriendsTrait {
       def makeFriends(p: Person)
     }
     class Person(val name: String) extends HelloTrait with MakeFriendsTrait with Cloneable with Serializable {
       def sayHello(name: String) = println("Hello, " + name)
       def makeFriends(p: Person) = println("Hello, my name is " + name + ", your name is " + p.name)
     }
    複製代碼
  • 在Trait中定義具體字段

    Scala中的Triat能夠定義具體field,此時繼承trait的類就自動得到了trait中定義的field
     可是這種獲取field的方式與繼承class是不一樣的:若是是繼承class獲取的field,實際是定
     義在父類中的;而繼承trait獲取的field,就直接被添加到了類中。
     
     trait Person {
       val eyeNum: Int = 2
     }
     
     class Student(val name: String) extends Person {
       def sayHello = println("Hi, I'm " + name + ", I have " + eyeNum + " eyes.")
     }
    複製代碼
  • 在Trait中定義抽象字段

    // Scala中的Triat能夠定義抽象field,而trait中的具體方法則能夠基於抽象field來編寫
      // 可是繼承trait的類,則必須覆蓋抽象field,提供具體的值
      
      trait SayHello {
        val msg: String
        def sayHello(name: String) = println(msg + ", " + name)
      }
      
      class Person(val name: String) extends SayHello {
        val msg: String = "hello"
        def makeFriends(p: Person) {
          sayHello(p.name)
          println("I'm " + name + ", I want to make friends with you!")
        }
      }
    複製代碼
  • 爲實例混入trait(若是不使用With,trait方法不會執行)

    trait Logged {
        def log(msg: String) {}
      }
      trait MyLogger extends Logged {
        override def log(msg: String) { println("log: " + msg) }
      }  
      class Person(val name: String) extends Logged {
          def sayHello { println("Hi, I'm " + name); 
          log("sayHello is invoked!") }
      }
      
      val p1 = new Person("leo")
      p1.sayHello
      val p2 = new Person("jack") with MyLogger
      p2.sayHello
      
      scala> val p1 = new Person("leo")
      p1: Person = Person@75b3ef1a
      
      scala> p1.sayHello
      Hi, I'm leo
      
      scala> val p2 = new Person("jack") with MyLogger
      p2: Person with MyLogger = $anon$1@703eead0
      
      scala> p2.sayHello
      Hi, I'm jack
      log: sayHello is invoked!
    複製代碼
  • trait調用鏈

    Scala中支持讓類繼承多個trait後,依次調用多個trait中的同一個方法,只要讓多個trait的同一個方法中,在最後都執行super.方法便可。
     
     類中調用多個trait中都有的這個方法時,首先會從最右邊的trait的方法開始執行,而後依次往左執行,造成一個調用鏈條
     
     這種特性很是強大,其實就至關於設計模式中的責任鏈模式的一種具體實現依賴
     
     trait Handler {
       def handle(data: String) {}
     }
     
     trait DataValidHandler2 extends Handler {
       override def handle(data: String) {
         println("check data2: " + data)
         super.handle(data)
       } 
     }
     
     
     trait DataValidHandler1 extends Handler {
       override def handle(data: String) {
         println("check data1: " + data)
         super.handle(data)
       } 
     }
     trait SignatureValidHandler extends Handler {
       override def handle(data: String) {
         println("check signature: " + data)
         super.handle(data)
       }
     }
     class Person(val name: String) extends SignatureValidHandler with DataValidHandler1 with DataValidHandler2 {
       def sayHello = { println("Hello, " + name); handle(name) }
     }
     
     scala> val p = new Person("person")
     p: Person = Person@7a85454b
     
     scala> p.sayHello
     Hello, person
     check data2: person
     check data1: person
     check signature: person
    複製代碼
  • 在trait中覆蓋抽象方法

    在trait中,是能夠覆蓋父trait的抽象方法的
     可是覆蓋時,若是使用了super.方法的代碼,則沒法經過編譯。由於super.方法就會去掉用父trait的抽象方法,此時子trait的該方法仍是會被認爲是抽象的
     此時若是要經過編譯,就得給子trait的方法加上abstract override修飾
     
     trait Logger {
       def log(msg: String)
     }
     
     trait MyLogger extends Logger {
       abstract override def log(msg: String) { super.log(msg) }
     }
    複製代碼
  • trait的構造機制(從左到右執行)

    class Person { println("Person's constructor!") }
     trait Logger { println("Logger's constructor!") }
     trait MyLogger extends Logger { println("MyLogger's constructor!") }
     trait TimeLogger extends Logger { println("TimeLogger's constructor!") }
     
     class Student extends Person with MyLogger with TimeLogger {
       println("Student's constructor!")
     }
     
     scala> val s = new Student
     Person's constructor!
     Logger's constructor!
     MyLogger's constructor!
     TimeLogger's constructor!
     Student's constructor!
     s: Student = Student@34a99d8
    複製代碼

1.2 python面向對象

  • 定義類並建立實例

    按照 Python 的編程習慣,類名以大寫字母開頭,緊接着是(object),表示該類是從哪一個類繼承下來的。
     class Person(object):
         pass
    
     省掉new建立實例
     xiaoming = Person()
     xiaohong = Person()
    複製代碼
  • 因爲Python是動態語言,對每個實例,均可以直接給他們的屬性賦值,臨時追加屬性

    xiaoming = Person()
     xiaoming.name = 'Xiao Ming'
     xiaoming.gender = 'Male'
     xiaoming.birth = '1990-1-1'
     
     print(xiaoming.name)
     Xiao Ming
    
     xiaohong.grade = xiaohong.grade + 1
    複製代碼
  • 初始化實例屬性

    init() 方法的第一個參數必須是self(也能夠用別的名字,但建議使用習慣用法),
     後續參數則能夠自由指定,和定義函數沒有任何區別,相應地,建立實例時,就必需要提供除
     self 之外的參數
     
      class Person(object):
          def __init__(self, name, gender, birth):
              self.name = name
              self.gender = gender
              self.birth = birth
              
      xiaoming = Person('Xiao Ming', 'Male', '1991-1-1')
      xiaohong = Person('Xiao Hong', 'Female', '1992-2-2')
      
      print (xiaoming.name)
      Xiao Ming
    複製代碼
  • 實例屬性訪問限制

    Python對屬性權限的控制是經過屬性名來實現的,若是一個屬性由雙下劃線開頭(__),
      該屬性就沒法被外部訪問
      
      class Person(object):
          def __init__(self, name):
              self.name = name
              self._title = 'Mr'
              self.__job = 'Student'
              
      p = Person('Bob')
      print p.name
      # => Bob
      print(p._title)
      # => Mr
      print p.__job
      
      ---------------------------------------------------------------------------
      AttributeError                            Traceback (most recent call last)
      <ipython-input-28-13a5b5479af8> in <module>()
      ----> 1 print(p.__job)
      
      AttributeError: 'Person' object has no attribute '__job'
    複製代碼
  • 建立類屬性

    類自己也是一個對象,若是在類上綁定一個屬性,則全部實例均可以訪問類的屬性
      class Person(object):
          address = 'Earth'
          def __init__(self, name):
              self.name = name
      
      print (Person.address)
      Earth
      
      p1 = Person('Bob')
      p2 = Person('Alice')
      print p1.address
      # => Earth
      print p2.address
      # => Earth
      
      因爲Python是動態語言,類屬性也是能夠動態添加和修改的
      Person.address = 'China'
      print p1.address
      # => 'China'
      print p2.address
      # => 'China'
      
      類屬性和實例實型名字衝突怎麼辦?當實例屬性和類屬性重名時,實例屬性優先級高,它將屏蔽掉對類屬性的訪問。
      class Person(object):
          address = 'Earth'
          def __init__(self, name):
              self.name = name
    
      p1 = Person('Bob')
      p2 = Person('Alice')
      
      print 'Person.address = ' + Person.address
      
      p1.address = 'China'
      print ('p1.address = ' + p1.address)
      p1.address = China
    
      print ('p2.address = ' + p2.address)
      p2.address = Earth
    複製代碼
  • 定義實例方法

    一個實例的私有屬性就是以__開頭的屬性,沒法被外部訪問,實例的方法就是在類中定義的
      函數,它的第一個參數永遠是 self,雖然name是私有的不能被訪問,
      可是get_name就能夠被訪問。
      
      class Person(object):
      
          def __init__(self, name):
              self.__name = name
      
          def get_name(self):
              return self.__name
    
      p1 = Person('Bob')
      print(p1.get_name())
      Bob
    複製代碼
  • 把方法追加到類上

    import types
    
      def fn_get_grade(self):
          if self.score >= 80:
              return 'A'
          if self.score >= 60:
              return 'B'
          return 'C'
      
      class Person(object):
          def __init__(self, name, score):
              self.name = name
              self.score = score
      
      p1 = Person('Bob', 90)
      p1.get_grade = types.MethodType(fn_get_grade, p1)
      print (p1.get_grade())
    
      A
    複製代碼
  • 定義類方法

    class Person(object):
         count = 0
         @classmethod
         def how_many(cls):
             return cls.count
         def __init__(self, name):
             self.name = name
             Person.count = Person.count + 1
             
    print(Person.how_many())
    0
    
    p1 = Person('Bob')
    print(Person.how_many())
    3
    複製代碼
  • 繼承一個類

    class Person(object):
      def __init__(self, name, gender):
          self.name = name
          self.gender = gender
    
    class Student(Person):
          def __init__(self, name, gender, score):
              super(Student, self).__init__(name, gender)
              self.score = score
    
    class Teacher(Person):
      def __init__(self, name, gender, course):
          super(Teacher, self).__init__(name, gender)
          self.course = course  
      
     p = Person('Tim', 'Male')
     s = Student('Bob', 'Male', 88)
     t = Teacher('Alice', 'Female', 'English')
     
     isinstance(s, Student)
     True
     
     isinstance(p , Person)
     True
     
     isinstance(t , Student)
     False
    複製代碼
  • 特殊方法

    1  str和repr
      class Person(object):
         def __init__(self, name, gender):
             self.name = name
             self.gender = gender
         def __str__(self):
             return '(Person: %s, %s)' % (self.name, self.gender)
    
     p = Person('Bob', 'male')
     print(p)
     (Person: Bob, male)
     
     2 Python的 sorted() 按照默認的比較函數 cmp 排序
     
     class Student(object):
         def __init__(self, name, score):
             self.name = name
             self.score = score
         def __str__(self):
             return '(%s: %s)' % (self.name, self.score)
         __repr__ = __str__
     
         def __cmp__(self, s):
             if self.name < s.name:
                 return -1
             elif self.name > s.name:
                 return 1
             else:
                 return 0
     
     L = [Student('Tim', 99), Student('Bob', 88), Student('Alice', 77)]
     print(L)
     [(Tim: 99), (Bob: 88), (Alice: 77)]
     
     3 len
     class Students(object):
         def __init__(self, *args):
             self.names = args
         def __len__(self):
             return len(self.names)
             
     ss = Students('Bob', 'Alice', 'Tim')
     print (len(ss))
     3
    複製代碼
  • 在Python中,函數實際上是一個對象

    class Person(object):
          def __init__(self, name, gender):
              self.name = name
              self.gender = gender
      
          def __call__(self, friend):
              print ('My name is %s...' % self.name)
              print ('My friend is %s...' % friend)
      
      p = Person('Bob', 'male')
      p('Tim')
      
      My name is Bob...
      My friend is Tim...
    複製代碼

2 總結

經過Python技術棧與Spark大數據數據平臺整合,必然須要本文進行詳細對比,粗陋成文,在於做者複習,勿怪。

秦凱新 於深圳 201812132319

相關文章
相關標籤/搜索