使用JavaApi寫Spark若是PairRDD的key值爲自定義的類型,須要重寫hashcode以及equals方法,否則就會發現相同的Key值並無進行聚合操做。apache
例如:使用User類型做爲Keyeclipse
public class User { private String name; private String age; public String getName() { return name; } public void setName(String name) { this.name = name; } public String getAge() { return age; } public void setAge(String age) { this.age = age; } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + ((age == null) ? 0 : age.hashCode()); result = prime * result + ((name == null) ? 0 : name.hashCode()); return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; User other = (User) obj; if (age == null) { if (other.age != null) return false; } else if (!age.equals(other.age)) return false; if (name == null) { if (other.name != null) return false; } else if (!name.equals(other.name)) return false; return true; } }
通常eclipse能夠自動的生成類型的hashcode以及equals方法,不須要本身特別處理,ide
若是遇到特殊的狀況的話,咱們能夠使用commons-lang3包裏面的HashCodeBuilder以及EqualsBuilder兩個工具類來生成相應的方法工具
package run.aaa.spark; import org.apache.commons.lang3.builder.EqualsBuilder; import org.apache.commons.lang3.builder.HashCodeBuilder; public class User { private String name; private String age; public String getName() { return name; } public void setName(String name) { this.name = name; } public String getAge() { return age; } public void setAge(String age) { this.age = age; } @Override public int hashCode() { return HashCodeBuilder.reflectionHashCode(this); } @Override public boolean equals(Object obj) { return EqualsBuilder.reflectionEquals(this, obj); } }