讀一張表,對其進行二值化特徵轉換。能夠二值化要求輸入類型必須double類型,類型怎麼轉換呢?java
直接利用spark column 就能夠進行轉換:sql
DataFrame dataset = hive.sql("select age,sex,race from hive_race_sex_bucktizer ");ide
/**spa
* 類型轉換blog
*/get
dataset = dataset.select(dataset.col("age").cast(DoubleType).as("age"),dataset.col("sex"),dataset.col("race"));it
是否是很簡單。想起以前的類型轉換作法,遍歷並建立另一個知足類型要求的RDD,而後根據RDD建立Datafame,好複雜!!!!spark
JavaRDD<Row> parseDataset = dataset.toJavaRDD().map(new Function<Row,Row>() { @Override public Row call(Row row) throws Exception { System.out.println(row); long age = row.getLong(row.fieldIndex("age")); String sex = row.getAs("sex"); String race =row.getAs("race"); double raceV = -1; if("white".equalsIgnoreCase(race)){ raceV = 1; } else if("black".equalsIgnoreCase(race)) { raceV = 2; } else if("yellow".equalsIgnoreCase(race)) { raceV = 3; } else if("Asian-Pac-Islander".equalsIgnoreCase(race)) { raceV = 4; }else if("Amer-Indian-Eskimo".equalsIgnoreCase(race)) { raceV = 3; }else { raceV = 0; } return RowFactory.create(age,("male".equalsIgnoreCase(sex)?1:0),raceV); } }); StructType schema = new StructType(new StructField[]{ createStructField("_age", LongType, false), createStructField("_sex", IntegerType, false), createStructField("_race", DoubleType, false) }); DataFrame df = hive.createDataFrame(parseDataset, schema);
不斷探索,不斷嘗試!io