上次寫了hive metastore的partition的生命週期,可是簡略歸納了下alter_partition的操做,這裏補一下alter_partition,由於隨着項目的深刻,發現它涉及的地方較多,好比insert into 時若是路徑存在狀況下會調用alter_partition,調用insert overwrite語句時,也會調用該方法,java
入口依舊是Hive.java這個類:oop
1 public void alterPartition(String dbName, String tblName, Partition newPart) 2 throws InvalidOperationException, HiveException { 3 try { 4 // Remove the DDL time so that it gets refreshed 5 if (newPart.getParameters() != null) { 6 newPart.getParameters().remove(hive_metastoreConstants.DDL_TIME); 7 } 8 newPart.checkValidity(); 9 getMSC().alter_partition(dbName, tblName, newPart.getTPartition()); 10 11 } catch (MetaException e) { 12 throw new HiveException("Unable to alter partition. " + e.getMessage(), e); 13 } catch (TException e) { 14 throw new HiveException("Unable to alter partition. " + e.getMessage(), e); 15 } 16 }
隨後經過HiveMetaStoreClient調用alter_partition請求服務端,傳入的參數中包含新的partition,而後服務端調用了rename_partition方法,詳細再也不說了,上一篇大致的也說明了,這裏直接從alterHandler.alterPartition進行partition的更改開始。ui
1 public Partition alterPartition(final RawStore msdb, Warehouse wh, final String dbname, 2 final String name, final List<String> part_vals, final Partition new_part) 3 throws InvalidOperationException, InvalidObjectException, AlreadyExistsException, 4 MetaException { 5 boolean success = false; 6 7 Path srcPath = null; 8 Path destPath = null; 9 FileSystem srcFs = null; 10 FileSystem destFs = null; 11 Partition oldPart = null; 12 String oldPartLoc = null; 13 String newPartLoc = null; 14 15 // Set DDL time to now if not specified 16 if (new_part.getParameters() == null || 17 new_part.getParameters().get(hive_metastoreConstants.DDL_TIME) == null || 18 Integer.parseInt(new_part.getParameters().get(hive_metastoreConstants.DDL_TIME)) == 0) { 19 new_part.putToParameters(hive_metastoreConstants.DDL_TIME, Long.toString(System 20 .currentTimeMillis() / 1000)); 21 } 22 23 Table tbl = msdb.getTable(dbname, name); 24 //alter partition 25 if (part_vals == null || part_vals.size() == 0) { 26 try { 27 oldPart = msdb.getPartition(dbname, name, new_part.getValues()); 28 if (MetaStoreUtils.requireCalStats(hiveConf, oldPart, new_part, tbl)) { 29 MetaStoreUtils.updatePartitionStatsFast(new_part, wh, false, true); 30 } 31 updatePartColumnStats(msdb, dbname, name, new_part.getValues(), new_part); 32 msdb.alterPartition(dbname, name, new_part.getValues(), new_part); 33 } catch (InvalidObjectException e) { 34 throw new InvalidOperationException("alter is not possible"); 35 } catch (NoSuchObjectException e){ 36 //old partition does not exist 37 throw new InvalidOperationException("alter is not possible"); 38 } 39 return oldPart; 40 }
。。。。。。
從代碼中咱們能夠看到:this
一、經過Table tbl = msdb.getTable(dbname, name); get到該表的整個元數據的封裝信息。spa
二、隨後oldPart = msdb.getPartition(dbname, name, new_part.getValues());,經過dbName、tableName、Values獲取partition的元數據信息,Values即是新的partition分區結構eg:(2017-09-11),隨後調用MetaStoreUtils.requireCalStats(hiveConf, oldPart, new_part, tbl),進行元數據存在校驗,若是不存在,則調用updatePartitionStatsFast進行更新(這裏就再也不詳細說明,由於我不知道里面StatsSetupConst的配置參數是幹嗎的哈哈哈哈哈~尷尬~一步步來嘛)debug
三、隨後調用了updatePartColumnStats方法,進行物理partition地址的更新,咱們一步一步看,代碼以下:code
1 private void updatePartColumnStats(RawStore msdb, String dbName, String tableName, 2 List<String> partVals, Partition newPart) throws MetaException, InvalidObjectException { 3 dbName = HiveStringUtils.normalizeIdentifier(dbName); 4 tableName = HiveStringUtils.normalizeIdentifier(tableName); 5 String newDbName = HiveStringUtils.normalizeIdentifier(newPart.getDbName()); 6 String newTableName = HiveStringUtils.normalizeIdentifier(newPart.getTableName()); 7 8 Table oldTable = msdb.getTable(dbName, tableName); 9 if (oldTable == null) { 10 return; 11 } 12 13 try { 14 String oldPartName = Warehouse.makePartName(oldTable.getPartitionKeys(), partVals); 15 String newPartName = Warehouse.makePartName(oldTable.getPartitionKeys(), newPart.getValues()); 16 if (!dbName.equals(newDbName) || !tableName.equals(newTableName) 17 || !oldPartName.equals(newPartName)) { 18 msdb.deletePartitionColumnStatistics(dbName, tableName, oldPartName, partVals, null); 19 } else { 20 Partition oldPartition = msdb.getPartition(dbName, tableName, partVals); 21 if (oldPartition == null) { 22 return; 23 } 24 if (oldPartition.getSd() != null && newPart.getSd() != null) { 25 List<FieldSchema> oldCols = oldPartition.getSd().getCols(); 26 if (!MetaStoreUtils.areSameColumns(oldCols, newPart.getSd().getCols())) { 27 updatePartColumnStatsForAlterColumns(msdb, oldPartition, oldPartName, partVals, oldCols, newPart); 28 } 29 } 30 } 31 } catch (NoSuchObjectException nsoe) { 32 LOG.debug("Could not find db entry." + nsoe); 33 //ignore 34 } catch (InvalidInputException iie) { 35 throw new InvalidObjectException("Invalid input to update partition column stats." + iie); 36 } 37 }
五、Table oldTable = msdb.getTable(dbName, tableName);這裏獲取oldTable的全部元數據信息,隨後經過makePartName拼接新老partition的partName(eg:/dt=2017-09-11/hour/1)用於新老partition的hdfs的路徑對比,由於alterPartition操做,多是經過alter table、table rename等操做執行的,因此若是老的dbName、tableName、以及partition Name與新的不一樣,那麼就須要將元數據中相似於meta_partition的數據清空。隨後經過客戶端從新建立partition。orm
六、若是是相同的,那麼說明修改是partition的列信息,經過MetaStoreUtils.areSameColumns(oldCols, newPart.getSd().getCols())進行校驗(內部方法再也不把代碼貼出來了)對象
七、調用updatePartColumnStatsForAlterColumns開始進行column的更新,這裏面代碼仍是要貼出來一塊兒玩一下:blog
private void updatePartColumnStatsForAlterColumns(RawStore msdb, Partition oldPartition, String oldPartName, List<String> partVals, List<FieldSchema> oldCols, Partition newPart) throws MetaException, InvalidObjectException { String dbName = oldPartition.getDbName(); String tableName = oldPartition.getTableName(); try { List<String> oldPartNames = Lists.newArrayList(oldPartName); List<String> oldColNames = new ArrayList<String>(oldCols.size()); for (FieldSchema oldCol : oldCols) { oldColNames.add(oldCol.getName()); } List<FieldSchema> newCols = newPart.getSd().getCols(); List<ColumnStatistics> partsColStats = msdb.getPartitionColumnStatistics(dbName, tableName, oldPartNames, oldColNames); assert (partsColStats.size() <= 1); for (ColumnStatistics partColStats : partsColStats) { //actually only at most one loop List<ColumnStatisticsObj> statsObjs = partColStats.getStatsObj(); for (ColumnStatisticsObj statsObj : statsObjs) { boolean found =false; for (FieldSchema newCol : newCols) { if (statsObj.getColName().equals(newCol.getName()) && statsObj.getColType().equals(newCol.getType())) { found = true; break; } } if (!found) { msdb.deletePartitionColumnStatistics(dbName, tableName, oldPartName, partVals, statsObj.getColName()); } } } } catch (NoSuchObjectException nsoe) { LOG.debug("Could not find db entry." + nsoe); //ignore } catch (InvalidInputException iie) { throw new InvalidObjectException ("Invalid input to update partition column stats in alter table change columns" + iie); } }
這裏能夠看到,它查詢元數據並封裝了一個ColumnStatistics對象,這個對象主要封裝了tableName、PartName、colName等信息,隨後將其取出來使新老ColName進行對比,注意,這裏是對colName以及type進行對比,若是不一樣,則刪除老的colName信息。
好的,如今至關於將全部old的不一致的數據刪除,下來咱們回到以前的alterPartition中來,隨後調用alterPartition(dbname, name, new_part.getValues(), new_part)將新的partition數據註冊到元數據中。以上,只是當調用rename_partition時,par_vals爲null的狀況下,對oldPart所進行的操做,那麼不爲null時呢?是否是很絕望?咱們慢慢折磨哈哈。。。
八、在par_vals不爲null的狀況下,會經過dbName、tableName、以及part_vals進行oldPart的查找並進行校驗。
九、對錶的類型進行判斷,若是該表爲內部表,則將原有的oldPart的table所在storage路徑,也就是hdfs路徑賦給newPart,這裏注意的是否是partition的location路徑,是storage的location路徑。隨之調用deletePartitionColumnStatistics直接刪除原有partition meta信息。
十、若是該表爲外部表,其實就是進行check,隨後刪除元數據meta(實際上是中間有沒懂得地方哈哈哈。。並且太晚了,後續補上....)代碼以下:
1 try { 2 destPath = new Path(wh.getTablePath(msdb.getDatabase(dbname), name), 3 Warehouse.makePartName(tbl.getPartitionKeys(), new_part.getValues())); 4 destPath = constructRenamedPath(destPath, new Path(new_part.getSd().getLocation())); 5 } catch (NoSuchObjectException e) { 6 LOG.debug(e); 7 throw new InvalidOperationException( 8 "Unable to change partition or table. Database " + dbname + " does not exist" 9 + " Check metastore logs for detailed stack." + e.getMessage()); 10 } 11 if (destPath != null) { 12 newPartLoc = destPath.toString(); 13 oldPartLoc = oldPart.getSd().getLocation(); 14 15 srcPath = new Path(oldPartLoc); 16 17 LOG.info("srcPath:" + oldPartLoc); 18 LOG.info("descPath:" + newPartLoc); 19 srcFs = wh.getFs(srcPath); 20 destFs = wh.getFs(destPath); 21 // check that src and dest are on the same file system 22 if (!FileUtils.equalsFileSystem(srcFs, destFs)) { 23 throw new InvalidOperationException("table new location " + destPath 24 + " is on a different file system than the old location " 25 + srcPath + ". This operation is not supported"); 26 } 27 try { 28 srcFs.exists(srcPath); // check that src exists and also checks 29 if (newPartLoc.compareTo(oldPartLoc) != 0 && destFs.exists(destPath)) { 30 throw new InvalidOperationException("New location for this table " 31 + tbl.getDbName() + "." + tbl.getTableName() 32 + " already exists : " + destPath); 33 } 34 } catch (IOException e) { 35 throw new InvalidOperationException("Unable to access new location " 36 + destPath + " for partition " + tbl.getDbName() + "." 37 + tbl.getTableName() + " " + new_part.getValues()); 38 } 39 new_part.getSd().setLocation(newPartLoc); 40 if (MetaStoreUtils.requireCalStats(hiveConf, oldPart, new_part, tbl)) { 41 MetaStoreUtils.updatePartitionStatsFast(new_part, wh, false, true); 42 } 43 String oldPartName = Warehouse.makePartName(tbl.getPartitionKeys(), oldPart.getValues()); 44 try { 45 //existing partition column stats is no longer valid, remove 46 msdb.deletePartitionColumnStatistics(dbname, name, oldPartName, oldPart.getValues(), null);
總的來講,會發現調用alterPartition的時候,並無與物理操做耦合在一塊兒,只是對ColumnStats元數據進行查找更新刪除等動做,可是真正在調用alterPartition時,對於元數據自己,實際上是更新了該partition的sd信息,以及重要的location.
相關的操做仍是蠻多的,這裏知識大體的分析了下,邊看源碼邊寫, 若有錯誤之處,還望各位大神之處,謝謝~ 碎覺~~明天去做死的幹活咯~