在hadoop的FsShell命令中,預計很是多人比較常常使用的就是hadoop fs -ls,-lsr,-cat等等這種與Linux系統中差點兒一致的文件系統相關的命令.但是細緻想一想,這裏仍是有一些些的不一樣的.首先,從規模的自己來看,單機版的文件系統,文件數目少,內容很少,而HDFS則是一個分佈式系統,裏面能容納巨大數量的文件文件夾.所以在這個前提之下,你假設任意運行ls或lsr命令,有的時候會獲得恐怖的數據條數的顯示記錄,有的時候咱們不得不經過Ctrl+C的方式停止命令.因此對於未知文件夾的命令運行,可否夠在ls命令中添加顯示限制的參數呢,這樣可以控制一下文件記錄信息的數量.這就是本文的一個出發點.java
要想加入參數,就要先理解眼下Ls命令工做的原理和過程.如下我從源碼的層面進行簡單的分析.首先這裏有個結構關係:git
從左到右依次爲孩子到父親.因此Command類是最基礎的類,命令行操做的運行入口就在這裏.進入到Command.java方法中,你會看到有如下這種方法:github
/** * Invokes the command handler. The default behavior is to process options, * expand arguments, and then process each argument. * <pre> * run * |-> {@link #processOptions(LinkedList)} * \-> {@link #processRawArguments(LinkedList)} * |-> {@link #expandArguments(LinkedList)} * | \-> {@link #expandArgument(String)}* * \-> {@link #processArguments(LinkedList)} * |-> {@link #processArgument(PathData)}* * | |-> {@link #processPathArgument(PathData)} * | \-> {@link #processPaths(PathData, PathData...)} * | \-> {@link #processPath(PathData)}* * \-> {@link #processNonexistentPath(PathData)} * </pre> * Most commands will chose to implement just * {@link #processOptions(LinkedList)} and {@link #processPath(PathData)} * * @param argv the list of command line arguments * @return the exit code for the command * @throws IllegalArgumentException if called with invalid arguments */ public int run(String...argv) { LinkedList<String> args = new LinkedList<String>(Arrays.asList(argv)); try { if (isDeprecated()) { displayWarning( "DEPRECATED: Please use '"+ getReplacementCommand() + "' instead."); } processOptions(args); processRawArguments(args); } catch (IOException e) { displayError(e); } return (numErrors == 0) ? exitCode : exitCodeForError(); }
首先會進行參數的預處理,在這裏會把參數中的一些參數給剝離出來,因爲這是一個抽象方法,因此終於的實現類在Ls.java中,代碼例如如下:apache
@Override protected void processOptions(LinkedList<String> args) throws IOException { CommandFormat cf = new CommandFormat(0, Integer.MAX_VALUE, "d", "h", "R"); cf.parse(args); dirRecurse = !cf.getOpt("d"); setRecursive(cf.getOpt("R") && dirRecurse); humanReadable = cf.getOpt("h"); if (args.isEmpty()) args.add(Path.CUR_DIR); }把這些參數逐一取出,而後這些參數會從args列表中被移除,最後就會剩下詳細的目標瀏覽文件或文件夾的參數.如下就會進入到這種方法中:
/** * Allows commands that don't use paths to handle the raw arguments. * Default behavior is to expand the arguments via * {@link #expandArguments(LinkedList)} and pass the resulting list to * {@link #processArguments(LinkedList)} * @param args the list of argument strings * @throws IOException */ protected void processRawArguments(LinkedList<String> args) throws IOException { processArguments(expandArguments(args)); }而後在expandArguments中會作一層從文件字符串到PathData詳細對象的轉化
/** * Expands a list of arguments into {@link PathData} objects. The default * behavior is to call {@link #expandArgument(String)} on each element * which by default globs the argument. The loop catches IOExceptions, * increments the error count, and displays the exception. * @param args strings to expand into {@link PathData} objects * @return list of all {@link PathData} objects the arguments * @throws IOException if anything goes wrong... */ protected LinkedList<PathData> expandArguments(LinkedList<String> args) throws IOException { LinkedList<PathData> expandedArgs = new LinkedList<PathData>(); for (String arg : args) { try { expandedArgs.addAll(expandArgument(arg)); } catch (IOException e) { // other exceptions are probably nasty displayError(e); } } return expandedArgs; }
/** * Expand the given argument into a list of {@link PathData} objects. * The default behavior is to expand globs. Commands may override to * perform other expansions on an argument. * @param arg string pattern to expand * @return list of {@link PathData} objects * @throws IOException if anything goes wrong... */ protected List<PathData> expandArgument(String arg) throws IOException { PathData[] items = PathData.expandAsGlob(arg, getConf()); if (items.length == 0) { // it's a glob that failed to match throw new PathNotFoundException(arg); } return Arrays.asList(items); }最後以最後的PathData列表的信息來到終於的processArgument方法
/** * Processes the command's list of expanded arguments. * {@link #processArgument(PathData)} will be invoked with each item * in the list. The loop catches IOExceptions, increments the error * count, and displays the exception. * @param args a list of {@link PathData} to process * @throws IOException if anything goes wrong... */ protected void processArguments(LinkedList<PathData> args) throws IOException { for (PathData arg : args) { try { processArgument(arg); } catch (IOException e) { displayError(e); } } }而後對每個pathData信息運行處理操做
/** * Processes a {@link PathData} item, calling * {@link #processPathArgument(PathData)} or * {@link #processNonexistentPath(PathData)} on each item. * @param item {@link PathData} item to process * @throws IOException if anything goes wrong... */ protected void processArgument(PathData item) throws IOException { if (item.exists) { processPathArgument(item); } else { processNonexistentPath(item); } }而後運行Ls.java中的processPathArgument方法
@Override protected void processPathArgument(PathData item) throws IOException { // implicitly recurse once for cmdline directories if (dirRecurse && item.stat.isDirectory()) { recursePath(item); } else { super.processPathArgument(item); } }在這裏會進程是否爲文件夾的推斷,假設是文件夾則會進行遞歸推斷一次,進行子文件夾文件的展現.咱們直接看是單文件的處理,基礎方法在Comman.java中定義.
/** * This is the last chance to modify an argument before going into the * (possibly) recursive {@link #processPaths(PathData, PathData...)} * -> {@link #processPath(PathData)} loop. Ex. ls and du use this to * expand out directories. * @param item a {@link PathData} representing a path which exists * @throws IOException if anything goes wrong... */ protected void processPathArgument(PathData item) throws IOException { // null indicates that the call is not via recursion, ie. there is // no parent directory that was expanded depth = 0; processPaths(null, item); }而後processPaths又是在子類中詳細實現
@Override protected void processPaths(PathData parent, PathData ... items) throws IOException { if (parent != null && !isRecursive() && items.length != 0) { out.println("Found " + items.length + " items"); } adjustColumnWidths(items); super.processPaths(parent, items); }而後再次進行一個類似這種來回,運行processPaths方法
/** * Iterates over the given expanded paths and invokes * {@link #processPath(PathData)} on each element. If "recursive" is true, * will do a post-visit DFS on directories. * @param parent if called via a recurse, will be the parent dir, else null * @param items a list of {@link PathData} objects to process * @throws IOException if anything goes wrong... */ protected void processPaths(PathData parent, PathData ... items) throws IOException { // TODO: this really should be iterative for (PathData item : items) { try { processPath(item); if (recursive && isPathRecursable(item)) { recursePath(item); } postProcessPath(item); } catch (IOException e) { displayError(e); } } }最後展現的操做就是在這種方法中進行的
@Override protected void processPath(PathData item) throws IOException { FileStatus stat = item.stat; String line = String.format(lineFormat, (stat.isDirectory() ? "d" : "-"), stat.getPermission() + (stat.getPermission().getAclBit() ?到這裏整個ls調用的流程就基本結束了,預計有些讀者要被這來回的方法繞暈了,只是沒有關係,咱們主要知道終於控制文件顯示的方法在哪裏,稍稍改改就可以達到咱們的目的."+" : " "), (stat.isFile() ?分佈式
stat.getReplication() : "-"), stat.getOwner(), stat.getGroup(), formatSize(stat.getLen()), dateFormat.format(new Date(stat.getModificationTime())), item ); out.println(line); }ide
現在我來教你們怎樣新增ls命令參數.首先定義參數說明oop
public static final String NAME = "ls"; public static final String USAGE = "[-d] [-h] [-R] [-l] [<path> ...]"; public static final String DESCRIPTION = "List the contents that match the specified file pattern. If " + "path is not specified, the contents of /user/<currentUser> " + @@ -53,7 +55,9 @@ public static void registerCommands(CommandFactory factory) { "-d: Directories are listed as plain files.\n" + "-h: Formats the sizes of files in a human-readable fashion " + "rather than a number of bytes.\n" += "-R: Recursively list the contents of directories.\n" + "-l: The limited number of files records's info which would be " + "displayed, the max value is 1024.\n";
定義相關變量post
protected int maxRepl = 3, maxLen = 10, maxOwner = 0, maxGroup = 0; protected int limitedDisplayedNum = 1024; protected int displayedRecordNum = 0; protected String lineFormat; protected boolean dirRecurse; protected boolean limitedDisplay = false; protected boolean humanReadable = false;默認最大顯示數目1024個.而後在參數解析的方法中進行新增參數的解析
@Override protected void processOptions(LinkedList<String> args) throws IOException { CommandFormat cf = new CommandFormat(0, Integer.MAX_VALUE, "d", "h", "R", "l"); cf.parse(args); dirRecurse = !cf.getOpt("d"); setRecursive(cf.getOpt("R") && dirRecurse); humanReadable = cf.getOpt("h"); limitedDisplay = cf.getOpt("l"); if (args.isEmpty()) args.add(Path.CUR_DIR); }而後是最核心的修改,processPaths方法
protected void processPaths(PathData parent, PathData ... items) if (parent != null && !isRecursive() && items.length != 0) { out.println("Found " + items.length " items"); } PathData[] newItems; if (limitedDisplay) { int length = items.length; if (length > limitedDisplayedNum) { length = limitedDisplayedNum; out.println("Found " + items.length + " items" + ", more than the limited displayed num " + limitedDisplayedNum); } newItems = new PathData[length]; for (int i = 0; i < length; i++) { newItems[i] = items[i]; } items = null; } else { newItems = items; } adjustColumnWidths(newItems); super.processPaths(parent, newItems); }
邏輯不難. 如下是測試的一個樣例,我在測試的jar包中設置了默認限制數目1個,而後用ls命令分別測試帶參數與不帶參數的狀況,測試截圖例如如下:this
此部分代碼已經提交至開源社區,編號HADOOP-12641.連接在文章尾部列出.spa
Issue連接:https://issues.apache.org/jira/browse/HADOOP-12641
github patch連接:https://github.com/linyiqun/open-source-patch/blob/master/hadoop/HADOOP-12641/HADOOP-12641.001.patch