【我的經歷】記本身的第一次GitHub開源代碼共享經歷

題記:java

  本身作程序員快三年有餘了,感受本身和剛入職相比確實有了很多進步,固然三年要是不進步那不就傻了嗎,有時候我也在想,我在這三年裏留下了什麼,固然也不是說有多麼高尚的想法,就是之後對別人介紹本身的時候,可否說出點什麼了,就像去面試同樣,總得拿出點看得見的業績出來吧!沒事的時候去知乎看看各位大神的膜拜一下之外,生活沒有任何變化,我特別不喜歡考試,由於不喜歡背東西,不喜歡背東西的緣由就是記性差,但是記性再差我也始終忘不了她,沒有刻骨銘心的經歷,卻有深刻骨髓的感受,有點扯遠了。程序員

  因此本身心裏一直以來都有想在GitHub上留下一點印記的想法,由於最近項目中較多使用flume,因此看了不少flume的代碼,發現了我以爲能夠提出修改的地方了。不少項目的配置文件中,關於時間和容量類的配置基本都是不能附加單位的,這致使設置過大數值的時候轉換困難以及閱讀困難等問題,好比想設置2g容量,可是由於默認單位爲byte,須要將2g轉換成2147483648,很不方便,而flume中應用了大量的配置文件,我初步總結了一下用的這兩方面配置的地方:面試

Exec Source
    ms:restartThrottle,batchTimeout
Spooling Directory Source
    ms:pollDelay,maxBackoff,
Event Deserializers
    BlobDeserializer
        byte:deserializer.maxBlobLength@BlobDeserializer
Taildir Source
    ms:idleTimeout,writePosInterval,backoffSleepIncrement,maxBackoffSleep
Twitter 1% firehose Source (experimental)
    ms:maxBatchDurationMillis
Kafka Source
    ms:batchDurationMillis,backoffSleepIncrement,maxBackoffSleep
NetCat Source
    byte:max-line-length@NetcatSource
Syslog TCP Source
    byte:eventSize@SyslogTcpSource
Multiport Syslog TCP Source
    byte:eventSize@MultiportSyslogTCPSource
BlobHandler
    byte:handler.maxBlobLength
Stress Source
    byte:size@StressSource
Scribe Source
    byte:maxReadBufferBytes@ScribeSource
HDFS Sink
    ms:hdfs.callTimeout
Hive Sink
    ms:callTimeout
Avro Sink
    ms:connect-timeout,request-timeout
Thrift Sink
    ms:connect-timeout,request-timeout
File Roll Sink
    ms:sink.rollInterval(s)
AsyncHBaseSink
    ms:timeout
MorphlineSolrSink
    ms:batchDurationMillis
Kite Dataset Sink
    ms:kite.rollInterval(s)
Memory Channel
    byte:byteCapacity@MemoryChannel
Kafka Channel
    ms:pollTimeout
File Channel
    byte:maxFileSize,minimumRequiredSpace@FileChannel
    ms:checkpointInterval,keep-alive
Spillable Memory Channel
    byte:byteCapacity,avgEventSize@SpillableMemoryChannel
    ms:overflowTimeout,
Pseudo Transaction Channel
    ms:keep-alive(s)
Failover Sink Processor
    ms:processor.maxpenalty
Load balancing Sink Processor
    ms:processor.selector.maxTimeOut
Avro Event Serializer
    byte:syncIntervalBytes@AvroEventSerializer
View Code

2017年05月14日express

  可是由於涉及到的地方太多,並且有些對原有程序改動較大,因此我想找一我的討論下,所以找到一個flume的開發人員,給他發了封郵件,英文的重要性就體現出來,靠着詞典把這封郵件寫出來了:apache

Dear Attila Simon:
     I use the flume in my work,when I was in the configuration of the flume, found that some of the parameters of configuration is very troublesome, and readability is very poor like the Memory Channel's byteCapacity ,File Channel's transactionCapacity and maxFileSize etc., basic it is about the capacity configuration.Generally when I was in the configuration that, I need a special calculation of byte after transformation, such as 2g into 2147483648, and must be withing annotated, otherwise the next time I read, don't know this is 2g intuitive
    So I wrote a simple method used for converting readable capacity allocation into corresponding byte code is as follows.
public class Utils {
    private static final String REGULAR="((?<gb>\\d+(\\.\\d+)?)(g|G|gb|GB))?((?<mb>\\d+(\\.\\d+)?)(m|M|mb|MB))?((?<kb>\\d+(\\.\\d+)?)(k|K|kb|KB))?((?<b>\\d+)(b|B|byte|BYTE)?)?";
    private static final int rate=1024;
    
    public static Long string2Bytes(String in){
        return string2Bytes(in,null);
    }
    public static Long string2Bytes(String in,Long defaultValue){
        if(in==null || in.trim().length()==0){
            return defaultValue;
        }
        in=in.trim();
        Pattern pattern = Pattern.compile(REGULAR);
        Matcher matcher = pattern.matcher(in);
        if(matcher.matches()){
            long bytes=0;
            String gb=matcher.group("gb");
            String mb=matcher.group("mb");
            String kb=matcher.group("kb");
            String b=matcher.group("b");
            if(gb!=null){
                bytes+=Math.round(Double.parseDouble(gb)*Math.pow(rate, 3));
            }
            if(mb!=null){
                bytes+=Math.round(Double.parseDouble(mb)*Math.pow(rate, 2));
            }
            if(kb!=null){
                bytes+=Math.round(Double.parseDouble(kb)*Math.pow(rate, 1));
            }
            if(b!=null){
                bytes+=Integer.parseInt(b);
            }
            return bytes;
        }else{
            throw new IllegalArgumentException("the param "+in+" is not a right");
        }
    }
}
Below is the test class
@RunWith(Parameterized.class)
public class UtilsTest {
    private String param;
    private Long result;
    public UtilsTest(String param,Long result){
        this.param=param;
        this.result=result;
    }
    @Parameters
    public static Collection<Object[]> data() {
        return Arrays.asList(new Object[][]{
                {"", null},
                {"  ", null},
                {"2g", 1L*2*1024*1024*1024},
                {"2G", 1L*2*1024*1024*1024},
                {"2gb", 1L*2*1024*1024*1024},
                {"2GB", 1L*2*1024*1024*1024},
                {"2000m", 1L*2000*1024*1024},
                {"2000mb", 1L*2000*1024*1024},
                {"2000M", 1L*2000*1024*1024},
                {"2000MB", 1L*2000*1024*1024},
                {"1000k", 1L*1000*1024},
                {"1000kb", 1L*1000*1024},
                {"1000K", 1L*1000*1024},
                {"1000KB", 1L*1000*1024},
                {"1000", 1L*1000},
                {"1.5GB", 1L*Math.round(1.5*1024*1024*1024)},
                {"1.38g", 1L*Math.round(1.38*1024*1024*1024)},
                {"1g500MB", 1L*1024*1024*1024+500*1024*1024},
                {"20MB512", 1L*20*1024*1024+512},
                {"0.5g", 1L*Math.round(0.5*1024*1024*1024)},
                {"0.5g0.5m", 1L*Math.round(0.5*1024*1024*1024+0.5*1024*1024)},
        });
    }
    
    @Test
    public void testString2Bytes() {
        assertEquals(result,Utils.string2Bytes(param));
    }
}

public class UtilsTest2 {    
    @Test(expected =IllegalArgumentException.class)
    public void testString2Bytes1() {
        String in="23t";
        Utils.string2Bytes(in);
    }
    
    @Test(expected =IllegalArgumentException.class)
    public void testString2Bytes2() {
        String in="2g50m1.4";
        Utils.string2Bytes(in);
    }
    
    @Test(expected =IllegalArgumentException.class)
    public void testString2Bytes3() {
        String in="4.2g";
        Utils.string2Bytes(in);
    }
}
    I'm going to put all the reading capacity place to use this method to read, and compatible with the previous usage, namely not with units of numerical defaults to byte, why I don't fork and pull request the code, the reason is that some of the parameter name with byte or bytes, if its value is 2GB or 500MB, it is appropriate to do so, or making people confuse, so I ask for your opinion in advance.
    Parameters in the configuration of time whether to need to add the unit, I think can, do you think if capacity added to the unit, whether time synchronization also improved.
    In addition to this I also want to talk about another point, that is, when the user parameter configuration errors are handled, in the flume, the means of processing is to configure the error using the default values, and print the warn message, even in some places will not print the warn,the following codes are in the MemoryChannel.class
    try {
      byteCapacityBufferPercentage = context.getInteger("byteCapacityBufferPercentage",
                                                        defaultByteCapacityBufferPercentage);
    } catch (NumberFormatException e) {
      byteCapacityBufferPercentage = defaultByteCapacityBufferPercentage;
    }
    try {
      transCapacity = context.getInteger("transactionCapacity", defaultTransCapacity);
    } catch (NumberFormatException e) {
      transCapacity = defaultTransCapacity;
      LOGGER.warn("Invalid transation capacity specified, initializing channel"
          + " to default capacity of {}", defaultTransCapacity);
    }
     I don't think this is right, because the common user won't be too care about a warn information, he would think that the program run successfully according to configuration parameters, the results do use the default values.
     I think, if the user doesn't set a property then we should use default values, if the user is configured with a property, he certainly expect to this attribute to run the program, if the configuration errors should stop running and allow the user to modify., of course, this approach may be too radical, may give the option to the user, may be a new property  < Agent >.configuration.error = defaultValue/stopRunning, when configured to defaultValue shall, in accordance with the previous approach, configuration stopRunning will stop running the program.

    Thank you very much for reading such a long email,and most of email are machine translation, looking forward to your reply, if possible I hope to become a member of the flume contributors.

                                                                                                                                                                                                                                                             from Lisheng Xia
View Code

  重點就是提出本身的建議,可是由於涉及太多須要討論下。而後就是等待回信中。app

2017年05月17日jsp

  等了好多天,終於回信了:ide

Hi Lisheng Xia,

I like this feature! I would like to add the dev list to this conversation so others can express their opinion as well. After all what community says is what really matters. We can discuss there your proposal in detail as well as whether there is a library which can help you in the unit conversion eg javax.measure.

In my opinion it is appropriate :
to have a configuration property name with "byte" and still allowing to specify the value with units, if it is clear from the documentation what would that mean (eg please see the GiB vs GB definitions here).
extending this feature with time values (Hour,Minute,Second,Milisec...)
having the conversation failures instantly and clearly visible to the user by throwing an exception. I think "< Agent >.configuration.error = defaultValue/stopRunning" would be even better but much harder to implement.

Cheers,
Attila
View Code

  看他表達的意思是意見很好,可是有問題的地方仍是須要你們討論的,順便他給我指出了個人一個認知錯誤,那就是GB和GiB的關係,仍是漲了很多見識。但是我不知道怎麼討論以及去哪討論,我就又發送了一封郵件詢問他:工具

Hi Attila

Thank you very much for your reply.

I downloaded the javax.measure and read its source, and look at the definition of wikipedia about GiB and GB, learned a lot, and correct the wrong knowledge.Now the idea is as follows

1.Don't import jscience or other jars,write directly in the flume-ng-core project/org.apache.flume.tools package,create a new UnitUtils class for capacity and time of string parsing and transformation operations.don't introducing jscience package, is the cause of not directly provide available method, and the function of the project needs to need only a class can be completed, unnecessary modified pom.xml and import  new jars.

2.About the capacity unit,because the window unit causes the confusion,my advice to simulate the JVM startup parameters, -Xms/-Xmx use rules, the rules only allowed to use the G/M/K/B or g/m/k/b to define (actual representative GiB/MiB/KiB/Byte), without the unit is Byte by default.

The above two points, if not no problem, I will develop and test.

In addition, the definition of unit of time, I need to discuss, is to use the h/m/s/S or H/m/s/S or use the Hour/Minute/Second/Millisecend.Can you tell me the way to where can be discussed.

Cheers,
Lisheng Xia
View Code

2017年05月18日測試

  沒想到此次回覆這麼快:

Hi Lisheng Xia, 

I think the best to discuss further on dev@flume.apache.org list (added to the email recipients list already). I recommend you to join that list as that is one of the early steps towards being a contributor. In general this page describes how to contribute to Apache Flume: https://cwiki.apache.org/confluence/display/FLUME/How+to+Contribute 


Cheers,
Attila
View Code

  只是我又遇到了難題,郵件列表這個東西很古老了,我怎麼也找不到使用方法,而後如何貢獻flume代碼,我也看了好幾遍,感受仍是找不到討論的入口,我以爲本身好小白,裏面不少東西仍是不太明白,反覆閱讀了不少遍,以爲這個應該是個重點:

  這個JIRA是什麼,搜索瞭解了一番,果真又漲了很多知識,隨後又去一個未知的的網站註冊了JIRA帳號,在裏面把問題提出來了:

  而後下面就不知道該作啥了,陷入等待中。

2017年05月22日

  後來感受等下去也不是辦法,我就打算直接pull request試一下,克隆,下載,修改,提交,pull request,結果提交後有個大大的紅叉號:

  我又蒙了,徹底不知道錯在哪,本地基本測試我已經測試過了,並且這個錯誤,進去詳情:

  我就寫了簡簡單單的方法就有這麼多的錯誤,徹底不知道是由於什麼,我又去了解Travis CI是什麼東西,果真又學到了不少知識,本身測試了下,果真這麼多錯誤,只能一個個改正,而後從新上傳。

2017年05月23日

  最後從新提交,測試經過,而後本身寫了個留言,問一下用戶手冊在哪,我須要更新:

  隨後就又是等待中。

2017年06月02日

  終於在今天一個開發者給我了回覆,並提出了本身的建議:

  但是我徹底不一樣意他的建議,因而我寫了不一樣意的緣由:

To be honest, before I write the tool class, I did not find a library that could meet my needs. Thank you for increasing my knowledge @adenes . I read the source code of config.jar and read the development documentation and did some basic Of the test, now listed under the advantages and disadvantages I know:
Advantages of using config.jar:
The solution is more mature and does not need to test the parsing parameters part of the code

Disadvantages of using config.jar:
There is no separate way to directly parse a string with a unit, you must reload the configuration file, the change is relatively large, not fully compatible with the original program
Can only parse a number and a unit of the parameters, such as 1g500m this with a number of units can not be resolved (do not know if there is such a demand)

Advantages of using custom tool classes to parse:
You can parse parameters with multiple units
Easy to use, the change is very small (read the time through a static method can be processed), and has been achieved

Disadvantages of using custom tool classes to parse:
Requires complete testing

What do you think?

For the time unit, I suggest that some of the recommendations, if I provided in the Context class getDuration () method used to resolve the time, if no unit does not need to parse the direct return number, there are units I will analyze the actual unit , Internally stored in milliseconds, but what number of units do I return?

For example, the default unit in milliseconds, timeout = 3000, I will return 3000, timeout = 3s (internal resolution into 3000ms), I will return 3000 (return the number of units in milliseconds).But in the default unit for the case of seconds, timeout = 3, I will return 3, timeout = 3s, How much should I return , if according to the above logic I will still return 3000, but the user expects to return the unit for seconds, How do I know the user's expectations unit?

Perhaps you can use the getDuration (String key, TimeUnit unit) method to let the user tell me what unit it needs to return value, but this will lead to another problem, if the default unit is seconds, timeout = 3, I can not know 3 real units and convert them into units that the user expects, because no unit can not simply return the number, and need to be converted into the unit desired by the user.

It seems a bit difficult to analyze the time unit, because I need to meet the needs of existing components for time parameter resolution. I can not provide a common way to meet all the needs of the present situation. Based on the current situation, my suggestion is , No unit number directly return the number, there are units according to the actual unit analysis, and unified return in milliseconds as a unit of the number, if the user needs seconds, then their own conversion (really, the user needs for the second unit is not the beginning support time units, resulting in too long settings to read up is not convenient)...
View Code

  結果好幾天都沒有回覆,等待中。

  雖然尚未結果,可是想說說本身的感想,一個就是對於開源的工具,能貢獻就去貢獻,不能只是享受使用它們的便利,也應該讓這個工具持續的發展下去,還有就是無論什麼第一次都是好睏難,好幾回我都快放棄了,估計也是由於第一次貢獻代碼不少地方都很小白,無論此次成不成功,我以爲只要開了這個頭,之後我都會繼續貢獻下去,再說經過這個事也讓我學到了特別多的東西,固然若是能用上本身貢獻的代碼仍是有點成就感的。

2017年06月13日

  按照他的要求修改了用戶使用手冊,這個時候碰到一個坑,flume-ng-doc不是一個Maven項目,只能修改了原始文件,修改完用戶使用手冊後之後又把配置文件模板修改了下,增長了單位使用的提醒,而後增長了工具類的測試方法,至此之外完成了,除時間單位解析全部的工做,提交等待他們下一步迴應。

  。。。等待中

相關文章
相關標籤/搜索