【NLP新聞-2013.06.16】Representative Reviewing

英語原文地址:http://nlp.hivefire.com/articles/share/40221/promise

注:本人翻譯NLP新聞只爲學習專業英語和擴展視野,若是翻譯的很差,請諒解!sass

(實在是讀不大懂,翻譯的一塌糊塗…若是有人能明白這篇文章的大題意思,必定要留言,感激涕零!)app

When thinking about how best to review papers, it seems helpful to have some conception of what good reviewing is. As far as I can tell, this is almost always only discussed in the specific context of a paper (i.e. your rejected paper), or at most an area (i.e. what a 「good paper」 looks like for that area) rather than general principles. Neither individual papers or areas are sufficiently general for a large conference—every paper differs in the details, and what if you want to build a new area and/or cross areas?less

當考慮如何最好的去審查論文的時候,若是對什麼是好的審查有一些概念和理解的話,那麼是有幫助的。據我所知,這種狀況只有在討論一篇論文具體語境的時候(例如你拒絕的論文)出現或者在一個大多數的領域,而不是通常的規則。沒有任何一我的或者一個領域的知識是足以應付一個大型會議的-每個論文在細節上是不一樣的,要是你創建一個新的領域或者交叉的領域會怎麼樣呢?ide

An unavoidable reason for reviewing is that the community of research is too large. In particular, it is not possible for a researcher to read every paper which someone thinks might be of interest. This reason for reviewing exists independent of constraints on rooms or scheduling formats of individual conferences. Indeed, history suggests that physical constraints are relatively meaningless over the long term — growing conferences simply use more rooms and/or change formats to accommodate the growth.學習

一個不可避免的審查的緣由是研究的團體太大了。尤爲是,不可能每個研究者閱讀每一篇他感興趣的論文。這個緣由獨立存在於房間的限制和我的會議調度安排。實際上,歷史代表,物理上的限制在時代發展的前提下是毫無心義的,長期增加的會議僅僅簡單的使用了更多的房間,或者改變了形式來適應增加。測試

This suggests that a generic test for paper acceptance should be 「Are there a significant number of people who will be interested?」 This question could theoretically be answered by sending the paper to every person who might be interested and simply asking them. In practice, this would be an intractable use of people’s time: We must query far fewer people and achieve an approximate answer to this question. Our goal then should be minimizing the approximation error for some fixed amount of reviewing work.優化

這代表,論文的通常測試驗收應該是:「是否會有至關多的人感興趣?」。這個問題能夠理論上經過把這篇論文給每一個可能感興趣的人而且只詢問他們是否感興趣來回答。實際上,這將比較難管理的去使用別人的時間:咱們必須查詢更少的人而且得到大概的針對這個問題的回答。咱們的目標應該在固定的審查工做中減小近似值偏差。ui

Viewed from this perspective, the first way that things can go wrong is by misassignment of reviewers to papers, for which there are two easy failure modes available.this

從這個角度看事情,第一種方式評審論文分配不當可能會出現錯誤,這裏有兩種容易失效的模式。

  1. When reviewer/paper assignment is automated based on an affinity graph, the affinity graph may be low quality or the constraint on the maximum number of papers per reviewer can easily leave some papers with low affinity to all reviewers orphaned.
  2. 當評審者/論文的分配根據親和圖自動分配,親和圖的質量也許很低或者每一個人的論文數量的最大值的限制會很容易剩餘一些論文,與評審者具備低的親和力讓他們孤立。
  3. When reviewer/paper assignments are done by one person, that person may choose reviewers who are all like-minded, simply because this is the crowd that they know. I’ve seen this happen at the beginning of the reviewing process, but the more insidious case is when it happens at the end, where people are pressed for time and low quality judgements can become common.
  4. 當評審者/論文分配是由一我的完成,這我的可能會選擇志趣相投的評審者,由於這些是他們知道的。我已經看到了這種模式已經出如今評審進程當中,可是更多隱藏的事件發生在最後,在最後階段人們壓時間,低質量的評判,成爲了常見的現象。

An interesting approach for addressing the constraint objective would be optimizing a different objective, such as the product of affinities rather than the sum. I’ve seen no experimentation of this sort.

一種有趣的方法解決約束目標能夠爲優化不一樣的目標,好比產品的親和力而不是總和。我尚未看到過有人使用這種方法。

For ICML, there are about 3 levels of 「reviewer」: the program chair who is responsible for all papers, the area chair who is responsible for organizing reviewing on a subset of papers, and the program committee member/reviewer who has primary responsibility for reviewing. In 2012 tried to avoid these failure modes in a least-system effort way using a blended approach. We used bidding to get a higher quality affinity matrix. We used a constraint system to assign the first reviewer to each paper and two area chairs to each paper. Then, we asked each area chair to find one reviewer for each paper. This obviously dealt with the one-area-chair failure mode. It also helps substantially with low quality assignments from the constrained system since (a) the first reviewer chosen is typically higher quality than the last due to it being the least constrained (b) misassignments to area chairs are diagnosed at the beginning of the process by ACs trying to find reviewers (c) ACs can reach outside of the initial program committee to find reviewers, which existing automated systems can not do.

 

(ICML Intermedia Casting Markup Language媒體選擇標記語言)ICML,有三種水平的「評審者」:程序的主要負責者,負責全部的論文;區域負責者,負責組織審查論文的子集還有程序的委員會成員們,評審者們,有直接的評審的責任。在2012年試着去防止這些失效的模型在最小系統的工做方式下使用混合的工做方式。咱們經過招標來得到更高質量的親和矩陣。咱們使用一個限制系統來分配第一個評審者給每一篇論文而後兩個區域的負責者一篇分配給每一篇論文。而後,咱們詢問每個區域的負責者去爲每個論文尋找到一個評審者。這很明顯是一one-area-chair失效處理模式。他還從本質上有助於低質量的從限制系統分配第一個評審者選擇,與最後一個相比一般會質量更高,由於它會變得最少的約束誤配給區域負責者,這些負責者在進程開始會被ACs診斷,試着去發現評審者ACs能夠去初始程序委員會的外面去尋找評審者,這些已經存在的自動系統是不能作到的。

The next way that reviewing can go wrong is via biased reviewing.

下一種方式的評審經過偏見評審可能會出現錯誤。

  1. Author name bias is a famous one. In my experience it is real: well known authors automatically have their paper taken seriously, which particularly matters when time is short. Furthermore, I’ve seen instances where well-known authors can slide by with proof sketches that no one fully understands.
  2. 做者姓名的偏見就是一個著名的例子。在個人經歷中:著名的做家們自動的把他們的論文認真對待,特別是當時間很短的時候。此外,我也已經看見過著名的做家在梗概沒有人能徹底理解的狀況下通過證實會下跌。
  3. Review anchoring is a very significant problem if it occurs. This does not happen in the standard review process, because the reviews of others are not visible to other reviewers until they are complete.
  4. 評審的穩定一旦發生是一個很是關鍵的問題。他在標準的評審程序中尚未出現,由於其餘的評審直到他們完成相對於其餘的評審來講是不可見的。
  5. A more subtle form of bias is when one reviewer is simply much louder or charismatic than others. Reviewing without an in-person meeting is actually helpful here, as it reduces this problem substantially.
  6. 一個更不易察覺的偏見的形式是當一個評審者僅僅更加高調的或者相比其餘有魅力。沒有一我的的會議的評審其實是很是有效的,就好像充分的削弱了這個問題。

Reviewing can also be low quality. A primary issue here is time: most reviewers will submit a review within a time constraint, but it may not be high quality due to limits on time. Minimizing average reviewer load is quite important here. Staggered deadlines for reviews are almost certainly also helpful. A more subtle thing is discouraging low quality submissions. My favored approach here is to publish all submissions nonanonymously after some initial period of time.

評審也會變得質量低。一個重要的問題就是時間:大部分的評審者將會提交一個評審在時間的限制內,可是這樣可能質量可能不會很高,就是由於時間的限制。減小平均的評審者的載入是很是重要的。錯開的截至時間對於評審者來講是很是有幫助的。一個更加不易察覺的事情是發現低質量的提交。我最喜歡的方法是發佈全部的提交上來的結果在一些初試時間事後。

Another significant issue in reviewer quality is motivation. Making reviewers not anonymous to each other helps with motivation as poor reviews will at least be known to some. Author feedback also helps with motivation, as reviewers know that authors will be able to point out poor reviewing. It is easy to imagine that further improvements in reviewer motivation would be helpful.

另外一個關鍵的問題是,評審質量是動力。使每個評審者不匿名的對於其餘人會有助於動機,正如的不良的評論會至少被一些人知道。做者反饋也有助於動機,例如評論者知道做者將會指出很差的評論。也很容易想象會有更深層次的改善在評論者動機上。

A third form of low quality review is based on miscommunication. Maybe there is silly typo in a paper? Maybe something was confusing? Being able to communicate with the author can greatly reduce ambiguities.

第三種低質量的評審形式是錯誤傳達。也許有人在文章中寫了錯字。也許一些事情是疑惑的。可以與做者聯繫上能夠大大的減小歧義。

The last problem is dictatorship at decision time for which I’ve seen several variants. Sometimes this comes in the form of giving each area chair a budget of papers to 「champion」. Sometimes this comes in the form of an area chair deciding to override all reviews and either accept or more likely reject a paper. Sometimes this comes in the form of a program chair doing this as well. The power of dictatorship is often available, but it should not be used: the wiser course is keeping things representative.

最後一個問題是我已經見過的一些變種,在決定的時刻獨裁。有些時候出如今給每一個領域預算「冠軍」的論文。有的時候出如今一個領域的負責者裁決去覆蓋全部的評論或者接受或者可能拒絕一個論文。有的時候出如今一個程序的負責者作這樣的事情。獨裁的力量是能夠得到的,可是不能使用:比較明智的作法是保持事物的表明性。

At ICML 2012, we tried to deal with this via a defined power approach. When reviewers agreed on the accept/reject decision, that was the decision. If the reviewers disgreed, we asked the two area chairs to make decisions and if they agreed, that was the decision. It was only when the ACs disagreed that the program chairs would become involved in the decision.

在ICML2012,咱們試着去經過一個定義的有做用的方法去處理這些問題。當評論者贊成一個接受的或者拒絕的決定時,這種方法就是一個決定。若是評論者不一樣意,咱們將會詢問兩個領域的負責人來作決定,若是他們贊成了,那麼這就是最終決定的結果。僅僅只有在ACs不一樣意的時候,程序負責人才會被加入到決定的判決當中來。

The above provides an understanding of how to create a good reviewing process for a large conference. With this in mind, we can consider various proposals at the peer review workshop and elsewhere.

上面提供了一個關於怎樣去建立一個良好的評論進程在一個會議當中的理解。記住這一點,咱們能夠考慮各類不一樣的提議在同行評審的研討會上或者一些其餘的地方。

  1. Double Blind Review. This reduces bias, at the cost of decreasing reviewer motivation. Overall, I think it’s a significant long term positive for a conference as 「insiders」 naturally become more concerned with review quality and 「outsiders」 are more prone to submit.
  2. Double Blind Review.這種方式下降了偏見,以減小評論者的動機爲代價。全面的看,我認爲這對於會議來講是一種有意義的能夠長期發展的方式,就像,知情人很天然的成爲了更多關聯評審質量,外部人容易有傾斜的去提交。
  3. Better paper/reviewer matching. A pure win, with the only caveat that you should be familiar with failure modes and watch out for them.
  4. Better paper/reviewer matching.一種純粹的勝出,僅有的須要注意的是你必須熟悉失效模型而且對他們保持警覺。
  5. Author feedback. This improves review quality by placing a check on unfair reviews and reducing miscommunication at some cost in time.
  6. 做者反饋。這能夠提升評論的質量,對於檢查和定位一個不公平的評論和減小最後因爲誤傳產生的代價。
  7. Allowing an appendix or ancillary materials. This allows authors to better communicate complex ideas, at the potential cost of reviewer time. A standard compromise is to make reading an appendix optional for reviewers.
  8. 容許附錄的材料。做者能夠更好的交流複雜的觀點,能夠更好的改善審稿時間的成本。一個標準的折中解決的方案是能夠閱讀可選的附錄。
  9. Open reviews. Open reviews means that people can learn from other reviews, and that authors can respond more naturally than in single round author feedback.
  10. Open reviews.Open reviews的意思是人們能夠從其餘的評論中學習,而且做者能夠更天然的回覆而不是單輪的做者回饋。

It’s important to note that none of the above are inherently contradictory. This is not necessarily obvious as proponents of open review and double blind review have found themselves in opposition at times. These approaches can be accommodated by simply hiding authors names for a fixed period of 2 months while the initial review process is ongoing.

值得指出的是,上面提到的沒有一種是相互矛盾的。也沒有必要明顯的分出open view的擁護者仍是double blind review的擁護者,以達到對立。這些方法能夠經過隱藏做者的名字固定的2個月的時間,當評論進程初始開始進行的時候來適應。

Representative reviewing seems like the real difficult goal. If a paper is rejected in a representative reviewing process, then perhaps it is just not of sufficient interest. Similarly, if a paper is accepted, then perhaps it is of real and meaningful interest. And if the reviewing process is not representative, then perhaps we should fix the failure modes.

Representative reviewing 看起來是一個很是困難的目標。若是一篇論文在Representative reviewing 的進程中被拒絕了,而後多是不夠有足夠的興趣。類似的,若是一個論文被接受了,而後可能這篇文章真的是很是的有意義和感興趣。若是評審進程不是典型的,而後可能咱們應該修改一下失效模型。

相關文章
相關標籤/搜索