翻譯 | Placing Search in Context The Concept Revisited

時間 2019-12-13

標籤翻譯 placing search context concept revisited 简体版

原文原文鏈接

翻譯 | Placing Search in Context The Concept Revisited

原文

摘要

[1] Keyword-based search engines are in widespread use today as a popular means for Web-based information retrieval.網絡

[2] Although such systems seem deceptively simple, a considerable amount of skill is required in order to satisfy non-trivial information needs.app

[3] This paper presents a new conceptual paradigm for performing search in context, that largely automates the search process, providing even non-professional users with highly relevant results.dom

[4] This paradigm is implemented in practice in the IntelliZap system, where search is initiated from a text query marked by the user in a document she views, and is guided by the text surrounding the marked query in that document (「the context」).ide

[5] The context-driven information retrieval process involves semantic keyword extraction and clustering to automatically generate new, augmented queries.工具

[6] The latter are submitted to a host of general and domain-specific search engines.性能

[7] Search results are then semantically reranked, using context. Experimental results testify that using context to guide search, effectively offers even inexperienced users an advanced search tool on the Web.ui

模型改進

第一節

[1] The core of IntelliZap technology is a semantic network, which provides a metric for measuring distances between pairs of words.this

[2] The basic semantic network is implemented using a vector-based approach, where each word is represented as a vector in multi-dimensional space.搜索引擎

[3] To assign each word a vector representation, we first identified 27 knowledge domains (such as computers, business and entertainment) roughly partitioning the whole variety of topics.lua

[4] We then sampled a large set of documents in these domains on the Internet Word vectors were obtained by recording the frequencies of each word in each knowledge domain.

[5] Each domain can therefore be viewed as an axis in the multi-dimensional space.

[6] The distance measure between word vectors is computed using a correlation-based metric:

第二節

[1] Unfortunately, there are no accepted procedures for evaluating performance of semantic metrics.

[2] Following Resnik [1999], we evaluated different metrics by computing correlation between their scores and human-assigned scores for a list of word pairs.

[3] The intuition behind this approach is that a good metric should approximate human judgments well.

[4] While Resnik used a list of 30 noun pairs from Miller and Charles [1991], we opted for a more comprehensive evaluation.

[5] To this end, we prepared a diverse list of 350 noun pairs representing various degrees of similarity,10 and employed 16 subjects to estimate the 「relatedness」 of the words in pairs on a scale from 0 (totally unrelated words) to 10 (very much related or identical words).

[6] The vector-based metric achieved 41% correlation with averaged human scores, and the WordNet-based metric achieved 39% correlation11,12 A linear combination of the two metrics achieved 55% correlation with human scores.

[7] Currently, our semantic network is defined for the English language, though the technology can be adapted for other languages with minimal effort.

[8] This would require training the network using textual data for the desired language, properly partitioned into domains.

[9] Linguistic information can be added, subject to the availability of adequate tools for the target language (e.g., EuroWordNet for European languages [Euro WordNet] or EDR for Japanese [Yokoi 1995]).