Lucene索引的一個特色就filed,索引以field組合。這一特色爲索引和搜索提供了很大的靈活性。elasticsearch則在Lucene的基礎上更近一步,它能夠是 no scheme。實現這一功能的祕密就Mapping。Mapping是對索引各個字段的一種預設,包括索引與分詞方式,是否存儲等,數據根據字段名在Mapping中找到對應的配置,創建索引。這裏將對Mapping的實現結構簡單分析,Mapping的放置、更新、應用會在後面的索引fenx中進行說明。app
首先看一下Mapping的實現關係結構,以下圖所示:elasticsearch
這只是Mapping中的一部份內容。Mapping擴展了lucene的filed,定義了更多的field類型既有Lucene所擁有的string,number等字段又有date,IP,byte及geo的相關字段,這也是es的強大之處。如上圖所示,能夠分爲兩類,mapper與documentmapper,前者是全部mapper的父接口。而DocumentMapper則是Mapper的集合,它表明了一個索引的mapper定義。ide
Mapper的有三類,第一類就是核心field結構FileMapper—>AbstractFieldMapper—>StringField這種核心數據類型,它表明了一類數據類型,如字符串類型,int類型這種;第二類是Mapper—>ObjectMapper—>RootObjectMapper,object類型的Mapper,這也是elasticsearch對lucene的一大改進,不想lucene之支持基本數據類型;最後一類是Mapper—>RootMapper—>IndexFieldMapper這種類型,只存在於根Mapper中的一種Mapper,如IdFieldMapper及圖上的IndexFieldMapper,它們相似於index的元數據,只可能存在於某個index內部。post
Mapper中一個比較重要的方法就是parse(ParseContext context),Mapper的子類對這個方法都有各自的實現。它的主要功能是經過解析ParseContext獲取到對應的field,這個方法主要用於創建索引時。索引數據被繼續成parsecontext,每一個field解析parseContext構建對應的lucene Field。它在AbstractFieldMapper中的實現以下所示:ui
public void parse(ParseContext context) throws IOException { final List<Field> fields = new ArrayList<>(2); try { parseCreateField(context, fields);//實際Filed解析方法 for (Field field : fields) { if (!customBoost()) {//設置boost field.setBoost(boost); } if (context.listener().beforeFieldAdded(this, field, context)) { context.doc().add(field);//將解析完成的Field加入到context中 } } } catch (Exception e) { throw new MapperParsingException("failed to parse [" + names.fullName() + "]", e); } multiFields.parse(this, context);//進行mutiFields解析,MultiFields做用是對同一個field作不一樣的定義,如能夠進行不一樣分詞方式的索引這樣便於經過各類方式查詢 if (copyTo != null) { copyTo.parse(context); } }
這裏的parseCreateField是一個抽象方法,每種數據類型都有本身的實現,如string的實現方式以下所示:this
protected void parseCreateField(ParseContext context, List<Field> fields) throws IOException { ValueAndBoost valueAndBoost = parseCreateFieldForString(context, nullValue, boost);//解析成值和boost if (valueAndBoost.value() == null) { return; } if (ignoreAbove > 0 && valueAndBoost.value().length() > ignoreAbove) { return; } if (context.includeInAll(includeInAll, this)) { context.allEntries().addText(names.fullName(), valueAndBoost.value(), valueAndBoost.boost()); } if (fieldType.indexed() || fieldType.stored()) {//構建LuceneField Field field = new Field(names.indexName(), valueAndBoost.value(), fieldType); field.setBoost(valueAndBoost.boost()); fields.add(field); } if (hasDocValues()) { fields.add(new SortedSetDocValuesField(names.indexName(), new BytesRef(valueAndBoost.value()))); } if (fields.isEmpty()) { context.ignoredValue(names.indexName(), valueAndBoost.value()); } }
//解析出字段的值和boost public static ValueAndBoost parseCreateFieldForString(ParseContext context, String nullValue, float defaultBoost) throws IOException { if (context.externalValueSet()) { return new ValueAndBoost((String) context.externalValue(), defaultBoost); } XContentParser parser = context.parser(); if (parser.currentToken() == XContentParser.Token.VALUE_NULL) { return new ValueAndBoost(nullValue, defaultBoost); } if (parser.currentToken() == XContentParser.Token.START_OBJECT) { XContentParser.Token token; String currentFieldName = null; String value = nullValue; float boost = defaultBoost; while ((token = parser.nextToken()) != XContentParser.Token.END_OBJECT) { if (token == XContentParser.Token.FIELD_NAME) { currentFieldName = parser.currentName(); } else { if ("value".equals(currentFieldName) || "_value".equals(currentFieldName)) { value = parser.textOrNull(); } else if ("boost".equals(currentFieldName) || "_boost".equals(currentFieldName)) { boost = parser.floatValue(); } else { throw new ElasticsearchIllegalArgumentException("unknown property [" + currentFieldName + "]"); } } } return new ValueAndBoost(value, boost); } return new ValueAndBoost(parser.textOrNull(), defaultBoost); }
以上就是Mapper如何將一個值解析成對應的Field的過程,這裏只是簡單介紹,後面會有詳細分析。spa
DocumentMapper是一個索引全部Mapper的集合,它表述了一個索引全部field的定義,能夠說是lucene的Document的定義,同時它還包含如下index的默認值,如index和search時默認分詞器。它的部分Field以下所示:code
private final DocumentMapperParser docMapperParser; private volatile ImmutableMap<String, Object> meta; private volatile CompressedString mappingSource; private final RootObjectMapper rootObjectMapper; private final ImmutableMap<Class<? extends RootMapper>, RootMapper> rootMappers; private final RootMapper[] rootMappersOrdered; private final RootMapper[] rootMappersNotIncludedInObject; private final NamedAnalyzer indexAnalyzer; private final NamedAnalyzer searchAnalyzer; private final NamedAnalyzer searchQuoteAnalyzer;
DocumentMapper的功能也體如今parse方法上,它的做用是解析整條數據。以前在Mapper中看到了Field是如何解析出來的,那實際上是在DocumentMapper解析以後。index請求發過來的整條數據在這裏被解析出Field,查找Mapping中對應的Field設置,交給它去解析。若是沒有且運行動態添加,es則會根據值自動建立一個Field同時更新Mapping。方法代碼以下所示:orm
public ParsedDocument parse(SourceToParse source, @Nullable ParseListener listener) throws MapperParsingException { ParseContext.InternalParseContext context = cache.get(); if (source.type() != null && !source.type().equals(this.type)) { throw new MapperParsingException("Type mismatch, provide type [" + source.type() + "] but mapper is of type [" + this.type + "]"); } source.type(this.type); XContentParser parser = source.parser(); try { if (parser == null) { parser = XContentHelper.createParser(source.source()); } if (sourceTransforms != null) { parser = transform(parser); } context.reset(parser, new ParseContext.Document(), source, listener); // will result in START_OBJECT int countDownTokens = 0; XContentParser.Token token = parser.nextToken(); if (token != XContentParser.Token.START_OBJECT) { throw new MapperParsingException("Malformed content, must start with an object"); } boolean emptyDoc = false; token = parser.nextToken(); if (token == XContentParser.Token.END_OBJECT) { // empty doc, we can handle it... emptyDoc = true; } else if (token != XContentParser.Token.FIELD_NAME) { throw new MapperParsingException("Malformed content, after first object, either the type field or the actual properties should exist"); } // first field is the same as the type, this might be because the // type is provided, and the object exists within it or because // there is a valid field that by chance is named as the type. // Because of this, by default wrapping a document in a type is // disabled, but can be enabled by setting // index.mapping.allow_type_wrapper to true if (type.equals(parser.currentName()) && indexSettings.getAsBoolean(ALLOW_TYPE_WRAPPER, false)) { parser.nextToken(); countDownTokens++; } for (RootMapper rootMapper : rootMappersOrdered) { rootMapper.preParse(context); } if (!emptyDoc) { rootObjectMapper.parse(context); } for (int i = 0; i < countDownTokens; i++) { parser.nextToken(); } for (RootMapper rootMapper : rootMappersOrdered) { rootMapper.postParse(context); } } catch (Throwable e) { // if its already a mapper parsing exception, no need to wrap it... if (e instanceof MapperParsingException) { throw (MapperParsingException) e; } // Throw a more meaningful message if the document is empty. if (source.source() != null && source.source().length() == 0) { throw new MapperParsingException("failed to parse, document is empty"); } throw new MapperParsingException("failed to parse", e); } finally { // only close the parser when its not provided externally if (source.parser() == null && parser != null) { parser.close(); } } // reverse the order of docs for nested docs support, parent should be last if (context.docs().size() > 1) { Collections.reverse(context.docs()); } // apply doc boost if (context.docBoost() != 1.0f) { Set<String> encounteredFields = Sets.newHashSet(); for (ParseContext.Document doc : context.docs()) { encounteredFields.clear(); for (IndexableField field : doc) { if (field.fieldType().indexed() && !field.fieldType().omitNorms()) { if (!encounteredFields.contains(field.name())) { ((Field) field).setBoost(context.docBoost() * field.boost()); encounteredFields.add(field.name()); } } } } } ParsedDocument doc = new ParsedDocument(context.uid(), context.version(), context.id(), context.type(), source.routing(), source.timestamp(), source.ttl(), context.docs(), context.analyzer(), context.source(), context.mappingsModified()).parent(source.parent()); // reset the context to free up memory context.reset(null, null, null, null); return doc; }
將整條數據解析成ParsedDocument,解析後的數據才能進行後面的Field解析創建索引。blog
總結:以上就是Mapping的結構和相關功能歸納,Mapper賦予了elasticsearch索引的更強大功能,使得索引和搜索能夠支持更多數據類型,靈活性更高。