ace -- 語法高亮

Creating a Syntax Highlighter for Ace 給ace建立一個語法高亮

Creating a new syntax highlighter for Ace is extremely simple. You'll need to define two pieces of code: a new mode, and a new set of highlighting rules.javascript

建立一個新的ace語法高亮極爲簡單。你須要定義兩個代碼: 一個新的mode和一組新的高亮規則。php

Where to Start

We recommend using the Ace Mode Creator when defining your highlighter. This allows you to inspect your code's tokens, as well as providing a live preview of the syntax highlighter in action.html

咱們建議使用 Ace Mode Creator 定義你的高亮。這容許你檢查你的代碼的tokens,以及在操做中提供語法高亮的實時預覽。java

Ace Mode Creator :  https://ace.c9.io/tool/mode_creator.htmlnode

Defining a Mode

Every language needs a mode. A mode contains the paths to a language's syntax highlighting rules, indentation rules, and code folding rules. Without defining a mode, Ace won't know anything about the finer aspects of your language.git

Here is the starter template we'll use to create a new mode:github

每種語言都須要一個mode。mode包含語言的語法高亮規則,縮進規則和代碼摺疊規則的路徑。在沒有定義mode的狀況下,ACE對你語言的細微之處一無所知web

這是一個啓動模板,咱們將用它建立一個新的mode:ajax

 

define(function(require, exports, module) {
"use strict";
 
var oop = require("../lib/oop");
// defines the parent mode
var TextMode = require("./text").Mode;
var Tokenizer = require("../tokenizer").Tokenizer;
var MatchingBraceOutdent = require("./matching_brace_outdent").MatchingBraceOutdent;
 
// defines the language specific highlighters and folding rules
var MyNewHighlightRules = require("./mynew_highlight_rules").MyNewHighlightRules;
var MyNewFoldMode = require("./folding/mynew").MyNewFoldMode;
 
var Mode = function() {
// set everything up
this.HighlightRules = MyNewHighlightRules;
this.$outdent = new MatchingBraceOutdent();
this.foldingRules = new MyNewFoldMode();
};
oop.inherits(Mode, TextMode);
 
(function() {
// configure comment start/end characters
this.lineCommentStart = "//";
this.blockComment = {start: "/*", end: "*/"};
 
// special logic for indent/outdent.
// By default ace keeps indentation of previous line
this.getNextLineIndent = function(state, line, tab) {
var indent = this.$getIndent(line);
return indent;
};
 
this.checkOutdent = function(state, line, input) {
return this.$outdent.checkOutdent(line, input);
};
 
this.autoOutdent = function(state, doc, row) {
this.$outdent.autoOutdent(doc, row);
};
 
// create worker for live syntax checking
this.createWorker = function(session) {
var worker = new WorkerClient(["ace"], "ace/mode/mynew_worker", "NewWorker");
worker.attachToDocument(session.getDocument());
worker.on("errors", function(e) {
session.setAnnotations(e.data);
});
return worker;
};
 
}).call(Mode.prototype);
 
exports.Mode = Mode;
});

What's going on here? First, you're defining the path to TextMode (more on this later). Then you're pointing the mode to your definitions for the highlighting rules, as well as your rules for code folding. Finally, you're setting everything up to find those rules, and exporting the Mode so that it can be consumed. That's it!正則表達式

這裏發生了什麼?首先,你定義了TextMode的路徑(稍後對此進行更多的闡述)。而後,你將mode指向你定義的高亮規則以及代碼摺疊規則。最後你設置全部的內容來查找這些規則,並導出該Mode以便它能夠被使用。

 

Regarding TextMode, you'll notice that it's only being used once: oop.inherits(Mode, TextMode);. If your new language depends on the rules of another language, you can choose to inherit the same rules, while expanding on it with your language's own requirements. For example, PHP inherits from HTML, since it can be embedded directly inside .html pages. You can either inherit from TextMode, or any other existing mode, if it already relates to your language.

關於 TextMode, 你會注意到它只使用了一次:oop.inherits(Mode, TextMode); 若是你的新語言依賴於其餘語言的規則,那麼你能夠選擇繼承相同的規則,同時根據你的語言自身的需求對其進行擴展。例如,PHP從HTML繼承,由於PHP能夠直接嵌入到.html頁面中。你也能夠從 TextMode繼承,或者其餘已有的mode,若是它已經涉及到你的語言。

 

All Ace modes can be found in the lib/ace/mode folder.

ace的全部modes均可以在 lib/ace/mode 文件夾中找到

Defining Syntax Highlighting Rules 定義語法高亮規則

The Ace highlighter can be considered to be a state machine. Regular expressions define the tokens for the current state, as well as the transitions into another state. Let's define mynew_highlight_rules.js, which our mode above uses.

All syntax highlighters start off looking something like this:

ace高亮能夠被認爲是一個狀態機。正則表達式給當前狀態定義tokens,以及轉換到另外一個狀態。讓咱們定義 mynew_highlight_rules.js,上面使用的mode。

全部的語法高亮開始都像這樣:

define(function(require, exports, module) {
"use strict";
 
var oop = require("../lib/oop");
var TextHighlightRules = require("./text_highlight_rules").TextHighlightRules;
 
var MyNewHighlightRules = function() {
 
// regexp must not have capturing parentheses. Use (?:) instead.
// regexps are ordered -> the first match is used
this.$rules = {
"start" : [
{
token: token, // String, Array, or Function: the CSS token to apply
regex: regex, // String or RegExp: the regexp to match
next: next // [Optional] String: next state to enter
}
]
};
};
 
oop.inherits(MyNewHighlightRules, TextHighlightRules);
 
exports.MyNewHighlightRules = MyNewHighlightRules;
 
});

The token state machine operates on whatever is defined in this.$rules. The highlighter always begins at the start state, and progresses down the list, looking for a matching regex. When one is found, the resulting text is wrapped within a <span class="ace_<token>"> tag, where <token> is defined as the token property. Note that all tokens are preceded by the ace_prefix when they're rendered on the page.

token狀態機運行在 this.$rules裏無論什麼定義。高亮老是從start 狀態開始,並沿着列表前進,尋找匹配的正則表達式regex。當找到文本時,被找到的文本被包裹在<span class="ace_<token>">標籤中, <token>是上面定義的 token屬性。請注意,當tokens渲染到頁面上時,都會以 ace_ 前綴呈現。

 

Once again, we're inheriting from TextHighlightRules here. We could choose to make this any other language set we want, if our new language requires previously defined syntaxes. For more information on extending languages, see "extending Highlighters" below.

再來一次,咱們從 TextHighlightRules 繼承下來。若是咱們的新語言須要先前定義的語法,咱們能夠選擇把它變成咱們想要的任何其它語言集。有關擴展語言的更多信息,請查看下面的 extending Highlighters 

 

Defining Tokens  定義tokens

The Ace highlighting system is heavily inspired by the TextMate language grammar. Most tokens will follow the conventions of TextMate when naming grammars. A thorough (albeit incomplete) list of tokens can be found on the Ace Wiki.

ace高亮系統深受 TextMate language grammar 啓發。當命名語法時,大多數tokens將遵循 TextMate的約定。在ace wiki上能夠找到完整的token列表 (雖然不完整):    

token列表: https://github.com/ajaxorg/ace/wiki/Creating-or-Extending-an-Edit-Mode#commonTokens

 

For the complete list of tokens, see tool/tmtheme.js. It is possible to add new token names, but the scope of that knowledge is outside of this document.

有關完整的tokens列表, 請查看 tool/tmtheme.js  https://github.com/ajaxorg/ace/blob/master/tool/tmtheme.js    能夠添加新的token名稱,但該知識的範圍在該文檔以外。

 

Multiple tokens can be applied to the same text by adding dots in the token, e.g. token: support.function wraps the text in a <span class="ace_support ace_function"> tag.

經過在tokens添加 點 ,能夠將多個tokens做用於同一文本。例如 token: support.function   將文本包裹在 <span class="ace_support ace_function">標籤中。

 

Defining Regular Expressions 定義正則表達式

Regular expressions can either be a RegExp or String definition

正則表達式既能夠是正則表達式也能夠是字符串定義

If you're using a regular expression, remember to start and end the line with the / character, like this:

若是你使用一個正則表達式,記住像下面這樣,在一行的開始和結束使用 / 字符。

{
token : "constant.language.escape",
regex : /\$[\w\d]+/
}
 

A caveat of using stringed regular expressions is that any \ character must be escaped. That means that even an innocuous regular expression like this:

使用字符串形式的正則表達式的一個警告是任何 \ 字符必須被轉義。這意味着,即便是一個像下面這樣的無害的正則表達式:

regex: "function\s*\(\w+\)"
 

Must actually be written like this:

必須像下面這樣編寫:

regex: "function\\s*\(\\w+\)"
 

Groupings 分組

You can also include flat regexps--(var)--or have matching groups--((a+)(b+)). There is a strict requirement whereby matching groups must cover the entire matched string; thus, (hel)lo is invalid. If you want to create a non-matching group, simply start the group with the ?: predicate; thus, (hel)(?:lo) is okay. You can, of course, create longer non-matching groups. For example:

你也能夠包括 單一的正則 --(var)-- 或者 匹配組 --((a+)(b+))。嚴格要求匹配組必須覆蓋整個匹配字符串,所以 (hel)lo 是無效的。若是你想建立一個不匹配的組,只須要用 ?: 謂語做爲組的開始;像 (hel)(?:lo) 也是能夠的。 固然,你能夠建立更長的非匹配組。 例如:

{
token : "constant.language.boolean",
regex : /(?:true|false)\b/
},
 

For flat regular expression matches, token can be a String, or a Function that takes a single argument (the match) and returns a string token. For example, using a function might look like this:

對於單一的正則表達式匹配, token能夠是一個 String, 或者是一個接收單個參數(當前匹配)並返回一個字符串token的Function。例如,使用函數可能看起來像下面這樣:

var colors = lang.arrayToMap(
("aqua|black|blue|fuchsia|gray|green|lime|maroon|navy|olive|orange|" +
"purple|red|silver|teal|white|yellow").split("|")
);
 
var fonts = lang.arrayToMap(
("arial|century|comic|courier|garamond|georgia|helvetica|impact|lucida|" +
"symbol|system|tahoma|times|trebuchet|utopia|verdana|webdings|sans-serif|" +
"serif|monospace").split("|")
);
 
...
 
{
token: function(value) {
if (colors.hasOwnProperty(value.toLowerCase())) {
return "support.constant.color";
}
else if (fonts.hasOwnProperty(value.toLowerCase())) {
return "support.constant.fonts";
}
else {
return "text";
}
},
regex: "\\-?[a-zA-Z_][a-zA-Z0-9_\\-]*"
}

 

If token is a function,it should take the same number of arguments as there are groups, and return an array of tokens.

若是token是一個函數,它應該具備與組相同的參數數目,而且返回一個tokens數組。

 

For grouped regular expressions, token can be a String, in which case all matched groups are given that same token, like this:

對於分組正則表達式,token能夠是 String , 在這種狀況下,全部的匹配組都被賦予相同的token。像下面這樣

{
token: "identifier",
regex: "(\\w+\\s*:)(\\w*)"
}
 

More commonly, though, token is an Array (of the same length as the number of groups), whereby matches are given the token of the same alignment as in the match. For a complicated regular expression, like defining a function, that might look something like this:

然而,更常見的是,token是一個數組(長度與 組的數量 相同),由此,匹配被賦予與匹配中相同的對齊的token。對於一個複雜的正則表達式,像定義一個函數,看起來可能像下面這樣:

{
token : ["storage.type", "text", "entity.name.function"],
regex : "(function)(\\s+)([a-zA-Z_][a-zA-Z0-9_]*\\b)"
}

 

Defining States 定義狀態

The syntax highlighting state machine stays in the start state, until you define a next state for it to advance to. At that point, the tokenizer stays in that new state, until it advances to another state. Afterwards, you should return to the original start state.

語法高亮狀態機停留在 start 狀態,直到你給它定義一個 next 狀態來更新。此時, tokenizer保持在新的 state , 直到它進入到另外一個狀態。而後, 你應該回到原來的 start 狀態。

Here's an example:

this.$rules = {
"start" : [ {
token : "text",
regex : "<\\!\\[CDATA\\[",
next : "cdata"
} ],
 
"cdata" : [ {
token : "text",
regex : "\\]\\]>",
next : "start"
}, {
defaultToken : "text"
} ]
};

In this extremely short sample, we're defining some highlighting rules for when Ace detects a <![CDATA tag. When one is encountered, the tokenizer moves from start into the cdata state. It remains there, applying the text token to any string it encounters. Finally, when it hits a closing ]> symbol, it returns to the start state and continues to tokenize anything else.

在這個很是短的示例中,咱們定義了一些用於檢測 <![CDATA 標籤的高亮規則。當遇到一個時,tokenizer從 start 移動到 cdata狀態。它仍然存在,將 ‘text’ token應用到它遇到的任何字符串。最後,當它命中關閉  ]> 符號時, 它返回到start 狀態而且繼續標記任何其餘東西。

 

Using the TMLanguage Tool  使用 TMLanguage 工具

There is a tool that will take an existing tmlanguage file and do its best to convert it into Javascript for Ace to consume. Here's what you need to get started:

有一個工具,它將使用現有的 tmlanguage 文件,並盡最大努力將其轉換成 Javascript以供 ace使用。一下是你須要開始的:

  1. In the Ace repository, navigate to the tools folder.
    1.   在ace庫中, 導航到 tools 文件夾
  2. Run npm install to install required dependencies.
    1.   運行 npm install 安裝須要的依賴
  3. Run node tmlanguage.js <path_to_tmlanguage_file>; for example, node <path_to_tmlanguage_file> /Users/Elrond/elven.tmLanguage
    1.   運行 node tmlanguage.js <path_to_tmlanguage_file> 例如: node tmlanguage  /Users/Elrond/elven.tmLanguage 

Two files are created and placed in lib/ace/mode: one for the language mode, and one for the set of highlight rules. You will still need to add the code into ace/ext/modelist.js, and add a sample file for testing.

兩個文件被建立並放置在 lib/ace/mode 目錄下: 一個是語言 mode, 一個是高亮規則的集合。你仍然須要將代碼添加到 ace/ext/modelist.js中,並添加用於測試的示例文件。

 

A Note on Accuracy 關於精度的一點註記

Your .tmlanguage file will then be converted to the best of the converter’s ability. It is an understatement to say that the tool is imperfect. Probably, language mode creation will never be able to be fully autogenerated. There's a list of non-determinable items; for example:

你的 .tmlanguage 文件會轉換爲 轉換器最好的能力。這是一個輕描淡寫的說法,該工具是不完美的。也許,語言模式的創造永遠不能徹底自生。這裏有一個不可肯定的項目清單,以下:

  • The use of regular expression lookbehinds
    This is a concept that JavaScript simply does not have and needs to be faked
    •   正則表達式查找表的使用
    •       這是一個javascript根本沒有,須要僞造的概念。
  • Deciding which state to transition to
    While the tool does create new states correctly, it labels them with generic terms like state_2state_10e.t.c.
    •   決定向哪一個 狀態 過渡
    •      雖然工具確實建立了新的狀態,但它用 state_2, state_10等通用屬於來標記它們。
  • Extending modes
    Many modes say something like include source.c, to mean, 「add all the rules in C highlighting.」 That syntax does not make sense to Ace or this tool (though of course you can extending existing highlighters).
    •   擴展模式
    •       許多模式都說一些相似於 include source.c 的例子, 意思是」在c高亮中加入全部的規則「。這種語法對於ace或者這個工具是沒有意義的(固然,你能夠擴展示有的高亮顯示器)。
  • Rule preference order
    •   規則偏好順序
  • Gathering keywords
    Most likely, you’ll need to take keywords from your language file and run them through createKeywordMapper()
    •   關鍵詞采集
    •       最有可能的,你須要從你的語言文件中獲取關鍵詞,並經過  createKeywordMapper() 運行它們。

However, the tool is an excellent way to get a quick start, if you already possess a tmlanguage file for you language.

然而。若是你對你的語言已經擁有了一個 tmlanguage 文件,這個工具是一個很好的快速入門的方法。

 

Extending Highlighters  擴展高亮

Suppose you're working on a LuaPage, PHP embedded in HTML, or a Django template. You'll need to create a syntax highlighter that takes all the rules from the original language (Lua, PHP, or Python) and extends it with some additional identifiers (<?lua<?php{%, for example). Ace allows you to easily extend a highlighter using a few helper functions.

假設你正在處理一個 LuaPage, PHP 嵌入到 HTML, 或者一個 Django模板。你須要建立一個語法高亮程序,它從原始語言(Lua, PHP, or Python)獲取全部語法規則,並使用一些附加標識符(例如, <?lua  <?php, {%)擴展它。ace容許你使用幾個輔助函數輕鬆擴展高亮。

 

Getting Existing Rules  獲取已有的規則

To get the existing syntax highlighting rules for a particular language, use the getRules() function. For example:

要得到特定語言的現有語法高亮規則,使用getRules() 函數,例如:

var HtmlHighlightRules = require("./html_highlight_rules").HtmlHighlightRules;
 
this.$rules = new HtmlHighlightRules().getRules();
 
/*
this.$rules == Same this.$rules as HTML highlighting
*/
 

Extending a Highlighter

The addRules method does one thing, and it does one thing well: it adds new rules to an existing rule set, and prefixes any state with a given tag. For example, let's say you've got two sets of rules, defined like this:

addRules 方法作一件事,而且作的很好: 它向現有規則集添加新規則,而且用一個給定的標籤給任何狀態添加前綴。例如,假設你有兩套規則,定義以下:

this.$rules = {
"start": [ /* ... */ ]
};
 
var newRules = {
"start": [ /* ... */ ]
}

If you want to incorporate newRules into this.$rules, you'd do something like this:

若是你想將 newRules 合併到 this.$rules , 你能夠這樣作:

this.addRules(newRules, "new-");
 
/*
this.$rules = {
"start": [ ... ],
"new-start": [ ... ]
};
*/

Extending Two Highlighters

The last function available to you combines both of these concepts, and it's called embedRules. It takes three parameters:

最後一個可用的函數將這兩個概念結合起來,稱爲 embedRules。 它接收三個參數:

  1. An existing rule set to embed with
    1.   嵌入現有的規則
  2. A prefix to apply for each state in the existing rule set
    1.   在現有規則集中應用每一個狀態的前綴
  3. A set of new states to add
    1.   添加一組新的狀態

Like addRulesembedRules adds on to the existing this.$rules object.

像 addRules, embedRules 添加到現有的 this.$rules 對象。

To explain this visually, let's take a look at the syntax highlighter for Lua pages, which combines all of these concepts:

爲了直觀的解釋這一點,讓咱們看看 Lua頁面的語法高亮,它結合了全部這些概念:

var HtmlHighlightRules = require("./html_highlight_rules").HtmlHighlightRules;
var LuaHighlightRules = require("./lua_highlight_rules").LuaHighlightRules;
 
var LuaPageHighlightRules = function() {
this.$rules = new HtmlHighlightRules().getRules();
 
for (var i in this.$rules) {
this.$rules[i].unshift({
token: "keyword",
regex: "<\\%\\=?",
next: "lua-start"
}, {
token: "keyword",
regex: "<\\?lua\\=?",
next: "lua-start"
});
}
this.embedRules(LuaHighlightRules, "lua-", [
{
token: "keyword",
regex: "\\%>",
next: "start"
},
{
token: "keyword",
regex: "\\?>",
next: "start"
}
]);
};

Here, this.$rules starts off as a set of HTML highlighting rules. To this set, we add two new checks for <%= and <?lua=. We also delegate that if one of these rules are matched, we should move onto the lua-start state. Next, embedRules takes the already existing set of LuaHighlightRules and applies the lua- prefix to each state there. Finally, it adds two new checks for %> and ?>, allowing the state machine to return to start.

這裏, this.$rules 規則從一組 HTML高亮規則開始。對於這個集合,咱們添加了兩個新的檢查 <%=  和  <?lua= 。咱們還受權,若是這些規則中的一個匹配,咱們應該移動到 lua-start 狀態。接下來,embedRules將已經存在的 LuaHIghlightRUles集合應用lua-前綴到每一個狀態。最後, 它爲 %> 和 ?> 添加了兩個新的檢查,容許狀態機返回到 start 。

 

Code Folding

Adding new folding rules to your mode can be a little tricky. First, insert the following lines of code into your mode definition:

在你的mode中添加新的摺疊規則可能會有點棘手。 首先,將下面幾行代碼插入到你的mode定義中。

var MyFoldMode = require("./folding/newrules").FoldMode;
 
...
var MyMode = function() {
 
...
 
this.foldingRules = new MyFoldMode();
};

 

You'll be defining your code folding rules into the lib/ace/mode/folding folder. Here's a template that you can use to get started:

你將代碼摺疊規則定義到 lib/ace/mode/folding 文件夾。 這裏有個模板你能夠用它來開始。

define(function(require, exports, module) {
"use strict";
 
var oop = require("../../lib/oop");
var Range = require("../../range").Range;
var BaseFoldMode = require("./fold_mode").FoldMode;
 
var FoldMode = exports.FoldMode = function() {};
oop.inherits(FoldMode, BaseFoldMode);
 
(function() {
 
// regular expressions that identify starting and stopping points
this.foldingStartMarker;
this.foldingStopMarker;
 
this.getFoldWidgetRange = function(session, foldStyle, row) {
var line = session.getLine(row);
 
// test each line, and return a range of segments to collapse
};
 
}).call(FoldMode.prototype);
 
});

 

Just like with TextMode for syntax highlighting, BaseFoldMode contains the starting point for code folding logic. foldingStartMarkerdefines your opening folding point, while foldingStopMarker defines the stopping point. For example, for a C-style folding system, these values might look like this:

就像TextMode語法高亮同樣,BaseFoldMode包含代碼摺疊邏輯的起點。foldingStartMarker 定義了你的摺疊打開點, 而foldingStopMarker定義了中止點。例如,對於 C-style 摺疊系統,這些值多是這樣:

this.foldingStartMarker = /(\{|\[)[^\}\]]*$|^\s*(\/\*)/;
this.foldingStopMarker = /^[^\[\{]*(\}|\])|^[\s\*]*(\*\/)/;

 

These regular expressions identify various symbols--{[//--to pay attention to. getFoldWidgetRange matches on these regular expressions, and when found, returns the range of relevant folding points. For more information on the Range object, see the Ace API documentation.

這些正則表達式各類符號-- {,[,// --  要注意。 在這些正則表達式上匹配 getFoldWidgetRange, 當找到時,返回相關摺疊點的範圍。有關Range對象的更多信息,查看 the Ace API documentation  

Again, for a C-style folding mechanism, a range to return for the starting fold might look like this:

一樣,對於 C-style 摺疊機構,返回起始摺疊範圍多是這樣:

var line = session.getLine(row);
var match = line.match(this.foldingStartMarker);
if (match) {
var i = match.index;
 
if (match[1])
return this.openingBracketBlock(session, match[1], row, i);
 
var range = session.getCommentFoldRange(row, i + match[0].length);
range.end.column -= 2;
return range;
}

Let's say we stumble across the code block hello_world() {. Our range object here becomes:

{
startRow: 0,
endRow: 0,
startColumn: 0,
endColumn: 13
}

Testing Your Highlighter

The best way to test your tokenizer is to see it live, right? To do that, you'll want to modify the live Ace demo to preview your changes. You can find this file in the root Ace directory with the name kitchen-sink.html.

  1. add an entry to supportedModes in ace/ext/modelist.js
  2. add a sample file to demo/kitchen-sink/docs/ with same name as the mode file

Once you set this up, you should be able to witness a live demonstration of your new highlighter.

Adding Automated Tests

Adding automated tests for a highlighter is trivial so you are not required to do it, but it can help during development.

In lib/ace/mode/_test create a file named 

text_<modeName>.txt

with some example code. (You can skip this if the document you have added in demo/docs both looks good and covers various edge cases in your language syntax).

 

Run node highlight_rules_test.js -gen to preserve current output of your tokenizer in tokens_<modeName>.json

After this running highlight_rules_test.js optionalLanguageName will compare output of your tokenizer with the correct output you've created.

Any files ending with the _test.js suffix are automatically run by Ace's Travis CI server.

相關文章
相關標籤/搜索