Skip to content

架构

本文档涵盖了 Biome 的一些内部结构,以及它们在项目中的使用方式。

¥This document covers some of the internals of Biome, and how they are used inside the project.

¥Parser and CST

解析器的架构由 rowan 的内部分支(实现 绿色和红色树 模式的库)所影响。

¥The architecture of the parser is bumped by an internal fork of rowan, a library that implements the Green and Red tree pattern.

CST(具体语法树)是一种与 AST(抽象语法树)非常相似的数据结构,用于跟踪程序的所有信息,包括琐碎信息。

¥The CST (Concrete Syntax Tree) is a data structure very similar to an AST (Abstract Syntax Tree) that keeps track of all the information of a program, trivia included.

琐事由对程序运行很重要的所有信息表示:

¥Trivia is represented by all that information that is important to a program to run:

  • spaces

  • tabs

  • comments

琐事附加到节点。节点可以有前导琐事和尾随琐事。如果你从左到右阅读代码,则 leading trivia 会出现在关键字之前,而 trialing trivia 会出现在关键字之后。

¥Trivia is attached to a node. A node can have leading trivia and trailing trivia. If you read code from left to right, leading trivia appears before a keyword, and trialing trivia appears after a keyword.

前导琐事和尾随琐事分类如下:

¥Leading trivia and trailing trivia are categorized as follows:

  • 直到标记/关键字(包括换行符)的每个琐事都将是前导琐事;

    ¥Every trivia up to the token/keyword (including line breaks) will be the leading trivia;

  • 直到下一个换行符(但不包括它)的所有内容都将是尾随琐事;

    ¥Everything until the next linebreak (but not including it) will be the trailing trivia;

给定以下 JavaScript 代码片段,// comment 1 是标记 ; 的尾随琐事,// comment 2 是关键字 const 的前导琐事。以下是 Biome 所代表的 CST 的最小化版本:

¥Given the following JavaScript snippet, // comment 1 is a trailing trivia of the token ;, and // comment 2 is a leading trivia to the keyword const. Below is a minimized version of the CST represented by Biome:

const a = "foo"; // comment 1
// comment 2
const b = "bar";
0: JS_MODULE@0..55
...
1: SEMICOLON@15..27 ";" [] [Whitespace(" "), Comments("// comment 1")]
1: JS_VARIABLE_STATEMENT@27..55
...
1: CONST_KW@27..45 "const" [Newline("\n"), Comments("// comment 2"), Newline("\n")] [Whitespace(" ")]
3: EOF@55..55 "" [] []

CST 在设计上永远无法直接访问;开发者可以使用 Red 树读取其信息,使用从语言语法自动生成的多个 API。

¥The CST is never directly accessible by design; a developer can read its information using the Red tree, using a number of APIs that are autogenerated from the grammar of the language.

弹性和可恢复的解析器

Section titled 弹性和可恢复的解析器

¥Resilient and recoverable parser

为了构建 CST,解析器需要具有错误恢复能力和可恢复性:

¥In order to construct a CST, a parser needs to be error-resilient and recoverable:

  • 弹性:能够在遇到属于该语言的语法错误后恢复解析的解析器;

    ¥resilient: a parser that is able to resume parsing after encountering syntax errors that belong to the language;

  • 可恢复:能够理解错误发生位置并能够通过创建正确信息恢复解析的解析器;

    ¥recoverable: a parser that is able to understand where an error occurred and being able to resume the parsing by creating correct information;

解析器的可恢复部分不是一门科学,也没有一成不变的规则。这意味着根据解析器解析的内容以及发生错误的位置,解析器可能能够以预期的方式恢复自身。

¥The recoverable part of the parser is not a science, and no rules are set in stone. This means that depending on what the parser was parsing and where an error occurred, the parser might be able to recover itself in an expected way.

解析器还使用 ‘Bogus’ 节点来保护消费者免于使用不正确的语法。这些节点用于修饰语法错误导致的代码损坏。

¥The parser also uses’ Bogus’ nodes to protect the consumers from consuming incorrect syntax. These nodes are used to decorate the broken code caused by a syntax error.

在下面的例子中,while 中的括号丢失了,尽管解析器可以很好地恢复自身,并可以用像样的 CST 表示代码。循环的括号和条件被标记为缺失,并且代码块被正确解析:

¥In the following example, the parentheses in the while are missing, although the parser can recover itself in a good manner and can represent the code with a decent CST. The parenthesis and condition of the loop are marked as missing, and the code block is correctly parsed:

while {}
JsModule {
interpreter_token: missing (optional),
directives: JsDirectiveList [],
items: JsModuleItemList [
JsWhileStatement {
while_token: WHILE_KW@0..6 "while" [] [Whitespace(" ")],
l_paren_token: missing (required),
test: missing (required),
r_paren_token: missing (required),
body: JsBlockStatement {
l_curly_token: L_CURLY@6..7 "{" [] [],
statements: JsStatementList [],
r_curly_token: R_CURLY@7..8 "}" [] [],
},
},
],
eof_token: EOF@8..8 "" [] [],
}

这是解析过程中发出的错误:

¥This is an error emitted during parsing:

main.tsx:1:7 parse ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✖ expected `(` but instead found `{`
> 1 │ while {}
│ ^
ℹ Remove {

以下代码片段则不是这样。解析器在恢复阶段无法正确理解语法,因此它需要依靠虚假节点将某些语法标记为错误。注意 JsBogusStatement:

¥The same can’t be said for the following snippet. The parser can’t properly understand the syntax during the recovery phase, so it needs to rely on the bogus nodes to mark some syntax as erroneous. Notice the JsBogusStatement:

function}
JsModule {
interpreter_token: missing (optional),
directives: JsDirectiveList [],
items: JsModuleItemList [
TsDeclareFunctionDeclaration {
async_token: missing (optional),
function_token: FUNCTION_KW@0..8 "function" [] [],
id: missing (required),
type_parameters: missing (optional),
parameters: missing (required),
return_type_annotation: missing (optional),
semicolon_token: missing (optional),
},
JsBogusStatement {
items: [
R_CURLY@8..9 "}" [] [],
],
},
],
eof_token: EOF@9..9 "" [] [],
}

这是我们从解析阶段得到的错误:

¥This is the error we get from the parsing phase:

main.tsx:1:9 parse ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✖ expected a name for the function in a function declaration, but found none
> 1 │ function}
│ ^

¥Formatter (WIP)

¥Daemon (WIP)

Biome 使用服务器-客户端架构来运行其任务。

¥Biome uses a server-client architecture to run its tasks.

daemon 是一个长期运行的服务器,Biome 在后台生成并用于处理来自编辑器和 CLI 的请求。

¥A daemon is a long-running server that Biome spawns in the background and uses to process requests from the editor and CLI.