From . . . Algorithms - quot; Angle Answer - What happens to page rendering after entering the URL?

Jun 01, 2021 Article blog

The article was reproduced from the public number: front-end troy students

Old-fashioned questions can be answered in a different way! This article does not explain the network part that occurs after URL input, only the analysis part after getting to the page from an algorithmic point of view, mainly divided into the following steps:

Build a DOM tree
样式 calculated
Build 布局树 Layout Tree )

Build a DOM tree

Because browsers cannot directly understand HTML strings, this series of byte streams is transformed into a meaningful and easy-to-manipulate data structure that is DOM树 DOM树 is essentially a multi-fork tree with document as its root node.

So how is it parsed?

The nature of HTML grammar

First, let's be clear: HTML grammar is not 上下文无关文法 grammar.

Here, it is necessary to discuss what is 上下文无关文法

There is a very clear definition in the compilation principles of computer science:

If a formal grammar G -(N, s, P, S) is generated in the form of A V-w, it is called context-independent syntax. Where V ∈ N, w ∈ (N∪.

It explains the meaning of each parameter in the G s (N, S, P, S):

N is a collection of non-terminators (as the name implies, that is, the last symbol is not it, the same as below).
Is the Terminator collection.
P is the starter, and it must belong to N, which is the non-terminator.
S is a collection of different generations. Such as S-> aSb, etc.

To put it more colloquially, 上下文无关的文法 means that the left side of all generated styles in this grammar is a non-terminator.

See here, if there is a little circle, I give you an example you understand.

Like what:

A -> B

In this grammar, there is a non-terminator on the left side of each production, which is 上下文无关的文法 In this case, xBy must be able to regulate xAy

Let's take a look at a counter-example:

aA -> B
Aa -> B

This situation is not 上下文无关的文法 and when we encounter B we don't know exactly whether A can be statuted, depending on whether a exists on the left or right, that is, context-sensitive.

As to why it is 非上下文无关文法 the first thing to note is that the canonical HTML syntax is 上下文无关文法 grammar and can embody its 非上下文无关 is non-standard syntax. Here I can prove it by citing only one counter-example.

For example, when the parser scans to form label, the context-independent grammar is handled by creating the DOM object directly for the form, whereas in a real HTML5 scenario, the parser looks at the context of the form and if the parent label of the form label is also form then skip the current form label directly, otherwise the DOM object is created.

A regular programming language is context-independent, whereas HTML, on the contrary, is a non-context-independent feature that determines that HTML Parser cannot be done using the parser of a regular programming language and requires a different approach.

The parsing algorithm

The HTML5 specification describes the parsing algorithm in detail. The algorithm is divided into two phases:

Tagged.
Achievements.

The two corresponding processes are lexical analysis and grammatical analysis.

The markup algorithm

The algorithm is entered as HTML text, output as HTML标记 and becomes a tag generator. T his is done using a finite automatic state machine. That is, when one or more characters are received in the current state, they are updated to the next state.

<html>
  <body>
    Hello sanyuan
  </body>
</html>

Demonstrate the process of 标记化 with a simple example.

A < was encountered with the status tag on.

The character that receives the word [a-z] enters the tag name state.

This state remains until a > is encountered, indicating that the tag name record is complete and then becomes a data state.

Next encounter body tag to do the same.

At this point both html and body tags are logged.

Now come to the > in the <body >, enter the data state, and then keep this state to receive the character hello sanyuan after that state.

Then receive the < in the < go back to the tag open, and receive the next / after which a end tag token is created.

Then you go into the tag name state and encounter > back to the data state.

The </body > is then processed in the same style.

Tree-building algorithm

As mentioned earlier, the DOM tree is a multi-fork tree with document as its root node. S o the parser first creates a document object. T he tag generator sends information about each tag to the tree builder. W hen the tree builder receives the appropriate tag, the corresponding DOM object is created. After you create this DOM对象 you do two things:

Add the DOM对象 to the DOM tree.
Press the corresponding marker into the stack that holds the open (corresponding to the closing 闭合标签 meaning) element.

Or take the example and say:

<html>
  <body>
    Hello sanyuan
  </body>
</html>

First, the state is the initialization state.

You receive an html tag from the tag generator, and that's when the state changes to before html. Create a DOM element of HTMLHtmlElement at the same time, add it to the document root object, and stack it.

Then the state automatically changes to before head, where body comes from the tag builder, indicating that there is no head at which point the tree builder automatically creates an HTMLHeadElement and adds it to the DOM树

Now go into the head state, and then jump directly to the after head.

Now the tag generator has a body tag, creates HTMLBodyElement, inserts it into the DOM tree, and presses into the open tag stack.

The state then changes to in body, and then receives the next series of characters: Hello sanyuan. W hen the first character is received, a Text node is created and inserted into it, and then the Text node is inserted under body元素 in the DOM tree. As you continue to receive later characters, they are attached to the Text node.

Now, the tag generator passes the end tag of a body and enters the after body state.

Marker Builder finally passes an html end tag and enters the state of after after body, indicating that the resolution process ends here.

Fault tolerance mechanism

When it comes to HTML5 specification, it has to be said that it has a strong tolerance strategy, is very fault tolerant, and although there are mixed reviews, I think as a senior front-end engineer, it's important to know what HTML Parser has done with fault tolerance.

Next up are some classic fault tolerance examples in WebKit, and there are others you'd like to add.

1. Use not < br>

if (t->isCloseTag(brTag) && m_document->inCompatMode()) {
  reportError(MalformedBRError);
  t->beginTag = true;
}

All in the form of < br >.

2. Table discrete

<table>
  <table>
    <tr><td>inner table</td></tr>
  </table>
  <tr><td>outer table</td></tr>
</table>

WebKit is automatically converted to:

<table>
    <tr><td>outer table</td></tr>
</table>
<table>
    <tr><td>inner table</td></tr>
</table>

3. Form elements are nested

Ignore the form inside directly at this time.

Style calculation

For CSS styles, there are generally three sources:

Link label reference
Styles in style labels
The inline style property of the element

Format the style sheet

First, the browser doesn't recognize CSS-style text directly, so the first thing the rendering engine does when it receives CSS text is to turn it into a structured object, the style Sheets.

This formatting process is too complex, and there are different optimization strategies for different browsers, which is not the case here.

This final structure can be viewed in the browser console through document.styleSheets Of course, this structure contains the above three CSS sources, providing the basis for later style operations.

Standardized style properties

There are some CSS-style values that are not easily understood by the rendering engine, so they need to be standardized before the style is calculated, such as em px red #ff0000 bold 700 and so on.

Calculate the specific style of each node

Styles have been 格式化 and 标准化 and then you can calculate the specific style information for each node.

In fact, the calculation is not complicated, mainly two rules: inheritance and cascade.

Each child node inherits the parent's style properties by default, and if it is not found in the parent node, the browser default style, also known as UserAgent样式 is adopted. This is the rule of inheritance and is very easy to understand.

Then there is the cascading rules, the biggest feature of CSS is its cascading, that is, the final style depends on the effect of the various properties, and even a lot of strange cascade phenomenon, have seen the "CSS World" students should have a deep understanding of this, the specific cascading rules belong to the category of in-depth CSS language, here is not much introduced.

It is worth noting, however, that after the style has been calculated, all style values are hung in window.computedStyle which is convenient for JS to obtain the calculated style.

Build a layout tree

Now that you've generated DOM树 and DOM样式 the next step is to 确定元素的位置 through the browser's layout system, which is to create a 布局树 Tree.

The general work of layout tree generation is as follows:

Traverses the generated DOM tree nodes and adds them to 布局树中
Calculates the coordinate location of the layout tree node.

It is important to note that this layout tree value contains visible elements that will not be placed in the head label and the element with display: none set.

Some people say that Render Tree will be generated first, that is, the rendering tree, but that was 16 years ago, and now the Chrome team has done a lot of refactoring and there is no process for Render Tree The layout tree is well-documented and fully functional with Render Tree

The reason not to talk about the details of the layout is because it is too complex, one by one will appear that the article is too bloated, but most of the time we just need to know what it does, if you want to go into the principle, know how it is done, I highly recommend that you read the Everyone FED team article from Chrome source to see how the browser layout layout.

summary

Take a look at the main veins of this section:

From . . . Algorithms - quot; Angle Answer - What happens to page rendering after entering the URL?1

That's W3Cschool编程狮 says about answering from an "algorithm" point of view - what happens to page rendering after entering the URL? Related to the introduction, I hope to help you.