Responsible Markdown in Next.js

Publikováno: 10.8.2021

Markdown truly is a great format. It’s close enough to plain text so that anyone can quickly learn it, and it’s structured enough that it can be parsed and eventually converted to you name it.

That being said: parsing, processing, …

The post Responsible Markdown in Next.js appeared first on CSS-Tricks. You can support CSS-Tricks by being an MVP Supporter.

Celý článek

Markdown truly is a great format. It’s close enough to plain text so that anyone can quickly learn it, and it’s structured enough that it can be parsed and eventually converted to you name it.

That being said: parsing, processing, enhancing, and converting Markdown needs code. Shipping all that code in the client comes at a cost. It’s not huge per se, but it’s still a few dozens of kilobytes of code that are used only to deal with Markdown and nothing else.

In this article, I want to explain how to keep Markdown out of the client in a Next.js application, using the Unified/Remark ecosystem (genuinely not sure which name to use, this is all super confusing).

General idea

The idea is to only use Markdown in the getStaticProps functions from Next.js so this is done during a build (or in a Next serverless function if using Vercel’s incremental builds), but never in the client. I guess getServerSideProps would also be fine, but I think getStaticProps is more likely to be the common use case.

This would return an AST (Abstract Syntax Tree, which is to say a big nested object describing our content) resulting from parsing and processing the Markdown content, and the client would only be responsible for rendering that AST into React components.

I guess we could even render the Markdown as HTML directly in getStaticProps and return that to render with dangerouslySetInnerHtml but we’re not that kind of people. Security matters. And also, flexibility of rendering Markdown the way we want with our components instead of it rendering as plain HTML. Seriously folks, do not do that. 😅

export const getStaticProps = async () => {
  // Get the Markdown content from somewhere, like a CMS or whatnot. It doesn’t
  // matter for the sake of this article, really. It could also be read from a
  // file.
  const markdown = await getMarkdownContentFromSomewhere()
  const ast = parseMarkdown(markdown)

  return { props: { ast } }
}

const Page = props => {
  // This would usually have your layout and whatnot as well, but omitted here
  // for sake of simplicity of course.
  return <MarkdownRenderer ast={props.ast} />
}

export default Page

Parsing Markdown

We are going to use the Unified/Remark ecosystem. We need to install unified and remark-parse and that’s about it. Parsing the Markdown itself is relatively straightforward:

import unified from 'unified'
import markdown from 'remark-parse'

const parseMarkdown = content => unified().use(markdown).parse(content)

export default parseMarkdown

Now, what took me a long while to understand is why my extra plugins, like remark-prism or remark-slug, did not work like this. This is because the .parse(..) method from Unified does not process the AST with plugins. As the name suggests, it only parses the string of Markdown content into a tree.

If we want Unified to apply our plugins, we need Unified to go through what they call the “run” phase. Normally, this is done by using the .process(..) method instead of the .parse(..) method. Unfortunately, .process(..) not only parses Markdown and applies plugins, but also stringifies the AST into another format (like HTML via remark-html, or JSX with remark-react). And this is not what we want, as we want to preserve the AST, but after it’s been processed by plugins.

| ........................ process ........................... |
| .......... parse ... | ... run ... | ... stringify ..........|

          +--------+                     +----------+
Input ->- | Parser | ->- Syntax Tree ->- | Compiler | ->- Output
          +--------+          |          +----------+
                              X
                              |
                       +--------------+
                       | Transformers |
                       +--------------+

So what we need to do is run both the parsing and running phases, but not the stringifying phase. Unified does not provide a method to do these 2 out of 3 phases, but it provides individual methods for every phase, so we can do it manually:

import unified from 'unified'
import markdown from 'remark-parse'
import prism from 'remark-prism'

const parseMarkdown = content => {
  const engine = unified().use(markdown).use(prism)
  const ast = engine.parse(content)

  // Unified‘s *process* contains 3 distinct phases: parsing, running and
  // stringifying. We do not want to go through the stringifying phase, since we
  // want to preserve an AST, so we cannot call `.process(..)`. Calling
  // `.parse(..)` is not enough though as plugins (so Prism) are executed during
  // the running phase. So we need to manually call the run phase (synchronously
  // for simplicity).
  // See: https://github.com/unifiedjs/unified#description
  return engine.runSync(ast)
}

Tada! We parsed our Markdown into a syntax tree. And then we ran our plugins on that tree (done here synchronously for sake of simplicity, but you could use .run(..) to do it asynchronously). But we did not convert our tree into some other syntax like HTML or JSX. We can do that ourselves, in the render.

Rendering Markdown

Now that we have our cool tree at the ready, we can render it the way we intend to. Let’s have a MarkdownRenderer component that receives the tree as an ast prop, and renders it all with React components.

const getComponent = node => {
  switch (node.type) {
    case 'root':
      return React.Fragment

    case 'paragraph':
      return 'p'

    case 'emphasis':
      return 'em'

    case 'heading':
      return ({ children, depth = 2 }) => {
        const Heading = `h${depth}`
        return <Heading>{children}</Heading>
      }

    /* Handle all types here … */

    default:
      console.log('Unhandled node type', node)
      return React.Fragment
  }
}

const Node = node => {
  const Component = getComponent(node)
  const { children } = node

  return children ? (
    <Component {...node}>
      {children.map((child, index) => (
        <Node key={index} {...child} />
      ))}
    </Component>
  ) : (
    <Component {...node} />
  )
}

const MarkdownRenderer = props => <Node {...props.ast} />

export default React.memo(MarkdownRenderer)

Most of the logic of our renderer lives in the Node component. It finds out what to render based on the type key of the AST node (this is our getComponent method handling every type of node), and then renders it. If the node has children, it recursively goes into the children; otherwise it just renders the component as a final leaf.

Cleaning up the tree

Depending on which Remark plugins we use, we might encounter the following problem when trying to render our page:

Error: Error serializing .content[0].content.children[3].data.hChildren[0].data.hChildren[0].data.hChildren[0].data.hChildren[0].data.hName returned from getStaticProps in “/”. Reason: undefined cannot be serialized as JSON. Please use null or omit this value.

This happens because our AST contains keys whose values are undefined, which is not something that can be safely serialized as JSON. Next gives us the solution: either we omit the value entirely, or if we need it somewhat, replace it with null.

We’re not going to fix every path by hand though, so we need to walk that AST recursively and clean it up. I found out that this happened when using remark-prism, a plugin to enable syntax highlighting for code blocks. The plugin indeed adds a [data] object to nodes.

What we can do is walk our AST before returning it to clean up these nodes:

const cleanNode = node => {
  if (node.value === undefined) delete node.value
  if (node.tagName === undefined) delete node.tagName
  if (node.data) {
    delete node.data.hName
    delete node.data.hChildren
    delete node.data.hProperties
  }

  if (node.children) node.children.forEach(cleanNode)

  return node
}

const parseMarkdown = content => {
  const engine = unified().use(markdown).use(prism)
  const ast = engine.parse(content)
  const processedAst = engine.runSync(parsed)

  cleanNode(processedAst)

  return processedAst
}

One last thing we can do to ship less data to the client is remove the position object which exists on every single node and holds the original position in the Markdown string. It’s not a big object (it has only two keys), but when the tree gets big, it adds up quickly.

const cleanNode = node => {
  delete node.position

Wrapping up

That’s it folks! We managed to restrict Markdown handling to the build-/server-side code so we don’t ship a Markdown runtime to the browser, which is unnecessarily costly. We pass a tree of data to the client, which we can walk and convert into whatever React components we want.

I hope this helps. :)

The post Responsible Markdown in Next.js appeared first on CSS-Tricks. You can support CSS-Tricks by being an MVP Supporter.