API

This is the API for the md4c module, which provides the actual bindings for the MD4C C library.

Parsers and Renderers

class md4c.HTMLRenderer(parser_flags, renderer_flags, **kwargs)

A class to convert Markdown to HTML, implemented in C on top of the MD4C-HTML library. This is the fastest way to convert Markdown to HTML with PyMD4C.

Parameters:
  • parser_flags (int, optional) – Zero or more parser option flags OR’d together. See Option Flags.

  • renderer_flags (int, optional) – Zero or more HTML renderer option flags OR’d together. See Option Flags.

Option flags may also be specified in keyword-argument form for more readability. See Option Flags.

parse(markdown)

Parse a Markdown document and return the rendered HTML.

Parameters:

markdown (str or bytes) – The Markdown text to parse. If provided as a bytes, it must be UTF-8 encoded.

Returns:

The generated HTML

Return type:

str or bytes

Raises:

ParseError – if there is a runtime error while parsing

class md4c.GenericParser(parser_flags, **kwargs)

SAX-like Markdown parser, implemented in C on top of the bare MD4C parser.

Parameters:

parser_flags (int, optional) – Zero or more parser option flags OR’d together. See Option Flags.

Option flags may also be specified in keyword-argument form for more readability. See Option Flags.

parse(markdown, enter_block_callback, leave_block_callback, enter_span_callback, leave_span_callback, text_callback)

Parse a Markdown document using the provided callbacks for output

Callbacks must all accept two parameters. The first describes the type of block, inline, or text. The second is a dict with details about the block or inline or a string/bytes containing the text itself. See Callbacks for more information.

If a callback raises StopParsing, parsing will abort with no error. Any other exception will abort parsing and propagate back to the caller of this method.

Parameters:
  • markdown (str or bytes) – The Markdown text to parse. If provided as a bytes, it must be UTF-8 encoded.

  • enter_block_callback (function or callable) – Callback to be called when the parser enters a new block element

  • leave_block_callback (function or callable) – Callback to be called when the parser leaves a block element

  • enter_span_callback (function or callable) – Callback to be called when the parser enters a new inline element

  • leave_span_callback (function or callable) – Callback to be called when the parser leaves a inline element

  • text_callback (function or callable) – Callback to be called when the parser has text to add to the current block or inline element

Raises:

ParseError – if there is a runtime error while parsing

class md4c.ParserObject(*args, **kwargs)

Object-oriented wrapper for GenericParser. Rather than providing callbacks for enter_block, leave_block, enter_span, leave_span, and text to a parse function, this base class can be subclassed to provide implementations for them.

When this class’s parse() function is called, it uses its own enter_block(), leave_block(), enter_span(), leave_span(), and text() functions as callbacks.

Arguments to the constructor are passed through to GenericParser as-is to set parser options.

enter_block(block_type, details)

Called when the parser is entering a block element. This function should be overridden in subclasses. By default, it does nothing.

Parameters:
  • block_type – An instance of the md4c.BlockType enum representing the type of block being entered

  • details – A dict that contains extra information for certain types of blocks. For example, heading blocks provide 'level'. Keys are strings. Values are either integers, strings, lists of tuples, or None. For more information, see the documentation for GenericParser.

leave_block(block_type, details)

Called when the parser is leaving a block element. This function should be overridden in subclasses. By default, it does nothing.

Parameters:
  • block_type – An instance of the md4c.BlockType enum representing the type of block being left

  • details – A dict that contains extra information for certain types of blocks. For example, heading blocks provide 'level'. Keys are strings. Values are either integers, strings, lists of tuples, or None. For more information, see the documentation for GenericParser.

enter_span(span_type, details)

Called when the parser is entering an inline element. This function should be overridden in subclasses. By default, it does nothing.

Parameters:
  • span_type – An instance of the md4c.SpanType enum representing the type of inline being entered

  • details – A dict that contains extra information for certain types of inlines. For example, links provide 'href' and 'title'. Keys are strings. Values are either integers, strings, lists of tuples, or None. For more information, see the documentation for GenericParser.

leave_span(span_type, details)

Called when the parser is entering an inline element. This function should be overridden in subclasses. By default, it does nothing.

Parameters:
  • span_type – An instance of the md4c.SpanType enum representing the type of inline being entered

  • details – A dict that contains extra information for certain types of inlines. For example, links provide 'href' and 'title'. Keys are strings. Values are either integers, strings, lists of tuples, or None. For more information, see the documentation for GenericParser.

text(text_type, text)

Called when the parser has text to add to the current block or inline element. This function should be overridden in subclasses. By default, it does nothing.

Parameters:
  • text_type – An instance of the md4c.TextType enum representing the type of text element

  • text – A string or bytes containing the actual text to add

parse(markdown)

Parse a Markdown document using this object’s enter_block(), leave_block(), enter_span(), leave_span(), and text() functions as callbacks for GenericParser.

Parameters:

markdown (str or bytes) – The Markdown text to parse.

HTML Entity Helper

md4c.lookup_entity(entity)

Translate an HTML entity to its UTF-8 representation. Returns the unmodified input if it is not a valid entity.

Parameters:

entity (str) – The HTML entity, including ampersand and semicolon

Returns:

Corresponding UTF-8 character(s)

Return type:

str

Option Flags

PyMD4C’s parsers and renderers accept options in two forms: An OR’d set of flags or as keyword arguments that accept True. All parsers and renderers accept the parsing options, but renderer options are specific to the renderer.

Parser Option Flags

Basic option flags

md4c.MD_FLAG_COLLAPSEWHITESPACE

Keyword argument: collapse_whitespace

In normal text, collapse non-trivial whitespace into a single space.

md4c.MD_FLAG_PERMISSIVEATXHEADERS

Keyword argument: permissive_atx_headers

Do not require a space in ATX headers (e.g. ###Header)

md4c.MD_FLAG_PERMISSIVEURLAUTOLINKS

Keyword argument: permissive_url_autolinks

Convert URLs to links even without < and >.

md4c.MD_FLAG_PERMISSIVEEMAILAUTOLINKS

Keyword argument: permissive_email_autolinks

Convert email addresses to links even without <, >, and mailto:.

md4c.MD_FLAG_NOINDENTEDCODEBLOCKS

Keyword argument: no_indented_code_blocks

Disable indented code blocks (only allow fenced code blocks).

md4c.MD_FLAG_NOHTMLBLOCKS

Keyword argument: no_html_blocks

Disable raw HTML blocks.

md4c.MD_FLAG_NOHTMLSPANS

Keyword argument: no_html_spans

Disable raw HTML inlines.

md4c.MD_FLAG_TABLES

Keyword argument: tables

Enable tables extension.

md4c.MD_FLAG_STRIKETHROUGH

Keyword argument: strikethrough

Enable strikethrough extension.

md4c.MD_FLAG_PERMISSIVEWWWAUTOLINKS

Keyword argument: permissive_www_autolinks

Enable www autolinks (even without any scheme prefix, as long as they begin with www.).

md4c.MD_FLAG_TASKLISTS

Keyword argument: tasklists

Enable task lists extension.

md4c.MD_FLAG_LATEXMATHSPANS

Keyword argument: latex_math_spans

Enable $ and $$ containing LaTeX equations.

md4c.MD_FLAG_WIKILINKS

Keyword argument: wikilinks

Enable wiki links extension.

md4c.MD_FLAG_UNDERLINE

Keyword argument: underline

Enable underline extension (and disable _ for regular emphasis).

Combination option flags

These enable several related parser options, or options to match a particular dialect of Markdown as closely as possible.

md4c.MD_FLAG_PERMISSIVEAUTOLINKS

Keyword argument: permissive_autolinks

Enables all varieties of autolinks:

  • MD_FLAG_PERMISSIVEURLAUTOLINKS

  • MD_FLAG_PERMISSIVEEMAILAUTOLINKS

  • MD_FLAG_PERMISSIVEWWWAUTOLINKS

md4c.MD_FLAG_NOHTML

Keyword argument: no_html

Disables all raw HTML tags:

  • MD_FLAG_NOHTMLBLOCKS

  • MD_FLAG_NOHTMLSPANS

md4c.MD_DIALECT_GITHUB

Keyword argument: dialect_github

Parse GitHub-Flavored Markdown (GFM), which enables the following flags:

  • MD_FLAG_PERMISSIVEAUTOLINKS

  • MD_FLAG_TABLES

  • MD_FLAG_STRIKETHROUGH

  • MD_FLAG_TASKLISTS

HTML Renderer Option Flags

These options are only accepted by the HTMLRenderer.

md4c.MD_HTML_FLAG_DEBUG

Keyword argument: debug

For development use, send MD4C debug output to stderr.

md4c.MD_HTML_FLAG_VERBATIM_ENTITIES

Keyword argument: verbatim_entities

Do not replace HTML entities with the actual character (e.g. &copy; with “©”).

md4c.MD_HTML_FLAG_SKIP_UTF8_BOM

Keyword argument: skip_utf8_bom

Omit BOM from the start of UTF-8 input.

md4c.MD_HTML_FLAG_XHTML

Keyword argument: xhtml

https://www.sphinx-doc.org/en/master/usage/extensions/doctest.html

Generate XHTML instead of HTML.

Enums

The MD4C library uses various enums to provide data to callbacks. PyMD4C uses Enums to encapsulate these.

class md4c.BlockType(value)

Represents a type of Markdown block

DOC = 0

Document

QUOTE = 1

Block quote

UL = 2

Unordered list

OL = 3

Ordered list

LI = 4

List item

HR = 5

Horizontal rule

H = 6

Heading

CODE = 7

Code block

HTML = 8

Raw HTML block

P = 9

Paragraph

TABLE = 10

Table

THEAD = 11

Table header row

TBODY = 12

Table body

TR = 13

Table row

TH = 14

Table header cell

TD = 15

Table cell

class md4c.SpanType(value)

Represents a type of Markdown span/inline

EM = 0

Emphasis

STRONG = 1

Strong emphasis

A = 2

Link

IMG = 3

Image

CODE = 4

Inline code

DEL = 5

Strikethrough

LATEXMATH = 6

Inline math

LATEXMATH_DISPLAY = 7

Display math

Wiki link

U = 9

Underline

class md4c.TextType(value)

Represents a type of Markdown text

NORMAL = 0

Normal text

NULLCHAR = 1

Null character

BR = 2

Line break

SOFTBR = 3

Soft line break

ENTITY = 4

HTML entity

CODE = 5

Text inside a code block or inline code block

HTML = 6

Raw HTML (inside an HTML block or simply inline HTML)

LATEXMATH = 7

Text inside an equation

class md4c.Align(value)

Represents a table cell alignment

DEFAULT = 0

Default alignment

LEFT = 1

Left alignment

CENTER = 2

Centering

RIGHT = 3

Right alignment

Exceptions

class md4c.ParseError

Raised when an error occurs during parsing, such as running out of memory. Note that there is no such thing as invalid syntax in Markdown, so this really only signals some sort of system error.

class md4c.StopParsing

A callback function can raise this to stop parsing early for non-error reasons. GenericParser (and by extension, ParserObject) will catch it and abort quietly.