API¶
This is the API for the md4c
module, which provides the actual bindings
for the MD4C C library.
Parsers and Renderers¶
- class md4c.HTMLRenderer(parser_flags, renderer_flags, **kwargs)¶
A class to convert Markdown to HTML, implemented in C on top of the MD4C-HTML library. This is the fastest way to convert Markdown to HTML with PyMD4C.
- Parameters:
parser_flags (int, optional) – Zero or more parser option flags OR’d together. See Option Flags.
renderer_flags (int, optional) – Zero or more HTML renderer option flags OR’d together. See Option Flags.
Option flags may also be specified in keyword-argument form for more readability. See Option Flags.
- parse(markdown)¶
Parse a Markdown document and return the rendered HTML.
- Parameters:
markdown (str or bytes) – The Markdown text to parse. If provided as a
bytes
, it must be UTF-8 encoded.- Returns:
The generated HTML
- Return type:
str or bytes
- Raises:
ParseError – if there is a runtime error while parsing
- class md4c.GenericParser(parser_flags, **kwargs)¶
SAX-like Markdown parser, implemented in C on top of the bare MD4C parser.
- Parameters:
parser_flags (int, optional) – Zero or more parser option flags OR’d together. See Option Flags.
Option flags may also be specified in keyword-argument form for more readability. See Option Flags.
- parse(markdown, enter_block_callback, leave_block_callback, enter_span_callback, leave_span_callback, text_callback)¶
Parse a Markdown document using the provided callbacks for output
Callbacks must all accept two parameters. The first describes the type of block, inline, or text. The second is a dict with details about the block or inline or a string/bytes containing the text itself. See Callbacks for more information.
If a callback raises
StopParsing
, parsing will abort with no error. Any other exception will abort parsing and propagate back to the caller of this method.- Parameters:
markdown (str or bytes) – The Markdown text to parse. If provided as a
bytes
, it must be UTF-8 encoded.enter_block_callback (function or callable) – Callback to be called when the parser enters a new block element
leave_block_callback (function or callable) – Callback to be called when the parser leaves a block element
enter_span_callback (function or callable) – Callback to be called when the parser enters a new inline element
leave_span_callback (function or callable) – Callback to be called when the parser leaves a inline element
text_callback (function or callable) – Callback to be called when the parser has text to add to the current block or inline element
- Raises:
ParseError – if there is a runtime error while parsing
- class md4c.ParserObject(*args, **kwargs)¶
Object-oriented wrapper for
GenericParser
. Rather than providing callbacks forenter_block
,leave_block
,enter_span
,leave_span
, andtext
to aparse
function, this base class can be subclassed to provide implementations for them.When this class’s
parse()
function is called, it uses its ownenter_block()
,leave_block()
,enter_span()
,leave_span()
, andtext()
functions as callbacks.Arguments to the constructor are passed through to
GenericParser
as-is to set parser options.- enter_block(block_type, details)¶
Called when the parser is entering a block element. This function should be overridden in subclasses. By default, it does nothing.
- Parameters:
block_type – An instance of the
md4c.BlockType
enum representing the type of block being entereddetails – A dict that contains extra information for certain types of blocks. For example, heading blocks provide
'level'
. Keys are strings. Values are either integers, strings, lists of tuples, orNone
. For more information, see the documentation forGenericParser
.
- leave_block(block_type, details)¶
Called when the parser is leaving a block element. This function should be overridden in subclasses. By default, it does nothing.
- Parameters:
block_type – An instance of the
md4c.BlockType
enum representing the type of block being leftdetails – A dict that contains extra information for certain types of blocks. For example, heading blocks provide
'level'
. Keys are strings. Values are either integers, strings, lists of tuples, orNone
. For more information, see the documentation forGenericParser
.
- enter_span(span_type, details)¶
Called when the parser is entering an inline element. This function should be overridden in subclasses. By default, it does nothing.
- Parameters:
span_type – An instance of the
md4c.SpanType
enum representing the type of inline being entereddetails – A dict that contains extra information for certain types of inlines. For example, links provide
'href'
and'title'
. Keys are strings. Values are either integers, strings, lists of tuples, orNone
. For more information, see the documentation forGenericParser
.
- leave_span(span_type, details)¶
Called when the parser is entering an inline element. This function should be overridden in subclasses. By default, it does nothing.
- Parameters:
span_type – An instance of the
md4c.SpanType
enum representing the type of inline being entereddetails – A dict that contains extra information for certain types of inlines. For example, links provide
'href'
and'title'
. Keys are strings. Values are either integers, strings, lists of tuples, orNone
. For more information, see the documentation forGenericParser
.
- text(text_type, text)¶
Called when the parser has text to add to the current block or inline element. This function should be overridden in subclasses. By default, it does nothing.
- Parameters:
text_type – An instance of the
md4c.TextType
enum representing the type of text elementtext – A string or bytes containing the actual text to add
- parse(markdown)¶
Parse a Markdown document using this object’s
enter_block()
,leave_block()
,enter_span()
,leave_span()
, andtext()
functions as callbacks forGenericParser
.- Parameters:
markdown (str or bytes) – The Markdown text to parse.
HTML Entity Helper¶
- md4c.lookup_entity(entity)¶
Translate an HTML entity to its UTF-8 representation. Returns the unmodified input if it is not a valid entity.
- Parameters:
entity (str) – The HTML entity, including ampersand and semicolon
- Returns:
Corresponding UTF-8 character(s)
- Return type:
str
Option Flags¶
PyMD4C’s parsers and renderers accept options in two forms: An OR’d set of
flags or as keyword arguments that accept True
. All parsers and renderers
accept the parsing options, but renderer options are specific to the renderer.
Parser Option Flags¶
Basic option flags¶
md4c.MD_FLAG_COLLAPSEWHITESPACE
Keyword argument:
collapse_whitespace
In normal text, collapse non-trivial whitespace into a single space.
md4c.MD_FLAG_PERMISSIVEATXHEADERS
Keyword argument:
permissive_atx_headers
Do not require a space in ATX headers (e.g.
###Header
)md4c.MD_FLAG_PERMISSIVEURLAUTOLINKS
Keyword argument:
permissive_url_autolinks
Convert URLs to links even without
<
and>
.md4c.MD_FLAG_PERMISSIVEEMAILAUTOLINKS
Keyword argument:
permissive_email_autolinks
Convert email addresses to links even without
<
,>
, andmailto:
.md4c.MD_FLAG_NOINDENTEDCODEBLOCKS
Keyword argument:
no_indented_code_blocks
Disable indented code blocks (only allow fenced code blocks).
md4c.MD_FLAG_NOHTMLBLOCKS
Keyword argument:
no_html_blocks
Disable raw HTML blocks.
md4c.MD_FLAG_NOHTMLSPANS
Keyword argument:
no_html_spans
Disable raw HTML inlines.
md4c.MD_FLAG_TABLES
Keyword argument:
tables
Enable tables extension.
md4c.MD_FLAG_STRIKETHROUGH
Keyword argument:
strikethrough
Enable strikethrough extension.
md4c.MD_FLAG_PERMISSIVEWWWAUTOLINKS
Keyword argument:
permissive_www_autolinks
Enable www autolinks (even without any scheme prefix, as long as they begin with
www.
).md4c.MD_FLAG_TASKLISTS
Keyword argument:
tasklists
Enable task lists extension.
md4c.MD_FLAG_LATEXMATHSPANS
Keyword argument:
latex_math_spans
Enable
$
and$$
containing LaTeX equations.md4c.MD_FLAG_WIKILINKS
Keyword argument:
wikilinks
Enable wiki links extension.
md4c.MD_FLAG_UNDERLINE
Keyword argument:
underline
Enable underline extension (and disable
_
for regular emphasis).
Combination option flags¶
These enable several related parser options, or options to match a particular dialect of Markdown as closely as possible.
md4c.MD_FLAG_PERMISSIVEAUTOLINKS
Keyword argument:
permissive_autolinks
Enables all varieties of autolinks:
MD_FLAG_PERMISSIVEURLAUTOLINKS
MD_FLAG_PERMISSIVEEMAILAUTOLINKS
MD_FLAG_PERMISSIVEWWWAUTOLINKS
md4c.MD_FLAG_NOHTML
Keyword argument:
no_html
Disables all raw HTML tags:
MD_FLAG_NOHTMLBLOCKS
MD_FLAG_NOHTMLSPANS
md4c.MD_DIALECT_GITHUB
Keyword argument:
dialect_github
Parse GitHub-Flavored Markdown (GFM), which enables the following flags:
MD_FLAG_PERMISSIVEAUTOLINKS
MD_FLAG_TABLES
MD_FLAG_STRIKETHROUGH
MD_FLAG_TASKLISTS
HTML Renderer Option Flags¶
These options are only accepted by the HTMLRenderer
.
md4c.MD_HTML_FLAG_DEBUG
Keyword argument:
debug
For development use, send MD4C debug output to stderr.
md4c.MD_HTML_FLAG_VERBATIM_ENTITIES
Keyword argument:
verbatim_entities
Do not replace HTML entities with the actual character (e.g.
©
with “©”).md4c.MD_HTML_FLAG_SKIP_UTF8_BOM
Keyword argument:
skip_utf8_bom
Omit BOM from the start of UTF-8 input.
md4c.MD_HTML_FLAG_XHTML
Keyword argument:
xhtml
- https://www.sphinx-doc.org/en/master/usage/extensions/doctest.html
Generate XHTML instead of HTML.
Enums¶
The MD4C library uses various enums to provide data to callbacks. PyMD4C uses
Enum
s to encapsulate these.
- class md4c.BlockType(value)¶
Represents a type of Markdown block
- DOC = 0¶
Document
- QUOTE = 1¶
Block quote
- UL = 2¶
Unordered list
- OL = 3¶
Ordered list
- LI = 4¶
List item
- HR = 5¶
Horizontal rule
- H = 6¶
Heading
- CODE = 7¶
Code block
- HTML = 8¶
Raw HTML block
- P = 9¶
Paragraph
- TABLE = 10¶
Table
- THEAD = 11¶
Table header row
- TBODY = 12¶
Table body
- TR = 13¶
Table row
- TH = 14¶
Table header cell
- TD = 15¶
Table cell
- class md4c.SpanType(value)¶
Represents a type of Markdown span/inline
- EM = 0¶
Emphasis
- STRONG = 1¶
Strong emphasis
- A = 2¶
Link
- IMG = 3¶
Image
- CODE = 4¶
Inline code
- DEL = 5¶
Strikethrough
- LATEXMATH = 6¶
Inline math
- LATEXMATH_DISPLAY = 7¶
Display math
- WIKILINK = 8¶
Wiki link
- U = 9¶
Underline
- class md4c.TextType(value)¶
Represents a type of Markdown text
- NORMAL = 0¶
Normal text
- NULLCHAR = 1¶
Null character
- BR = 2¶
Line break
- SOFTBR = 3¶
Soft line break
- ENTITY = 4¶
HTML entity
- CODE = 5¶
Text inside a code block or inline code block
- HTML = 6¶
Raw HTML (inside an HTML block or simply inline HTML)
- LATEXMATH = 7¶
Text inside an equation
Exceptions¶
- class md4c.ParseError¶
Raised when an error occurs during parsing, such as running out of memory. Note that there is no such thing as invalid syntax in Markdown, so this really only signals some sort of system error.
- class md4c.StopParsing¶
A callback function can raise this to stop parsing early for non-error reasons.
GenericParser
(and by extension,ParserObject
) will catch it and abort quietly.