This document describes the general approach and design of redoc for developers interested in contributing.
Two-way R Markdown workflows are challenging because R Markdown and knitr workflows are lossy - the compiled document does not contain all of the information in the source. Also, we are limited by information that can be passed via
pandoc from markdown to final formats and in reverse.
To produced a Reversible Reproducible Document in Word (a “redoc”), the
redoc() format first pre-parses the source
.Rmd file. knitr doesn’t expose its parser to developers, so I’ve lifted most of the code for this parser from knitr and rmarkdown. The parser captures YAML headers, code chunks, and inline code, giving names to unnamed chunks and inline code sections and wrapping them in named
<span> tags with unique
id values and the class
"redoc". The contents of those sections are stored in a file called
redoc() then knits the
.Rmd file. Code output is wrapped within the same
<div> tags as the original chunks.
When the knitted document is converted to a
.docx by pandoc,
redoc passes it through a series of pandoc lua filters (found in
inst/lua-filters). These do three things:
<div>tags of class
redocto hidden custom styles with names corresponding to their unique IDs so that they are retained in the Word document.
post_processor argument and functions from officer, the original
.Rmd and the
codelist.yml file are stored in the Word document. As
.docx files are just ZIP archives, this is straightforward, except that some metadata must be added to ensure Word preserves these files when editing.
If the option
diagnostics=TRUE is set, information about the R session and current software versions is also stored in the Word document for later debugging.
highlight_output=TRUE is set, the post-processor also modifies all Word document styles to color the
dedoc() function is run, it extracts the
*.codelist.yml file from the
Then pandoc is used to convert the
docx back to markdown. A custom lua filter converts any track-changes text to Critic Markup, and another lua filter replaces any elements with the custom
redoc styles with placeholders of the form
dedoc() then uses the data in the
*.chunks.yml file to replace these placeholders with original chunk (or inline code). In the event that chunk output has been deleted or modified beyond recognition, redoc tries to be smart about its placement, placing it near its original location. Depending on the policies selected via
inline_missing arguments, the restored code may be wrapped in an HTML comment or not restored at all.
The simplest form of extension is defining additional parts of the document to be wrapped and stored in the
*.codelist.yml file. These are defined in as a list of functions in the
wrappers argument of
redoc(). Each function captures a type of code, and by default these are R chunks and inline code, HTML comments, YAML blocks, some LaTeX, pandoc-style citations, and pandoc raw spans and blocks.
You can capture other types of code by adding additional functions, which are detailed in the
?wrappers documentation. If the code is simple enough to be captured with a regular expression, these functions can be generated with with
When building additional formats based on
redoc(), it is important to use the
base_format option of
rmarkdown::output_format(). rmarkdown will then merge the
post_processor functions of
redoc() and your format so that
redoc()'s runs after your custom post-processor.