vignettes/redoc-package-design.Rmd
redoc-package-design.Rmd
This document describes the general approach and design of redoc for developers interested in contributing.
Two-way R Markdown workflows are challenging because R Markdown and knitr workflows are lossy - the compiled document does not contain all of the information in the source. Also, we are limited by information that can be passed via pandoc
from markdown to final formats and in reverse.
To produced a Reversible Reproducible Document in Word (a “redoc”), the redoc()
format first pre-parses the source .Rmd
file. knitr doesn’t expose its parser to developers, so I’ve lifted most of the code for this parser from knitr and rmarkdown. The parser captures YAML headers, code chunks, and inline code, giving names to unnamed chunks and inline code sections and wrapping them in named <div>
and <span>
tags with unique id
values and the class "redoc"
. The contents of those sections are stored in a file called filename.codelist.yml
.
redoc()
then knits the .Rmd
file. Code output is wrapped within the same <span>
and <div>
tags as the original chunks.
When the knitted document is converted to a .docx
by pandoc, redoc
passes it through a series of pandoc lua filters (found in inst/lua-filters
). These do three things:
<span>
and <div>
tags of class redoc
to hidden custom styles with names corresponding to their unique IDs so that they are retained in the Word document.Then using rmarkdown::output_format()
’s post_processor
argument and functions from officer, the original .Rmd
and the codelist.yml
file are stored in the Word document. As .docx
files are just ZIP archives, this is straightforward, except that some metadata must be added to ensure Word preserves these files when editing.
If the option diagnostics=TRUE
is set, information about the R session and current software versions is also stored in the Word document for later debugging.
If highlight_output=TRUE
is set, the post-processor also modifies all Word document styles to color the redoc
-class sections.
When the dedoc()
function is run, it extracts the *.codelist.yml
file from the .docx
file.
Then pandoc is used to convert the docx
back to markdown. A custom lua filter converts any track-changes text to Critic Markup, and another lua filter replaces any elements with the custom redoc
styles with placeholders of the form [[chunk-id]]
. dedoc()
then uses the data in the *.chunks.yml
file to replace these placeholders with original chunk (or inline code). In the event that chunk output has been deleted or modified beyond recognition, redoc tries to be smart about its placement, placing it near its original location. Depending on the policies selected via dedoc()
’s block_missing
or inline_missing
arguments, the restored code may be wrapped in an HTML comment or not restored at all.
redoc()
is based on rmarkdown::word_document()
, and can similarly be extended.
The simplest form of extension is defining additional parts of the document to be wrapped and stored in the *.codelist.yml
file. These are defined in as a list of functions in the wrappers
argument of redoc()
. Each function captures a type of code, and by default these are R chunks and inline code, HTML comments, YAML blocks, some LaTeX, pandoc-style citations, and pandoc raw spans and blocks.
You can capture other types of code by adding additional functions, which are detailed in the ?wrappers
documentation. If the code is simple enough to be captured with a regular expression, these functions can be generated with with make_wrapper()
.
When building additional formats based on redoc()
, it is important to use the base_format
option of rmarkdown::output_format()
. rmarkdown will then merge the post_processor
functions of redoc()
and your format so that redoc()'s
runs after your custom post-processor.
Future versions of redoc()
will include a reversible version of officedown::rdocx_document()
.