+ - 0:00:00
Notes for current slide
Notes for next slide

Reproducible reports with Markdown, knitr, BibTex, MathJax

Mikhail Dozmorov

Virginia Commonwealth University

09-06-2023

1 / 60

Literate programming

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do.

  • Donald E. Knuth, Literate Programming, 1984
2 / 60

Writing reports

  • HTML - HyperText Markup Language, used to create web pages. Developed in 1993

  • LaTeX - a typesetting system for production of technical/scientific documentation, PDF output. Developed in 1994

  • Sweave - a tool that allows embedding of the R code in LaTeX documents, PDF output. Developed in 2002

  • Markdown - a lightweight markup language for plain text formatting syntax. Easily converted to HTML, PDF, Word, and other formats

3 / 60

HTML example

  • HTML files have .htm or .html extensions
  • Pairs of tags define content/formatting
    • <h1> Header level 1 </h1>
    • <a href="http://www..."> Link </a>
    • <p> Paragraph </p>
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
</head>
<body>
<h1>Markdown example</h1>
<p>This is a simple example of a Markdown document.</p>
You can emphasize code with <strong>bold</strong> or <em>italics</em>, or <code>monospace</code> font.
</body>
</html>
4 / 60

LaTeX example

  • LaTeX files usually have a .tex extension
  • LaTeX commands define appearance of text, and other formatting structures
\documentclass{article}
\usepackage{graphicx}

\begin{document}

\title{Introduction to \LaTeX{}}

\author{Author's Name}

\maketitle

\begin{abstract}

This is abstract text: This simple document shows very basic features of
\LaTeX{}.
\end{abstract}

\section{Introduction}

http://www.electronics.oulu.fi/latex/examples/example_1/
5 / 60

Sweave example

  • Sweave files typically have an .Rnw extension
  • LaTeX for text, <<chunk_name>>= <code> @ syntax outlines code blocks
\documentclass{article}
\usepackage{amsmath}
\DeclareMathOperator{\logit}{logit}
% \VignetteIndexEntry{Logit-Normal GLMM Examples}
\begin{document}
First we attach the data.
<<gapminder>>=
library(dslabs)
data(gapminder)
attach(gapminder)
@
6 / 60

Markdown

  • Lightweight markup language.
  • Uses plain text.
  • Simple, human-readable syntax.
  • Used for formatting documents, including slides.
  • Popular among programmers and writers.
  • Easily converted to presentation format.
  • Simplifies content creation and editing.
7 / 60

R Markdown

  • R Markdown is a Markdown markup language extended with the ability to add R code.
  • The goal is still to separate form and content, but also to prioritize human-readability, even at the cost of fancy features
  • You can learn Markdown in about 5 minutes. If you can write an email, you can write Markdown. See "Help -> Markdown Quick Reference" (also, "Cheatsheets")
  • Can be converted in various document types, including HTML, PDF, MS Word, Beamer, HTML5 slides, Tufte-style handouts, books, dashboards, shiny applications, scientific articles, websites, and more.
  • Fully supported in RStudio. Or, use a desktop Markdown editor like MarkdownPad (Windows) or MacDown (Mac).

http://bioconnector.github.io/markdown - Markdown Reference

http://markdownpad.com/ - a full-featured Markdown editor for Windows

http://macdown.uranusjr.com/ - the open source Markdown editor for macOS

8 / 60

Basic Markdown Syntax

Regardless of your chosen output format, some basic syntax will be useful:

  • Section headers

  • Text emphasis

  • Lists

  • R code

9 / 60

Section Headers

To set up different sized header text in your document, use # for Header 1, ## for Header 2, and ### for Header 3.

# Table of content
## Chapter 1
### Introduction

Renders as

Table of content

Chapter 1

Introduction

10 / 60

Text emphasis

  • Italicize text via *Italicize* or _Italicize_

  • Bold text via **Bold** or __Bold__

11 / 60

Unordered Lists

This code

- Item 1
- Item 2
+ Item 2a
+ Item 2b

Renders these bullets (sub-lists need 1 tab or 4 spaces!)

  • Item 1
  • Item 2
    • Item 2a
    • Item 2b
12 / 60

Ordered Lists

This code

1. Item 1
2. Item 2
+ Item 2a
+ Item 2b

Renders this list (be advised - the bullets may not look great in all templates)

  1. Item 1
  2. Item 2
    • Item 2a
    • Item 2b
13 / 60

Inline R Code

  • To use R within a line, use the syntax, wrapped in single forward ticks (can't be displayed)
## `r dim(mtx)`
  • This can be useful to refer to estimates, confidence intervals, p-values, etc. in the body of an article/homework without worrying about copy errors.
14 / 60

Markdown syntax

superscript^2^
~~strikethrough~~
Links
http://example.com
[linked phrase](http://example.com)
Images
![](http://example.com/logo.png)
![optional caption text](figures/img.png)
15 / 60

Markdown syntax

Blockquotes
A friend once said:
> It's always better to give
> than to receive.
Horizontal Rule / Page Break
*****
------
Tables
First Header | Second Header
------------- | -------------
Content Cell | Content Cell
Content Cell | Content Cell

https://www.tablesgenerator.com/markdown_tables

16 / 60

Large code chunks

Marked with triple backticks

```{r optionalChunkName, echo=TRUE, results='hide'}
# R code here
```
  • Command+Option+I (Ctrl+Alt+I on Windows) inserts R code chunk
  • Insert button
17 / 60

Creating R markdown document

  • Regular text file with .Rmd extension

  • Create manually, or use RStudio

  • "File -> New File" menu provides options for creating various R documents

18 / 60

R Markdown formats from RStudio

  • Documents: HTML, PDF, Word, ODT, RTF

  • Presentations: ioslides, beamer, reveal.js, Slidy, Xaringan, PowerPoint

  • Journals: template packages for major journals/publishers

  • Web sites: bookdown, blogdown, pkgdown, flexdashboard, Shiny, GitHub document

More

https://rmarkdown.rstudio.com/formats.html

19 / 60

YAML header (think settings)

  • YAML - YAML Ain't Markup Language

  • YAML is a simple text-based format for specifying data, like JSON

---
title: "Untitled"
author: ”Your Name"
date: ”Current date"
output: html_document
---

output is the critical part - it defines the output format. Can be pdf_document or word_document, other formats available

20 / 60

YAML header for a PDF presentation

---
title: "Reproducible reports with Markdown, knitr"
author: "Mikhail Dozmorov"
date: "2023-09-06"
output:
beamer_presentation:
# colortheme: seahorse
colortheme: dolphin
fig_caption: no
fig_height: 6
fig_width: 7
fonttheme: structurebold
# theme: boxes
theme: AnnArbor
---
21 / 60

YAML header for a Word document

---
bibliography: [3D_refs.bib,brain.bib]
csl: styles.ref/genomebiology.csl
output:
word_document:
reference_docx: styles.doc/NIH_grant_style.docx
pdf_document: default
html_document: default
---
22 / 60

Modifying the behavior of R code chunks

Chunk options, comma-separated

  • echo=FALSE - hides the code, but not the results/output. Default: TRUE

  • eval=FALSE - disables code execution. Default: TRUE

  • cache=TRUE - turn on caching of calculation-intensive chunk. Default: FALSE

  • fig.width=##, fig.height=## - customize the size of a figure generated by the code chunk

  • include: (TRUE by default) if this is set to FALSE the R code is still evaluated, but neither the code nor the results are returned in the output document

23 / 60

Modifying the behavior of R code chunks

  • results="hide" - hides the results/output.
    • markup (the default) takes the result of the R evaluation and turns it into markdown that is rendered as usual,
    • hold - will hold all the output pieces and push them to the end of a chunk. Useful if you're running commands that result in lots of little pieces of output in the same chunk
    • hide will hide results
    • asis writes the raw results from R directly into the document. Only really useful for tables

The full list of options: http://yihui.name/knitr/options/

24 / 60

Global chunk options

  • Some options you would like to set globally, instead of typing them for each chunk
knitr::opts_chunk$set(fig.width=12, fig.height=8, fig.path='img/’, cache.path='cache/', cache=FALSE, echo=FALSE, warning=FALSE, message=FALSE)
  • warning=FALSE and message=FALSE suppress any R warnings or messages from being included in the final document

  • fig.path='img/' - the figure files get placed in the img subdirectory. (Default: not saved at all)

25 / 60

Caching

  • The cache= option is automatically set to FALSE. That is, every time you render the Rmd, all the R code is run again from scratch

  • If you use cache=TRUE, for this chunk, knitr will save the results of the evaluation into a directory that you specify, e.g., cache.path='cache/'. When you re-render the document, knitr will first check if there are previously cached results under the cache directory before really evaluating the chunk

    • if cached results exist and this code chunk has not been changed since last run (use MD5 sum to verify), the cached results will be (lazy-) loaded, otherwise new cache will be built
    • if a cached chunk depends on other chunks (see the dependson option) and any one of these chunks has changed, this chunk must be forcibly updated (old cache will be purged)

Documentation for caching

26 / 60

An example of R Markdown document

27 / 60

KnitR

  • KnitR - Elegant, flexible, and fast dynamic report generation written in R Markdown. PDF, HTML, DOCX output. Developed in 2012 by Yihui Xie
install.packages('knitr', dependencies = TRUE)

To render a pdf from R Markdown, you need to have a version of TeX installed on your computer. Like R, TeX is open source software. RStudio recommends the following installations by system:

For Macs: MacTeX For PCs: MiKTeX Links for installing both can be found at http://www.latex-project.org/ftp.html

https://github.com/yihui/knitr, http://yihui.name/knitr/

28 / 60

Displaying data as tables

  • KnitR has built-in function to display a table
data(mtcars)
knitr::kable(head(mtcars))
  • pander package allows more customization
pander::pander(head(mtcars))
29 / 60

Displaying data as tables

  • xtable package has even more options
xtable::xtable(head(mtcars))
  • DT package, an R interface to the DataTables library
DT::datatable(mtcars)
30 / 60

Including figures

  • Plots may be generated by R code and displayed in the output document
  • Existing image files like *.jpg, *.png, may be inserted like:
![](http://example.com/logo.png)
![optional caption text](figures/img.png)
  • Alternatively, use knitr capabilities:
{r, out.width = '300px', echo=FALSE}
knitr::include_graphics('img/bandThree2.png')
  • For PDF output, use LaTeX syntax:
\begin{center}
\includegraphics[height=170px]{img/bioinfo3.png}
\end{center}
31 / 60

Customizing Figures

The fig.cap option allows you to specify the caption for the figure generated by a given chunk:

```{r caption, fig.cap="I am the caption"}
plot(pressure)
```

The fig.height and fig.width options let you specify the dimensions of your plots:

```{r caption, fig.height = 4, fig.width = 8}
plot(pressure)
```
32 / 60

Creating the final report

  • Markdown documents (*.md or *.Rmd) can be converted to HTML using markdown::markdownToHTML('markdown_example.md', 'markdown_example.html')

  • Another option is to use rmarkdown::render('markdown_example.md’). At the backend it uses pandoc command line tool, installed with Rstudio

  • Rstudio - one button. knit2html(), knit2pdf() functions

Note: KnitR compiles the document in an R environment separate from yours (think Makefile). Do not use ./Rprofile file - it loads into your environment only.

http://pandoc.org/

33 / 60

Things to include in your final report

set.seed(12345) - initialize random number generator

Include session_info() at the end - outputs all packages/versions used

```{r sessionInfo}
diagnostics <- devtools::session_info()
platform <- data.frame(diagnostics$platform %>% unlist, stringsAsFactors = FALSE)
colnames(platform) <- c('description')
pander(platform)
packages <- as.data.frame(diagnostics$packages)
pander(packages[ packages$`*` == '*', ])
```

Alternatively {r sessionInfo} xfun::session_info()

34 / 60

Making default RMarkdown document on your own

Altering the default Rmarkdown file each time you write a homework, report, or article would be a pain.

  • Fortunately, you don't have to!
35 / 60

Templates

You can create your own templates which set-up packages, fonts, default chunk options, etc.

  • https://bookdown.org/yihui/rmarkdown/document-templates.html

  • Some packages (e.g rticles) provide templates that meet journal requirements or provide other.

    • Journal of Statistical Software
    • The R Journal
    • Association for Computing Machinery
    • ACS publications (Journal of the American Chemical Society, Environmental Science & Technology)
    • Elsevier publications

https://github.com/mdozmorov/MDtemplate

36 / 60

Parameters

You may also set parameters in your document's YAML header

---
output: html_document
params:
date: "2017-11-02"
---

or pass new values with the render function.

  • This creates a read-only list params containing the values declared

  • e.g. params$date returns 2017-11-02

37 / 60

Bibliography

38 / 60

BibTex

@article{Berkum:2010aa,
Abstract = {The three-dimensional folding of chromosomes ...},
Author = {van Berkum, Nynke L and Lieberman-Aiden, Erez and Williams, Louise and Imakaev, Maxim and Gnirke, Andreas and Mirny, Leonid A and Dekker, Job and Lander, Eric S},
Date-Added = {2016-10-08 14:26:23 +0000},
Date-Modified = {2016-10-08 14:26:23 +0000},
Doi = {10.3791/1869},
Journal = {J Vis Exp},
Journal-Full = {Journal of visualized experiments : JoVE},
Mesh = {Chromosome Positioning; Chromosomes; DNA; Genomics; Nucleic Acid Conformation},
Number = {39},
Pmc = {PMC3149993},
Pmid = {20461051},
Pst = {epublish},
Title = {Hi-C: a method to study the three-dimensional architecture of genomes},
Year = {2010},
Bdsk-Url-1 = {http://dx.doi.org/10.3791/1869}}
39 / 60

BibTex managers

Save references in .bib text file

40 / 60

Convert anything to BibTex

41 / 60

BibTex and RMarkdown

Add to YAML header

bibliography: 3D_refs.bib

Insert into RMarkdown as

The 3D structure of the human genome has proven to be highly organized
[@Dixon:2012aa; @Rao:2014aa]. This organization starts from distinct
chromosome territories [@Cremer:2010aa], following by topologically associated
domains (TADs) [@Dixon:2012aa; @Jackson:1998aa; @Ma:1998aa; @Nora:2012aa; @Sexton:2012aa],
smaller "sub-TADs" [@Phillips-Cremins:2013aa; @Rao:2014aa] and, on the
most local level, individual regions of interacting chromatin [@Rao:2014aa; @Dowen:2014aa; @Ji:2016aa].
42 / 60

Format your BibTex references

Add to YAML header

csl: genomebiology.csl

Get more styles at https://www.zotero.org/styles

43 / 60

Format your Word output

  • If knitting into Word output, you may want to have fonts, headers, margins other than default.

  • Create a Word document with the desired formatting. Change font styles by right-clicking on the font (e.g., "Normal") and select "Modify"

  • Include it into YAML header

output:
word_document:
reference_docx: styles.doc/NIH_grant_style.docx
44 / 60

Math formulas

45 / 60

Markdown Code: MathJax

  • Markdown supports MathJax JavaScript engine to render mathematical equations and formulas
  • Inline equations - use single "dollar sign" $ to specify MathJax coding
$s^{2} = \frac{\sum(x-\bar{x})^2}{n-1}$

s2=(xx¯)2n1

46 / 60

Centering you equations

Insertion of two dollar signs $$ centers your equations. Other examples, off set and centered - notice double dollar signs:

$ \sum_{i=0}^n i^2 = \frac{(n^2+n)(2n+1)}{6} $
$$ \sum_{i=0}^n i^2 = \frac{(n^2+n)(2n+1)}{6} $$

Inline equation i=0ni2=(n2+n)(2n+1)6 on the same line. Or, self-standing equation on a separate line i=0ni2=(n2+n)(2n+1)6

47 / 60

More Interesting Codes:

Greek Letters

$\alpha$ $\beta$ $\gamma$ $\chi$
$\Delta$ $\Sigma$ $\Omega$

Greek Letters: (not all capitalized Greek letters available)

α β γ χ

Δ Σ Ω

superscripts (^) and subscripts (_)

xi2 log2x

48 / 60

Grouping with Brackets

Use brackets {...} to delimit a formula containing a superscript or subscript. Notice the difference the grouping makes:

${x^y}^z$
$x^{y^z}$
$x_i^2$
$x_{i^2}$

xyz xyz xi2 xi2

49 / 60

Scaling:

Add the scaling code \left(...\right) to make automatic size adjustments

$(\frac{\sqrt x}{y^3})$
$\left(\frac{\sqrt x}{y^3}\right)$

(xy3) (xy3)

50 / 60

Sums and Integrals

Subscript (_) designates the lower limit; superscript (^) designates upper limit:

$\sum_1^n$ $\sum_{i=0}^\infty i^2$

1n i=0i2

Other notable symbols:

- $\prod$ $\infty$
- $\bigcup$ $\bigcap$
- $\int$ $\iint$

51 / 60

Radical Signs

Use 'sqrt' code to adjust the size of its argument. Note the change in size of the square root function based on the code

1. $sqrt{x^3}$
2. $sqrt[3]{\frac xy}$
and for complicated expressions use brackets
3. ${...}^{1/2}$
  1. x3
  2. xy3
  3. ...1/2
52 / 60

You can also change fonts!

$\mathbb or $Bbb for 'Blackboard bold"
$\mathbf for boldface
$\mathtt for 'typewritter' font
$\mathrm for roman font
$\mathsf for sans-serif
$\mathcal for 'caligraphy'
$\mathscr for script letter:
$\mathfrak for "Fraktur" (old German style)

ABCDEFG ABCDEFG ABCDEFG ABCDEFG ABCDEFG ABCDEFG

53 / 60

You can also change fonts!

Some special functions such as "lim" "sin" "max" and "ln" are normally set in roman font instead of italic. Use \lim, \sin to make these (roman):

$\sin x$ (roman) vs $sin x$ (italics)

sinx (roman) vs sinx (italics)

54 / 60

And, add curly brackets

$$\begin{cases}
\widehat{IF_{1D}} = IF_{1D} - f(D)/2 \\
\widehat{IF_{2D}} = IF_{2D} + f(D)/2
\end{cases} \ (1)$$

{IF1D^=IF1Df(D)/2IF2D^=IF2D+f(D)/2 (1)

55 / 60

RStudio bonus

Inline preview of forumlas and images in an RMarkdown document

Preview online https://www.codecogs.com/latex/eqneditor.php

56 / 60

LaTeX and Markdown

  • Rendering Markdown as a pdf requires a LaTeX installation

  • You will additionally need to install Pandoc from http://pandoc.org/

  • With LaTeX, many customizations are possible

57 / 60

LaTeX Customization, 1

  • You can include additional LaTeX commands and content

  • Use the includes option as follows to add your favorite style files for the preamble, title/abstract, bibliography, etc...

---
title: 'A More Organized Person's Document'
output:
beamer_presentation:
includes:
in_header: header.tex
before_body: doc_prefix.tex
after_body: doc_suffix.tex
---
58 / 60

LaTeX Customization, 2

  • If you prefer a self-contained document, you may opt for the header-includes option over the modular approach:
---
title: 'BIOS 691: Reproducible Research Tools'
author: "Author Name"
date: "November 2, 2017"
header-includes:
- \usepackage{graphicx}
output:
beamer_presentation:
theme: "Frankfurt"
---
59 / 60

Quarto

Quarto is an open-source scientific and technical publishing system built on Pandoc.

  • Python, R, Julia, and Observable support.
  • Author documents as Markdown or Jupyter notebooks.
  • Articles, reports, presentations, websites, blogs, and books in HTML, PDF, MS Word, ePub, and more.
  • Supports scientific markdown, including equations, citations, crossrefs, figure panels, callouts, advanced layout, and more.
  • Improved figure/table cross-referencing, labeling.

https://quarto.org/docs/get-started/

Quarto - J.J. Allaire 1h video

60 / 60

Literate programming

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do.

  • Donald E. Knuth, Literate Programming, 1984
2 / 60
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow