+ - 0:00:00
Notes for current slide
Notes for next slide

Introduction to functions and R packages

Mikhail Dozmorov

Virginia Commonwealth University

09-18-2023

1 / 21

Package priorities

Question: What is more important?

  • Usability, solves real problem
  • Statistical (methodological) superiority
  • Documentation
  • Speed
2 / 21

Documenting functions: the old way

  • Originally, documentation was written in LaTeX-like format, stored in man/*.Rd files
\name{cat_function}
\alias{cat_function}
\title{A Cat Function}
\usage{
cat_function(love.cats = TRUE)
}
\arguments{
\item{love.cats}{Do you love cats? Defaults to TRUE.}
}
\description{
This function allows you to express your love of cats.
}
\examples{
cat_function()
}
\keyword{cats}
3 / 21

Documenting functions: the simple way

  • The package roxygen2 greatly simplifies documentation
  • Roxygen2 docstrings start with #’
  • Keywords defining pieces of documentation start with @

    • @param - parameter description
    • @return - what the function returns
    • @export - must be to make the function available
    • @examples - how-to use the function
  • Can (must) use LaTeX syntax in special cases

    • \code{ <R code here> } - code highlight
    • \url{ http:// ... } - URL
    • \email{name@...} - e-mail

https://CRAN.R-project.org/package=roxygen2

4 / 21

Documenting functions: the simple way

  • The package roxygen2 greatly simplifies documentation
#' A Cat Function
#'
#' This function allows you to express your love of cats.
#' @param love.cats Do you love cats? Defaults to TRUE.
#' @keywords cats
#' @export
#' @examples
#' cat_function()
5 / 21

Generating documentation

  • Run roxygen2::roxygenise() or devtools::document() to convert roxygen-formatted help to .Rd files understood by R

  • Check Generate documentation with Roxygen to auto-generate .Rd files, NAMESPACE file. The menu "Tools -> Project Options -> Build Tools"

https://usethis.r-lib.org/

6 / 21

Making your functions available

  • All packages have a NAMESPACE file: a collection of objects to be exported and imported
    • To avoid overwriting users' variables
    • To avoid ambiguity in function calls
    • To ensure the package has everything it needs to run
    • To encourage modular code
# Generated by roxygen2: do not edit by hand
S3method(t,test2)
export(TCGA_corr)
export(Venn2)
export(Venn3)
export(Venn4)
export(Venn5)
export(gene_enrichment)
7 / 21

Making your functions available

  • A NAMESPACE file specifies which functions are available to the user, and which are hidden (helper functions, minimize naming conflicts)
export(function_name)
  • A minimal NAMESPACE file
# Export all names
exportPattern(".")
  • Your NAMESPACE is auto generated using @export, @import, @importFrom Roxygen tags; never directly modify your NAMESPACE file
8 / 21

Making objects from other packages available

  • All or partial set of objects from another package can be imported and used as package::object
import(randomForest)
importFrom(ModelMetrics,mcc)
importFrom(PRROC,pr.curve)
  • Your NAMESPACE is auto generated using @export, @import, importFrom Roxygen tags; never directly modify your NAMESPACE file
9 / 21

Making everything available with Roxygen2

Roxygen tags from function's help sections get converted to the NAMESPACE entries

In preciseTAD.R function:

#' @export
#'
#' @import randomForest e1071
preciseTAD <- function(...)

In NAMESPACE, after running roxygen2::roxygenise() or devtools::document()

export(preciseTAD)
import(randomForest)
import(e1071)
10 / 21

Writing detailed documentation

  • Vignette – an instructive tutorial demonstrating practical uses of the software with discussion of the interpretation of the results (vignette = tutorial). Critical to get a user started with your package

  • A short introduction that explains

    • The type of data the package can be used on
    • The general purpose of the functions in the package
    • One or more example analyses with
    • A small, real data set
    • An explanation of the key functions
    • An application of these functions to the data
    • A description of the output and how it can be used

https://github.com/hadley/dplyr/tree/master/vignettes

11 / 21

Writing vignettes

  • Written using Markdown syntax
  • Saved in vignettes/*.Rmd files
  • Add YAML header to each vignette file
---
title: "Vignette title"
date: "2023-09-17"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Vignette title}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
  • Build your vignettes with the devtools::build_vignettes() command
  • The resulting *.html files will be in the inst/doc folder
12 / 21

Package building pipeline using devtools

library(devtools)
create(“cats”) # Create package skeleton
document(“cats”) # Create function's help
build_vignettes("cats") # Build vignettes
build("cats") # Build package
install("cats") # Install package
check("cats") # Build and check a source package, using all known best practices
13 / 21

README.md for the package

  • Create README.Rmd with the standard RMarkdown text and code, as you would do for the vignette
  • Use devtools::build_readme() function that will compile the README.md file
    • Remember, the Markdown format in the README.md file shouldn't have R code.
    • The devtools::build_readme() function will execute code you put in the README.Rmd file and format the code and its output properly
14 / 21

Package building pipeline using command line

  • R CMD build cats – will create a tarball of the package, with its version number encoded in the file name

  • R CMD install cats_0.0.0.9000.tar.gz

  • R CMD check --as-cran cats_0.0.0.9000.tar.gz

15 / 21

Including datasets

  • Create data folder

  • Save your data in R binary format, using save(cats, file = “data/mydata.rda”) (or, use .RData extension)

  • Can include .txt of .csv files

  • Add LazyData: true in the DESCRIPTION file – your data will be immediately available (after data("mydata"), cats data will be available on the first use).

  • If the data is large, also add LazyDataCompression: xz

16 / 21

Documenting datasets

  • Add R/mydata-data.R file
  • Document with roxygen2 syntax
#' My data brief info
#'
#' Longer description of my data
#'
#' @docType data
#' @usage data(mydata)
#' @format An object of class \code{"data.frame"}
#' @keywords datasets
#' @references Put reference here
#' @source \href{http://....org}{Link}
#' @examples
#' data(mydata)
"mydata" # No extension
17 / 21

Example of a dataset package

  • USDA Nutrients - an R package containing all data from the USDA National Nutrient Database, "Composition of Foods Raw, Processed, Prepared"

  • Use devtools::install_github("hadley/usdanutrients") function to install a package from GitHub

https://github.com/hadley/usdanutrients

18 / 21

Updating R and packages

  • installr::updateR() - update R and the corresponding packages on Windows

  • updateR - update R on Mac

19 / 21

Other useful tips and tricks

  • testthat is a H.W. package to write unit tests

  • rm(list=ls(all=TRUE)) removes everything in the global environment

    • But does not unload packages! Use, e.g., detach("package:vegan", unload=TRUE)
    • Use "Session -> Restart R" to completely refresh your environment
  • pkgdown is a H.W. package that can autogenerate a website for your package build_site()

  • blogdown - Creating Websites with R Markdown, Yihui Xie et al.

  • bookdown - Write HTML, PDF, ePub, and Kindle books with R Markdown, by Yihui Xie et al.

20 / 21

More references

21 / 21

Package priorities

Question: What is more important?

  • Usability, solves real problem
  • Statistical (methodological) superiority
  • Documentation
  • Speed
2 / 21
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow