Managing secrets

Introduction

This document gives you the basics on securely managing secrets. Most of this document is not directly related to httr, but it’s common to have some secrets to manage whenever you are using an API.

What is a secret? Some secrets are short alphanumeric sequences:

Passwords are clearly secrets, e.g. the second argument to authenticate(). Passwords are particularly important because people (ill-advisedly) often use the same password in multiple places.
Personal access tokens (e.g. github) should be kept secret: they are basically equivalent to a user name password combination, but are slightly safer because you can have multiple tokens for different purposes and it’s easy to invalidate one token without affecting the others.

Surprisingly, the “client secret” in an oauth_app() is not a secret. It’s not equivalent to a password, and if you are writing an API wrapper package, it should be included in the package. (If you don’t believe me, here are google’s comments on the topic.)

Other secrets are files:

The JSON web token (jwt) used for server-to-server OAuth (e.g. google) is a secret because it’s equivalent to a personal access token.
The .httr-oauth file is a secret because it stores OAuth access tokens.

The goal of this vignette is to give you the tools to manage these secrets in a secure way. We’ll start with best practices for managing secrets locally, then talk about sharing secrets with selected others (including travis), and finish with the challenges that CRAN presents.

Here, I assume that the main threat is accidentally sharing your secrets when you don’t want to. Protecting against a committed attacker is much harder. And if someone has already hacked your computer to the point where they can run code, there’s almost nothing you can do. If you’re concerned about those scenarios, you’ll need to take a more comprehensive approach that’s outside the scope of this document.

Locally

Working with secret files locally is straightforward because it’s ok to store them in your project directory as long as you take three precautions:

Ensure the file is only readable by you, not by any other user on the system. You can use the R function Sys.chmod() to do so:
```
Sys.chmod("secret.file", mode = "0400")
```
It’s good practice to verify this setting by examining the file metadata with your local filesystem GUI tools or commands.
If you use git: make sure the files are listed in .gitignore so they don’t accidentally get included in a public repository.
If you’re making a package: make sure they are listed in .Rbuildignore so they don’t accidentally get included in a public R package.

httr proactively takes all of these steps for you whenever it creates a .httr-oauth file.

The main remaining risk is that you might zip up the entire directory and share it. If you’re worried about this scenario, store your secret files outside of the project directory. If you do this, make sure to provide a helper function to locate the file and provide an informative message if it’s missing.

my_secrets <- function() {
  path <- "~/secrets/secret.json"
  if (!file.exists(path)) {
    stop("Can't find secret file: '", path, "'")
  }
  
  jsonlite::read_json(path)
}

Storing short secrets is harder because it’s tempting to record them as a variable in your R script. This is a bad idea, because you end up with a file that contains a mix of secret and public code. Instead, you have three options:

Ask for the secret each time.
Store in an environment variable.
Use the keyring package.

Regardless of how you store them, to use your secrets you will still need to read them into R variables. Be careful not to expose them by printing them or saving them to a file.

Ask each time

For scripts that you only use every now and then, a simple solution is to simply ask for the password each time the script is run. If you use RStudio an easy and secure way to request a password is with the rstudioapi package:

password <- rstudioapi::askForPassword()

If you don’t use RStudio, use a more general solution like the getPass package.

You should never type your password into the R console: this will typically be stored in the .Rhistory file, and it’s easy to accidentally share without realising it.

Environment variables

Asking each time is a hassle, so you might want to store the secret across sessions. One easy way to do that is with environment variables. Environment variables, or envvars for short, are a cross platform way of passing information to processes.

For passing envvars to R, you can list name-value pairs in a file called .Renviron in your home directory. The easiest way to edit it is to run:

file.edit("~/.Renviron")

The file looks something like

VAR1 = value1
VAR2 = value2

And you can access the values in R using Sys.getenv():

Sys.getenv("VAR1")
#> [1] "value1"

Note that .Renviron is only processed on startup, so you’ll need to restart R to see changes.

These environment variables will be available in every running R process, and can easily be read by any other program on your computer to access that file directly. For more security, use the keyring package.

Keyring

The keyring package provides a way to store (and retrieve) data in your OS’s secure secret store. Keyring has a simple API:

keyring::key_set("MY_SECRET")
keyring::key_get("MY_SECRET")

By default, keyring will use the system keyring. This is unlocked by default when you log in, which means while the password is stored securely pretty much any process can access it.

If you want to be even more secure, you can create custom keyring and keep it locked. That will require you to enter a password every time you want to access your secret.

keyring::keyring_create("httr")
keyring::key_set("MY_SECRET", keyring = "httr")

Note that accessing the key always unlocks the keyring, so if you’re being really careful, make sure to lock it again afterwards.

keyring::keyring_lock("httr")

You might wonder if we’ve actually achieved anything here because we still need to enter a password! However, that one password lets you access every secret, and you can control how often you need to re-enter it by manually locking and unlocking the keyring.

Sharing with others

By and large, managing secrets on your own computer is straightforward. The challenge comes when you need to share them with selected others:

You may need to share a secret with me so that I can run your reprex and figure out what is wrong with httr.
You might want to share a secret amongst a group of developers all working on the same GitHub project.
You might want to automatically run authenticated tests on travis.

To make this work, all the techniques in this section rely on public key cryptography. This is a type of asymmetric encryption where you use a public key to produce content that can only be decrypted by the holder of the matching private key.

Reprexes

The most common place you might need to share a secret is to generate a reprex. First, do everything you can do eliminate the need to share a secret:

If it is an http problem, make sure to run all requests with verbose().
If you get an R error, make sure to include traceback().

If you’re lucky, that will be sufficient information to fix the problem.

Otherwise, you’ll need to encrypt the secret so you can share it with me. The easiest way to do so is with the following snippet:

library(openssl)
library(jsonlite)
library(curl)

encrypt <- function(secret, username) {
  url <- paste("https://api.github.com/users", username, "keys", sep = "/")

  resp <- httr::GET(url)
  httr::stop_for_status(resp)
  pubkey <- httr::content(resp)[[1]]$key

  opubkey <- openssl::read_pubkey(pubkey)
  cipher <- openssl::rsa_encrypt(charToRaw(secret), opubkey)
  jsonlite::base64_enc(cipher)
}
  
cipher <- encrypt("<username>\n<password>", "hadley")
cat(cipher)

Then I can run the following code on my computer to access it:

decrypt <- function(cipher, key = openssl::my_key()) {
  cipherraw <- jsonlite::base64_dec(cipher)
  rawToChar(openssl::rsa_decrypt(cipherraw, key = key))
}

decrypt(cipher)
#> username
#> password

Change your password before and after you share it with me or anyone else.

GitHub

If you want to share secrets with a group of other people on GitHub, use the secret or cyphr packages.

Travis

The easiest way to handle short secrets is to use environment variables. You’ll set in your .Renviron locally and in the settings pane on travis. That way you can use Sys.getenv() to access in both places. It’s also possible to set encrypted env vars in your .travis.yml: see the documentation for details.

Regardless of how you set it, make sure you have a helper to retrieve the value. A good error message will save you a lot of time when debugging problems!

my_secret <- function() {
  val <- Sys.getenv("SECRET")
  if (identical(val, "")) {
    stop("`SECRET` env var has not been set")
  }
  val
}

Note that encrypted data is not available in pull requests in forks. Typically you’ll need to check PRs locally once you’ve confirmed that the code isn’t actively malicious.

To share secret files on travis, see https://docs.travis-ci.com/user/encrypting-files/. Basically you will encrypt the file locally and check it in to git. Then you’ll add a decryption step to your .travis.yml which makes it decrypts it for each run. See bigquery for an example.

Be careful to not accidentally expose the secret on travis. An easy way to accidentally expose the secret is to print it out so that it’s captured in the log. Don’t do that!

CRAN

There is no way to securely share information with arbitrary R users, including CRAN. That means that if you’re developing a package, you need to make sure that R CMD check passes cleanly even when authentication is not available. This tends to primarily affect the documentation, vignettes, and tests.

Documentation

Like any R package, an API client needs clear and complete documentation of all functions. Examples are particularly useful but may need to be wrapped in \donttest{} to avoid challenges of authentication, rate limiting, lack of network access, or occasional API server down time.

Vignettes

Vignettes pose additional challenges when an API requires authentication, because you don’t want to bundle your own credentials with the package! However, you can take advantage of the fact that the vignette is built locally, and only checked by CRAN. In a setup chunk, do:

NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true")
knitr::opts_chunk$set(purl = NOT_CRAN)

And then use eval = NOT_CRAN in any chunk that requires access to a secret.

Testing

Use testthat::skip() to automatically skip tests that require authentication. I typically will wrap this into a little helper function that I call at the start of every test requiring auth.

skip_if_no_auth <- function() {
  if (identical(Sys.getenv("MY_SECRET"), "")) {
    skip("No authentication available")
  }
}