install.packages("styler")
STA 9750 Week 1 In-Class Activity: R
and RStudio
Welcome!
Topics:
- Installing
R
andRStudio
- Installing
git
- Getting Started on
GitHub
- Basic Principles of “Clean Code”
R
and RStudio
The primary programming language used in this course is R
, one of the two most popular languages used in data science. R
, like its predecessor the S
language, is optimized for interactive, data-analytic work, in contrast with python
, which is optimized for general purpose computing.
R
is a programming language and runtime; we will supplement it with RStudio
, an Integrated Development Environment or, less formally, an editor. RStudio
is the software where you will write the code and then the R
runtime will execute it.
R
Students should first install R
from https://cloud.r-project.org/.
As of 2024-08-26, the most recent version of R
is 4.4.1. Using the most current version of R
will reduce the likelihood of issues later in the course.
RStudio
Next, download and install the RStudio
IDE (desktop edition).
RStudio
is highly configurable and I recommend taking advantage of all its built-in features. If you go to the Global Options
menu (accessible under Tools
), I recommend the following settings:
- General: Uncheck “Restore .RData into workspace at startup”.
- General: Set “Save workspace to .RData on exit” to “Never”
- Code / Editing: Set “Tab width” to 2
- Code / Editing: Check
- “Insert spaces for Tab”
- Auto-detect code indentation
- Insert matching parens / quotes
- Use native pipe operator
- Auto-indent code after paste
- Vertically align arguments in auto-indent
- Continue comment when inserting new line
- Code / Display: Check
- Show line numbers
- Show margin (margin column should be 80)
- Code / Diagnostics: check all “R” diagnostics.
- Appearance: Pick a color theme you enjoy. (I’m partial to light text on a dark background)
You may wish to enable GitHub Copilot. I have little experience with GH Copilot, but it seems quite popular and is allowed in this course. It is not guaranteed to be accurate at all times - and “the AI told me to” is not a valid excuse if your code is wrong - but on balance, it should be useful.
quarto
We won’t use it this week, but you will need to install Quarto before starting on Mini-Project #00.
Source Code Management
git
git
is a source-code management tool, used by developers to manage the code they write. If you’ve ever been part of a large project and struggled to coordinate all team members using the same version of a document, git
exists to solve that problem.
If you don’t have git
pre-installed, install either Git for Windows or the XCode Command Line Tools for MacOS. If not automatically prompted when you try to use git
, the Mac install can be manually triggered by running xcode-select --install
at a command line.
In this course, we will use three main functions of git
:
-
staging: telling
git
, I want you to prepare to save a certain file - committing: saving a set of related changes
- pushing: copying your committed changes to a separate server for sharing and backup
Whenever you write code you are happy with, you should use git
to save it. Saving changes with git
is cheap and easy - so do it regularly. You always want git
to have a backup of good code in case you loose power, accidentally delete a file, break something in a way you’re not sure how to undo, etc..
RStudio
comes with powerful git
integration. Once you have created a project, you should see a tab labelled “Git” in the top right corner of your IDE window that looks something like this:
To stage a file - prepare to save it - click the empty check box next to the file name. A new file shows a status of “?” - this is git
saying “I’ve never seen this file before. Do you want me to track it for you?”. Later, when you make further changes to file you have already asked git to track, a status of “M” (for Modified) will be shown.
On its own staging a file does nothing. You also need to commit it for git
to truly track it.1 The Commit
button will commit all staged changes. When you make a commit, git
requires a brief message summarizing the changes. There’s no particular formatting requirement to this message, but it should be something that future-you is able to easily understand. For instance, the commit message from the initial draft of this document reads as:
Initial draft of Lab 01 (STA9750)
- Installing R and RStudio
- Git and GitHub
- Leaflet Example for Styler
TODO: Fuller shell explainers
TODO: Link more git help
When I read this, I know the purpose of the change I made (first line), the contents of that change (list), and parts that still need more work.
Finally, after you save a change, it is only saved on your computer. The true power of git
comes from its ability to copy changes and backups across machines. This gives you an easy way to store backups in case your computer dies and makes collaboration efficient and fun. git
allows you to push
and pull
changes between machines in endlessly powerful (but sometimes complex) ways. For this course, we’ll keep things simple and only use GitHub
to share code. We discuss GitHub
in the next section.
Reference: We will not use all of the functionality of git
in this course, but you should familiarize yourself with Chapters 1, 2, and 6 of the Git Book over the next two weeks.
GitHub
GitHub
is an industry-standard code hosting and collaboration platform. In addition to hosting copies of code, GitHub
provides web hosting, bug reporting, code review, continuous integration, documentation wikis, and discussion fora. You will explore GitHub
in more detail starting in Mini-Project #00.
Code Styling
Autoformatting with the styler
Package
A major theme of this course will be sharing and co-developing code with your classmates, both for peer feedback and for the course project. Code sharing is hard! Everyone writes code a little differently and what is clear to you may not be clear at all to your reader.
To make code sharing just a bit easier, we use tools to ensure all code shared in this course is consistently formatted. By using consistent formatting, you reduce the cognitive load on your reader, making it easier for them to focus on the ideas of your code, not how you chose to write it.
A major strength of R
is its huge number of user-contributed packages. These are “add-ins” which provide additional functionality not available in the basic version of R
. As of 2024-08-26, there are over 21 thousand packages available on CRAN, the largest official repository of R
packages. Beyond all those, there are thousands more packages available on other code hosting websites like GitHub
.2
We will use the contributed styler
package to format code in this course. Run the following command to automatically download and install the styler
package:
(Use the clipboard icon on the right of code snippets to automatically copy code suitable for pasting into RStudio
.)
You should see something like this:
The styler
package has been downloaded and installed on your computer, but it is not yet “active” or “open” in R
. In general, you will only need to download packages once, but you will need to load them each time you want to use them.3
Open a R
file in RStudio
and copy the following (ugly) code:
if(!require("leaflet")) install.packages("leaflet")
if(!require("tidyverse")){
install.packages("tidyverse")
}
library(tidyverse)
library(rvest)
library(leaflet)
pAGE = read_html('https://en.wikipedia.org/wiki/Baruch_College')
pAGE |> html_element(".latitude") |> html_text2() -> BaruchLatitude
baruch_longitude <- pAGE |> html_element(".longitude") |> html_text2()
BaruchLatitude <- sum(as.numeric(strsplit(BaruchLatitude,
"[^0123456789]")[[1]]) * (1/60)^(0:2), na.rm=TRUE)
baruch_longitude <- sum(as.numeric(strsplit(baruch_longitude, "[^0123456789]")[[1]]) *
(1/60)^(0:2), na.rm=TRUE)
leaflet() %>% addTiles() %>% setView(-baruch_longitude, BaruchLatitude, zoom=17) %>%
addPopups(-baruch_longitude, BaruchLatitude, "Look! It's <b>Baruch College</b>!")
You don’t need to understand what this does just yet, but it’s hopefully clear that this is ugly code. Nothing is lined up properly, capitalization is erratic, and different coding styles are intermixed rather recklessly.
Near the top of your RStudio
pane, you will see a drop-down menu titled Addins
. If you successfully installed styler
above, one of the Addins
choices will be “style active file.” Click this and the code will be cleaned up (a bit) resulting in something like this:
if (!require("leaflet")) install.packages("leaflet")
if (!require("tidyverse")) {
install.packages("tidyverse")
}
library(tidyverse)
library(rvest)
library(leaflet)
pAGE <- read_html("https://en.wikipedia.org/wiki/Baruch_College")
pAGE |>
html_element(".latitude") |>
html_text2() -> BaruchLatitude
baruch_longitude <- pAGE |>
html_element(".longitude") |>
html_text2()
BaruchLatitude <- sum(as.numeric(strsplit(
BaruchLatitude,
"[^0123456789]"
)[[1]]) * (1 / 60)^(0:2), na.rm = TRUE)
baruch_longitude <- sum(as.numeric(strsplit(baruch_longitude, "[^0123456789]")[[1]]) *
(1 / 60)^(0:2), na.rm = TRUE)
leaflet() %>%
addTiles() %>%
setView(-baruch_longitude, BaruchLatitude, zoom = 17) %>%
addPopups(-baruch_longitude, BaruchLatitude, "Look! It's <b>Baruch College</b>!")
It’s far from perfect - and we will discuss the many issues in this example throughout the course - but it’s better! At a minimum, you should make sure to run styler
like this on all code you submit during this course.
And now that your code is cleaned up, you should run it! The Source
button in the top right corner will run all code in the open file. Running the code produces something like this:
Not too shabby! That’s an interactive, dynamic map showing the location of Baruch College obtained by parsing the Baruch Wikipedia page, getting the GPS coordinates of Baruch, downloading a map file, and locating Baruch on that map.
Challenge: Adjust this code to show Hunter college instead of Baruch.
lintr
If you want even more feedback on writing good code, install the lintr
package and use the associated RStudio
add-in. Unlike styler
, lintr
won’t make changes automatically for you, but it will highlight much more subtle possible problems.4
Extra: Welcome to $SHELL
To become a true “power user” of tools like R
and python
, you will need to become more familiar with the command line interface (CLI) and associated tools.5
The Software Carpentry Unix Shell Tutorial is a great introduction to shell usage. Check it out!
NB: MacOS and Linux systems work quite similarly under the hood, as both descend from the Unix
tradition. By contrast, Windows works somewhat differently. Learners whose personal machine runs Windows are encouraged to take advantage of the provided Linux-running virtual machines6 as they work through this section.
Looking Ahead
Next week, we will use these tools to begin coding in earnest. If you’re feeling ambitious, go ahead and get started on Mini-Project #00.
Footnotes
This two stage process is a bit cumbersome for the first stage of a small project, but it quickly becomes incredibly valuable. Instead of saving everything every time, there is great power in only saving “good” or “finished” changes to a large project, while leaving work-in-progress elsewhere unsaved. You probably won’t need this level of control until you get to the course project, but it’s better to have it than not.↩︎
If you are interested in bioinformatics, the Bioconductor project develops incredible open-source
R
packages.↩︎While this may feel cumbersome, it’s really not dissimilar to any other software you use (or
R
itself). You need to download it once, but you need to open it each time you intend to use it. There’s no harm in re-downloading–free software!–but it wastes time and bandwidth. Since we benefit so much from the free-software community, the very least we can do is not run up their internet bills unnecessarily.↩︎Some of the issues identified by
lintr
may be false positives, but the false positive rate is quite low, especially for the sort of procedural code that is the focus of this course. You should default to trying to appeaselintr
, but feel free to use the course discussion board for any questions.↩︎As an added benefit, use of the CLI also makes you look like a 90s movie hacker to all your friends.↩︎
See the Course Resources page.↩︎