R is a computer program that lets you analyze data. By “analyze” we mean, first, read the data into the program and then operate on it – drawing graphs and charts, manipulating values, fitting statistical models, and so on. R is both a statistical “environment” and also a programming language, and it is very widely used both in commercial and academic settings. R is free and open-source and runs on Windows, Apple, and Linux operating systems. It is maintained by a group of volunteers who release bug fixes and new features regularly.
Who Uses R and Why?
R started as a tool for statisticians, evolving from a language called S that was created in the 1970s. Today, R remains the primary language of academic statisticians, and it also has a prominent place among analysts in business and government as well. It is used not only for building statistical models but also for handling and cleaning data, as in my blog, and for developing new statistical methods, building simulations, for visualization, and generally for all the data-handling tools the statistician and the data scientist require. Because of the ease with which users can develop and distribute new methods, R has also become the tool of choice in certain fast-growing fields such as biostatistics and genetics. Articles on “surveys of the top tools used by data scientists” inevitably name R as one of the important tools with which data scientists, as well as statisticians, should be familiar. Moreover, R's popularity is such that there are extensions to R that allow you to connect to other programs such as the Python and Java languages, the H2O machine-learning system, the ArcGIS geographical information system, and many more.
Acquiring and Installing R
The primary way to acquire R is to download it from the Internet. The main R website for R is www.r-project.org, and the www.cran.r-project.org page (“CRAN” standing for “Comprehensive R Archive Network”) is where you can download R itself. There are in fact dozens of “mirror” sites for CRAN – that is, websites that are essentially copies of the CRAN site – so as to reduce the load on the CRAN site. You can probably find a mirror near you on the “mirrors” page. After you download R, install it in the way you would normally install a program on your operating system.
At any one time, users around the world will be running slightly different versions of R, since new ones are released fairly frequently. For example, at this writing the current version of R was called 3.3.2, but many users are still using 3.2 or earlier versions. This will almost never cause problems, but it is a good idea to update your version of R from time to time.
There are also several slightly different versions of R distributed other than at CRAN. Microsoft R Open is a particular version of R that uses a different set of math libraries intended to make certain computations faster. Like “regular” R, Microsoft R Open is free, although it does not run on OS X. Other versions of R are intended to communicate with relational databases or with other big-data platforms. For this book, we will assume you are running “regular” R – but in any case for our purposes all versions of R should behave exactly the same way.
Starting and Quitting R
The way you start R depends on your operating system. Normally double-clicking on an R icon will be enough to get R started. In the command-line interface of many Linux systems, or using the OS X terminal window, it may be enough just to type the upper-case letter R (or, for Windows command lines, Rgui). When R has started, you will see the command prompt >. This is the R console, the place where commands are entered. At this point, you can start typing commands to R. When it comes time to quit R, you can either “kill” the window in the usual way (for OS X, the red dot, the lightswitch in the top right, or via the File dialog; for Windows, the red X or File dialog) or you can type the q() command. In either case, R will then ask you if you want to “Save workspace image.” If you answer “yes” to this question, R will save to the disk any changes you made during the current session, whereas if you answer “no,” R will return its workspace to the condition it was in when R was last started. We almost always want to answer “yes” to this question!