Getting started with R

R wallpaper

In this post I want you getting started with R as quickly as possible (with RStudio). R is a simple language to manage big data. Let’s start to speak about it.

What is R?

R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Polls, data mining surveys, and studies of scholarly literature databases show substantial increases in popularity; as of July 2020, R ranks 8th in the TIOBE index, a measure of popularity of programming languages.

The R user interface

RStudio gives you a way to talk to your computer and sends command to execute some operations. R gives you a language to speak in. To get start, open RStudio just as you would open any other application on your computer. When you di, a window should appear in your screen.

RStudio - First launch
RStudio – First launch

If you don’t have RStudio installed in your machine, you can download it from this link.

RStudio download page
RStudio download page

The RStudio interface is simple. Your type R code into the bottom line of the RStudio console pane ad then press Enter to run it. The code you type is called a command, because it will command your computer to do something for you. The line you type it into is called the command line.

When you type a command at the prompt and hit Enter, your computer executes the command and shows you the result. Then RStudio displays a fresh prompt for your next command.

When do we compile?

In some languages, like C#, you have to compile your human-readable code into machine-readable code before you can run it. If you’ve programmed in such a language before, which means R automatically interprets you code as you run it.

Playing with R

First basic example is a sum.

> 1+1
[1]2
>

You’ll notice that a [1] appears next to your result. R is just letting you know that this line begins with the first value in your result. Some commands return more than one value, and their results may fill up multiple lines. For example, the command 100:130 returns 31 values; it creates a sequence of integer from 100 to 130. Notice that new bracketed numbers appear at the start of the second and third line of output.

RStudio - First example
RStudio – First example

The colon operator (:) returns every integer between two integers. It is an easy way to create a sequence of numbers.

If you type an incomplete command and press Enter, R will display a + prompt, which means it is waiting for you to type the rest of your command.

> 5-
+
+ 1
[1] 4
RStudio - Example incomplete command
RStudio – Example incomplete command

If you type a command that R doesn’t recognize, R will return an error message.

RStudio - Error message
RStudio – Error message

R treats the hashtag character # in a special way; R will not run anything that follows a hashtag on a line. This makes hashtags very useful for adding comments and annotations to your code. The hashtag is known as the commenting symbol in R.

Cancelling commands

Some R commands may take a long time to run. You can cancel a command once it has begun by typing Ctrl + C. Note that it may also take R a long time to cancel the command.

Objects

R lets you save data by storing it inside an R object. What’s an object? Just a name that you can use to call up stored data. For example, you can save data into an object like a or b. Wherever R encounters the object, it will replace it with the data saved inside.

a <- 1
a
[1] 1

a + 2
[1] 3
  • to create an R object, choose a name and then use the less-than symbol < followed by a minus sign (<-) to save data into it. R will make an object, give it your name and store in it whatever follows the arrow
  • when you ask R what’s in a, it tells you on the next line
  • you can use your object in new R commands too.

So, for another example, the following code would create an object named die that contains the numbers one through six. To see what is stored in an object, just type the object’s name by itself.

die <- 1:6

die
## 1 2 3 4 5 6

When you create an object, the object will appear in the environment pane of RStudio. This pane will show you all the objects you’ve created since opening RStudio.

RStudio - Create an object and environment pane
RStudio – Create an object and environment pane

You can name an object in R almost anything you want, but there are a few rules:

  1. a name cannot start with a number
  2. a name cannot use some special symbols, like ^, !, $, @, +, -, /, or *
Good namesNames that cause errors
a1trial
b$
FOO^mean
my_var2nd
.day!bad

R also understand capitalization (or is case-sensitive) so name and Name will refer to different objects.

Finally, R will overwrite any previous information stored in an object without asking you for permission. So, it is a good idea to not use names that are already taken:

my_number <- 1
my_number 
## 1

my_number <- 999
my_number
## 999

You can see which object names you have already used with the function ls

Object examples

If you follow my examples, you now know a virtual die that is stored in your computer’s memory. You can access it whenever you like by typing the word die. R will replace an object with its contents whenever the object’s name appears in a command. So, for example, you can do all sorts of math with the die. Math isn’t so helpful for rolling dice, but manipulating sets of numbers will be your stock and trade as a data scientist. So, let’s take a look at how to do that:

die - 1
## 0 1 2 3 4 5

die / 2
## 0.5 1.0 1.5 2.0 2.5 3.0

die * die
## 1  4  9 16 25 36
RStudio - Operation with an object
RStudio – Operation with an object

If you are a big fan of linear algebra, you may notice that R does not always follow the rules for matrix multiplication. Instead, R uses element-wise execution. When you manipulate a set of numbers, R will apply the same operation to each element in the set. So, for example, when you run die - 1, R subtracts one from each element of die.

When you use two or more vectors in an operation, R will line up the vectors and perform a sequence of individual operations. For example, when you run die * die, R lines up the two die vectors and then multiplies the first element of the vector 1 by the first element of vector 2 and so on.

Element-wise execution

If you give R two vectors of unequal lengths, R will repeat the shorter vector until it is as long as the longer vector, and then do the math. This isn’t a permanent change; the shorter vector will be its original size after R does the math. If the length of the short vector does not divide evenly into the length of the long vector, R will return a warning message. This behavior is known as vector recycling, and it helps R do element-wise operations.

> 1:2
[1] 1 2
> 1:4
[1] 1 2 3 4
> die <- 1:6
> die + 1:2
[1] 2 4 4 6 6 8
> die + 1:4
[1] 2 4 6 8 6 8
Warning message:
In die + 1:4 :
  longer object length is not a multiple of shorter object length
RStudio - Object operations
RStudio – Object operations

Element-wise operations are a very useful feature in R because they manipulate groups of values in an orderly way. When you start working with data sets, element-wise operations will ensure that values from one observation or case are only paired with values from the same observation or case. Element-wise operations also make it easier to write your own programs and functions in R.

But don’t think that R has given up on traditional matrix multiplication. You just have to ask for it when you want it. You can do inner multiplication with the %*% operator and outer multiplication with the %o% operator.

> die <- 1:6
> die %*% die
     [,1]
[1,]   91
> die %o% die
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    2    3    4    5    6
[2,]    2    4    6    8   10   12
[3,]    3    6    9   12   15   18
[4,]    4    8   12   16   20   24
[5,]    5   10   15   20   25   30
[6,]    6   12   18   24   30   36