What `R` you? (R vectors in python)

reticulate allows us to toggle between R and python in the same session, callling R objects when running python scripts and vice versa. When calling R data structures in python, the R structures are converted to the equivalent python structures where applicable. However, like translating English to Mandarin, translating R structures to python may not be straightforward which we will see later.

There are 5 R data structures:

  1. vector (more specifically atomic vector)

  2. list

  3. array

  4. matrix (special kind of array which is 2 dimensional)

  5. data frame

In this post, we will look at translating R’s vector into python.

# load libraries 
library(tidyverse)
library(reticulate)

A R vector is a python …

well it depends if the R vector has single or multiple elements.

Single element R vector

If the R vector has only 1 element, the python structure will be a scalar. A scalar is a structure which contains a single value. The value can be any type e.g. 69, 0.07, or ‘banana’.

Let’s verify with some code. Is Rvec_1 a vector?

Rvec_1<-1
is.vector(Rvec_1) 
## [1] TRUE

Indirectly, you can print the class() of the object. If it prints the element type, you can infer the object is a vector.

class(Rvec_1)
## [1] "numeric"

Is a single element R vector a python scalar structure?

py_run_string("import numpy as np")
py_eval("np.isscalar(r.Rvec_1)")
## [1] TRUE

Likewise, you can print the type of the object. If it prints the element type, you can infer the structure is a scalar.

py_eval("type(r.Rvec_1)")
## <class 'float'>

If you wish to run everything in R and achieve the above, you will have to convert the R object into a python object and store this converted object in your R’s global environment. From my previous introduction to the reticulate package, you can do this using ther_to_py function.

r_to_py(Rvec_1) %>% class()
## [1] "python.builtin.float"  "python.builtin.object"

There you have it. When you convert a single element R vector into python, it is a float element type which is indicative that it is a python scalar structure.

Multi element R vector

If the R vector has multiple elements, the python structure will be a list. Let’s assert this with some code. Is Rvec_multi a R atomic vector? The class() is an element type thus it can be inferred to be a R vector.

Rvec_multi<-c(66,99, 0.07)
class(Rvec_multi) 
## [1] "numeric"

Is a multi element R vector a python list? Yes, it is.

r_to_py(Rvec_multi) %>% class()
## [1] "python.builtin.list"   "python.builtin.object"

Named vectors

Occasionally, you may work with named vectors in R; for instance, when calculating quantiles.

(Rvec_name<-quantile(rnorm(100)))
##         0%        25%        50%        75%       100% 
## -2.1896617 -0.8763797 -0.2375871  0.4623371  2.7550884

Named vectors are still considered vectors.

Rvec_name %>% class()
## [1] "numeric"

Do note that the names in the named vectors (e.g. 0%, 25%..) are treated as character and NOT numbers.

Rvec_name %>%  str()
##  Named num [1:5] -2.19 -0.876 -0.238 0.462 2.755
##  - attr(*, "names")= chr [1:5] "0%" "25%" "50%" "75%" ...

However, python ignores the names when translating a multi element named vector. python treats it like another python list.

r_to_py(Rvec_name)
## [-2.189661746762837, -0.8763796933162846, -0.23758708410549662, 0.4623370994806194, 2.7550883808673072]


Some differences between python and R

Element types

We have been using element types to infer if the object is a R vector or a python scalar. Thus, it would helpful to know some of the differences between R and python element types.

Element types (numbers)

By default, R treats numbers as floats/numerics regardless if they are whole numbers or numbers with decimals

class(1)
## [1] "numeric"
class(0.07)
## [1] "numeric"

On the other hand, python treats whole numbers as integers.

py_eval("type(1)")
## <class 'int'>

Python treats number with decimals just like R, as floats/numerics

py_eval("type(0.07)")
## <class 'float'>

The trick for R to treat whole numbers as integers in the eyes of both R and python is to add the suffix L after the number.

Rvec_1int<-1L
class(Rvec_1int)
## [1] "integer"
r_to_py(Rvec_1int) %>% class()
## [1] "python.builtin.int"    "python.builtin.object"

Element types(coercing)

Elements in multi element R vectors adhere to singularity. In other words, different element types are coerced such that all elements have the same type.

Let’s look at an example. First, I will create 3 single element vectors of different element types.

Relement_int=2L
class(Relement_int)
## [1] "integer"
Relement_bool=TRUE
class(Relement_bool)
## [1] "logical"
Relement_char="banana"
class(Relement_char)
## [1] "character"

Next, I will combine these vectors into a multi element vector. Let’s reassess the element type for each element.

Rvec_mix<- c(Relement_int, Relement_bool, Relement_char)

class(Rvec_mix[1])
## [1] "character"
class(Rvec_mix[2])
## [1] "character"
class(Rvec_mix[3])
## [1] "character"

As you can see, all the different elements have been coerced into the same element type when they are combined in a multi element vector. Often, the individual elements are coerced into strings as strings is the most accommodating element type.

In contrast, python doesn’t coerce element types when lists are created. The integrity of each element type remains unchanged.

py_run_string("Plist_mix=(r.Relement_int, r.Relement_bool, r.Relement_char)")

py_eval("type(Plist_mix[0])")
## <class 'int'>
py_eval("type(Plist_mix[1])")
## <class 'bool'>
py_eval("type(Plist_mix[2])")
## <class 'str'>

Indexing

Besides the differences in element types, there are differences in indexing for each language.

Indexing (zero/non-zero)

R uses non-zero indexing

Rvec_multi[1]
## [1] 66

python uses zero indexing

py_eval("r.Rvec_multi[0]")
## [1] 66

Indexing (negative numbers)

In addition to non-zero and zero indexing, there are other differences in indexing. In R, negative index number means that the element of that index number is excluded.

Rvec_multi[-1]
## [1] 99.00  0.07

In python, negative index number means that indexing begins from the end of the dataset.

py_eval("r.Rvec_multi[-1]")
## [1] 0.07