Data Types and Structures in R
The R programming language offers an array of data types and structures, critical for robust data analysis.
Mastering these elements enables a broad spectrum of statistical analyses and graphic representations.
Understanding the Difference: Data Types vs. Structures Simply put, data types refer to the kind of data an individual element represents (such as numeric, string, boolean), while data structures pertain to how multiple elements are organized and interrelated (like vectors, lists, matrices). The choice of data type is integral to the data's nature and impacts how it's structured.
Data Types
R's primary data types include:
- Numeric Values
Ideal for representing real numbers.
x <- 42.5
- Integer
Specifically for whole numbers, with "L" indicating an integer.
x <- 42L
- Characters
Designated for text strings.x <- "Hello, R!"
- Logical
For boolean values (TRUE, FALSE).x <- TRUE
Data Structures
Key data structures in R are:
- Vectors
Composed of elements of a single type, vectors are essential for consistent series of measurements. For instance, a vector can store temperature readings of a specific location.temperature <- c(22, 23, 21, 20)
[1] 22 23 21 20
- Factors
- Lists
Lists are versatile, able to hold elements of various types. They are excellent for compiling diverse data sets. For example, a list might include different types of student information, such as name, grade, etc.
student <- list(name="Luca", grade=27, passed=TRUE)
$name
[1] "Luca"
$grade
[1] 27
$passed
[1] TRUE - Factors
- Matrices
Matrices are two-dimensional arrays, perfect for managing data with a bidirectional relationship. They are structured in rows and columns, containing elements of the same type.points_matrix <- matrix(1:6, nrow=2)
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6 - Factors
- Data Frames
Similar to matrices but capable of holding different types of data in each column. Data frames are especially useful for managing tabular data. Their versatility is enhanced when combined with functions like dplyr for sophisticated data manipulation, such as filtering, summarizing, and transforming. For instance, a data frame might store student names and their corresponding grades in separate columns.
student_data <- data.frame(name=c("Luca", "Marta"), grade=c(27, 30))
name grade
1 Luca 27
2 Marta 30 - Factors
Factors are specialized for managing categorical data in R. They categorize and order data into predefined levels, which is instrumental in both data analysis and manipulation. For example, assigning a factor to a variable like "gender" not only stores the original values ("M", "F", "F") but also organizes them into distinct categories, simplifying subsequent data operations.gender <- factor(c("M", "F", "F"), levels=c("M", "F"))
[1] M F F
Levels: M F