Data Types and Structures in R

The R programming language offers an array of data types and structures, critical for robust data analysis.

Mastering these elements enables a broad spectrum of statistical analyses and graphic representations.

Understanding the Difference: Data Types vs. Structures Simply put, data types refer to the kind of data an individual element represents (such as numeric, string, boolean), while data structures pertain to how multiple elements are organized and interrelated (like vectors, lists, matrices). The choice of data type is integral to the data's nature and impacts how it's structured.

Data Types

R's primary data types include:

  • Numeric Values
    Ideal for representing real numbers.

    x <- 42.5

  • Integer
    Specifically for whole numbers, with "L" indicating an integer.

    x <- 42L

  • Characters
    Designated for text strings.

    x <- "Hello, R!"

  • Logical
    For boolean values (TRUE, FALSE).

    x <- TRUE

Data Structures

Key data structures in R are:

  • Vectors
    Composed of elements of a single type, vectors are essential for consistent series of measurements. For instance, a vector can store temperature readings of a specific location.

    temperature <- c(22, 23, 21, 20)

    [1] 22 23 21 20

  • Factors
  • Lists
    Lists are versatile, able to hold elements of various types. They are excellent for compiling diverse data sets. For example, a list might include different types of student information, such as name, grade, etc.

    student <- list(name="Luca", grade=27, passed=TRUE)

    $name
    [1] "Luca"

    $grade
    [1] 27

    $passed
    [1] TRUE

  • Factors
  • Matrices
    Matrices are two-dimensional arrays, perfect for managing data with a bidirectional relationship. They are structured in rows and columns, containing elements of the same type.

    points_matrix <- matrix(1:6, nrow=2)

    [,1] [,2] [,3]
    [1,] 1 3 5
    [2,] 2 4 6

  • Factors
  • Data Frames
    Similar to matrices but capable of holding different types of data in each column. Data frames are especially useful for managing tabular data. Their versatility is enhanced when combined with functions like dplyr for sophisticated data manipulation, such as filtering, summarizing, and transforming. For instance, a data frame might store student names and their corresponding grades in separate columns.

    student_data <- data.frame(name=c("Luca", "Marta"), grade=c(27, 30))

    name grade
    1 Luca 27
    2 Marta 30

  • Factors
    Factors are specialized for managing categorical data in R. They categorize and order data into predefined levels, which is instrumental in both data analysis and manipulation. For example, assigning a factor to a variable like "gender" not only stores the original values ("M", "F", "F") but also organizes them into distinct categories, simplifying subsequent data operations.

    gender <- factor(c("M", "F", "F"), levels=c("M", "F"))

    [1] M F F
    Levels: M F

 

 




Report a mistake or post a question




FacebookTwitterLinkedinLinkedin