Notas de la clase 2025-05-12

Published

May 12, 2025

Datos

Los datos corresponden al tercer cuatrimestre del 2024. Son de la encuesta permanente de hogares del INDEC.

# datos_indec <- readRDS("/cloud/project/data/datos_indec.rds")
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
filename <- "../data/usu_individual_T324.txt"
file.exists(filename)
[1] TRUE
my_data <- readr::read_delim(filename, delim = ";")
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 47564 Columns: 177
── Column specification ────────────────────────────────────────────────────────
Delimiter: ";"
chr   (6): CODUSU, MAS_500, CH05, CH14, PP04D_COD, PP09C_ESP
dbl (169): ANO4, TRIMESTRE, NRO_HOGAR, COMPONENTE, H15, REGION, AGLOMERADO, ...
num   (1): IPCF
lgl   (1): PP09A_ESP

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
my_data <- my_data |> 
  transmute(
    salario = PP08D1, 
    sexo = CH04,
    nivel_educativo = NIVEL_ED,
    edad = CH06
  ) |>
  mutate(salario = ifelse(salario<0,NA,salario)) |>
  mutate(sexo = ifelse(sexo == 1, "Hombre","Mujer"))

Ejemplo de funciones

a <- 1
b <- c(1,2,3,4,5)
b2 <- c("Uno", "dos", "tres")

add <- function(numero1, numero2){
  numero1 + numero2
}

add(1,1)
[1] 2
add(1000, 1000)
[1] 2000

Nuestra función de error estandar

error_estandar <- function(mi_variable){
 N <- length(mi_variable)
 SE <- (sd(mi_variable)/sqrt(N))
 SE 
}

Probarla

error_estandar(my_data$CH06)
Warning: Unknown or uninitialised column: `CH06`.
[1] NA

Ejemplo de test de T

edadHombre <- my_data %>% 
  filter(sexo == "Hombre") %>% 
  pull(edad)
edadMujer = my_data %>% 
  filter(sexo == "Mujer") %>% 
  pull(edad)

Test de T

\[ H_0: \mu_m = \mu_h \\ H_1: \mu_m < \mu_h \\ H_2: \mu_m > \mu_h \\ \]

t.test(edadHombre,edadMujer)

    Welch Two Sample t-test

data:  edadHombre and edadMujer
t = -14.215, df = 47466, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.283811 -2.487980
sample estimates:
mean of x mean of y 
 34.96669  37.85258 
t.test(edadMujer,edadHombre)

    Welch Two Sample t-test

data:  edadMujer and edadHombre
t = 14.215, df = 47466, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 2.487980 3.283811
sample estimates:
mean of x mean of y 
 37.85258  34.96669 

Test de T

salarioHombre <- my_data %>% 
  filter(sexo == "Hombre") %>% 
  pull(salario)
salarioMujer = my_data %>% 
  filter(sexo == "Mujer") %>% 
  pull(salario
       
       )
var.test(salarioHombre,salarioMujer)

    F test to compare two variances

data:  salarioHombre and salarioMujer
F = 1.6645, num df = 10126, denom df = 8367, p-value < 2.2e-16
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 1.597668 1.734036
sample estimates:
ratio of variances 
          1.664523 

Concluimos que no tienen varianza igual. Entonces no corresponde test de T

Wilcox o test U de Mann-Whitney

wilcox.test(salarioHombre,salarioMujer)

    Wilcoxon rank sum test with continuity correction

data:  salarioHombre and salarioMujer
W = 43725179, p-value = 0.0001317
alternative hypothesis: true location shift is not equal to 0

Tareas para la próxima

Lecturas:

  • Capítulo 8

Tarea

  • Eligir otra variable (no sexo) dicotómica.

  • Usar Test de T o U de Mann-Whitney según corresponda para deternimar si existe diferencias en

    • Salario

    • Edad