15 feb 2020

Ejercicio de econometría resuelto (Principle of Econometric)


# Problema 2.10.1 del libro: Hill, C; Grinths, W and Lim,
# G. (2011). Principle of Econometric. United States of America. Fourth edition.

2.10.1 PROBLEMS
2.1 Consider the following five observations. You are to do all the parts of this exercise
using only a calculator.

> x<-c(0,1,2,3,4)
> y<-c(6,2,3,1,0)

(a)    Complete the entries in the table. Put the sums in the last row. What are the sample means x and y?

> sum(x-mean(x))
[1] 0
> sum((x-mean(x))^2)
[1] 10
> sum(y-mean(y))
[1] 4.440892e-16
> sum((x-mean(x))*(y-mean(y)))
[1] -13

(b)    Calculate b1 and b2 using (2.7) and (2.8) and state their interpretation.

> # b. Calcula beta mediante sumatoria
> b1<-sum((x-mean(x))*(y-mean(y)))/sum((x-mean(x))^2)
> b0<-mean(y)-b1*mean(x) 
> c(b0,b1)
[1]  5.0 -1.3

Para validar los resultados anteriores:

> lm(y~x)
 
Call:
lm(formula = y ~ x)
 
Coefficients:
(Intercept)            x  
        5.0         -1.3 

(c)    Compute sum(x^2) Using these numerical values, show that:

> sum(x^2)
[1] 30
> sum(x*y)
[1] 11
> 
> sum((x-mean(x))^2)==(sum(x^2)-(N<-length(x))*(mean(x)^2))
[1] TRUE
> sum((x-mean(x))*(y-mean(y)))==sum(x*y)-length(x)*mean(x)*mean(y)
[1] TRUE

(d)    Use the least squares estimates from part (b) to compute the fitted values of y, and complete the remainder of the table below. Put the sums in the last row.

> yhat=b0+b1*x
> e=y-yhat
> sum(e^2)
[1] 4.3
> sum(x*e)
[1] 1.776357e-15
> 
> cbind(x,y,yhat,e,e^2,x*e)
     x y yhat    e          
[1,] 0 6  5.0  1.0 1.00  0.0
[2,] 1 2  3.7 -1.7 2.89 -1.7
[3,] 2 3  2.4  0.6 0.36  1.2
[4,] 3 1  1.1 -0.1 0.01 -0.3
[5,] 4 0 -0.2  0.2 0.04  0.8

(e)    On graph paper, plot the data points and sketch the fitted regression line.

> plot(x,y)
> abline(b0,b1)


(f)     On the sketch in part (e), locate the point of the means (x; y). Does your fitted line pass through that point? If not, go back to the drawing board, literally.

> mean(y)==b0+b1*mean(x)
[1] TRUE

(g)    Show that for these numerical values:

> mean(y)==b0+b1*mean(x)
[1] TRUE

(h)    Show that for these numerical values:

> mean(y)==mean(yhat)
[1] TRUE

(i)     Compute

> sigmahat<-sum(e^2)/(length(x)-2)
> sigmahat
[1] 1.433333

(j)     Compute var(b2)

> sigmahat<-sum(e^2)/(length(x)-2)
> sigmahat
[1] 1.433333

(k)    Adicional. estudie la significancia de los coeficientes:

> seb0<- sigmahat*(sum(x^2)/(length(x)*sum((x-mean(x))^2)))
> seb0
[1] 0.86
 
> seb1<- sigmahat/sum((x-mean(x))^2)
> seb1
[1] 0.1433333 
> 
> c(sqrt(seb0),sqrt(seb1))
[1] 0.9273618 0.3785939
> 
> t1<-b1/sqrt(seb1)
> t0<-b0/sqrt(seb0)
> c(t0,t1)
[1]  5.391639 -3.433759
> 
> gl<-length(e)-2
> 
> #logical; if TRUE (default), probz P[X = x], otherwise, P[X > x].
> pvalor1<-pt(abs(t1), gl, lower.tail = FALSE)*2
> pvalor2<-pt(abs(t0), gl, lower.tail = FALSE)*2
> 
> c(pvalor1,pvalor2)
[1] 0.04142418 0.01250200
 
> #r2
> r2<-sum((yhat-mean(y))^2)/sum((y-mean(y))^2)
> r2
[1] 0.7971698
> 
> # residual error 
> sqrt(sum(e^2)/3)
[1] 1.197219
> 
> #verificar sean los valores correctos.
> summary(lm(y~x))
 
Call:
lm(formula = y ~ x)
 
Residuals:
   1    2    3    4    5 
 1.0 -1.7  0.6 -0.1  0.2 
 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   5.0000     0.9274   5.392   0.0125 *
x            -1.3000     0.3786  -3.434   0.0414 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 1.197 on 3 degrees of freedom
Multiple R-squared:  0.7972,  Adjusted R-squared:  0.7296 
F-statistic: 11.79 on 1 and 3 DF,  p-value: 0.04142

Creando variables por grupos en dplyr (group_by + mutate)

  Simulemos una base de hogares, donde se identifica el hogar, el sexo (1 mujer) y provincia y edad para cada miembro.   # Definir la lista ...