{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Notebook 4: K-Normal Means\n",
"_Bryan Graham - University of California - Berkeley_\n",
"\n",
"_Ec 240a: Econometrics, Fall 2015_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let $E[Y|X] = m(X) + \\sigma U$ with $U$ a standard normal random variable independent of $X$. Assume that $m(X)$ equals some linear combination of a potentially high dimensional set of basis functions (e.g., a power series in $X$). In lecture we saw how we could use Gram-Schmidt orthonormalization to generate a representation of $m(X)$ in terms of a set of orthonormal basis functions of the same dimension. This, in turn, transforms our series regression problem into a canonical K-Normal means one (cf., Wasserman (2006, Chapter 7)).\n",
"
\n",
"
\n",
"In this notebook I use K-Normal Means theory to study the conditional mean of log Earnings give years of completed schooling and AFQT scores among a sample of 733 white male respondents from the NLSY79. I focus on respondents who were 18 or younger when they took the AFQT test and restrict the analysis to high school graduates. The the population of interest, therefore, is white male high school graduates born in 1962, 1963 or 1964 and resident in the United States in 1979."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Direct Python to plot all figures inline (i.e., not in a separate window)\n",
"%matplotlib inline\n",
"\n",
"# Load libraries\n",
"import numpy as np\n",
"import numpy.linalg\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"from __future__ import division"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Directory where NLSY79_TeachingExtract.csv file is located\n",
"workdir = '/Users/bgraham/Dropbox/Teaching/Teaching_Datasets/'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we import the NLSY79_TeachingExtract.csv dataset as a pandas dataframe and select the appropriate subsample."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Year-of-birth distribution of target sample\n",
"62 307\n",
"63 263\n",
"64 251\n",
"dtype: int64\n"
]
},
{
"data": {
"text/html": [
"
\n", " | PID_79 | \n", "HHID_79 | \n", "Earnings | \n", "School | \n", "AFQT | \n", "LogEarn | \n", "
---|---|---|---|---|---|---|
count | \n", "733.000000 | \n", "733.000000 | \n", "733.000000 | \n", "733.000000 | \n", "733.000000 | \n", "733.000000 | \n", "
mean | \n", "3497.286494 | \n", "3496.601637 | \n", "68569.923403 | \n", "13.829468 | \n", "59.995492 | \n", "10.830661 | \n", "
std | \n", "2619.165544 | \n", "2619.228555 | \n", "57868.069739 | \n", "2.172758 | \n", "26.388355 | \n", "0.888534 | \n", "
min | \n", "7.000000 | \n", "7.000000 | \n", "8.725730 | \n", "12.000000 | \n", "0.094000 | \n", "2.166276 | \n", "
25% | \n", "1805.000000 | \n", "1804.000000 | \n", "34621.595600 | \n", "12.000000 | \n", "40.801998 | \n", "10.452233 | \n", "
50% | \n", "3171.000000 | \n", "3171.000000 | \n", "53954.796857 | \n", "13.000000 | \n", "63.183998 | \n", "10.895902 | \n", "
75% | \n", "4648.000000 | \n", "4648.000000 | \n", "80325.222143 | \n", "16.000000 | \n", "81.324997 | \n", "11.293839 | \n", "
max | \n", "12139.000000 | \n", "12137.000000 | \n", "546176.494286 | \n", "20.000000 | \n", "100.000000 | \n", "13.210697 | \n", "