Wednesday, July 20, 2016

Genius Off the Deep End: Math and Formulas and Such (Part VI)

We continue our exploration of Economic Value.  This one dives more deeply into shallow mathematics - or is it takes a shallow look at deep mathematics?  Who the hell knows?  This week I'm savoring the million dollar premium in a low-cost rental in hick country drinking glorified furniture polish, so I'm not inclined to care.

Most academic writers try to work from existing streams of thought in a particular area.  Such a task - particularly in an area like economics - often takes the work outside the realm of comprehensibility for the general reading public.  Simkovic & McIntyre steer clear of firmly placing their work within existing literature on human capital theory, but at the same time keep the work appropriately out of reach of the most unsophisticated readers, walking a tightrope that is the hallmark of geniuses like Beethoven or Stephen King.

For example, despite being written for what is mostly a non-scientific audience (let's not kid ourselves, right?), Economic Value gives rather short shrift to explaining its methods in approachable terms.  So let's start by looking at the basic statistical approach on a more introductory level.  One example of a term that goes under-explained is "Ordinary Least Squares" ("OLS")  The wikipedia entry gives a fair, if dense, explanation. It's a standard method of regression analysis in research.

In layman's terms, the goal of regression analysis generally is that where you have what's called a dependent variable (in this particular case earnings), you take all of the independent variable observations you have (in our case, whether there's a "law degree" and a bunch of other factors that apply to a person), and you try to find a line, represented by a mathematical formula, that best "fits" the data provided to explain the relationship between all those other variables (education, experience, skills, race, gender, etc.) and earnings.  The ultimate design is that you find some formula where you can plug in the numbers for independent variables, even if not seen in data, and generate a predicted output.

In their data modelling, Simkovic & McIntyre use what's called an independent "dummy variable" in the formula for whether the person has a law degree or not.  What this means is that when the person has only a bachelor's degree, the dummy variable is set to 0 and the dependent variable (earnings) can be derived from all the other explanatory variables (like years of experience or some variable to represent raw ability).  When the person has a "law degree," their dummy variable is set to 1.

The coefficient that goes with the dummy variable is thus what they ultimately use to determine their earnings premium, although the precise translation from this coefficient to actual numbers isn't crystal clear to the lay reader.

Another phrase often used, and not thoroughly explained, is "log earnings."  "Log" stands for "logarithm," which can be explained here for those of you who quit before that particular math class.  Basically hit the log button on your calculator and then type an amount of annual earnings. Simkovic & McIntyre use log earnings instead of simple dollar-value earnings because there is a tradition in quantitative labor economics to use log earnings as a dependent variable (one can read the Card article cited in Economic Value for an explanation).

However, one brilliant thing that using "log earnings" instead of raw earnings is that it allows for large spans to be concealed in what otherwise appear to be short distances.  For example, Figure 5 in Economic Value is ostensibly used to show that "[r]ecent premiums for young law graduates are within historical norms." 

Yet, note that as we pointed out in Part 3, the 2008-2013 grouping has an incredibly small number of observations.  Look at the differences in log earnings represented in Figure 5, and pay particular attention to the confidence intervals.  Simkovic & McIntyre try to demonstrate that there's not much variance with an "average premium" of 0.56 across the observations.  But the confidence intervals range from <0.2 in 2004-2007 and ~1.0 for 2000-2003.  Moreover, because logarithms are inversely exponential, very small changes in log earnings actually mean large shifts when it comes to real numbers that people actually have to buy things with in real life.  For example, the difference represented in Figure 5 between the 2000-2003 premium and the 2004-2007 premium is 0.29.  Well, log 100000 - log 50000 = 0.3 or so.  That's a lot of variance to claim anything is "within historical norms" or steady across a small number of observations.

But still, using log earnings and a linear regression is fairly well accepted in the economic literature.  There isn't too much controversial (or ingenious) in the basic idea. 

A better question, though, is what genius path led Simkovic & McIntyre to reject what appears to be an existing common approach in the literature to human capital theories?  I ask this question not to criticize, but to comprehend.

For several decades, labor economists have used Mincer's Human Capital Earnings Function to evaluate the relationship between education, earnings, and experience.  Card - again, cited by Simkovic & McIntyre - discusses this literature at length.

The basic Mincer Formula (which has been expanded in other literature) is this:

Log(earnings) = log(baseline earnings) + a(years of schooling) + b(work experience) + c(work experience)^2

As fatally flawed as their SIPP data is, Simkovic & McIntyre had everything they needed to calculate the added earnings of three JD years using this formula or some custom derivation from it appropriately modified to the "law degree" market.

For example, one of the most common problems identified with the Mincer equation is that log earnings may not have a linear relationship with years of education (which is what results in the Mincer formula for people who have no experience).  In other words, certain years of training may "pay off" more than others in terms of developing human capital.  This possibility will be discussed more in Part 7.

Several scholars (including at least one, Card, cited by Simkovic & McIntyre) have found a convex relationship between earnings and education.  As Lemieux (2003) suggests upon reviewing Mincer (1997), log earnings may only be linear in a stable environment where labor demand meets the incoming labor supply, pointing out that log earnings was convex where the demand for skilled labor outpaced the existing supply.  Lemieux suggests adding an additional polynomial to the base model, and other researchers have played around with numerous formulations.  There are people doing integral calculus with this shit in the 2000s.

So what formula do Simkovic and McIntyre use?

Log(earnings) = a(dummy for law degree) + monomial control variables and constants

What's missing?  Well, in short, they've seemingly done away with experience being a variable in their basic model despite it being a crucial variable in decades of labor economics' studying of the relationship between education and earnings.  Whereas the Mincer formula would not only factor experience as a key variable in the evaluation of log earnings but factor it exponentially, Simkovic & McIntyre seem to take a more circuitous route to addressing the role of experience in lawyer earnings.  They've also (by the lack of any polynomials) assumed log earnings and education have a fundamentally linear relationship when such fact remains an open question.

On one hand, the scant mention and departure from Mincer and his progeny is odd - almost like ignoring strongly persuasive authority from multiple federal appeals courts, but again we must understand what Simkovic & McIntyre are doing:  simplifying the mystical and clarifying the sloppily obfuscated.  So what if they started from scratch with a lousy dataset and ignored what might be the most important variable in explaining earnings in the labor economics literature?

That they took a deeper mathematical approach than prior studies alone should win them praise; that they managed to sidestep certain issues with a deft use of numbers should win them super-tenure.

Again, corrections are welcome.  As a lawyer, and not an economist, statistician, or professor of law, my station is inferior and I would welcome a greater understanding of Simkovic & McIntyre's keen analysis.

1 comment:

  1. Thank you for discussing log functions, and how implicit it is that small variations is logs result in huge changes in absolute terms. That's fine if your are plotting decibels against frequency, say, and means very little when you are talking about, say, money. I swear S&M uses this to sound impressive and obfuscate, not because it is mathematically necessary or helpful.

    "We've taken the second-derivative of the log-base 10 of salary data with respect to time, and one can clearly see that nothing has changed in the slope for the intervening 50 years...! Therefore, we are right, everything is just fine, so go away and stop asking questions."