We continue our exploration of Economic Value. This one dives more deeply into shallow mathematics - or is it takes a shallow look at deep mathematics? Who the hell knows? This week I'm savoring the million dollar premium in a low-cost rental in hick country drinking glorified furniture polish, so I'm not inclined to care.
Most academic writers try to work from existing streams of thought in a particular area. Such a task - particularly in an area like economics - often takes the work outside the realm of comprehensibility for the general reading public. Simkovic & McIntyre steer clear of firmly placing their work within existing literature on human capital theory, but at the same time keep the work appropriately out of reach of the most unsophisticated readers, walking a tightrope that is the hallmark of geniuses like Beethoven or Stephen King.
For example, despite being written for what is mostly a non-scientific audience (let's not kid ourselves, right?), Economic Value gives
rather short shrift to explaining its methods in approachable terms. So let's start by looking at the basic statistical approach on a more introductory level. One example of a term
that goes under-explained is "Ordinary Least Squares" ("OLS") The wikipedia entry gives a fair, if dense, explanation. It's a standard method of regression analysis in research.
In
layman's terms, the goal of regression analysis generally is that where you have what's called a
dependent variable (in this particular case earnings), you take all of the
independent variable observations you have (in our case, whether there's
a "law degree" and a bunch of other factors that apply to a person), and you try to find a
line, represented by a mathematical formula, that best "fits" the data
provided to explain the relationship between all those other variables (education, experience, skills, race, gender, etc.) and earnings. The ultimate design is that you find some formula where you can plug in the numbers for independent variables, even if not seen in data, and generate a predicted output.
In their data modelling, Simkovic &
McIntyre use what's called an independent "dummy variable" in the formula for whether
the person has a law degree or not. What this means is that when the
person has only a bachelor's degree, the dummy variable is set to 0 and
the dependent variable (earnings) can be derived from all the other
explanatory variables (like years of experience or some variable to represent raw ability). When the person has a "law degree," their dummy
variable is set to 1.
The coefficient that goes with
the dummy variable is thus what they ultimately use to determine their
earnings premium, although the precise translation from this coefficient
to actual numbers isn't crystal clear to the lay reader.
Another phrase often used, and not thoroughly explained, is "log earnings." "Log" stands for "logarithm," which can be explained here for those of you who quit before that particular math class. Basically hit the log button on your calculator and then type an amount of annual earnings.
Simkovic & McIntyre use log earnings instead of simple dollar-value
earnings because there is a tradition in quantitative labor economics
to use log earnings as a dependent variable (one can read the Card
article cited in Economic Value for an explanation).
However,
one brilliant thing that using "log earnings" instead of raw earnings is that it
allows for large spans to be concealed in what otherwise appear to be
short distances. For example, Figure 5 in Economic Value is ostensibly used to show that "[r]ecent premiums for young law graduates are
within historical norms."
Yet, note that as we pointed out in Part 3,
the 2008-2013 grouping has an incredibly small number of
observations. Look at the differences in log
earnings represented in Figure 5, and pay particular attention to the confidence intervals. Simkovic & McIntyre try to demonstrate that there's not much variance with an "average premium" of 0.56 across the observations. But the
confidence intervals range from <0.2 in 2004-2007 and ~1.0 for
2000-2003. Moreover, because logarithms are inversely exponential, very small changes in log earnings actually mean large shifts when it comes to real numbers that people actually have to buy things with in real life. For example, the difference represented in Figure 5 between the 2000-2003 premium and the 2004-2007 premium is 0.29. Well, log 100000 - log 50000 = 0.3 or so. That's a lot of variance to claim anything is "within historical norms" or steady across a small number of observations.
But
still, using log earnings and a linear regression is fairly well
accepted in the economic literature. There isn't too much controversial (or ingenious) in the basic idea.
A better question, though, is
what genius path led Simkovic & McIntyre to reject what appears to be an existing common approach in the literature to human
capital theories? I ask this question not to criticize, but to comprehend.
For several decades, labor economists
have used Mincer's Human Capital Earnings Function to evaluate the
relationship between education, earnings, and experience. Card - again,
cited by Simkovic & McIntyre - discusses this literature at length.
The basic Mincer Formula (which has been expanded in other literature) is this:
Log(earnings) = log(baseline earnings) + a(years of schooling) + b(work experience) + c(work experience)^2
As fatally flawed as their SIPP data is, Simkovic & McIntyre had everything
they needed to calculate the added earnings of three JD years using this formula or some custom derivation from it appropriately modified to the "law degree" market.
For example, one
of the most common problems identified with the Mincer equation is that
log earnings may not have a linear relationship with years of
education (which is what results in the Mincer formula for people who have no experience). In other words, certain years of training may "pay off" more than
others in terms of developing human capital. This possibility will be discussed more in Part 7.
Several scholars (including at least one, Card,
cited by Simkovic & McIntyre) have found a convex relationship
between earnings and education. As Lemieux (2003) suggests upon
reviewing Mincer (1997), log earnings may only be linear in a stable
environment where labor demand meets the incoming labor supply, pointing
out that log earnings was convex where the demand for skilled labor
outpaced the existing supply. Lemieux suggests adding an additional
polynomial to the base model, and other researchers have played around
with numerous formulations. There are people doing integral calculus with this shit in the 2000s.
So what formula do Simkovic and McIntyre use?
Log(earnings) = a(dummy for law degree) + monomial control variables and constants
What's missing? Well, in short, they've seemingly done away with experience being a variable in their basic model despite
it being a crucial variable in decades of labor economics' studying of
the relationship between education and earnings. Whereas the Mincer formula would not only factor experience as a key variable in the evaluation of log earnings but factor it exponentially, Simkovic & McIntyre seem to take a more circuitous route to addressing the role of experience in lawyer earnings. They've also (by the
lack of any polynomials) assumed log earnings and education have a fundamentally linear
relationship when such fact remains an open question.
On one hand, the scant mention and departure from Mincer and his progeny is odd - almost like ignoring strongly persuasive authority from multiple federal appeals courts, but again we must understand what Simkovic & McIntyre are doing: simplifying the mystical and clarifying the sloppily obfuscated. So what if they started from scratch with a lousy dataset and ignored what might be the most important variable in explaining earnings in the labor economics literature?
That they took a deeper mathematical approach than prior studies alone should win them praise; that they managed to sidestep certain issues with a deft use of numbers should win them super-tenure.
Again, corrections are welcome. As a lawyer, and not an economist, statistician, or professor of law, my station is inferior and I would welcome a greater understanding of Simkovic & McIntyre's keen analysis.
Thank you for discussing log functions, and how implicit it is that small variations is logs result in huge changes in absolute terms. That's fine if your are plotting decibels against frequency, say, and means very little when you are talking about, say, money. I swear S&M uses this to sound impressive and obfuscate, not because it is mathematically necessary or helpful.
ReplyDelete"We've taken the second-derivative of the log-base 10 of salary data with respect to time, and one can clearly see that nothing has changed in the slope for the intervening 50 years...! Therefore, we are right, everything is just fine, so go away and stop asking questions."