Factor Investors Strongly Believe in History (Part 1)

If you’re at all interested in investing, you’ve probably heard something about “factor investing”. It’s not a term I’ve used much in Mindfully Investing posts, but many of the investing questions I’ve addressed could be re-framed as questions about investing “factors”. In this two-part series, I’ll describe factor investing and discuss whether the mindful investor should pursue this strategy.

What’s a Factor?

A factor is any characteristic of a company or its stock that might have a predictable relationship to that stock’s returns over time. Although “factor” is a relatively recent term of art in the investing world, the roots of factor investing go back to at least the early 20th century, when John Burr created a method to estimate stock prices based on a company’s intrinsic value. Famous investors like Benjamin Graham and Warren Buffet used characteristics like hidden cash on the books and predictable cash-flows to help select stocks.

The name “factors” came from several foundational studies starting in the 1970s that identified multiple characteristics with an apparent linkage to superior stock returns. Funds that focus on stocks with similar factors now account for 10% of the U.S. stock market capitalization. Most of these funds attempt to achieve returns or risk-adjusted returns that are superior to the broader market, which is often loosely referred to as a factor “premium”.

Five of the most commonly cited factor premiums are shown in this table.

Factor Name	Hypothesis	Type	Common Metrics
Size	Small-cap stocks perform better than large-cap stocks	Fundamental	Market Capitalization
Value	Stocks with low price per value metrics perform better than those with high metrics	Fundamental	Price/Book Value, Price/Earnings, Price/Free Cash Flow
Quality	Stocks with healthy accounting metrics perform better than those with unhealthy metrics	Fundamental	Profitability, Margins, Leverage, Financial Constraints and Distress, Earnings Stability, Accounting Quality
Momentum	Stocks with recent price increases will perform better than those with recent price decreases	Price Movement	Stock Prices Over Time
Volatility	Stocks with lower price volatility perform better than those with higher volatility	Price Movement	Stock Prices Over Time

If factors like these provide an exciting premium above the boring broad-market return, why doesn’t everyone invest in factor funds? One clue is that factor premiums are also generally referred to as “risk premiums”. That is, increasing your portfolio exposure to any of these factors to obtain increased returns will also likely increase your risks, particularly in the form of increased volatility. But mindful investors aren’t particularly worried about small increases in volatility. So, wouldn’t that make factor investing ideal for the mindful investor? To answer that question, we need to consider the historical evidence supporting the existence of factor premiums.

Limited Historical Data

I’ve written in the past about the difficulty of using historical data to predict the future. For good reason, every investing brochure has the disclaimer, “past results do not guarantee future returns”. The total reliable stock market history in the U.S. is about 147 years. Nowadays, a person’s retirement investing time frame can easily exceed 50 years (such as, starting at age 30 and living to 90). So, the total U.S. stock history represents as little as three unique lifetime investing periods. That’s a pretty small sample size to make predictions from, but that doesn’t stop people from trying.

Further, much of the historical data has been cobbled together in ways that introduce more variables to the analysis. Consider Edward Mcquarrie’s stunning example of the history of the small-cap data from the Center for Research in Security Prices (CRSP) database, which is one of the most commonly used data sets for factor analyses. Rarely mentioned facts about the CRSP small-cap data include:

Prior to 1972, 95% of small-cap stocks are not included in the data set!

Most of the 5% that are included were actually former large-cap companies that had fallen on hard times.
Much of the legendary long-term superior performance of small-cap stocks before 1945 was due to a few of these once-large companies recovering from the depths of the depression.

In other words, the data on “small caps” before 1972 is of questionable value for predicting small-cap performance in the future, and for this reason, many of the size factor analyses you’ll see go back no further than 45 years. This represents just a single lifetime investment period or a sample size of exactly one. Not much to work with.

Although I didn’t attempt to drill into the data history of every factor, I tend to agree with Jack Bogle’s view that there are inherent difficulties with assembling consistent historical factor data, particularly the further back you go in history.

Data Mining at the Factor Zoo

Faced with these data limitations and strong business incentives to “improve” investing methods, questionable factor studies have flourished in the last decade. Wesley Gray recently pointed out some of the more problematic methods that are particularly prevalent in finance and factor research:

Researchers and publishers are most excited by positive results and rarely publish negative results.

P-hacking is a broad term for various statistical manipulations that can cause factor premiums to appear more robust and pervasive.
Although these problems exist in many scientific fields, economics, business, and social science research are among some of the worst offenders, and most factor research falls into one or more of these categories.
Data mining is another broadly used term that’s sometimes used interchangeably with p-hacking. In my view, data mining is more about torturing the data to confess any possible positive relationship, without considering whether such a relationship is reasonable or not.

A recent paper explicitly engaged in large-scale data mining, in part to illustrate how absurd the results can be. The researchers evaluated two million stock factors and produced over 20,000 “significant” factors using standard statistical thresholds. Within that group, they identified 17 factors (many of which had never been described before) that showed much stronger statistical relationships, but all of which had no theoretical or common-sense underpinning. One example is a premium based on sorting stocks into common versus ordinary stocks, subtracting out retained earnings and other adjustments, and then dividing by advertising expense. If that makes sense to you as a cause of excess stock returns, please leave a comment and explain it to me. In other words, these “new factors” were just lucky high correlations between two million random combinations of historical stock metrics. There’s no underlying reason to believe these premiums really exist now or will persist into the future.

All this p-hacking and data mining is adding to an ever-growing “factor zoo”. A recent study reviewed 447 factor premiums in the research literature and found that:

286 (64%) of the 447 factors don’t meet standard statistical thresholds for validity

Of the 161 remaining statistically significant factors, the magnitudes of the premiums are often much less than originally reported.
Using a different factor model, which the researchers claim is more robust, causes 115 of the 161 factor premiums to fail tests of statistical significance.
That leaves 46 factors (only 10%) that are replicable.

Campbell Harvey is one researcher sounding the alarm at the factor zoo. He wrote in 2014 that:

“Most of the empirical research in finance is likely false. This implies that half the financial products (promising outperformance) that companies are selling to clients are false.”

Other researchers have made similar broad criticisms that much of the recent finance research and the resulting investing products are suspect.

Factor Persistence

Even when the historical data are relatively consistent and the statistical methods used are replicable, I’ve noted before that factor premiums are erratic and have waxed and waned over their short history. This table summarizes estimates from a few researchers of the percentage of time that positive factor premiums occurred during various time frames.

Study	Factor	1-Year	10-Year	Type	Span	Years
Morningstar	Value-Large Cap	50%	65%	Rolling	1990-2015	25
Morningstar	Value-Small Cap	54%	82%	Rolling	1990-2015	25
AAII	Size (Small)	56%	66%	Rolling	1926-1996	70
The BAM Alliance	Size (Small)	59%	77%	Discreet	1927-2017	90
The BAM Alliance	Value	63%	86%	Discreet	1927-2017	90
The BAM Alliance	Momentum	72%	97%	Discreet	1927-2017	90

One implication from these studies is that the longer you invest, the greater your chances of realizing the desired factor premium. However, Charlie Bilello points out that small-cap outperformed large-cap in about 78% of the years from 1979 to 2015, but small caps had a lower annualized return of 11.4% to 11.7% for large caps over that same period. So, the percentage of years with positive premiums is no guarantee of better overall returns.

Chances of a Premium – Using annual returns data from Portfolio Visualizer, I calculated the annualized returns (Compound Annual Growth Rate or CAGR) for all possible investing start and stop dates from 1972 to 2018 for large-cap, small-cap, and large-cap value stocks. The resulting investing periods vary in length from 1 year to the entire 46-year span of this data set, with every possible combination of time spans in between. This resulted in annualized returns for a total of 1128 possible investing periods. I like this method of looking across many time spans because it gives a more realistic assessment of the probabilities of success. It’s easy to plan to invest for 10 or 20 years, but life often intervenes with emergencies, new business opportunities, and surprise expenses that cause well-intentioned investors to sell before they planned.

Here are histograms of the premiums (factor annualized returns minus large-cap annualized returns) for the size factor (small caps) and the value factor. In both cases, the most likely historical result was a factor premium of between 0 and 1%. (Note these are annualized premiums over the entire investing period, not annual premiums received each year.) For small-cap, 28% of the outcomes were a negative premium, and for value, 25% of the outcomes were negative.

Put another way, a negative annualized premium between 0 and -1% was the third most likely historical outcome for value and the fourth most likely outcome for small-cap.

Negative Premium Durations – Another way to look at the historical persistence of premiums is to see how long past investors had to endure negative premiums. Using the same data set of annualized returns from above, I looked at how long the returns of the small-cap and value factors were less than the annualized returns for large-cap stocks (negative premiums) as shown in these two graphs. Each date on the horizontal axis of the graphs represents the results for someone buying into the market on that date and holding through 2018. So, the investing periods get shorter as you move to the left side of the graphs.

A small-cap investor who started in 1984 would have had to endure nearly 30 straight years of underperformance (or nearly 95% of their 34-year investing period) as compared to the annualized returns from the S&P 500. And for investors who started between 1985 and 1994, a negative streak of 10 years was a common occurrence. The picture is a little better for value investors. The longest streak of negative value premiums is around 10 years, but some of those negative streaks amounted to 70% or even 90% of the total investing period.

The short history of small-cap and value factors tells us, that if you’re going to invest based on factors, you have a decent chance of disappointment and potentially for very long periods.

Selecting the “Best” Factors

Given the see-saw performance of factors, how is a factor investor supposed to decide which factors to pursue? Even using superior statistical methods and thresholds, we’re left with dozens of potentially actionable factors that have sometimes worked and sometimes not. Larry Swedroe’s book on factor investing emphasizes five criteria for prioritizing factors:

Persistent: The factor holds across long periods of time and different economic regimes.
Pervasive: The factor holds across countries, regions, sectors, and even asset classes.

Robust: The factor holds for various definitions and measures (like price/book versus price/earnings for the value factor).
Investable: A real-world investor can practically make money with the factor allowing for investing costs and other implementation issues.
Intuitive: There are logical risk-based or behavioral-based explanations for the factor and why it should continue to exist.

This provides a helpful framework to sift through the factor zoo, and Swedroe arrives at a select list of factors that are pretty similar to the table at the top of this post. But some of these criteria feel inherently subjective. How pervasive does a factor have to be? How many definitions and measures need to agree before we judge the factor as sufficiently robust?

And what’s a “long period” of time for persistence? Many of these studies assume that the on-and-off-again existence of a factor premium for one investing lifetime (around 40 or 50 years) is sufficient to be “persistent”. Is that track record really long enough?

A good illustration of this conundrum is that similar time spans are often used to support the existence of a factor premium but also dismiss periods of factor underperformance. For example, Swedroe discussed whether the value premium is “really dead” given its poor performance over the last 10 years. Swedroe’s defends the value factor by saying:

The value premium has only been slightly negative for the last 10 years
The premium is closer to zero if you look at a range of value metrics
This is a relatively short period of underperformance that’s common in investing

The value premium in the last 10 years was positive for most international markets.

In defense of the size, value, and momentum premiums, Cliff Asness at AQR looked at the 23-year period from 1992 to 2015. Because these three factors were mostly identified before 1992, Asness uses the returns after 1992 as an “out-of-sample” check of the original research results. Asness finds a positive but slightly lower premium for these factors in the out-of-sample period, and he asserts that a 23-year record is more than adequate to resoundingly refute any critique of data mining.

In other words, Swedroe is claiming that 10 years is too soon to declare the defeat of the value factor, and Asness is claiming that 23 years is more than enough time to declare victory for the value factor, as well as two other factors. Somewhere in a mere 13-year window, we go from cooking in the too-soon-to-tell kitchen to serving up factors on the dinner table of certainty.

In a recent interview, Asness notes for the quality factor that “there is little evidence” in AQR’s research that it’s performing differently now than in “backtests” like the ones he presented for size, value, and momentum as summarized above. But Robert Novy-Marx, a professor at the University of Rochester’s Simon Business School counters that:

“Even if it performed really poorly, you wouldn’t know. There’s just not enough out-of-sample time to make any claim one way or another.”

Conclusion

Will even the “persistent” and “robust” factors continue into the future? Is the historical data set long enough to demonstrate the likely efficacy of factor investing in the future? Are the odds favorable enough that you should pursue factor investing? Larry Swedroe answers by saying:

“So, before investing, be sure that you believe strongly in the rationale behind the factor and the reasons why you trust it will persist in the long run. Without this strong belief, it is unlikely that you will be able to maintain discipline during the inevitable long periods of underperformance…Finally, because there is no way to know which factors will deliver premiums in the future, we recommend that you build a portfolio broadly diversified across them.”

In other words, even if some robust factors continue into the future, you’re still going to need to diversify your bets and apply all your mindful investing skills to control your behavior. And a diversified set of factor exposures is likely to dilute any premiums you receive because some factors will be doing poorly while others are doing well. Because exposure to factors most often increases total portfolio volatility, factor investing will likely require even more mindfulness than simply investing in boring S&P 500 or all-market index funds.

Personally, I believe it’s possible that some of these factors will persist into the future, but I don’t believe “strongly” in any of them for the reasons I’ve laid out here. So, Swedroe would probably say I shouldn’t pursue factor investing, even though I think I could be mindful enough to ignore most market gyrations and live with any underperformance of a portfolio tilted towards certain factors. Because the factor premiums I looked at were most often in the 1 to 2% range, it seems likely that I’d still meet most of my investing goals with either an all-market or factor-tilted portfolio.

In my next post (Part 2 of this series), I’ll compare the finance perspective on factor investing to another perspective borrowed from an entirely different field of science. You’ll probably be surprised by what scientific field that is and the results of the comparison. But you’ll have to read my next post to find out.