Press "Enter" to skip to content

Understanding Data Analysis

Two months ago Donald Trump shocked the world by winning the election. Regardless of your political persuasion or whom you support you were likely surprised by his win. Data Analysis of elections are almost a national past time.  All of the models and analysis predicted a different outcome. Even the stock market itself predicted a different result, and as a result there was a period of havoc in the market while the stock market corrected.

Similarly, people have built predictive models of stock market prices, climate change, day to day weather, and even your purchasing habits. Now plenty of these models are accurate, and will accurately predict outcomes. However, the important thing to understand is that models are based on assumptions.
For much of my last job I managed a team of Data Analysts, or Data Scientists as they are now more commonly called. During this time it was my team’s task to build models to predict everything from break down rates of instruments to buying patterns of customers. Using these analysis we would attempt to position our organization to respond to customer needs with a minimum of costs. Sometimes we got it right, in which case our engineers covered the customer need in a timely manner and at the lowest cost to our company. Other times though we got it way wrong and had to employ temps and other mitigating measures.


The difference between success and failure in these analyses was usually based on whether we had made the correct assumptions as the basis for our analysis. Had we correctly assumed the customer’s usage rate of their instrument? What about the cleanliness of their lab? Whether the customer regularly engaged in preventative maintenance? These all helped start us on the path to creating a model.

Mean, Median, and Distribution

There is a popular saying, “there are lies, damnable lies, and statistics”. You can make statistics say anything with enough effort, simply by modifying the underlying assumptions. Take statistics on the average net worth of society as a simple example. If you assumed that society was evenly dispersed, you would use a mean, which measures the average of all numbers. However, if you assumed that there was significant income variety you might take the median, which is the middle value. But that also still might not show the story. Imagine a striation where most people are either very poor or very rich with no in the middle. In that case you would not want to look at either the mean or the median. Instead you would want to look at the distribution of incomes.

Statistical Sample

The other major thing you need to consider is whether your sample truly represents the population of outcomes. If you are predicting an election, for example, you’re likely basing your predictions on polling people, social media, or both. The issue of course is perhaps this means a large portion of society is not represented. The idea that society is represented is in fact another one of those assumptions.

This is also why predicting individual stock outcomes is so hard. We have a data set of outcomes that even in the longest running stocks is probably 180 years. And yet 180 years denotes only 150 distinct 30 year periods that would represent your investment horizon. It is extremely likely there is a 30 year period somewhere in the future that will not be like any of those previous periods. It is unlikely that all 30 year periods are represented in the data, which explains why individual stock prediction is not particularly reliable. Overall economic data has a longer track record, which is why it is likely at least slightly more reliable. Still even if statistics somehow get it right they still have not told you “Why?”.

Why or What does the data mean

Often we look at data and see a situation that occurs at the same time as some other occurrence. If it does it frequently enough you will hear we call something correlated. Correlation is helpful as it can tell us how frequently two things occur together (or don’t). However, what it can’t tell you is causation. What truly caused something is only determinable via using tools like the scientific method and testing. Why? Because even if they occur together they may just be doing so based on a coincidence. They also might be caused by some altogether different phenomenon. I saw this regularly when we’d look at a data set superficially before splitting our data by things like customer type, region, etc. Prior to splitting you might see an improvement based on an action, but if you split up the data you could see any change was caused by a demographic variable, not an improvement. Knowing what variables to control for are defined by those scientific tests. These tests are also what should ultimately define the assumptions I mentioned a few paragraphs ago.


The final important aspect of data analysis is benchmarking. Benchmarking is choosing an item of similar consistency to compare numbers against. So say you’re looking at a company and trying to decide if it is using too much leverage? In this case you would compare the company to others in their industry to determine the true situation. Looking at the company in a vacuum would not take into account if it is a asset dependent business like a railroad. Comparing in the context of all factors would tell you which company in an industry is the better bet to succeed. The risk again is what assumptions led you to the comparison. Should AirBnb for example be compared to Marriott? Visa to Amex (Only one actually provides the lending behind the card)?

Data Analysis Take Aways

Ultimately whenever you see models or predictions, you should ask what assumptions were used to make the prediction, what variables they controlled for, what is the benchmark if any, and how they viewed the results. Ask what evidence was used to formulate and justify those assumptions. In particular, since this is a finance blog, the relationship is primarily related to those stock and economy predictions. As you dig you will find that the better models have significant testing behind them, but many are just gut feel. Even those market prediction models with significantly more data that are significantly tested still go wrong from time to time. When they do go wrong we learn one more thing about our world that allows us to iterate closer to a correct model. In the interim, until something like a financial model is perfected, you are probably better off investing in index funds. However, if you venture beyond them always remember to ask.

Do you use Data Analysis to choose investments?


  1. Smart Provisions
    Smart Provisions January 16, 2017

    Great post about data, FTF.

    I work in an industry that requires a lot of data analysis and often times, our data modeling and analysis comes out wrong because our client was expect something or else they withheld information from us. It might simply be because we were unable to drill down on all of the hidden variables. Less unknowns is always better.

    For investments, I don’t quite perform data analysis, but maybe in the future, I’ll learn and try to program my own algorithm to automate my trading on a fun and small investment account.

    • January 16, 2017

      It’s very common to be missing critical data. Really you need a good data scientist defining the collection method up front and free reign to the data to really be successful. What industry if I might ask?

  2. SomeRandomGuyOnline
    SomeRandomGuyOnline January 16, 2017

    Lots of parallels when it comes to looking at and interpreting scientific and medical studies. For example, you might read a study titled “So-and-so drug reduces your risk of heart attack!” But when you actually look at the methods and data of the study, sometimes that conclusion can’t be drawn. For example, if they only study the drug in 30 people, you can’t really say the results will apply to the general public. Same can be said if they don’t control for the study sample’s age and demographics.

    Great article. Just goes to show that you need to read studies and reports with a bit of a critical eye.

    • January 16, 2017

      Thanks for adding a different perspective. From what I remember from working in pharmaceuticals (over ten years ago) the data analysis is particularly intensive. A drug could work with a specific race or even gnome and not others. Reading closely the makeup of the study is paramount.

  3. Mrs. BITA
    Mrs. BITA January 17, 2017

    This is a good succinct explanation of data analysis. One of my old favourites that clearly illustrates the problem with looking at stats in isolation is the one by G.B. Shaw “Statistics show that of those who contract the habit of eating, very few survive”

    Mean vs. median is one of my little pet peeves (I have a small stable of them. I groom them everyday). A shockingly large number of people have not heard of median or have forgotten what they learned in school. Never ask mean to do a job meant for median. If you do, bad decisions are almost certain.

    • January 17, 2017

      I hadn’t seen that one before. The one I always get a kick out of is the great poison known as DiHydrogen Monoxide.

  4. Mustard Seed Money
    Mustard Seed Money January 17, 2017

    I have to admit there have been times that I used data analysis to see how defensive stocks were during the great recession to see if it made sense to add them to my portfolio. However, that was only one piece of my research and I did a ton more to ensure that it made sense to own. It’s worked out so far but then again we haven’t had a bear market for me to fully test out my theories.

    • January 18, 2017

      It has seemed like prices have followed a pretty standard pattern over the last eight years or so. However it’s certainly not the norm. I’d be interested in hearing what your latest picks are just for curiousity sake.

  5. Wall Street Physician
    Wall Street Physician January 19, 2017

    “In the interim, until something like a financial model is perfected, you are probably better off investing in index funds.”

    When someone perfects a financial model, they won’t share it with us, but start printing money!

    • January 19, 2017

      You never know it could be one of my readers that cracks the code. If so I implore them to please share 😉

Leave a Reply

Your email address will not be published. Required fields are marked *