Modeling and Analysis: Learn It 1

  • Differentiate correlation from causation
  • Decide on the suitability of interpolation and extrapolation
  • Identify the appropriate way to represent data and mathematical models
  • Use multiple representations to choose a model
  • Recognize the limits of models

In today’s increasingly complex world, mathematical models serve as invaluable tools for understanding various phenomena, from climate change to economic trends. However, models are simplifications of reality, and it’s crucial to recognize their limitations and the assumptions upon which they are built. This section aims to equip you with the skills to critically evaluate models, understand the difference between correlation and causation, determine the appropriateness of interpolation and extrapolation, and more.

Correlation versus Causation

Understanding the relationship between variables is a cornerstone of data analysis and modeling. However, it’s easy to misinterpret these relationships, especially when it comes to correlation and causation

Correlation refers to a statistical relationship between two or more variables. When one variable changes, there’s a consistent and predictable pattern of change in another variable. However, this does not imply that one variable causes the other to change.

correlation

A statistical relationship between two or more variables where a change in one variable is associated with a change in another variable.

Studies have found a correlation between ice cream sales and the number of drowning incidents. However, ice cream sales do not cause drownings; both are influenced by the weather.

 Always remember that correlation does not imply causation. Just because two variables move together doesn’t mean one is causing the other to change.

Causation goes a step beyond correlation by establishing a cause-and-effect relationship between variables. In a causal relationship, changes in one variable are directly responsible for changes in another.

causation

A cause-and-effect relationship between variables, where changes in one variable are directly responsible for changes in another variable.

Smoking is causally linked to lung cancer, as extensive research has shown that smoking increases the risk of developing lung cancer.

To establish causation, look for evidence from controlled experiments or longitudinal studies that can rule out other variables.

One of the most common errors in data interpretation is assuming that correlation implies causation. This assumption can lead to incorrect conclusions and misguided actions. For example, if a study finds a correlation between high sugar consumption and poor academic performance, it would be a mistake to immediately conclude that sugar intake causes poor grades. There could be a third variable, such as lack of exercise, affecting both.

Another misconception is that strong correlations are always meaningful. In some cases, correlations can be coincidental or spurious, meaning they occur by chance or are influenced by a third variable.

There’s a famous spurious correlation between the number of films Nicolas Cage appears in and the number of people who drown in swimming pools. [1] Clearly, one does not cause the other.



  1. https://www.wnycstudios.org/podcasts/otm/articles/spurious-correlations