What is data correlation?
Data correlation describes the statistical relationship between two data variables (e.g. states or properties). In statistics, data correlation is any statistical relationship – whether causal (dependent) or not – between two variables.
With the correlation as a measure of the relationship between data, two things should be clarified:
- How strong is the connection?
- What is the direction of the connection?
A correlation of zero means there is no relationship. The value one means complete connection, ie complete dependency .
Positive and negative correlation
A positive correlation means that as one variable increases, so does the other variable. One example is the positive correlation between the time spent studying per week and the level of graduation. Those who learn a lot on a regular basis tend to achieve a higher degree.
In the case of negative correlation (also called anti-correlation), one variable behaves inversely to the other: if one increases or decreases, the other moves in the opposite direction. For example, the tank filling of a motor vehicle decreases with increasing distance.
Saturation limit
Often there is a so-called saturation limit. For example, a car does not go any faster the more accelerator you give. In correlations of the economy, the following often applies: the closer you get to the saturation limit, the more the costs increase and the benefits decrease.
Correlation does not have to mean causality
In general, a correlation is insufficient to infer a causal relationship. Just because one variable tends to change whenever another changes does not automatically mean that one variable is causing the other to change.
A high correlation can indicate causality, but there can also be other explanations for it:
- It can be purely a coincidence, so there is no relationship between the variables.
- There may be a third, unknown variable that makes the relationship between the first and the second seem stronger (or weaker) than it is. The two observed variables are then both linked to this third one.
Examples
Ice cream sales are correlated with the incidence of sunburn. Both of these come from a third variable, being outdoors when there is increased exposure to the sun.
Examples of interdependent phenomena are the correlation between the height of parents and their offspring and the correlation between the price of a product and the amount consumers are willing to buy, shown in the so-called demand curve.
Correlations are useful for making predictions. For example, a utility company may produce less electricity on a mild day due to the correlation between electricity demand and weather. In this example there is a causal relationship as very cold or very warm weather causes people to use more electricity for heating or cooling.