What is Correlation?

History and Definition

In 1888, Sir Francis Galton first defined Correlation in a paper that he published on human differences and inheritance of intelligence. It was soon adopted as a tool of statistical measure by other researchers. While, Sir Galton’s method of calculating Correlation has changed drastically over the years, its original essence still holds true.

Correlation is the use of statistical tools and techniques to tell us if two variables are related. The measure of correlation between two variables is called correlation coefficient, usually denoted by ‘r’ or ‘ρ’. Pearson’s Correlation Coefficient is one of the most popularly used correlation coefficients and measures only the linear correlation between two variables while other correlation coefficients might also measure non-linear relationships. The type of relationship between two variables maybe causal, reciprocal, parallel or complementary and reflects the simultaneous change in values of both variables over time.

Other measures of Correlation are Rank Correlation coefficients like Spearman’s Rank Correlation and Kendall’s Rank Correlation which also measure the extent to which an increase in one variable is accompanied by an increase in the other variable. In these cases, the variables might not be linearly correlated. These Rank Correlations are thus a different kind of a measure rather than being a replacement for Pearson’s Correlation Coefficient.

Illustration

Basic examples of correlation can be seen around us in our daily lives when we say that the demand and price of a commodity are correlated, or the amount of rainfall and crop output of an area are correlated. An important aspect is that when the price of commodity goes up, and its demand goes down; they are negatively correlated. On the other hand when the rainfall received in a region is high, and its crop production increases; they are said to be positively correlated. In general, the value of r lies in a range between –1 to 1; and we can say that if the value of r is such that:

  • 1>r>0, positive correlation exists
  • r=0, no correlation exists
  • –1<r<0, negative correlation exists

Benefits and Practical Usage of Correlation

There are several advantages of using Correlation as a statistical tool. Firstly, it is simple to calculate and easy to interpret. Secondly, it helps you make better predictions. For example, if you know that a certain type of dog food is correlated to a longer life span in dogs, you can predict which dogs in a random sample will live longer and you may also alter the diet of dogs accordingly.

On the other hand, we also need to be aware of the following pitfalls when using Correlation. Firstly, most commonly used Correlation Coefficients only measure a linear relationship, so even if our calculations show that r=0, we should be careful that a non-linear relationship might exist between the variables. Secondly, we should not use r to imply a causal relationship between variables, i.e. even if we know that 2 variables x and y are Correlated, the measure does not tell us whether x causes y or y causes x. Thirdly, there might be cases where a very high correlation might exist between 2 variables, say your height and your income. Some such cases might be completely illogical and should be discarded instead of trying to find any sense in the data.

Correlation studies are widely used in various industries and sciences. Say, a pharma group having 100’s of products and widespread distribution centers and retail shops across the country wants to optimize store inventories to make sure products are available during peak seasons. It can collect sales transaction data across all outlets to study and find seasonal fluctuations in demand for its products. A correlation analysis can be conducted between products and their monthly/seasonal sales; to help create a better store inventory model that optimizes product availability based on seasonal demand.

Correlation coefficient is considered an important tool by investment advisors who study the market trends and price movements of different stocks over time and their relation to each other or to the industry and other factors. It helps them make better investment decisions and manage portfolios. Correlation is also used by product companies to take action on pricing decisions by predicting the possible impact on demand. It is widely used to study the impact of economic policies and decisions by studying the changing situation over time with respect to the economic policies used.

Note: Research Optimus responds to business enquiries only, and we do not make unsolicited or automated calls. If you receive such calls please submit your complaint to https://www.donotcall.gov/