Logistic regression aims to measure the relationship between a categorical dependent variable and one or more independent variables (usually continuous) by plotting the dependent variables’ probability scores. A categorical variable is a variable that can take values falling in limited categories instead of being continuous.
Logistic regression uses regression to predict the outcome of a categorical dependant variable on the basis of predictor variables. The probable outcomes of a single trial are modeled as a function of the explanatory variable using a logistic function. Logistic modeling is done on categorical data which may be of various types including binary and nominal. For example, a variable might be binary and have two possible categories of ‘yes’ and ‘no’; or it may be nominal say hair color maybe black, brown, red, gold and grey.
Another objective of logistic regression is to check if the probability of getting a particular value of the dependant variable is related to the independent variable. Multiple logistic regression is used when there are more than one independent variables under study.
Logistic Regression’s history can be traced back to the 19th century when it was first used to describe the growth rate of populations by Quetelet and Verhulst. Today, logistic regression is widely used in the field of medicine and biology. Epidemiology is also an area where logistic regression is widely used for identification of risk factors for diseases and to plan for preventive medication. Studies concerned with public health and related policy decisions use logistic regression as an important statistical tool.
Say, a retail chain wants to increase its sales by focusing on customer loyalty and acquiring new customers through word of mouth effect. It could conduct a survey among its existing customers to collect data on which Logistic Regression can be applied. Logistic Regression would help identify factors like product quality, service quality, brand image, reward programs, etc, that impact customers’ loyalty and willingness to recommend a retail store’s products to others. The results would help improve the store’s performance on these parameters and increase customer loyalty. It would also bring down customer acquisition costs by increasing customer referrals.
Scenarios or experimental studies where the independent variable is under our control (and we can set its values), are very suitable for using logistic regression. For example a cake manufacturer may want to determine if baking at 330°F, 350°F and 380°F would lead to ‘soft’ or ‘hard’ variety of cake (assuming he sells both the soft and hard cake varieties under different brands and at different prices). It would be best to use logistic regression in such a scenario instead of other statistical tests like ANOVA. It would be meaningless to compare the mean baking temperatures between soft and hard cakes and test the difference using ANOVA or t-test here, because the baking temperature does not depend on the cakes’ softness but rather, if there is a relationship - it is the other way round.
Further, logistic regression also scores over linear regression in the sense that linear regression does not take into account the weights of each sample in case of multiple observations. For example, if the baker has baked two batches of cakes, where the first batch consisted of 10 cakes (of which 6 were hard and 4 soft), and the second batch consisted of 40 cakes (of which 21 were hard and 19 soft); linear regression would assign equal importance to both batches (irrespective of the number of cakes being 10 vs. 40). A logistic regression would however take this into account and give the second batch 4 times more weightage than the first batch of cakes.
Note: Research Optimus responds to business enquiries only, and we do not make unsolicited or automated calls. If you receive such calls please submit your complaint to https://www.donotcall.gov/