What is a Linear Regression?

To do linear regression analysis, first, we need to add excel add-insExcel Add-insAn add-in is an extension that adds more features and options to the existing Microsoft Excel.read more by following steps.

Click on File – Options (This will open an Excel Options pop-up for you).

Click on Add-ins – Select “Excel Add-ins” from the Manage Drop Down in excelManage Drop Down In ExcelA drop-down list in excel is a pre-defined list of inputs that allows users to select an option.read more, then Click on “Go”.

It will open the “Add-ins” pop-up. Select Analysis ToolPakAnalysis ToolPakExcel’s data analysis toolpak can be used by users to perform data analysis and other important calculations. It can be manually enabled from the addins section of the files tab by clicking on manage addins, and then checking analysis toolpak.read more, then click “OK.”

The “Data Analysis” add-in will appear under the “Insert” tab.

Let us understand the below examples of linear regression analysis in excelRegression Analysis In ExcelRegression is done to define relationships between two or more variables in a data set in statistics regression is done by some complex formulas. Still, excel has provided us with tools for regression analysis. So the study took the park of the excel, clicked on data analysis, and then on regression analysis on excel.read more.

Linear Regression Analysis Examples

Example #1

Suppose we have monthly sales and spent on marketing for last year. Now, we need to predict future sales based on last year’s sales and marketing spending.

Click “Data Analysis” under the “Data” tab to open the “Data Analysis” pop-up for you.

Select “Regression” from the list and click “OK.”

A “Regression” pop-up will open.

Select the range of sales $C$1:$C$13 in the Y-axis box as the dependent variable, and $B$1:$B$14 in the X-axis as advertising spend is the independent variable.

Checkmark the “Labels” box if you have selected headers in the data. Else, it will give you the error.

Next, select “Output Range” if you want to get the value on the specific range on the worksheet. Otherwise, select “New Worksheet Ply,” which will add a new worksheet and give you the result.

Then, check the “Residuals” box and click “OK.”

It will add worksheets and give you the following result.

Let us understand the output.

Summary Output

Multiple R: This represents the correlation coefficient. The value 1 shows a positive relationship, and the value 0 shows no relationship.

R Square: R SquareR SquareR Squared formula depicts the possibility of an event’s occurrence within an expected outcome. It is “r = n (∑xy) – ∑x ∑y / √ [n* (∑x2 – (∑x)2)] * [n* (∑y2 – (∑y)2)]”, where r is the Correlation coefficient, n is the number in the given dataset, x is the first variable in the context and y is the second variable. read more represents the coefficient of determination. It tells you the percentage of points that fall on the regression lineRegression LineA regression line indicates a linear relationship between the dependent variables on the y-axis and the independent variables on the x-axis. The correlation is established by analyzing the data pattern formed by the variables.read more. 0.49 means that 49% of values fit the model

Adjusted R square: This is adjusted R squareAdjusted R SquareAdjusted R Squared refers to the statistical tool which helps the investors in measuring the extent of the variance of the variable which is dependent that can be explained with the independent variable and it considers the impact of only those independent variables which have an impact on the variation of the dependent variable.read more, which requires when you have more than one X variable.

Standard Error: This represents an estimate of the standard deviation of error. It is the precision of the regression coefficient that is measured.

Observations: This is the number of observations you have taken in a sample.

ANOVA – Df: Degrees of freedomDegrees Of FreedomDegrees of freedom (df) refers to the number of independent values (variable) in a data sample used to find the missing piece of information (fixed) without violating any constraints imposed in a dynamic system. These nominal values have the freedom to vary, making it easier for users to find the unknown or missing value in a dataset.read more

SS: Sum of Squares.

MS: we have two MS

  • Regression MS is Regression SS/Regression Df.Residual MS is the mean squared error (Residual SS / Residual Df).

F: F test for the null hypothesisNull HypothesisNull hypothesis presumes that the sampled data and the population data have no difference or in simple words, it presumes that the claim made by the person on the data or population is the absolute truth and is always right. So, even if a sample is taken from the population, the result received from the study of the sample will come the same as the assumption.read more.

Significance F: P-Values associated with Significance

Coefficient: The coefficient gives you the estimate of the least squares.

T Statistic: T Statistic for null hypothesis vs. the alternate hypothesis.

P-Value: This is the p-value for the hypothesis test.

Lower 95% and Upper 95%: These are the lower boundary and the upper boundary for the confidence intervalThe Confidence IntervalConfidence Interval refers to the degree of uncertainty associated with specific statistics & it is often employed along with the Margin of Error. Confidence Interval = Mean of Sample ± Critical Factor × Standard Deviation of Sample. read more

Residuals Output.: We have 12 observations based on the data. The second column represents “Predicted” sales, and the third is “Residuals.” Residuals are the difference in predicted sales from the actual ones.

Example#2

Select the predicted sales and marketing column.

Go to the chart group under the “Insert” tab. Next, select the “Scatter” chart icon.

It will insert the scatter plot in excelInsert The Scatter Plot In ExcelScatter plot in excel is a two dimensional type of chart to represent data, it has various names such XY chart or Scatter diagram in excel, in this chart we have two sets of data on X and Y axis who are co-related to each other, this chart is mostly used in co-relation studies and regression studies of data.read more. See the image below.

Right-click on any point, then select Add Trendline in excelTrendline In ExcelA trend line, often known as the best-fit line, depicts the data’s trend. It shows the overall trend, pattern, or direction based on the data points available.read more. It will add a trendline to your chart.

  • You can format the trendline by right-clicking anywhere on the trendline and then selecting the format trendline.You can make more improvements to the chart. i.e., formatting the trendline, color and changing title, etc.You can also show the formula on the graph by checking the formula on the chart and displaying the R-squared value.

Some More Examples of Linear Regression Analysis:

  • Predictions of umbrellas sold based on the rain happened in the area.Prediction of AC sold based on the temperature in Summer.During the exam season, stationery sales, basically exam guide sales, increased.Prediction of sales when advertising has done based on high TRP serial where an advertisement is done, the popularity of brand ambassador, and the footfalls at the place of holding where an advertisement is being published.Sales of a house based on the locality, area, and price.

Example #3

Suppose we have nine students with their IQ level and the number they scored on Test.

Step 1: First, find out the dependent and independent variables. Here, the test score is the dependent variable, and IQ is the independent variable, as the test score varies as IQ changes.

Step 2: Go to Data Tab – Click on Data Analysis – Select regression – click “OK.”

It will open the “Regression” window for you.

Step 3. Input test score range in the “Input Y Range” box and IQ in Input X Range Box. (Check on “Labels” if you have headers in your data range. Select output options, then check on the desired residuals. Click “OK.”)

You will get the summary output shown in the below Image.

Step 4: Analysing the regression by summary output.

Multiple R: Here, the correlation coefficient is 0.99, which is very near 1, which means the linear relationship is very positive.

R Square: R-Square value is 0.983, which means that 98.3% of values fit the model.

P-value: Here, P-value is 1.86881E-07, which is very less than .1, Which means IQ has significant predictive values.

See the chart below.

You can see that almost all the points are falling in line or a nearby trendline.

Example #4

We need to predict sales of AC based on the sales and temperature for a different month.

Follow the below steps to get the regression result.

Step 1: First, find out the dependent and independent variables. Sales are the dependent variable, and temperature is an independent variable as sales vary as Temp changes.

Step 2: Go to the “Data” tab – Click on “Data Analysis” – Select “Regression,” – click “OK.”

It will open the regression window for you.

Step 3. Input sales in the “Input Y Range” box and Temp in the “Input X Range” box. (Check on “Labels” if you have headers in your data range. Select output options, then check on the desired Residuals. Click Ok.

It will give you a summary output as below.

Step 4: Analyse the result.

Multiple R: Here, the correlation coefficient is 0.877, near 1, which means the Linear relationshipLinear RelationshipA linear relationship describes the relation between two distinct variables - x and y - in the form of a straight line on a graph. When presenting a linear relationship through an equation, the value of y is derived through the value of x, reflecting their correlation.read more is positive.

R Square: R-Square value is 0.770, which means that 77% of values fit the model.

P-Value: Here, P-value is 1.86881E-07, which is very less than .1, Which means IQ has significant predictive values.

Example #5

Now, let us do a regression analysis for multiple independent variables:

First, you need to predict the sales of a mobile that will launch next year. Then, you have the price and population of the countries affecting the sales of mobiles.

Step 1. First, find out the dependent and independent variables. Here, sales are the dependent variable, as quantity and population. Both are independent variables as sales vary with the country’s quantity and population.

Step 2. Go to the “Data” tab – Click on “Data Analysis” – Select “Regression” – click “OK.”

Step 3. Input sales in the “Input Y Range” box and select quantity and population in the “Input X Range” box. (Check on “Labels” if you have headers in your data range. Select output options, then check on the desired residuals. Click “OK.”

Run the regression using data analysis under the “Data” tab. It will give you the below result.

Multiple R: Here, the correlation coefficientCorrelation CoefficientCorrelation Coefficient, sometimes known as cross-correlation coefficient, is a statistical measure used to evaluate the strength of a relationship between 2 variables. Its values range from -1.0 (negative correlation) to +1.0 (positive correlation). read more is 0.93, which is very near 1, which means the Linear relationship is very positive.

R Square: R-Square value is 0.866, which means that 86.7% of values fit the model.

Significance F: Significance F is less than .1, which means that the regression equation has a significant predictive value.

P-Value: If you look at P-value for quantity and population, you can see that values are less than .1, which means quantity and population have significant predictive values. The fewer P-values mean that a variable has more significant predictive values.

However, quantity and population have significant predictive value. Still, If you look at P-value for quantity and population, you can see that quantity has a lesser P-value in excelP-value In ExcelP-Value, or Probability Value, is the deciding factor on the null hypothesis for the probability of an assumed result to be true, being accepted or rejected, & acceptance of an alternative result in case of the assumed results rejection. read more than population. It means quantity has a more significant predictive value than population.

Things to Remember

  • Whenever one selects data, one must always check the dependent and independent variables.Linear regression analysis considers the relationship between the mean of the variables.It only models the relationship between the linear variables.Sometimes, it is not the best fit for a real-world problem. For example: (age and wages). Most of the time, wages increase as age increases. However, after retirement, age increases but wages decrease.

This article has been a guide to Linear Regression and its definition. Here, we discuss how to perform a linear regression analysis in Excel with the help of examples and a downloadable Excel sheet. You can learn more about Excel from the following articles: –

  • Statistics in ExcelConfidence Interval in ExcelFormula of RegressionDescriptive Statistics in Excel