{"id":253,"date":"2023-02-20T17:14:04","date_gmt":"2023-02-20T17:14:04","guid":{"rendered":"https:\/\/content.one.lumenlearning.com\/introstatstest\/chapter\/line-of-best-fit-dig-deeper\/"},"modified":"2025-05-11T23:17:14","modified_gmt":"2025-05-11T23:17:14","slug":"line-of-best-fit-fresh-take","status":"publish","type":"chapter","link":"https:\/\/content.one.lumenlearning.com\/introstatstest\/chapter\/line-of-best-fit-fresh-take\/","title":{"raw":"Line of Best Fit: Fresh Take","rendered":"Line of Best Fit: Fresh Take"},"content":{"raw":"<section class=\"textbox learningGoals\">\r\n<ul>\r\n\t<li>Recognize when a linear regression model will fit a given data set.<\/li>\r\n\t<li>Use technology to create scatterplots, find the line of best fit, and find the correlation coefficient.<\/li>\r\n\t<li><span data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Find the estimated slope and y-intercept for a linear regression model&quot;}\" data-sheets-userformat=\"{&quot;2&quot;:4609,&quot;3&quot;:{&quot;1&quot;:0},&quot;12&quot;:0,&quot;15&quot;:&quot;Calibri&quot;}\">Find the estimated slope and [latex]y[\/latex]-intercept for a linear regression model.<\/span><\/li>\r\n\t<li><span data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Use the line of best fit to predict values&quot;}\" data-sheets-userformat=\"{&quot;2&quot;:4609,&quot;3&quot;:{&quot;1&quot;:0},&quot;12&quot;:0,&quot;15&quot;:&quot;Calibri&quot;}\">Use the line of best fit to predict values.<\/span><\/li>\r\n<\/ul>\r\n<\/section>\r\n<section class=\"textbox recall\">\r\n<div style=\"font-weight: 400;\">\r\n<p><strong>Bivariate<\/strong> data is data that contains two variables.<\/p>\r\n<\/div>\r\n<div style=\"font-weight: 400;\">\r\n<p>An <strong>explanatory variable<\/strong> is an <strong>independent<\/strong> variable.It may explain or cause a change in another variable.<\/p>\r\n<\/div>\r\n<div style=\"font-weight: 400;\">\r\n<p>A <strong>response variable<\/strong> is a<strong> dependent variable<\/strong>.\u00a0It changes in response to the explanatory variable.<\/p>\r\n<\/div>\r\n<\/section>\r\n<section class=\"textbox recall\" aria-label=\"Recall\">\r\n<div style=\"font-weight: 400;\">\r\n<h3><strong>The main idea\u00a0<\/strong><\/h3>\r\n<\/div>\r\n<div style=\"font-weight: 400;\">\r\n<p><strong>Least Squares\u00a0<\/strong><strong>Regression<\/strong>\u00a0<strong>(LSR)<\/strong>\u00a0analysis is a statistical tool that models the strength of a linear relationship between an independent (explanatory) variable and a dependent (response) variable.<\/p>\r\n<\/div>\r\n<div style=\"font-weight: 400;\">\r\n<p>A<strong> scatterplot<\/strong> is used to display the relationship, in which each data point is a pair of data values, both quantitative, one independent and one dependent. See the image below, depicting the quarterly percent change in GDP over the quarterly percent change in the unemployment rate. Each data point tells us that, when the percent change in unemployment is some particular amount, the percent change in GDP is a particular corresponding amount.<\/p>\r\n<p><img class=\"aligncenter wp-image-6498\" src=\"https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/27\/2023\/02\/20171403\/1024px-Okuns_law_quarterly_differences.svg_.png\" alt=\"A scatterplot of the quarterly change in unemployment rate and GDP, with the line of best fit in black.\" width=\"367\" height=\"242\" \/><\/p>\r\n<\/div>\r\n<div style=\"font-weight: 400;\">\r\n<p>If we think the data on the scatterplot looks even roughly linear, as it does in the graph above, we can try to find a <strong>line of best fit<\/strong> using <strong>Least Squares Regression (LSR).\u00a0<\/strong><\/p>\r\n<p>While LSR can be performed by hand, we'll use technology.<\/p>\r\n<h3><span style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\">Two Conditions Before Using Least Squares Regression<\/span><\/h3>\r\n<h3 style=\"padding-left: 40px;\"><span style=\"font-weight: 400; font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif; font-size: 16px;\">1. Both variables must be quantitative.<\/span><\/h3>\r\n<h3 style=\"padding-left: 40px;\"><span style=\"font-weight: 400; font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif; font-size: 16px;\">2.\u00a0The data must appear at least roughly linear when graphed on a scatterplot.<\/span><\/h3>\r\n<p>The Least Squares Regression analysis produces a line through the data set that best approximates the linear trend present in the data. It does so by minimizing the sum of the distances between each data point and the line itself.<\/p>\r\n<h3><span style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\">Vocabulary<\/span><\/h3>\r\n<ul>\r\n\t<li><span style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\">Least Squares Regression, Linear Regression, and Linear Modeling are all terms for the same thing: Finding a line of best fit for a data set.<\/span><\/li>\r\n\t<li><span style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\">The line of best fit is also called the Least Squares Regression line or the regression line.<\/span><\/li>\r\n\t<li><span style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\">The distance between any data point and the line of best fit is called the\u00a0<\/span><strong style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\">residual<\/strong><span style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\">, or the\u00a0<\/span><strong style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\">vertical error<\/strong><span style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\"> of the data point.<\/span><\/li>\r\n<\/ul>\r\n<p>The equation of the line of best fit is the equation of a line, [latex]\\hat{y}=a+bx[\/latex]. The notation [latex]\\hat{y}[\/latex] is a statistical notation that indicates the output of the equation, the value of the dependent variable, is the general\u00a0predicted value of the response variable for this linear model.<\/p>\r\n<p>The\u00a0<strong>correlation coefficient, [latex]r[\/latex]<\/strong> tells us how strong the linear relationship is. Values of [latex]r[\/latex] very close to [latex]-1[\/latex] or [latex]1[\/latex] are strongly linear, with most of the data points very close to the line of best fit. The closer [latex]r[\/latex] is to [latex]0[\/latex], the weaker the linear relationship is between the two variables.<\/p>\r\n<ul>\r\n\t<li>If [latex]r[\/latex] is close to [latex]-1[\/latex] (negative 1), we say the linear relationship is strongly decreasing.<\/li>\r\n\t<li>If [latex]r[\/latex] is close to [latex]1[\/latex] (positive 1), we say the linear relationship is strongly increasing.<\/li>\r\n\t<li>If [latex]r[\/latex] is close to [latex]0[\/latex], or equal to [latex]0[\/latex], we say the relationship is not linear.<\/li>\r\n<\/ul>\r\n<p>See the image below, which labels each scatterplot shape with its [latex]r[\/latex]-value.<\/p>\r\n<p><img class=\"aligncenter wp-image-1979 size-large\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5826\/2022\/11\/20200222\/Scatterplot-r-values-1024x456.jpg\" alt=\"An image showing many different scatterplot shapes that can occur with the correlation coefficient listed above it.\" width=\"1024\" height=\"456\" \/><\/p>\r\n<\/div>\r\n<\/section>\r\n<section class=\"textbox watchIt\" aria-label=\"Watch It\">\r\n<p>The videos below will introduce you to the ideas of correlation and Least Squares Regression<\/p>\r\n<p>[embed]https:\/\/www.youtube.com\/embed\/CWnfwZRAuaY[\/embed]<\/p>\r\n<p>[embed]https:\/\/www.youtube.com\/embed\/0T0z8d0_aY4[\/embed]<\/p>\r\n<\/section>\r\n<p>&nbsp;<\/p>","rendered":"<section class=\"textbox learningGoals\">\n<ul>\n<li>Recognize when a linear regression model will fit a given data set.<\/li>\n<li>Use technology to create scatterplots, find the line of best fit, and find the correlation coefficient.<\/li>\n<li><span data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Find the estimated slope and y-intercept for a linear regression model&quot;}\" data-sheets-userformat=\"{&quot;2&quot;:4609,&quot;3&quot;:{&quot;1&quot;:0},&quot;12&quot;:0,&quot;15&quot;:&quot;Calibri&quot;}\">Find the estimated slope and [latex]y[\/latex]-intercept for a linear regression model.<\/span><\/li>\n<li><span data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Use the line of best fit to predict values&quot;}\" data-sheets-userformat=\"{&quot;2&quot;:4609,&quot;3&quot;:{&quot;1&quot;:0},&quot;12&quot;:0,&quot;15&quot;:&quot;Calibri&quot;}\">Use the line of best fit to predict values.<\/span><\/li>\n<\/ul>\n<\/section>\n<section class=\"textbox recall\">\n<div style=\"font-weight: 400;\">\n<p><strong>Bivariate<\/strong> data is data that contains two variables.<\/p>\n<\/div>\n<div style=\"font-weight: 400;\">\n<p>An <strong>explanatory variable<\/strong> is an <strong>independent<\/strong> variable.It may explain or cause a change in another variable.<\/p>\n<\/div>\n<div style=\"font-weight: 400;\">\n<p>A <strong>response variable<\/strong> is a<strong> dependent variable<\/strong>.\u00a0It changes in response to the explanatory variable.<\/p>\n<\/div>\n<\/section>\n<section class=\"textbox recall\" aria-label=\"Recall\">\n<div style=\"font-weight: 400;\">\n<h3><strong>The main idea\u00a0<\/strong><\/h3>\n<\/div>\n<div style=\"font-weight: 400;\">\n<p><strong>Least Squares\u00a0<\/strong><strong>Regression<\/strong>\u00a0<strong>(LSR)<\/strong>\u00a0analysis is a statistical tool that models the strength of a linear relationship between an independent (explanatory) variable and a dependent (response) variable.<\/p>\n<\/div>\n<div style=\"font-weight: 400;\">\n<p>A<strong> scatterplot<\/strong> is used to display the relationship, in which each data point is a pair of data values, both quantitative, one independent and one dependent. See the image below, depicting the quarterly percent change in GDP over the quarterly percent change in the unemployment rate. Each data point tells us that, when the percent change in unemployment is some particular amount, the percent change in GDP is a particular corresponding amount.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-6498\" src=\"https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/27\/2023\/02\/20171403\/1024px-Okuns_law_quarterly_differences.svg_.png\" alt=\"A scatterplot of the quarterly change in unemployment rate and GDP, with the line of best fit in black.\" width=\"367\" height=\"242\" \/><\/p>\n<\/div>\n<div style=\"font-weight: 400;\">\n<p>If we think the data on the scatterplot looks even roughly linear, as it does in the graph above, we can try to find a <strong>line of best fit<\/strong> using <strong>Least Squares Regression (LSR).\u00a0<\/strong><\/p>\n<p>While LSR can be performed by hand, we&#8217;ll use technology.<\/p>\n<h3><span style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\">Two Conditions Before Using Least Squares Regression<\/span><\/h3>\n<h3 style=\"padding-left: 40px;\"><span style=\"font-weight: 400; font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif; font-size: 16px;\">1. Both variables must be quantitative.<\/span><\/h3>\n<h3 style=\"padding-left: 40px;\"><span style=\"font-weight: 400; font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif; font-size: 16px;\">2.\u00a0The data must appear at least roughly linear when graphed on a scatterplot.<\/span><\/h3>\n<p>The Least Squares Regression analysis produces a line through the data set that best approximates the linear trend present in the data. It does so by minimizing the sum of the distances between each data point and the line itself.<\/p>\n<h3><span style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\">Vocabulary<\/span><\/h3>\n<ul>\n<li><span style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\">Least Squares Regression, Linear Regression, and Linear Modeling are all terms for the same thing: Finding a line of best fit for a data set.<\/span><\/li>\n<li><span style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\">The line of best fit is also called the Least Squares Regression line or the regression line.<\/span><\/li>\n<li><span style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\">The distance between any data point and the line of best fit is called the\u00a0<\/span><strong style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\">residual<\/strong><span style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\">, or the\u00a0<\/span><strong style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\">vertical error<\/strong><span style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\"> of the data point.<\/span><\/li>\n<\/ul>\n<p>The equation of the line of best fit is the equation of a line, [latex]\\hat{y}=a+bx[\/latex]. The notation [latex]\\hat{y}[\/latex] is a statistical notation that indicates the output of the equation, the value of the dependent variable, is the general\u00a0predicted value of the response variable for this linear model.<\/p>\n<p>The\u00a0<strong>correlation coefficient, [latex]r[\/latex]<\/strong> tells us how strong the linear relationship is. Values of [latex]r[\/latex] very close to [latex]-1[\/latex] or [latex]1[\/latex] are strongly linear, with most of the data points very close to the line of best fit. The closer [latex]r[\/latex] is to [latex]0[\/latex], the weaker the linear relationship is between the two variables.<\/p>\n<ul>\n<li>If [latex]r[\/latex] is close to [latex]-1[\/latex] (negative 1), we say the linear relationship is strongly decreasing.<\/li>\n<li>If [latex]r[\/latex] is close to [latex]1[\/latex] (positive 1), we say the linear relationship is strongly increasing.<\/li>\n<li>If [latex]r[\/latex] is close to [latex]0[\/latex], or equal to [latex]0[\/latex], we say the relationship is not linear.<\/li>\n<\/ul>\n<p>See the image below, which labels each scatterplot shape with its [latex]r[\/latex]-value.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-1979 size-large\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5826\/2022\/11\/20200222\/Scatterplot-r-values-1024x456.jpg\" alt=\"An image showing many different scatterplot shapes that can occur with the correlation coefficient listed above it.\" width=\"1024\" height=\"456\" \/><\/p>\n<\/div>\n<\/section>\n<section class=\"textbox watchIt\" aria-label=\"Watch It\">\n<p>The videos below will introduce you to the ideas of correlation and Least Squares Regression<\/p>\n<p><iframe loading=\"lazy\" id=\"oembed-1\" title=\"Scatter Plots : Introduction to Positive and Negative Correlation\" width=\"500\" height=\"375\" src=\"https:\/\/www.youtube.com\/embed\/CWnfwZRAuaY?feature=oembed&#38;rel=0\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/p>\n<p><iframe loading=\"lazy\" id=\"oembed-2\" title=\"Linear Regression - Least Squares Criterion  Part 1\" width=\"500\" height=\"375\" src=\"https:\/\/www.youtube.com\/embed\/0T0z8d0_aY4?feature=oembed&#38;rel=0\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/p>\n<\/section>\n<p>&nbsp;<\/p>\n","protected":false},"author":12,"menu_order":20,"template":"","meta":{"_candela_citation":"[]","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"part":225,"module-header":"fresh_take","content_attributions":[],"internal_book_links":[],"video_content":null,"cc_video_embed_content":{"cc_scripts":"","media_targets":[]},"try_it_collection":null,"_links":{"self":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters\/253"}],"collection":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/users\/12"}],"version-history":[{"count":8,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters\/253\/revisions"}],"predecessor-version":[{"id":6656,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters\/253\/revisions\/6656"}],"part":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/parts\/225"}],"metadata":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters\/253\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/media?parent=253"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapter-type?post=253"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/contributor?post=253"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/license?post=253"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}