{"id":246,"date":"2023-02-20T17:13:57","date_gmt":"2023-02-20T17:13:57","guid":{"rendered":"https:\/\/content.one.lumenlearning.com\/introstatstest\/chapter\/lines-of-best-fit-learn-it-2\/"},"modified":"2025-05-10T03:12:42","modified_gmt":"2025-05-10T03:12:42","slug":"lines-of-best-fit-learn-it-2","status":"publish","type":"chapter","link":"https:\/\/content.one.lumenlearning.com\/introstatstest\/chapter\/lines-of-best-fit-learn-it-2\/","title":{"raw":"Line of Best Fit: Learn It 2","rendered":"Line of Best Fit: Learn It 2"},"content":{"raw":"<section class=\"textbox learningGoals\">\r\n<ul>\r\n\t<li>Recognize when a linear regression model will fit a given data set.<\/li>\r\n\t<li>Use technology to create scatterplots, find the line of best fit, and find the correlation coefficient.<\/li>\r\n\t<li><span data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Find the estimated slope and y-intercept for a linear regression model&quot;}\" data-sheets-userformat=\"{&quot;2&quot;:4609,&quot;3&quot;:{&quot;1&quot;:0},&quot;12&quot;:0,&quot;15&quot;:&quot;Calibri&quot;}\">Find the estimated slope and [latex]y[\/latex]-intercept for a linear regression model.<\/span><\/li>\r\n\t<li><span data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Use the line of best fit to predict values&quot;}\" data-sheets-userformat=\"{&quot;2&quot;:4609,&quot;3&quot;:{&quot;1&quot;:0},&quot;12&quot;:0,&quot;15&quot;:&quot;Calibri&quot;}\">Use the line of best fit to predict values.<\/span><\/li>\r\n<\/ul>\r\n<\/section>\r\n<h2>Line of Best Fit<\/h2>\r\n<p>A method we will use to make predictions about missing observations or future observations in bivariate data is called <strong>Least Squares Regression (LSR) analysis<\/strong>. The language might seem intimidating at first, but the ideas are quite straightforward, especially with examples to illustrate each new term. For example, LSR analysis can also be described as <strong>linear modeling<\/strong>, where we determine the equation of a <strong>line of best fit<\/strong> to make predictions based on an existing data set.<\/p>\r\n<section class=\"textbox keyTakeaway\">\r\n<h3>line of best fit<\/h3>\r\n<p>The <strong>line of best fit<\/strong> is simply the best line that describes the data points. For real data with natural deviations, the line cannot go through all of the points. In fact, very often, the line does not go through any of the data points.<\/p>\r\n<\/section>\r\n<p>Since no line will be perfect, the best we can do is minimize its error. In this course, we will do this by minimizing the sum total of the squared vertical errors from all data points to the line. This is why the line of best fit is also called the Least Squares Regression Line (LSRL).<\/p>\r\n\r\n[caption id=\"attachment_6496\" align=\"aligncenter\" width=\"662\"]<img class=\"wp-image-6496\" src=\"https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/27\/2023\/02\/20171356\/6.1.L.LineGraph.png\" alt=\"A graph with several points and a line of best fit. Each point is connected to the line of best fit vertically. Beside one of the vertical lines, it reads &quot;Residual = 4 - 10 = -6.&quot;\" width=\"662\" height=\"358\" \/> Figure 1. The line of best fit, or Least Squares Regression Line, minimizes the total squared vertical errors (residuals). Each vertical line represents a residual\u2014the difference between the actual data point and the predicted value from the line.[\/caption]\r\n\r\n<p>The <strong>vertical error<\/strong> associated with each data point is called the <strong>residual <\/strong>of that observation. This error, illustrated by the length of the vertical line, represents how far off a prediction calculated from the line is compared to the actual, observed value; the larger the line, the greater the error associated with that particular observation.<\/p>\r\n<p><strong>Note:<\/strong> For data points that are above the line of best fit, the residuals are <strong>positive<\/strong>, and for data points that are below the line, the residuals are <strong>negative<\/strong>.<\/p>\r\n<section class=\"textbox tryIt\">[ohm2_question hide_question_numbers=1]1156[\/ohm2_question]<\/section>\r\n<section class=\"textbox tryIt\">[ohm2_question hide_question_numbers=1]1157[\/ohm2_question]<\/section>\r\n<section class=\"textbox keyTakeaway\">\r\n<h3>equation for the line of best fit<\/h3>\r\n<p>The equation for the line of best fit is very similar to one you may have seen in a previous math class:<\/p>\r\n\r\n[caption id=\"attachment_439\" align=\"aligncenter\" width=\"600\"]<img class=\"wp-image-439\" src=\"https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/27\/2023\/02\/24201537\/6.2.L-1-300x170.png\" alt=\"An equation that reads y = a + bx. The &quot;y&quot; is labeled as the predicted response or predicted value. The &quot;a&quot; is labeled as the estimated value of the y-intercept when x = 0. The &quot;b&quot; is labeled as the slope of line - constant rate of change. Lastly, the x is labeled as the explanatory variable.\" width=\"600\" height=\"341\" \/> Figure 2. The equation for the line of best fit is written as \u0177 = a + bx, where \u0177 is the predicted value, a is the estimated y-intercept, and b is the slope representing the constant rate of change.[\/caption]\r\n<\/section>\r\n<p>An important distinction between the\u00a0[latex]y=mx+b[\/latex] linear model and the [latex]\\displaystyle\\hat{{y}}={a}+{b}{x}[\/latex] linear model is that in statistics, we are estimating the equation of the line from data. This estimate is denoted by the \u201chat\u201d symbol (^), which signifies a value estimated by the model rather than a known quantity. Please keep this distinction in mind as we progress through the upcoming activities, where we will be interpreting the <strong>estimated or predicted slopes<\/strong> and <strong>estimated or predicted intercepts<\/strong> generated from given data sets.<\/p>\r\n<p>The calculations of the estimated slope and the estimated y-intercept come directly from the data set.\u00a0Below are the formulas for\u00a0<span style=\"font-size: 1rem; text-align: initial;\">[latex]a[\/latex]<\/span>\u00a0and <span style=\"font-size: 1rem; text-align: initial;\">[latex]b[\/latex]<\/span>:<\/p>\r\n<ul>\r\n\t<li>The estimated\u00a0<span style=\"font-size: 1rem; text-align: initial;\">[latex]y[\/latex]-intercept is\u00a0<\/span>[latex]\\displaystyle{a}=\\overline{y}-{b}\\overline{{x}}[\/latex].\r\n\r\n<ul>\r\n\t<li>Note:\u00a0The sample means of the\u00a0<span style=\"font-size: 1rem; text-align: initial;\">[latex]x[\/latex]<\/span><span style=\"font-size: 1rem; text-align: initial;\">\u00a0values and the\u00a0[latex]y[\/latex]\u00a0<\/span><span style=\"font-size: 1rem; text-align: initial;\">values are [latex]\\displaystyle\\overline{{x}}[\/latex] and [latex]\\overline{{y}}[\/latex].\u00a0<\/span><\/li>\r\n<\/ul>\r\n<\/li>\r\n\t<li><span style=\"font-size: 1rem; text-align: initial;\">The estimated slope is [latex]b=r(\\frac{{s}_{y}}{{s}_{x}})[\/latex]. <\/span>\r\n<ul>\r\n\t<li><span style=\"font-size: 1rem; text-align: initial;\">Note: [latex]s_y[\/latex] represents the standard deviation of the [latex]y[\/latex]\u00a0values (response variable), [latex]s_x[\/latex] represents the standard deviation of the\u00a0[latex]x[\/latex] values (explanatory variable), and [latex]r[\/latex] is the correlation coefficient.<\/span><\/li>\r\n\t<li>[reveal-answer q=\"396322\"]Different formula to find the estimated slope.[\/reveal-answer][hidden-answer a=\"396322\"][latex]{b}=\\dfrac{{\\sum{({x}-\\overline{{x}})}{({y}-\\overline{{y}})}}}{{\\sum{({x}-\\overline{{x}})}^{{2}}}}[\/latex].\u00a0 <span style=\"font-size: 1rem; text-align: initial;\">The best-fit line always passes through the point [latex]\\left ({\\overline x},{\\overline y} \\right )[\/latex].[\/hidden-answer]<\/span><\/li>\r\n<\/ul>\r\n<\/li>\r\n<\/ul>\r\n<p>Why are these formulas important?<\/p>\r\n<ul>\r\n\t<li>They demonstrate that both the estimated slope and intercept are calculated directly from the data.\u00a0If you change the data, you will almost certainly get a different line of best fit, slope, and intercept.<\/li>\r\n\t<li>The good news is that we will be relying on software for all calculations involved in the line of best fit.\u00a0However, you will need to use some common sense and appropriate units when providing interpretations.<\/li>\r\n<\/ul>\r\n<section class=\"textbox tryIt\">[ohm2_question hide_question_numbers=1]1279[\/ohm2_question]<\/section>","rendered":"<section class=\"textbox learningGoals\">\n<ul>\n<li>Recognize when a linear regression model will fit a given data set.<\/li>\n<li>Use technology to create scatterplots, find the line of best fit, and find the correlation coefficient.<\/li>\n<li><span data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Find the estimated slope and y-intercept for a linear regression model&quot;}\" data-sheets-userformat=\"{&quot;2&quot;:4609,&quot;3&quot;:{&quot;1&quot;:0},&quot;12&quot;:0,&quot;15&quot;:&quot;Calibri&quot;}\">Find the estimated slope and [latex]y[\/latex]-intercept for a linear regression model.<\/span><\/li>\n<li><span data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Use the line of best fit to predict values&quot;}\" data-sheets-userformat=\"{&quot;2&quot;:4609,&quot;3&quot;:{&quot;1&quot;:0},&quot;12&quot;:0,&quot;15&quot;:&quot;Calibri&quot;}\">Use the line of best fit to predict values.<\/span><\/li>\n<\/ul>\n<\/section>\n<h2>Line of Best Fit<\/h2>\n<p>A method we will use to make predictions about missing observations or future observations in bivariate data is called <strong>Least Squares Regression (LSR) analysis<\/strong>. The language might seem intimidating at first, but the ideas are quite straightforward, especially with examples to illustrate each new term. For example, LSR analysis can also be described as <strong>linear modeling<\/strong>, where we determine the equation of a <strong>line of best fit<\/strong> to make predictions based on an existing data set.<\/p>\n<section class=\"textbox keyTakeaway\">\n<h3>line of best fit<\/h3>\n<p>The <strong>line of best fit<\/strong> is simply the best line that describes the data points. For real data with natural deviations, the line cannot go through all of the points. In fact, very often, the line does not go through any of the data points.<\/p>\n<\/section>\n<p>Since no line will be perfect, the best we can do is minimize its error. In this course, we will do this by minimizing the sum total of the squared vertical errors from all data points to the line. This is why the line of best fit is also called the Least Squares Regression Line (LSRL).<\/p>\n<figure id=\"attachment_6496\" aria-describedby=\"caption-attachment-6496\" style=\"width: 662px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-6496\" src=\"https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/27\/2023\/02\/20171356\/6.1.L.LineGraph.png\" alt=\"A graph with several points and a line of best fit. Each point is connected to the line of best fit vertically. Beside one of the vertical lines, it reads &quot;Residual = 4 - 10 = -6.&quot;\" width=\"662\" height=\"358\" \/><figcaption id=\"caption-attachment-6496\" class=\"wp-caption-text\">Figure 1. The line of best fit, or Least Squares Regression Line, minimizes the total squared vertical errors (residuals). Each vertical line represents a residual\u2014the difference between the actual data point and the predicted value from the line.<\/figcaption><\/figure>\n<p>The <strong>vertical error<\/strong> associated with each data point is called the <strong>residual <\/strong>of that observation. This error, illustrated by the length of the vertical line, represents how far off a prediction calculated from the line is compared to the actual, observed value; the larger the line, the greater the error associated with that particular observation.<\/p>\n<p><strong>Note:<\/strong> For data points that are above the line of best fit, the residuals are <strong>positive<\/strong>, and for data points that are below the line, the residuals are <strong>negative<\/strong>.<\/p>\n<section class=\"textbox tryIt\"><iframe loading=\"lazy\" id=\"ohm1156\" class=\"resizable\" src=\"https:\/\/ohm.one.lumenlearning.com\/multiembedq.php?id=1156&theme=lumen&iframe_resize_id=ohm1156&source=tnh\" width=\"100%\" height=\"150\"><\/iframe><\/section>\n<section class=\"textbox tryIt\"><iframe loading=\"lazy\" id=\"ohm1157\" class=\"resizable\" src=\"https:\/\/ohm.one.lumenlearning.com\/multiembedq.php?id=1157&theme=lumen&iframe_resize_id=ohm1157&source=tnh\" width=\"100%\" height=\"150\"><\/iframe><\/section>\n<section class=\"textbox keyTakeaway\">\n<h3>equation for the line of best fit<\/h3>\n<p>The equation for the line of best fit is very similar to one you may have seen in a previous math class:<\/p>\n<figure id=\"attachment_439\" aria-describedby=\"caption-attachment-439\" style=\"width: 600px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-439\" src=\"https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/27\/2023\/02\/24201537\/6.2.L-1-300x170.png\" alt=\"An equation that reads y = a + bx. The &quot;y&quot; is labeled as the predicted response or predicted value. The &quot;a&quot; is labeled as the estimated value of the y-intercept when x = 0. The &quot;b&quot; is labeled as the slope of line - constant rate of change. Lastly, the x is labeled as the explanatory variable.\" width=\"600\" height=\"341\" srcset=\"https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/27\/2023\/02\/24201537\/6.2.L-1-300x170.png 300w, https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/27\/2023\/02\/24201537\/6.2.L-1-1024x581.png 1024w, https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/27\/2023\/02\/24201537\/6.2.L-1-768x436.png 768w, https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/27\/2023\/02\/24201537\/6.2.L-1-65x37.png 65w, https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/27\/2023\/02\/24201537\/6.2.L-1-225x128.png 225w, https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/27\/2023\/02\/24201537\/6.2.L-1-350x199.png 350w, https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/27\/2023\/02\/24201537\/6.2.L-1.png 1032w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><figcaption id=\"caption-attachment-439\" class=\"wp-caption-text\">Figure 2. The equation for the line of best fit is written as \u0177 = a + bx, where \u0177 is the predicted value, a is the estimated y-intercept, and b is the slope representing the constant rate of change.<\/figcaption><\/figure>\n<\/section>\n<p>An important distinction between the\u00a0[latex]y=mx+b[\/latex] linear model and the [latex]\\displaystyle\\hat{{y}}={a}+{b}{x}[\/latex] linear model is that in statistics, we are estimating the equation of the line from data. This estimate is denoted by the \u201chat\u201d symbol (^), which signifies a value estimated by the model rather than a known quantity. Please keep this distinction in mind as we progress through the upcoming activities, where we will be interpreting the <strong>estimated or predicted slopes<\/strong> and <strong>estimated or predicted intercepts<\/strong> generated from given data sets.<\/p>\n<p>The calculations of the estimated slope and the estimated y-intercept come directly from the data set.\u00a0Below are the formulas for\u00a0<span style=\"font-size: 1rem; text-align: initial;\">[latex]a[\/latex]<\/span>\u00a0and <span style=\"font-size: 1rem; text-align: initial;\">[latex]b[\/latex]<\/span>:<\/p>\n<ul>\n<li>The estimated\u00a0<span style=\"font-size: 1rem; text-align: initial;\">[latex]y[\/latex]-intercept is\u00a0<\/span>[latex]\\displaystyle{a}=\\overline{y}-{b}\\overline{{x}}[\/latex].\n<ul>\n<li>Note:\u00a0The sample means of the\u00a0<span style=\"font-size: 1rem; text-align: initial;\">[latex]x[\/latex]<\/span><span style=\"font-size: 1rem; text-align: initial;\">\u00a0values and the\u00a0[latex]y[\/latex]\u00a0<\/span><span style=\"font-size: 1rem; text-align: initial;\">values are [latex]\\displaystyle\\overline{{x}}[\/latex] and [latex]\\overline{{y}}[\/latex].\u00a0<\/span><\/li>\n<\/ul>\n<\/li>\n<li><span style=\"font-size: 1rem; text-align: initial;\">The estimated slope is [latex]b=r(\\frac{{s}_{y}}{{s}_{x}})[\/latex]. <\/span>\n<ul>\n<li><span style=\"font-size: 1rem; text-align: initial;\">Note: [latex]s_y[\/latex] represents the standard deviation of the [latex]y[\/latex]\u00a0values (response variable), [latex]s_x[\/latex] represents the standard deviation of the\u00a0[latex]x[\/latex] values (explanatory variable), and [latex]r[\/latex] is the correlation coefficient.<\/span><\/li>\n<li>\n<div class=\"qa-wrapper\" style=\"display: block\"><button class=\"show-answer show-answer-button collapsed\" data-target=\"q396322\">Different formula to find the estimated slope.<\/button><\/p>\n<div id=\"q396322\" class=\"hidden-answer\" style=\"display: none\">[latex]{b}=\\dfrac{{\\sum{({x}-\\overline{{x}})}{({y}-\\overline{{y}})}}}{{\\sum{({x}-\\overline{{x}})}^{{2}}}}[\/latex].\u00a0 <span style=\"font-size: 1rem; text-align: initial;\">The best-fit line always passes through the point [latex]\\left ({\\overline x},{\\overline y} \\right )[\/latex].<\/div>\n<\/div>\n<p><\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Why are these formulas important?<\/p>\n<ul>\n<li>They demonstrate that both the estimated slope and intercept are calculated directly from the data.\u00a0If you change the data, you will almost certainly get a different line of best fit, slope, and intercept.<\/li>\n<li>The good news is that we will be relying on software for all calculations involved in the line of best fit.\u00a0However, you will need to use some common sense and appropriate units when providing interpretations.<\/li>\n<\/ul>\n<section class=\"textbox tryIt\"><iframe loading=\"lazy\" id=\"ohm1279\" class=\"resizable\" src=\"https:\/\/ohm.one.lumenlearning.com\/multiembedq.php?id=1279&theme=lumen&iframe_resize_id=ohm1279&source=tnh\" width=\"100%\" height=\"150\"><\/iframe><\/section>\n","protected":false},"author":12,"menu_order":13,"template":"","meta":{"_candela_citation":"[]","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"part":225,"module-header":"learn_it","content_attributions":[],"internal_book_links":[],"video_content":null,"cc_video_embed_content":{"cc_scripts":"","media_targets":[]},"try_it_collection":null,"_links":{"self":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters\/246"}],"collection":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/users\/12"}],"version-history":[{"count":15,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters\/246\/revisions"}],"predecessor-version":[{"id":6505,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters\/246\/revisions\/6505"}],"part":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/parts\/225"}],"metadata":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters\/246\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/media?parent=246"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapter-type?post=246"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/contributor?post=246"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/license?post=246"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}