{"id":277,"date":"2023-02-20T17:14:22","date_gmt":"2023-02-20T17:14:22","guid":{"rendered":"https:\/\/content.one.lumenlearning.com\/introstatstest\/chapter\/assessing-the-fit-of-a-line-learn-it-4\/"},"modified":"2025-05-11T23:23:29","modified_gmt":"2025-05-11T23:23:29","slug":"assessing-the-fit-of-a-line-learn-it-4","status":"publish","type":"chapter","link":"https:\/\/content.one.lumenlearning.com\/introstatstest\/chapter\/assessing-the-fit-of-a-line-learn-it-4\/","title":{"raw":"Assessing the Fit of a Line: Learn It 4","rendered":"Assessing the Fit of a Line: Learn It 4"},"content":{"raw":"<section class=\"textbox learningGoals\">\r\n<ul>\r\n\t<li>Describe the connection between the residual and the position of a data point relative to the line of best fit.<\/li>\r\n\t<li>Create and use a residual plot to identify influential points and determine the most appropriate regression model.<\/li>\r\n\t<li><span data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Determine the reliability of predictions from the line of best fit using the residuals and standard error of the residuals&quot;}\" data-sheets-userformat=\"{&quot;2&quot;:4609,&quot;3&quot;:{&quot;1&quot;:0},&quot;12&quot;:0,&quot;15&quot;:&quot;Calibri&quot;}\">Determine the reliability of predictions from the line of best fit using the residuals and standard error of the residuals.<\/span><\/li>\r\n<\/ul>\r\n<\/section>\r\n<h2>Residual Standard Error<\/h2>\r\n<p>Using [latex]r[\/latex] and [latex]R^2[\/latex], we are able to determine whether the line of best fit is a useful model and how well the line fits the data. However, we\u2019ve also seen how the line of best fit can be used to calculate predicted values. So, how can we make a general assessment of the accuracy of predictions from the line? To do so, we\u2019ll look at the distribution of residuals, specifically focusing on the variability.<\/p>\r\n<section class=\"textbox keyTakeaway\">\r\n<h3>residual standard error<\/h3>\r\n<p>The<strong> residual standard error<\/strong>, [latex]s_e[\/latex], is a measure of the variability in the residuals. It is also known as the residual standard deviation. It is the typical error we expect in predictions using the line of best fit. It is a way to quantify the spread of the points around the line of best fit on the scatterplot.<\/p>\r\n<p>The formula for the residual standard error is: [latex]s_e = \\sqrt{\\dfrac{1}{n-2}\\left(y_i-\\hat{y}_i\\right)^{2}}[\/latex]<\/p>\r\n<\/section>\r\n<p>[reveal-answer q=\"653663\"]Explanation of the formula[\/reveal-answer]<br \/>\r\n[hidden-answer a=\"653663\"]<\/p>\r\n<p>Previously, we calculated the error in a single prediction by calculating: Residual = Observed value \u2212 Predicted value.<\/p>\r\n<p>[latex](y_i-\\hat{y}_i)[\/latex] is the residual\/error at the [latex]i^{th}[\/latex]-point on the [latex]x[\/latex]-axis, that is, the difference between the true value of [latex]y[\/latex] ([latex]y_i[\/latex]) and the value predicted by the linear model ([latex]\\hat{y}_i[\/latex]).<\/p>\r\n<p>Because some of the residuals\/errors are positive and some are negative, these errors will cancel each other out. In order to remedy this situation, one solution is to take the square of these errors and then calculate the sum of these squared errors for all data points, and finally take the square root of this sum. Furthermore, instead of dividing by the sample size [latex]n[\/latex], we can divide by the degrees of freedom [latex]n-2[\/latex] to obtain an unbiased estimation of the standard deviation of the error term. [\/hidden-answer]<\/p>\r\n<p>A large residual standard error indicates there is a lot of spread in the scatter of the points around the line of best fit and thus more variability in the residuals. If all the data points fit perfectly on the line, the line is a perfect fit for the data and the residual standard error will be zero. This scenario almost never occurs in practice, since there is rarely data with observations that fall in a perfect line.<\/p>\r\n<p>One thing to keep in mind is that the regression standard error has the same units as the response variable. Therefore, you want to keep the response variable, units, and context of the data in mind as you use the residual standard error to evaluate how well the line fits the data.<\/p>\r\n<p><strong>Note:<\/strong> Most statistical software computes [latex]r[\/latex], [latex]R^2[\/latex], and [latex]s_e[\/latex]. Therefore, our focus is not on calculating but on understanding and interpreting.<\/p>\r\n<p>Select the <strong>Animal Longevity<\/strong> data set and let's investigate this data set again<strong>.<\/strong><\/p>\r\n<p><iframe src=\"https:\/\/lumen-learning.shinyapps.io\/linear_regression\/\" width=\"100%\" height=\"850\"><\/iframe><\/p>\r\n<p><br \/>\r\n[<a href=\"https:\/\/lumen-learning.shinyapps.io\/linear_regression\/\" target=\"_blank\" rel=\"noopener\">Trouble viewing? Click to open in a new tab.<\/a>]<\/p>\r\n<section class=\"textbox tryIt\">[ohm2_question hide_question_numbers=1]3036[\/ohm2_question]<\/section>","rendered":"<section class=\"textbox learningGoals\">\n<ul>\n<li>Describe the connection between the residual and the position of a data point relative to the line of best fit.<\/li>\n<li>Create and use a residual plot to identify influential points and determine the most appropriate regression model.<\/li>\n<li><span data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Determine the reliability of predictions from the line of best fit using the residuals and standard error of the residuals&quot;}\" data-sheets-userformat=\"{&quot;2&quot;:4609,&quot;3&quot;:{&quot;1&quot;:0},&quot;12&quot;:0,&quot;15&quot;:&quot;Calibri&quot;}\">Determine the reliability of predictions from the line of best fit using the residuals and standard error of the residuals.<\/span><\/li>\n<\/ul>\n<\/section>\n<h2>Residual Standard Error<\/h2>\n<p>Using [latex]r[\/latex] and [latex]R^2[\/latex], we are able to determine whether the line of best fit is a useful model and how well the line fits the data. However, we\u2019ve also seen how the line of best fit can be used to calculate predicted values. So, how can we make a general assessment of the accuracy of predictions from the line? To do so, we\u2019ll look at the distribution of residuals, specifically focusing on the variability.<\/p>\n<section class=\"textbox keyTakeaway\">\n<h3>residual standard error<\/h3>\n<p>The<strong> residual standard error<\/strong>, [latex]s_e[\/latex], is a measure of the variability in the residuals. It is also known as the residual standard deviation. It is the typical error we expect in predictions using the line of best fit. It is a way to quantify the spread of the points around the line of best fit on the scatterplot.<\/p>\n<p>The formula for the residual standard error is: [latex]s_e = \\sqrt{\\dfrac{1}{n-2}\\left(y_i-\\hat{y}_i\\right)^{2}}[\/latex]<\/p>\n<\/section>\n<p><div class=\"qa-wrapper\" style=\"display: block\"><button class=\"show-answer show-answer-button collapsed\" data-target=\"q653663\">Explanation of the formula<\/button><\/p>\n<div id=\"q653663\" class=\"hidden-answer\" style=\"display: none\">\n<p>Previously, we calculated the error in a single prediction by calculating: Residual = Observed value \u2212 Predicted value.<\/p>\n<p>[latex](y_i-\\hat{y}_i)[\/latex] is the residual\/error at the [latex]i^{th}[\/latex]-point on the [latex]x[\/latex]-axis, that is, the difference between the true value of [latex]y[\/latex] ([latex]y_i[\/latex]) and the value predicted by the linear model ([latex]\\hat{y}_i[\/latex]).<\/p>\n<p>Because some of the residuals\/errors are positive and some are negative, these errors will cancel each other out. In order to remedy this situation, one solution is to take the square of these errors and then calculate the sum of these squared errors for all data points, and finally take the square root of this sum. Furthermore, instead of dividing by the sample size [latex]n[\/latex], we can divide by the degrees of freedom [latex]n-2[\/latex] to obtain an unbiased estimation of the standard deviation of the error term. <\/p><\/div>\n<\/div>\n<p>A large residual standard error indicates there is a lot of spread in the scatter of the points around the line of best fit and thus more variability in the residuals. If all the data points fit perfectly on the line, the line is a perfect fit for the data and the residual standard error will be zero. This scenario almost never occurs in practice, since there is rarely data with observations that fall in a perfect line.<\/p>\n<p>One thing to keep in mind is that the regression standard error has the same units as the response variable. Therefore, you want to keep the response variable, units, and context of the data in mind as you use the residual standard error to evaluate how well the line fits the data.<\/p>\n<p><strong>Note:<\/strong> Most statistical software computes [latex]r[\/latex], [latex]R^2[\/latex], and [latex]s_e[\/latex]. Therefore, our focus is not on calculating but on understanding and interpreting.<\/p>\n<p>Select the <strong>Animal Longevity<\/strong> data set and let&#8217;s investigate this data set again<strong>.<\/strong><\/p>\n<p><iframe loading=\"lazy\" src=\"https:\/\/lumen-learning.shinyapps.io\/linear_regression\/\" width=\"100%\" height=\"850\"><\/iframe><\/p>\n<p>\n[<a href=\"https:\/\/lumen-learning.shinyapps.io\/linear_regression\/\" target=\"_blank\" rel=\"noopener\">Trouble viewing? Click to open in a new tab.<\/a>]<\/p>\n<section class=\"textbox tryIt\"><iframe loading=\"lazy\" id=\"ohm3036\" class=\"resizable\" src=\"https:\/\/ohm.one.lumenlearning.com\/multiembedq.php?id=3036&theme=lumen&iframe_resize_id=ohm3036&source=tnh\" width=\"100%\" height=\"150\"><\/iframe><\/section>\n","protected":false},"author":12,"menu_order":31,"template":"","meta":{"_candela_citation":"[]","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"part":225,"module-header":"learn_it","content_attributions":[],"internal_book_links":[],"video_content":null,"cc_video_embed_content":{"cc_scripts":"","media_targets":[]},"try_it_collection":null,"_links":{"self":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters\/277"}],"collection":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/users\/12"}],"version-history":[{"count":14,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters\/277\/revisions"}],"predecessor-version":[{"id":6663,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters\/277\/revisions\/6663"}],"part":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/parts\/225"}],"metadata":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters\/277\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/media?parent=277"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapter-type?post=277"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/contributor?post=277"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/license?post=277"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}