{"id":1470,"date":"2023-06-22T02:29:01","date_gmt":"2023-06-22T02:29:01","guid":{"rendered":"https:\/\/content.one.lumenlearning.com\/introstatstest\/chapter\/transforming-data-fresh-take\/"},"modified":"2025-05-17T02:40:51","modified_gmt":"2025-05-17T02:40:51","slug":"transforming-data-fresh-take","status":"publish","type":"chapter","link":"https:\/\/content.one.lumenlearning.com\/introstatstest\/chapter\/transforming-data-fresh-take\/","title":{"raw":"Transforming Data - Fresh Take","rendered":"Transforming Data &#8211; Fresh Take"},"content":{"raw":"<section class=\"textbox learningGoals\">\r\n<ul>\r\n\t<li><span data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Decide which transformation to use for the different type of data sets and analyze the results&quot;}\" data-sheets-userformat=\"{&quot;2&quot;:6913,&quot;3&quot;:{&quot;1&quot;:0},&quot;11&quot;:4,&quot;12&quot;:0,&quot;14&quot;:{&quot;1&quot;:2,&quot;2&quot;:0},&quot;15&quot;:&quot;Calibri&quot;}\">Decide which transformation to use for the different type of data sets and analyze the results<\/span><\/li>\r\n<\/ul>\r\n<\/section>\r\n<p><strong>Data transformation<\/strong> is any act you perform on collected raw data in order for it to be better and more useful when the data is processed.<\/p>\r\n<p>You may add a fixed number, square each data value, cube each data value, square root each data value, or even apply a logarithm to each data value. However, it is important to be strategic and careful when selecting which action to perform on the raw data.<\/p>\r\n<p>The idea is that we want to find a way to transform the raw data so that we take into account the various parameters and metrics that exist within our data and transform them so that they match each other. Because this is the goal, some researchers will even split their data as their act of transformation.<\/p>\r\n<p>[footnote]https:\/\/en.wikipedia.org\/wiki\/Data_transformation_(statistics)[\/footnote]Data can also be transformed to make it easier to visualize. For example, suppose we have a scatterplot in which the points are the countries of the world, and the data values being plotted are the land area and population of each country. If the plot is made using untransformed data (e.g., square kilometers for area and the number of people for population), most of the countries would be plotted in a tight cluster of points in the lower left corner of the graph. The few countries with very large areas and\/or populations would be spread thinly around most of the graph's area. Simply rescaling units (e.g., to thousand square kilometers, or to millions of people) will not change this. However, following logarithmic transformations of both area and population, the points will be spread more uniformly in the graph.<\/p>\r\n<p>[footnote]https:\/\/en.wikipedia.org\/wiki\/Data_transformation_(statistics)[\/footnote]Another reason for applying data transformation is to improve interpretability, even if no formal statistical analysis or visualization is to be performed. For example, suppose we are comparing cars in terms of their fuel economy. These data are usually presented as \"kilometers per liter\" or \"miles per gallon\". However, if the goal is to assess how much additional fuel a person would use in one year when driving one car compared to another, it is more natural to work with the data transformed by applying the\u00a0reciprocal function, yielding liters per kilometer, or gallons per mile.<\/p>\r\n<section class=\"textbox watchIt\" aria-label=\"Watch It\">\r\n<p>[embed]https:\/\/youtu.be\/sK0RY-Qkug4[\/embed]<\/p>\r\n<\/section>","rendered":"<section class=\"textbox learningGoals\">\n<ul>\n<li><span data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Decide which transformation to use for the different type of data sets and analyze the results&quot;}\" data-sheets-userformat=\"{&quot;2&quot;:6913,&quot;3&quot;:{&quot;1&quot;:0},&quot;11&quot;:4,&quot;12&quot;:0,&quot;14&quot;:{&quot;1&quot;:2,&quot;2&quot;:0},&quot;15&quot;:&quot;Calibri&quot;}\">Decide which transformation to use for the different type of data sets and analyze the results<\/span><\/li>\n<\/ul>\n<\/section>\n<p><strong>Data transformation<\/strong> is any act you perform on collected raw data in order for it to be better and more useful when the data is processed.<\/p>\n<p>You may add a fixed number, square each data value, cube each data value, square root each data value, or even apply a logarithm to each data value. However, it is important to be strategic and careful when selecting which action to perform on the raw data.<\/p>\n<p>The idea is that we want to find a way to transform the raw data so that we take into account the various parameters and metrics that exist within our data and transform them so that they match each other. Because this is the goal, some researchers will even split their data as their act of transformation.<\/p>\n<p><a class=\"footnote\" title=\"https:\/\/en.wikipedia.org\/wiki\/Data_transformation_(statistics)\" id=\"return-footnote-1470-1\" href=\"#footnote-1470-1\" aria-label=\"Footnote 1\"><sup class=\"footnote\">[1]<\/sup><\/a>Data can also be transformed to make it easier to visualize. For example, suppose we have a scatterplot in which the points are the countries of the world, and the data values being plotted are the land area and population of each country. If the plot is made using untransformed data (e.g., square kilometers for area and the number of people for population), most of the countries would be plotted in a tight cluster of points in the lower left corner of the graph. The few countries with very large areas and\/or populations would be spread thinly around most of the graph&#8217;s area. Simply rescaling units (e.g., to thousand square kilometers, or to millions of people) will not change this. However, following logarithmic transformations of both area and population, the points will be spread more uniformly in the graph.<\/p>\n<p><a class=\"footnote\" title=\"https:\/\/en.wikipedia.org\/wiki\/Data_transformation_(statistics)\" id=\"return-footnote-1470-2\" href=\"#footnote-1470-2\" aria-label=\"Footnote 2\"><sup class=\"footnote\">[2]<\/sup><\/a>Another reason for applying data transformation is to improve interpretability, even if no formal statistical analysis or visualization is to be performed. For example, suppose we are comparing cars in terms of their fuel economy. These data are usually presented as &#8220;kilometers per liter&#8221; or &#8220;miles per gallon&#8221;. However, if the goal is to assess how much additional fuel a person would use in one year when driving one car compared to another, it is more natural to work with the data transformed by applying the\u00a0reciprocal function, yielding liters per kilometer, or gallons per mile.<\/p>\n<section class=\"textbox watchIt\" aria-label=\"Watch It\">\n<p><iframe loading=\"lazy\" id=\"oembed-1\" title=\"The Effects of Transforming Data on Spread and Centre (1.4)\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/sK0RY-Qkug4?feature=oembed&#38;rel=0\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/p>\n<\/section>\n<hr class=\"before-footnotes clear\" \/><div class=\"footnotes\"><ol><li id=\"footnote-1470-1\">https:\/\/en.wikipedia.org\/wiki\/Data_transformation_(statistics) <a href=\"#return-footnote-1470-1\" class=\"return-footnote\" aria-label=\"Return to footnote 1\">&crarr;<\/a><\/li><li id=\"footnote-1470-2\">https:\/\/en.wikipedia.org\/wiki\/Data_transformation_(statistics) <a href=\"#return-footnote-1470-2\" class=\"return-footnote\" aria-label=\"Return to footnote 2\">&crarr;<\/a><\/li><\/ol><\/div>","protected":false},"author":8,"menu_order":29,"template":"","meta":{"_candela_citation":"[]","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"part":1438,"module-header":"fresh_take","content_attributions":[],"internal_book_links":[],"video_content":null,"cc_video_embed_content":{"cc_scripts":"","media_targets":[]},"try_it_collection":null,"_links":{"self":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters\/1470"}],"collection":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/users\/8"}],"version-history":[{"count":3,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters\/1470\/revisions"}],"predecessor-version":[{"id":6904,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters\/1470\/revisions\/6904"}],"part":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/parts\/1438"}],"metadata":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters\/1470\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/media?parent=1470"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapter-type?post=1470"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/contributor?post=1470"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/license?post=1470"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}