What are the disadvantages of having a left skewed distribution?












7












$begingroup$


I'm currently working on a classification problem and I've a numerical column which is left skewed. i've read many posts where people are recommending to take log transformation or boxcox transformation to fix the left skewness.



So I was wondering what would happen If I left the skewness as it is and continue with my model building? Are there any advantages of fixing skewness for classification problem (knn, logistic regression)?










share|improve this question









$endgroup$

















    7












    $begingroup$


    I'm currently working on a classification problem and I've a numerical column which is left skewed. i've read many posts where people are recommending to take log transformation or boxcox transformation to fix the left skewness.



    So I was wondering what would happen If I left the skewness as it is and continue with my model building? Are there any advantages of fixing skewness for classification problem (knn, logistic regression)?










    share|improve this question









    $endgroup$















      7












      7








      7


      6



      $begingroup$


      I'm currently working on a classification problem and I've a numerical column which is left skewed. i've read many posts where people are recommending to take log transformation or boxcox transformation to fix the left skewness.



      So I was wondering what would happen If I left the skewness as it is and continue with my model building? Are there any advantages of fixing skewness for classification problem (knn, logistic regression)?










      share|improve this question









      $endgroup$




      I'm currently working on a classification problem and I've a numerical column which is left skewed. i've read many posts where people are recommending to take log transformation or boxcox transformation to fix the left skewness.



      So I was wondering what would happen If I left the skewness as it is and continue with my model building? Are there any advantages of fixing skewness for classification problem (knn, logistic regression)?







      machine-learning python






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Apr 5 at 19:36









      user214user214

      22818




      22818






















          2 Answers
          2






          active

          oldest

          votes


















          4












          $begingroup$

          There are issues that will depend on specific features of your data and analytic approach, but in general skewed data (in either direction) will degrade some of your model's ability to describe more "typical" cases in order to deal with much rarer cases which happen to take extreme values.



          Since "typical" cases are more common than extreme ones in a skewed data set, you are losing some precision with the cases you'll see most often in order to accommodate cases that you'll see only rarely. Determining a coefficient for a thousand observations which are all between [0,10] is likely to be more precise than for 990 observations between [0,10] and 10 observations between [1,000, 1,000,000]. This can lead to your model being less useful overall.



          "Fixing" skewness can provide a variety of benefits, including making analysis which depends on the data being approximately Normally distributed possible/more informative. It can also produce results which are reported on a sensible scale (this is very situation-dependent), and prevent extreme values (relative to other predictors) from over- or underestimating the influence of the skewed predictor on the predicted classification.



          You can test this somewhat (in a non-definitive way, to be sure) by training models with varying subsets of your data: everything you've got, just as it is, your data without that skewed variable, your data with that variable but excluding values outside of the "typical" range (though you'll have to be careful in defining that), your data with the skewed variable distribution transformed or re-scaled, etc.



          As for fixing it, transformations and re-scaling often make sense. But I cannot emphasize enough:



          Fiddling with variables and their distributions should follow from properties of those variables, not your convenience in modelling.



          Log-transforming skewed variables is a prime example of this:




          • If you really think that a variable operates on a geometric scale,
            and you want your model to operate on an arithmetic scale, then log
            transformation can make a lot of sense.

          • If you think that variable operates on an arithmetic scale, but you
            find its distribution inconvenient and think a log transformation
            would produce a more convenient distribution, it may make sense to
            transform. It will change how the model is used and interpreted,
            usually making it more dense and harder to interpret clearly, but
            that may or may not be worthwhile. For example, if you take the log of a numeric outcome and the log of a numeric predictor, the result has to be interpreted as an elasticity between them, which can be awkward to work with and is often not what is desired.

          • If you think that a log transformation would be desirable for a
            variable, but it has a lot of observations with a value of 0, then
            log transformation isn't really an option for you, whether it would
            be convenient or not. (Adding a "small value" to the 0 observations
            causes lots of problems-- take the logs of 1-10, and then 0.0 to
            1.0).






          share|improve this answer









          $endgroup$













          • $begingroup$
            Assume I've numeric column such as price and it's heavily left skewed. I'm thinking of using few basic classification algorithms. What should be my approach? Should I go for log transformation or boxcox transformation?
            $endgroup$
            – user214
            Apr 5 at 21:01










          • $begingroup$
            @user214 Left-skewed price information? That sounds interesting! (My research data is generally skewed hard to the right). There is always variation between study contexts, but I generally think of money as "geometric enough" that a log transformation is appropriate (or at least strongly defensible). Whether or not that's the ideal transformation is a very difficult question to answer, but log transformation is unlikely to be a problem for you here. You'll just need to remember that anything about that predictor will be reported on a log scale, and interpret accordingly.
            $endgroup$
            – Upper_Case
            Apr 5 at 21:07



















          4












          $begingroup$

          I agree with main points of @Upper_Case well put answer. I like to put forth a perspective that emphasizes on "machine learning" side of the question.



          For a classification task using kNN, logistic regression, kernel SVM, or non-linear neural networks, the main disadvantage that we are concerned about is decrease in model performance, e.g. decrease in AUC score on a validation set.



          Other disadvantages of skeweness are often investigated when the damage of skeweness on the quality of result is hard to assess.ّ However, in a classification problem, we can train and validate the model once with the original (skewed) and once with the transformed feature, and then




          1. If performance declined, we do not transform,

          2. If performance improved, we transform.


          In other words, damage of skeweness can be easily and objectively assessed, therefore, those justifications do not affect our decision, only performance does.



          If we take a closer look at the justifications for using lets say log transformation, they hold true when some assumptions are made about the final features that a model or test directly work with. A final feature is a function of raw feature; that function can be identity. For example, a model (or test) may assume that a final feature should be normal, or at least symmetric around the mean, or should be linearly additive, etc. Then, we, with the knowledge (or a speculation) that a raw feature is left-skewed, may perform log transformation to align the final feature with the imposed assumption.



          An important intricacy here is that we do not, and cannot change the distribution of any raw feature, we are merely creating a final feature (as a function of raw feature) that has a different distribution more aligned with the imposed assumptions.



          For a classification task using kNN, logistic regression, kernel SVM, or non-linear neural networks, there is no normality, or symmetric assumption for distribution of final features, thus there is no force from these models in this regard. Although, we can trace a shadow of "linear addition" assumption in logistic regression model, i.e.
          $$P(y=1|boldsymbol{x})=frac{1}{1+e^{-(w_1x_1+..+w_dx_d)}}$$
          and in neural networks for weighted sum of features in the first layer, i.e.$$y_i=fleft(boldsymbol{W}_{i,.}boldsymbol{x}+bright)=fleft(W_{i,1}x_1+W_{i,2}x_2+...+bright)$$I say "a shadow" because the target variable is not directly the linear addition of final features, the addition goes through one or more non-linear transformations which could make these models more robust to the violation of this assumption. On the other hand, the linear addition assumption does not exist in kNN, or kernelSVM, as they work with sample-sample distances rather than feature interactions.



          But again, these justifications come second compared to the result of model evaluation, if performance suffers we do not transform.






          share|improve this answer











          $endgroup$














            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "557"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48711%2fwhat-are-the-disadvantages-of-having-a-left-skewed-distribution%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            4












            $begingroup$

            There are issues that will depend on specific features of your data and analytic approach, but in general skewed data (in either direction) will degrade some of your model's ability to describe more "typical" cases in order to deal with much rarer cases which happen to take extreme values.



            Since "typical" cases are more common than extreme ones in a skewed data set, you are losing some precision with the cases you'll see most often in order to accommodate cases that you'll see only rarely. Determining a coefficient for a thousand observations which are all between [0,10] is likely to be more precise than for 990 observations between [0,10] and 10 observations between [1,000, 1,000,000]. This can lead to your model being less useful overall.



            "Fixing" skewness can provide a variety of benefits, including making analysis which depends on the data being approximately Normally distributed possible/more informative. It can also produce results which are reported on a sensible scale (this is very situation-dependent), and prevent extreme values (relative to other predictors) from over- or underestimating the influence of the skewed predictor on the predicted classification.



            You can test this somewhat (in a non-definitive way, to be sure) by training models with varying subsets of your data: everything you've got, just as it is, your data without that skewed variable, your data with that variable but excluding values outside of the "typical" range (though you'll have to be careful in defining that), your data with the skewed variable distribution transformed or re-scaled, etc.



            As for fixing it, transformations and re-scaling often make sense. But I cannot emphasize enough:



            Fiddling with variables and their distributions should follow from properties of those variables, not your convenience in modelling.



            Log-transforming skewed variables is a prime example of this:




            • If you really think that a variable operates on a geometric scale,
              and you want your model to operate on an arithmetic scale, then log
              transformation can make a lot of sense.

            • If you think that variable operates on an arithmetic scale, but you
              find its distribution inconvenient and think a log transformation
              would produce a more convenient distribution, it may make sense to
              transform. It will change how the model is used and interpreted,
              usually making it more dense and harder to interpret clearly, but
              that may or may not be worthwhile. For example, if you take the log of a numeric outcome and the log of a numeric predictor, the result has to be interpreted as an elasticity between them, which can be awkward to work with and is often not what is desired.

            • If you think that a log transformation would be desirable for a
              variable, but it has a lot of observations with a value of 0, then
              log transformation isn't really an option for you, whether it would
              be convenient or not. (Adding a "small value" to the 0 observations
              causes lots of problems-- take the logs of 1-10, and then 0.0 to
              1.0).






            share|improve this answer









            $endgroup$













            • $begingroup$
              Assume I've numeric column such as price and it's heavily left skewed. I'm thinking of using few basic classification algorithms. What should be my approach? Should I go for log transformation or boxcox transformation?
              $endgroup$
              – user214
              Apr 5 at 21:01










            • $begingroup$
              @user214 Left-skewed price information? That sounds interesting! (My research data is generally skewed hard to the right). There is always variation between study contexts, but I generally think of money as "geometric enough" that a log transformation is appropriate (or at least strongly defensible). Whether or not that's the ideal transformation is a very difficult question to answer, but log transformation is unlikely to be a problem for you here. You'll just need to remember that anything about that predictor will be reported on a log scale, and interpret accordingly.
              $endgroup$
              – Upper_Case
              Apr 5 at 21:07
















            4












            $begingroup$

            There are issues that will depend on specific features of your data and analytic approach, but in general skewed data (in either direction) will degrade some of your model's ability to describe more "typical" cases in order to deal with much rarer cases which happen to take extreme values.



            Since "typical" cases are more common than extreme ones in a skewed data set, you are losing some precision with the cases you'll see most often in order to accommodate cases that you'll see only rarely. Determining a coefficient for a thousand observations which are all between [0,10] is likely to be more precise than for 990 observations between [0,10] and 10 observations between [1,000, 1,000,000]. This can lead to your model being less useful overall.



            "Fixing" skewness can provide a variety of benefits, including making analysis which depends on the data being approximately Normally distributed possible/more informative. It can also produce results which are reported on a sensible scale (this is very situation-dependent), and prevent extreme values (relative to other predictors) from over- or underestimating the influence of the skewed predictor on the predicted classification.



            You can test this somewhat (in a non-definitive way, to be sure) by training models with varying subsets of your data: everything you've got, just as it is, your data without that skewed variable, your data with that variable but excluding values outside of the "typical" range (though you'll have to be careful in defining that), your data with the skewed variable distribution transformed or re-scaled, etc.



            As for fixing it, transformations and re-scaling often make sense. But I cannot emphasize enough:



            Fiddling with variables and their distributions should follow from properties of those variables, not your convenience in modelling.



            Log-transforming skewed variables is a prime example of this:




            • If you really think that a variable operates on a geometric scale,
              and you want your model to operate on an arithmetic scale, then log
              transformation can make a lot of sense.

            • If you think that variable operates on an arithmetic scale, but you
              find its distribution inconvenient and think a log transformation
              would produce a more convenient distribution, it may make sense to
              transform. It will change how the model is used and interpreted,
              usually making it more dense and harder to interpret clearly, but
              that may or may not be worthwhile. For example, if you take the log of a numeric outcome and the log of a numeric predictor, the result has to be interpreted as an elasticity between them, which can be awkward to work with and is often not what is desired.

            • If you think that a log transformation would be desirable for a
              variable, but it has a lot of observations with a value of 0, then
              log transformation isn't really an option for you, whether it would
              be convenient or not. (Adding a "small value" to the 0 observations
              causes lots of problems-- take the logs of 1-10, and then 0.0 to
              1.0).






            share|improve this answer









            $endgroup$













            • $begingroup$
              Assume I've numeric column such as price and it's heavily left skewed. I'm thinking of using few basic classification algorithms. What should be my approach? Should I go for log transformation or boxcox transformation?
              $endgroup$
              – user214
              Apr 5 at 21:01










            • $begingroup$
              @user214 Left-skewed price information? That sounds interesting! (My research data is generally skewed hard to the right). There is always variation between study contexts, but I generally think of money as "geometric enough" that a log transformation is appropriate (or at least strongly defensible). Whether or not that's the ideal transformation is a very difficult question to answer, but log transformation is unlikely to be a problem for you here. You'll just need to remember that anything about that predictor will be reported on a log scale, and interpret accordingly.
              $endgroup$
              – Upper_Case
              Apr 5 at 21:07














            4












            4








            4





            $begingroup$

            There are issues that will depend on specific features of your data and analytic approach, but in general skewed data (in either direction) will degrade some of your model's ability to describe more "typical" cases in order to deal with much rarer cases which happen to take extreme values.



            Since "typical" cases are more common than extreme ones in a skewed data set, you are losing some precision with the cases you'll see most often in order to accommodate cases that you'll see only rarely. Determining a coefficient for a thousand observations which are all between [0,10] is likely to be more precise than for 990 observations between [0,10] and 10 observations between [1,000, 1,000,000]. This can lead to your model being less useful overall.



            "Fixing" skewness can provide a variety of benefits, including making analysis which depends on the data being approximately Normally distributed possible/more informative. It can also produce results which are reported on a sensible scale (this is very situation-dependent), and prevent extreme values (relative to other predictors) from over- or underestimating the influence of the skewed predictor on the predicted classification.



            You can test this somewhat (in a non-definitive way, to be sure) by training models with varying subsets of your data: everything you've got, just as it is, your data without that skewed variable, your data with that variable but excluding values outside of the "typical" range (though you'll have to be careful in defining that), your data with the skewed variable distribution transformed or re-scaled, etc.



            As for fixing it, transformations and re-scaling often make sense. But I cannot emphasize enough:



            Fiddling with variables and their distributions should follow from properties of those variables, not your convenience in modelling.



            Log-transforming skewed variables is a prime example of this:




            • If you really think that a variable operates on a geometric scale,
              and you want your model to operate on an arithmetic scale, then log
              transformation can make a lot of sense.

            • If you think that variable operates on an arithmetic scale, but you
              find its distribution inconvenient and think a log transformation
              would produce a more convenient distribution, it may make sense to
              transform. It will change how the model is used and interpreted,
              usually making it more dense and harder to interpret clearly, but
              that may or may not be worthwhile. For example, if you take the log of a numeric outcome and the log of a numeric predictor, the result has to be interpreted as an elasticity between them, which can be awkward to work with and is often not what is desired.

            • If you think that a log transformation would be desirable for a
              variable, but it has a lot of observations with a value of 0, then
              log transformation isn't really an option for you, whether it would
              be convenient or not. (Adding a "small value" to the 0 observations
              causes lots of problems-- take the logs of 1-10, and then 0.0 to
              1.0).






            share|improve this answer









            $endgroup$



            There are issues that will depend on specific features of your data and analytic approach, but in general skewed data (in either direction) will degrade some of your model's ability to describe more "typical" cases in order to deal with much rarer cases which happen to take extreme values.



            Since "typical" cases are more common than extreme ones in a skewed data set, you are losing some precision with the cases you'll see most often in order to accommodate cases that you'll see only rarely. Determining a coefficient for a thousand observations which are all between [0,10] is likely to be more precise than for 990 observations between [0,10] and 10 observations between [1,000, 1,000,000]. This can lead to your model being less useful overall.



            "Fixing" skewness can provide a variety of benefits, including making analysis which depends on the data being approximately Normally distributed possible/more informative. It can also produce results which are reported on a sensible scale (this is very situation-dependent), and prevent extreme values (relative to other predictors) from over- or underestimating the influence of the skewed predictor on the predicted classification.



            You can test this somewhat (in a non-definitive way, to be sure) by training models with varying subsets of your data: everything you've got, just as it is, your data without that skewed variable, your data with that variable but excluding values outside of the "typical" range (though you'll have to be careful in defining that), your data with the skewed variable distribution transformed or re-scaled, etc.



            As for fixing it, transformations and re-scaling often make sense. But I cannot emphasize enough:



            Fiddling with variables and their distributions should follow from properties of those variables, not your convenience in modelling.



            Log-transforming skewed variables is a prime example of this:




            • If you really think that a variable operates on a geometric scale,
              and you want your model to operate on an arithmetic scale, then log
              transformation can make a lot of sense.

            • If you think that variable operates on an arithmetic scale, but you
              find its distribution inconvenient and think a log transformation
              would produce a more convenient distribution, it may make sense to
              transform. It will change how the model is used and interpreted,
              usually making it more dense and harder to interpret clearly, but
              that may or may not be worthwhile. For example, if you take the log of a numeric outcome and the log of a numeric predictor, the result has to be interpreted as an elasticity between them, which can be awkward to work with and is often not what is desired.

            • If you think that a log transformation would be desirable for a
              variable, but it has a lot of observations with a value of 0, then
              log transformation isn't really an option for you, whether it would
              be convenient or not. (Adding a "small value" to the 0 observations
              causes lots of problems-- take the logs of 1-10, and then 0.0 to
              1.0).







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Apr 5 at 20:53









            Upper_CaseUpper_Case

            1913




            1913












            • $begingroup$
              Assume I've numeric column such as price and it's heavily left skewed. I'm thinking of using few basic classification algorithms. What should be my approach? Should I go for log transformation or boxcox transformation?
              $endgroup$
              – user214
              Apr 5 at 21:01










            • $begingroup$
              @user214 Left-skewed price information? That sounds interesting! (My research data is generally skewed hard to the right). There is always variation between study contexts, but I generally think of money as "geometric enough" that a log transformation is appropriate (or at least strongly defensible). Whether or not that's the ideal transformation is a very difficult question to answer, but log transformation is unlikely to be a problem for you here. You'll just need to remember that anything about that predictor will be reported on a log scale, and interpret accordingly.
              $endgroup$
              – Upper_Case
              Apr 5 at 21:07


















            • $begingroup$
              Assume I've numeric column such as price and it's heavily left skewed. I'm thinking of using few basic classification algorithms. What should be my approach? Should I go for log transformation or boxcox transformation?
              $endgroup$
              – user214
              Apr 5 at 21:01










            • $begingroup$
              @user214 Left-skewed price information? That sounds interesting! (My research data is generally skewed hard to the right). There is always variation between study contexts, but I generally think of money as "geometric enough" that a log transformation is appropriate (or at least strongly defensible). Whether or not that's the ideal transformation is a very difficult question to answer, but log transformation is unlikely to be a problem for you here. You'll just need to remember that anything about that predictor will be reported on a log scale, and interpret accordingly.
              $endgroup$
              – Upper_Case
              Apr 5 at 21:07
















            $begingroup$
            Assume I've numeric column such as price and it's heavily left skewed. I'm thinking of using few basic classification algorithms. What should be my approach? Should I go for log transformation or boxcox transformation?
            $endgroup$
            – user214
            Apr 5 at 21:01




            $begingroup$
            Assume I've numeric column such as price and it's heavily left skewed. I'm thinking of using few basic classification algorithms. What should be my approach? Should I go for log transformation or boxcox transformation?
            $endgroup$
            – user214
            Apr 5 at 21:01












            $begingroup$
            @user214 Left-skewed price information? That sounds interesting! (My research data is generally skewed hard to the right). There is always variation between study contexts, but I generally think of money as "geometric enough" that a log transformation is appropriate (or at least strongly defensible). Whether or not that's the ideal transformation is a very difficult question to answer, but log transformation is unlikely to be a problem for you here. You'll just need to remember that anything about that predictor will be reported on a log scale, and interpret accordingly.
            $endgroup$
            – Upper_Case
            Apr 5 at 21:07




            $begingroup$
            @user214 Left-skewed price information? That sounds interesting! (My research data is generally skewed hard to the right). There is always variation between study contexts, but I generally think of money as "geometric enough" that a log transformation is appropriate (or at least strongly defensible). Whether or not that's the ideal transformation is a very difficult question to answer, but log transformation is unlikely to be a problem for you here. You'll just need to remember that anything about that predictor will be reported on a log scale, and interpret accordingly.
            $endgroup$
            – Upper_Case
            Apr 5 at 21:07











            4












            $begingroup$

            I agree with main points of @Upper_Case well put answer. I like to put forth a perspective that emphasizes on "machine learning" side of the question.



            For a classification task using kNN, logistic regression, kernel SVM, or non-linear neural networks, the main disadvantage that we are concerned about is decrease in model performance, e.g. decrease in AUC score on a validation set.



            Other disadvantages of skeweness are often investigated when the damage of skeweness on the quality of result is hard to assess.ّ However, in a classification problem, we can train and validate the model once with the original (skewed) and once with the transformed feature, and then




            1. If performance declined, we do not transform,

            2. If performance improved, we transform.


            In other words, damage of skeweness can be easily and objectively assessed, therefore, those justifications do not affect our decision, only performance does.



            If we take a closer look at the justifications for using lets say log transformation, they hold true when some assumptions are made about the final features that a model or test directly work with. A final feature is a function of raw feature; that function can be identity. For example, a model (or test) may assume that a final feature should be normal, or at least symmetric around the mean, or should be linearly additive, etc. Then, we, with the knowledge (or a speculation) that a raw feature is left-skewed, may perform log transformation to align the final feature with the imposed assumption.



            An important intricacy here is that we do not, and cannot change the distribution of any raw feature, we are merely creating a final feature (as a function of raw feature) that has a different distribution more aligned with the imposed assumptions.



            For a classification task using kNN, logistic regression, kernel SVM, or non-linear neural networks, there is no normality, or symmetric assumption for distribution of final features, thus there is no force from these models in this regard. Although, we can trace a shadow of "linear addition" assumption in logistic regression model, i.e.
            $$P(y=1|boldsymbol{x})=frac{1}{1+e^{-(w_1x_1+..+w_dx_d)}}$$
            and in neural networks for weighted sum of features in the first layer, i.e.$$y_i=fleft(boldsymbol{W}_{i,.}boldsymbol{x}+bright)=fleft(W_{i,1}x_1+W_{i,2}x_2+...+bright)$$I say "a shadow" because the target variable is not directly the linear addition of final features, the addition goes through one or more non-linear transformations which could make these models more robust to the violation of this assumption. On the other hand, the linear addition assumption does not exist in kNN, or kernelSVM, as they work with sample-sample distances rather than feature interactions.



            But again, these justifications come second compared to the result of model evaluation, if performance suffers we do not transform.






            share|improve this answer











            $endgroup$


















              4












              $begingroup$

              I agree with main points of @Upper_Case well put answer. I like to put forth a perspective that emphasizes on "machine learning" side of the question.



              For a classification task using kNN, logistic regression, kernel SVM, or non-linear neural networks, the main disadvantage that we are concerned about is decrease in model performance, e.g. decrease in AUC score on a validation set.



              Other disadvantages of skeweness are often investigated when the damage of skeweness on the quality of result is hard to assess.ّ However, in a classification problem, we can train and validate the model once with the original (skewed) and once with the transformed feature, and then




              1. If performance declined, we do not transform,

              2. If performance improved, we transform.


              In other words, damage of skeweness can be easily and objectively assessed, therefore, those justifications do not affect our decision, only performance does.



              If we take a closer look at the justifications for using lets say log transformation, they hold true when some assumptions are made about the final features that a model or test directly work with. A final feature is a function of raw feature; that function can be identity. For example, a model (or test) may assume that a final feature should be normal, or at least symmetric around the mean, or should be linearly additive, etc. Then, we, with the knowledge (or a speculation) that a raw feature is left-skewed, may perform log transformation to align the final feature with the imposed assumption.



              An important intricacy here is that we do not, and cannot change the distribution of any raw feature, we are merely creating a final feature (as a function of raw feature) that has a different distribution more aligned with the imposed assumptions.



              For a classification task using kNN, logistic regression, kernel SVM, or non-linear neural networks, there is no normality, or symmetric assumption for distribution of final features, thus there is no force from these models in this regard. Although, we can trace a shadow of "linear addition" assumption in logistic regression model, i.e.
              $$P(y=1|boldsymbol{x})=frac{1}{1+e^{-(w_1x_1+..+w_dx_d)}}$$
              and in neural networks for weighted sum of features in the first layer, i.e.$$y_i=fleft(boldsymbol{W}_{i,.}boldsymbol{x}+bright)=fleft(W_{i,1}x_1+W_{i,2}x_2+...+bright)$$I say "a shadow" because the target variable is not directly the linear addition of final features, the addition goes through one or more non-linear transformations which could make these models more robust to the violation of this assumption. On the other hand, the linear addition assumption does not exist in kNN, or kernelSVM, as they work with sample-sample distances rather than feature interactions.



              But again, these justifications come second compared to the result of model evaluation, if performance suffers we do not transform.






              share|improve this answer











              $endgroup$
















                4












                4








                4





                $begingroup$

                I agree with main points of @Upper_Case well put answer. I like to put forth a perspective that emphasizes on "machine learning" side of the question.



                For a classification task using kNN, logistic regression, kernel SVM, or non-linear neural networks, the main disadvantage that we are concerned about is decrease in model performance, e.g. decrease in AUC score on a validation set.



                Other disadvantages of skeweness are often investigated when the damage of skeweness on the quality of result is hard to assess.ّ However, in a classification problem, we can train and validate the model once with the original (skewed) and once with the transformed feature, and then




                1. If performance declined, we do not transform,

                2. If performance improved, we transform.


                In other words, damage of skeweness can be easily and objectively assessed, therefore, those justifications do not affect our decision, only performance does.



                If we take a closer look at the justifications for using lets say log transformation, they hold true when some assumptions are made about the final features that a model or test directly work with. A final feature is a function of raw feature; that function can be identity. For example, a model (or test) may assume that a final feature should be normal, or at least symmetric around the mean, or should be linearly additive, etc. Then, we, with the knowledge (or a speculation) that a raw feature is left-skewed, may perform log transformation to align the final feature with the imposed assumption.



                An important intricacy here is that we do not, and cannot change the distribution of any raw feature, we are merely creating a final feature (as a function of raw feature) that has a different distribution more aligned with the imposed assumptions.



                For a classification task using kNN, logistic regression, kernel SVM, or non-linear neural networks, there is no normality, or symmetric assumption for distribution of final features, thus there is no force from these models in this regard. Although, we can trace a shadow of "linear addition" assumption in logistic regression model, i.e.
                $$P(y=1|boldsymbol{x})=frac{1}{1+e^{-(w_1x_1+..+w_dx_d)}}$$
                and in neural networks for weighted sum of features in the first layer, i.e.$$y_i=fleft(boldsymbol{W}_{i,.}boldsymbol{x}+bright)=fleft(W_{i,1}x_1+W_{i,2}x_2+...+bright)$$I say "a shadow" because the target variable is not directly the linear addition of final features, the addition goes through one or more non-linear transformations which could make these models more robust to the violation of this assumption. On the other hand, the linear addition assumption does not exist in kNN, or kernelSVM, as they work with sample-sample distances rather than feature interactions.



                But again, these justifications come second compared to the result of model evaluation, if performance suffers we do not transform.






                share|improve this answer











                $endgroup$



                I agree with main points of @Upper_Case well put answer. I like to put forth a perspective that emphasizes on "machine learning" side of the question.



                For a classification task using kNN, logistic regression, kernel SVM, or non-linear neural networks, the main disadvantage that we are concerned about is decrease in model performance, e.g. decrease in AUC score on a validation set.



                Other disadvantages of skeweness are often investigated when the damage of skeweness on the quality of result is hard to assess.ّ However, in a classification problem, we can train and validate the model once with the original (skewed) and once with the transformed feature, and then




                1. If performance declined, we do not transform,

                2. If performance improved, we transform.


                In other words, damage of skeweness can be easily and objectively assessed, therefore, those justifications do not affect our decision, only performance does.



                If we take a closer look at the justifications for using lets say log transformation, they hold true when some assumptions are made about the final features that a model or test directly work with. A final feature is a function of raw feature; that function can be identity. For example, a model (or test) may assume that a final feature should be normal, or at least symmetric around the mean, or should be linearly additive, etc. Then, we, with the knowledge (or a speculation) that a raw feature is left-skewed, may perform log transformation to align the final feature with the imposed assumption.



                An important intricacy here is that we do not, and cannot change the distribution of any raw feature, we are merely creating a final feature (as a function of raw feature) that has a different distribution more aligned with the imposed assumptions.



                For a classification task using kNN, logistic regression, kernel SVM, or non-linear neural networks, there is no normality, or symmetric assumption for distribution of final features, thus there is no force from these models in this regard. Although, we can trace a shadow of "linear addition" assumption in logistic regression model, i.e.
                $$P(y=1|boldsymbol{x})=frac{1}{1+e^{-(w_1x_1+..+w_dx_d)}}$$
                and in neural networks for weighted sum of features in the first layer, i.e.$$y_i=fleft(boldsymbol{W}_{i,.}boldsymbol{x}+bright)=fleft(W_{i,1}x_1+W_{i,2}x_2+...+bright)$$I say "a shadow" because the target variable is not directly the linear addition of final features, the addition goes through one or more non-linear transformations which could make these models more robust to the violation of this assumption. On the other hand, the linear addition assumption does not exist in kNN, or kernelSVM, as they work with sample-sample distances rather than feature interactions.



                But again, these justifications come second compared to the result of model evaluation, if performance suffers we do not transform.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Apr 8 at 23:07

























                answered Apr 6 at 11:51









                EsmailianEsmailian

                3,181320




                3,181320






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48711%2fwhat-are-the-disadvantages-of-having-a-left-skewed-distribution%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Plaza Victoria

                    Puebla de Zaragoza

                    Musa