Advantages and disadvantages of standardization, normalization, and robust scaling

Asked 2 years ago, Updated 2 years ago, 325 views

As you can see in the title, I understand what standardization, normalization, and robust scaling are, but I would like you to give me specific examples of the advantages, disadvantages, and when and what pretreatment is recommended.

python python3

2022-09-30 21:56

1 Answers

I am reprinting because there is a similar question in the main house SO and the advantages and disadvantages of the questionnaire were summarized.(Japanese is the translation of the respondent.The items are well organized, but I can't guarantee the authenticity.)
Data Standardization vs Normalization vs. Robust Scaler

Advantages:

  • Standardization:scales features such that the distribution is centered around 0, with a standard deviation of 1.
    Standardization: Scale the feature so that the distribution is centered around 0 and the standard deviation is 1.
  • Normalization:shrinks the range such that the range is now between 0 and 1 (or -1 to 1 if there are negative values).
    Normalization—Reduce the range to 0 to 1 (or -1 to 1 if negative).
  • Robust Scaler:similar to normalization but installed uses the interquartile range, that it is robust to outliers.
    Similar to normalization, but uses a quartile range to be robust even if an out-of-range value exists.

Disadvantages:

  • Standardization: not good if the data is not normally distributed (i.e.no Gaussian Distribution).
    Not appropriate if the data is not normally distributed (for example, it is not Gaussian).
  • Normalization: get influenced health by outliers(i.e.extreme values).
    It is strongly influenced by outliers (e.g., extreme values).
  • Robust Scaler: doesn't take the media into account and only sources on the parts where the bulk data is.
    Focus only on areas where there is a large amount of data without considering the median.

Similar questions focus on the impact of preprocessing techniques.
Therefore, the answer is focused there, but the best preprocessing depends on how you want to handle the missing value, as shown in sci-kit-learn cited answer.


2022-09-30 21:56

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.