How to remove a missing value from a data frame in an R language, or to remove (or convert to 0) a column that has only a missing value

Asked 2 years ago, Updated 2 years ago, 30 views

I'm sorry to ask you so many rudimentary questions.
Thank you very much for your help with the following questions.
How to Output Mann Whitney Test (or t-test) Calculations in R Language

Could you please let me know how to handle the following data?

A1 A2 A3 A4
       31       54
       56    48 69
       11    13 14
       16    18 
             63 24
             28 22
       31    33 31
       36    78 31
       41    43 41

The output of a certain data from the original measuring instrument is an excel file similar to the one above. Until now, I have used excel to delete the blanks and fill them up.

A1 A2 A3 A4
       31    48 54
       56    13 69
       11    18 14
       16    63 24
       31    28 22
       36    33 31
       41    78 31
             43 41

Running such data in wilcox.exact resulted in an error (all NA) because there were no numbers in the A2 column.So I deleted Data<-Data[,is.na(Data[1,])==FALSE], but I thought it would be troublesome if the column shifted, so I tried Data[,is.na(Data[1,])]<-0 to zero all NA's in the first column.(The diagram shows that there is no data in this column in the first place, so wilcox.exact should work anyway.)

However, I was wondering if I could simplify the processing in excel.

A1 A2 A3 A4
       31       54
       56    48 69
       11    13 14
       16    18 
             63 24
             28 22
       31    33 31
       36    78 31
       41    43 41

I am wondering if I can proceed with the following processing as the above data is.
·If the value of the column is NA only, set NA to 0.
·In other cases, delete NA and fill it up.

If all NAs are set to zero, I think it will affect the results of the calculation, but I would appreciate it if you could tell me how to set NA to zero.

Thank you for your guidance.

r

2022-09-30 11:05

1 Answers

First of all, non-rectangular data cannot be data.frame, so if you want to fill in NA,

It can be one of the following:The latter is in mind:

library(tidyr)

# data generation for samples
df<-data.frame(
  A1=sample(c(4:6, NA), 10, prob=c(4,2,2,2), replace=TRUE),
  A2 = rep(NA, 10),
  A3=sample(c(4:6, NA), 10, prob=c(2,4,2,2), replace=TRUE),
  A4=sample(c(4:6, NA), 10, prob=c(2,2,4,2), replace=TRUE)
)

# Check the structure of the sample data
str(df)
  # > 'data.frame': 10 obs.of 4 variables:
  # > $A1: int 65 NA 4 6 5 45 NA
  #>$A2:logi Nana Nana...
  #>$A3: int 5 6 5 5 65 NA 54
  # > $A4: int 66 NA 4 6 5 6 46

# Convert to long data
# I'm using tidyr::gather function
df_long<-gather(df, key=fac, value=value, factor_key=TRUE)
str(df_long)
  # > 'data.frame': 40 obs.of 2 variables:
  #>$fac: Factor w/4 levels "A1", "A2", "A3", ...: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  #>$value: int 65 NA 4 6 5 45 NA ...

# Remove NA from value column
# Using the tidyr::drop_na function
df_long_rmNA<-drop_na(df_long, value)
str(df_long_rmNA)
  # > 'data.frame': 25 obs.of 2 variables:
  #>$fac: Factor w/4 levels "A1", "A2", "A3", ...: 1 1 1 1 1 1 1 1 1 1 3 ...
  #>$value: int 6 5 4 6 5 5 5 5 5 6 ...

# Replace NA with a Different Value
# Using the tidyr::replace_na function
df_long_repNA<-replace_na(df_long,list(value=0))
str(df_long_repNA)
  # > 'data.frame': 40 obs.of 2 variables:
  #>$fac: Factor w/4 levels "A1", "A2", "A3", ...: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  #>$value: num6 50 4 6 5 5 4 50 ...

Perhaps we can process the requested data with this.You can also run wilcox.exact with this vertical data:

library(exactRankTests)
  #>Package 'exactRankTests' is no longer under development.
  # > Please consumer using package 'coin' installed.

# Run the wilcox.exact function (specified in fomula format)
# narrow down the record (line) to be used with the subset argument
# See the subset function for information on how to use subsets
wilcox.exact(value~fac,df_long,
             subset=df_long$fac%in%c("A1", "A3"))
  #>  
  # > Exact Wilcoxon rank sum test
  #>  
  #>data:value by fac
  # > W = 29, p-value = 0.9776
  # > alternative hypothesis —true mu is not equal to 0
wilcox.exact(value~fac,df_long_rmNA,
             subset=df_long_rmNA$fac%in%c("A1", "A3"))
  #>  
  # > Exact Wilcoxon rank sum test
  #>  
  #>data:value by fac
  # > W = 29, p-value = 0.9776
  # > alternative hypothesis —true mu is not equal to 0
wilcox.exact(value~fac,df_long_repNA,
             subset=df_long_repNA$fac%in%c("A1", "A3"))
  #>  
  # > Exact Wilcoxon rank sum test
  #>  
  #>data:value by fac
  # > W = 47, p-value = 0.9131
  # > alternative hypothesis —true mu is not equal to 0

However, the exactRankTests package will be developed soon, so I was told to use the coin package.The version using the wilcox_test function of the coin package is also shown below:

library(coin)
  # > Loading required package:survival
  #>  
  # > Attaching package: 'coin'
  #>The following objects are masked from 'package: exactRankTests':
  #>  
  #>dperm,pperm,qperm,rperm
wilcox_test(value~fac,df_long,
            subset=df_long$fac%in%c("A1", "A3"),
            distribution="exact")
  #>  
  # > Exact Wilcoxon-Mann-Whitney Test
  #>  
  #>data:value by fac(A1,A3)
  #>Z=-0.35161, p-value=0.9776
  # > alternative hypothesis —true mu is not equal to 0

I thought it would be easier to understand if I added the output, but I apologize for the long sentence.


2022-09-30 11:05

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.