Search
Question: How to filter all column using Loop
0
gravatar for kritikamish99
5 days ago by
India
kritikamish9910 wrote:

I have 100 columns 1 column is name and 2 to 100 samples values I want to filter out column 2 to 100 with certain threshold say >=5 using loop. This will iterate for new file name corresponding to sample name.

for example filtering value above 5

Col1 Col2 Col3 Col4 Col5 Col6 Col7
A 5 1 1 2 4 1    
B 6 2 2 5 3 6
C 7 3 3 8 9 3
D 8 4 6 9 1 3

Output (file name will be Col2)

Col1 Col2
B 6
C 7
D 8

This has to be repeated for all the column

ADD COMMENTlink modified 4 days ago by Martin Morgan ♦♦ 20k • written 5 days ago by kritikamish9910

Do you want all positions with < 5 to be replaced by NA values ?

ADD REPLYlink written 5 days ago by hauken_heyken40
0
gravatar for hauken_heyken
5 days ago by
hauken_heyken40 wrote:

I dont know if I understand your question correctly, but this will work, if you want to only keep columns with rows >= 5:

 

library(data.table)

a = c(1,2,3,4,5)
b = c(4,5,6,7,8)
c = c(6,7,8,9,10)

e = c(1,2,3,4,4) # <--- This column will be filtered out, because non is >= 5

d = as.data.table(cbind(a,b,c,e))

indexesToRemove = lapply(1:ncol(d),function(x) ifelse(sum(d[,as.integer(x), with = F] >= 5 ) > 0,as.integer(x), NA ))

indexesToRemove = indexesToRemove[!is.na(indexesToRemove)]

d = d[,unlist(A), with = F] #<--- Columns with >= 5 is now here
output:

   a b  c
1: 1 4  6
2: 2 5  7
3: 3 6  8
4: 4 7  9
5: 5 8 10
ADD COMMENTlink modified 5 days ago • written 5 days ago by hauken_heyken40

Hi hauken_heyken

my query is

suppose i have table with column a , b, c ,d

a=c("A","B","C","D","E","F","G")

b=c(1,2,3,4,5,8,10)

c=c(1,2,3,4,6,7,10)

d=c(1,2,10,5,12,15,10)

e = as.data.table(cbind(a,b,c,d))

 

What i want is 3 files with name b, c, d because its a column name.

The file "b" will have  values 1st column as E,F,G from 2nd column will be filtered value  5,8,10

file "c" will be 1 st column 1st column as E,F,G and 2nd column will be filtered value 6,7,10

file "d" will be 1st column as C,D,E,F,G and  2nd column will be filtered value 10,5,12,15,10

 

 

ADD REPLYlink modified 4 days ago • written 4 days ago by kritikamish9910
1

Ah, okey, now it makes sence. Then this will work:

New version is:

library(data.table)

b=c(1,2,3,4,5,8,10)

c=c(1,2,3,4,6,7,10)

e=c(1,2,10,5,12,15,10)

d = as.data.table(cbind(b,c,e))

indexesToRemove = lapply(1:ncol(d),function(x) ifelse(d[,as.integer(x), with = F] >= 5,as.integer(x), NA ))

#<----Changed the lapply now, to not sum

#now save the cbind with a:

a=c("A","B","C","D","E","F","G")

# ---> remember to set setwd:  setwd("...Location of files to be saved..")

for(i in ncol(d)){

  out = cbind(d[!is.na(unlist(indexesToRemove[i])),i, with = F],a =  a[!is.na(unlist(indexesToRemove[i]))])

  write.csv(x = out,file = paste0(names(out[,1]), ".csv"), row.names = F) #<--- Remove row.names

   #<--- Choose something else if csv is not format

}

First file created will be b.csv, and looks like this:

b a
5 E
8 F
10 G
ADD REPLYlink modified 4 days ago • written 4 days ago by hauken_heyken40

Hi Hauken_Heyken

Thank you . The Code is working !!

But its not giving me file b . Out result is the table "e" values

also how will iterate for all the columns (file b and c )

ADD REPLYlink modified 4 days ago • written 4 days ago by kritikamish9910
0
gravatar for Martin Morgan
4 days ago by
Martin Morgan ♦♦ 20k
United States
Martin Morgan ♦♦ 20k wrote:

Create a data.frame directly. Using cbind() causes the numeric values to be represented as character vectors, which is not desired. o value in using data.table in the current example

e = data.frame(a,b,c,d)

A data.frame is a list of vectors, so iterate over the columns that you're interested in, i.e., all but the first

result <- lapply(e[-1], function(value) value[value >= 5])

Create files with

for (fname in names(result))
    write.csv(data.frame(result[[fname]]), fname)

but that doesn't seem like a useful thing to do.

A "tidy" approach is to gather the original data.frame and then filter on the column of values, no iteration involved.

library(tidyverse)
gather(e, "filename", "value", -1) %>% filter(value >= 5)

This isn't a Bioconductor question so should be asked elsewhere, on StackOverflow or the R-help mailing list for instance (checking first that similar questions have not already been asked).

ADD COMMENTlink written 4 days ago by Martin Morgan ♦♦ 20k

Hi Martin Morgan

I agree its not bioconductor questions. Actually i have Gene expression data . And it has 103 samples with FC value and 20000 probes

What i wanted is filtering all sample at cut off 1.5 FC . So i queried here.

 

 

ADD REPLYlink written 3 days ago by kritikamish9910
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 151 users visited in the last hour