Hi,
I think it's a stupid question but I cannot find a way to do it online. I want to sort simultaneously all the columns of my dataframe from the biggest to the smallest number (decreasing order) keeping at the same time the first column that contains the gene names (=characters) since I need to know which gene is what after the other columns has been re-ordered. my dataset:
# A tibble: 3 x 12
gene_symbol `control 0hr` `control 24hr`
<chr> <dbl> <dbl>
1 5-8S 270. 146.
2 5S 123. 129.
3 7SK 56.6 31.0
the only thing that works is selecting the numeric columns that I want to sort but in doing so I will lose the names of the genes:
CD.sorted <- apply(test[4:14],2,sort,decreasing=T)
and if I run this the order is not decreasing and the number are all messed up (I got either all 0 or 9.99):
CD.sorted2 <- apply(test,2,sort,decreasing=T)
and if I run this code, the order is not decreased and all the number are again changed but I don't understand how:
library(dplyr)
f <- test %>% mutate_at(4:14, funs(sort(., decreasing = TRUE)))
and if I run this (where I tell exactly which columns to order) there is no changes at all:
CD.sorted2 <- arrange(test, desc("control 0hr", "control 24hr","control 48hr","control 54hr", "control 60hr","control 66hr","control 72hr","control 96hr","control 120hr" ,"control 144hr", "control 168hr" ))
thank you
Camilla
Base R's
order()
can order by multiple columns like this:However, I think that you'll struggle to find a suitable ordering across multiple samples and genes in this way. You could order each sample separately and store the results in a list, like this, if you wished:
To order by just a single column:
if I run:
I got
1
as result.If I remove the column with the gene names, I can get all the columns in the decreasing order. I need to find the genes thatare the most abundance across all not only in one specific sample.
I think that you should provide some sample data and then a desired result in your original question. This would help to clarify what you are aiming to achieve.
It seems that you first want a summary metric for each gene, like sum, mean, or median, and to then order genes based on this. For example:
I was trying to do it with Excel with a smaller number of genes/column but I cannot simultaneously find the highest values in all columns. Thanks for the clarification for the posting (which question goes where).
There's no such thing as a stupid question, and if there is, this is not it. You wish to order by a bunch of columns, and dplyr should have a way of doing that.
I tried
dplyr
but if I select only the column that I want, I lose the gene names so at the end I don't know which gene is. and I have also triedmutate
fromdplyr
I got different number!Cross-posted: https://www.biostars.org/p/458192/