Sort decreasing only numeric columns (more than one) in a dataframe with character and numeric column
1
0
Entering edit mode
camillab. ▴ 10
@camillab-23642
Last seen 17 months ago
London

Hi,

I think it's a stupid question but I cannot find a way to do it online. I want to sort simultaneously all the columns of my dataframe from the biggest to the smallest number (decreasing order) keeping at the same time the first column that contains the gene names (=characters) since I need to know which gene is what after the other columns has been re-ordered. my dataset:

# A tibble: 3 x 12
gene_symbol control  0hr control 24hr
<chr>                <dbl>          <dbl>
1 5-8S                 270.           146.
2 5S                   123.           129.
3 7SK                   56.6           31.0


the only thing that works is selecting the numeric columns that I want to sort but in doing so I will lose the names of the genes:

CD.sorted <- apply(test[4:14],2,sort,decreasing=T)


and if I run this the order is not decreasing and the number are all messed up (I got either all 0 or 9.99):

CD.sorted2 <- apply(test,2,sort,decreasing=T)


and if I run this code, the order is not decreased and all the number are again changed but I don't understand how:

library(dplyr)
f <- test %>% mutate_at(4:14, funs(sort(., decreasing = TRUE)))


and if I run this (where I tell exactly which columns to order) there is no changes at all:

CD.sorted2 <- arrange(test, desc("control  0hr", "control 24hr","control  48hr","control 54hr", "control 60hr","control 66hr","control 72hr","control 96hr","control 120hr" ,"control 144hr", "control 168hr" ))


thank you

Camilla

sort R RNAseq character numeric • 720 views
1
Entering edit mode

Base R's order() can order by multiple columns like this:

order(col1, col2, ..., coln, decreasing = TRUE)


However, I think that you'll struggle to find a suitable ordering across multiple samples and genes in this way. You could order each sample separately and store the results in a list, like this, if you wished:

mat <- data.frame(gene = paste0('gene', 1:20), matrix(rexp(200, rate=.1), ncol=20))
mat <- tibble::as_tibble(mat)

apply(mat[,2:ncol(mat)], 2, function(x) {
idx <- order(x, decreasing = TRUE)
data.frame(
gene = mat[idx,'gene'],
x[idx])})


To order by just a single column:

mat[order(mat[,'X6'], decreasing = TRUE),]

0
Entering edit mode

if I run:

order("control  0hr", "control 24hr","control  48hr","control 54hr", "control 60hr","control 66hr","control 72hr","control 96hr","control 120hr" ,"control 144hr", "control 168hr" , decreasing = TRUE)


I got 1 as result.

If I remove the column with the gene names, I can get all the columns in the decreasing order. I need to find the genes thatare the most abundance across all not only in one specific sample.

1
Entering edit mode

I think that you should provide some sample data and then a desired result in your original question. This would help to clarify what you are aiming to achieve.

It seems that you first want a summary metric for each gene, like sum, mean, or median, and to then order genes based on this. For example:

idx <- order(apply(mat[,2:ncol(mat)], 1, sum), decreasing = TRUE)
mat[idx,]

1
Entering edit mode

I was trying to do it with Excel with a smaller number of genes/column but I cannot simultaneously find the highest values in all columns. Thanks for the clarification for the posting (which question goes where).

0
Entering edit mode

There's no such thing as a stupid question, and if there is, this is not it. You wish to order by a bunch of columns, and dplyr should have a way of doing that.

0
Entering edit mode

I tried dplyr but if I select only the column that I want, I lose the gene names so at the end I don't know which gene is. and I have also tried mutate from dplyr I got different number!

0
Entering edit mode
2
Entering edit mode
Ram ▴ 150
@ram
Last seen 3 days ago
United States

dplyr has awesome functions that can select a bunch of columns and perform actions across them. Using Kevin's sample dataset, here's how you'd do it:

mat %>% arrange(desc(across(-gene)))


Character limits is reached, so I'll truncate the output to the first 10 columns and 10 rows

(mat %>% arrange(across(!gene)))[1:10,1:10]

gene         X1        X2        X3         X4        X5         X6        X7         X8
1   gene4  0.7584679 11.716186  5.127271  0.1015122  1.772376 14.3669174 12.645955  0.1591437
2  gene14  0.7584679 11.716186  5.127271  0.1015122  1.772376 14.3669174 12.645955  0.1591437
3   gene6  4.2842208  4.877391  2.795409 14.7575866 17.178091  0.1438329 15.996090 14.4949145
4  gene16  4.2842208  4.877391  2.795409 14.7575866 17.178091  0.1438329 15.996090 14.4949145
5   gene5  4.3069101  3.710006 13.163678 10.7128000  2.661170  0.9162319 23.260643  2.6604723
6  gene15  4.3069101  3.710006 13.163678 10.7128000  2.661170  0.9162319 23.260643  2.6604723
7   gene1  6.6057449  1.154507  8.355614 10.1557894  7.383982  4.0959640  9.665963 13.8651378
8  gene11  6.6057449  1.154507  8.355614 10.1557894  7.383982  4.0959640  9.665963 13.8651378
9   gene9 10.4593677  8.043021 11.563477 16.9735232  5.518921 14.8537116 13.597860 29.6926161
10 gene19 10.4593677  8.043021 11.563477 16.9735232  5.518921 14.8537116 13.597860 29.6926161
X9
1  11.5794838
2  11.5794838
3   6.6068753
4   6.6068753
5  22.6667529
6  22.6667529
7   3.3119579
8   3.3119579
9   0.5391543
10  0.5391543

## descending order
(mat %>% arrange(desc(across(-gene))))[1:10,1:10]

gene       X1         X2         X3         X4         X5        X6        X7        X8
1   gene7 26.09067  0.9044992  0.1209798  0.5120584  4.9461160  3.104944  6.430786  3.200510
2  gene17 26.09067  0.9044992  0.1209798  0.5120584  4.9461160  3.104944  6.430786  3.200510
3   gene3 23.04789  1.1921839  0.3533496 44.5954627  0.9295802 15.843080  8.004870  3.632617
4  gene13 23.04789  1.1921839  0.3533496 44.5954627  0.9295802 15.843080  8.004870  3.632617
5   gene8 21.68222  1.7352600  5.3151209  0.7945908 16.3582420  2.291566  6.349197  8.602098
6  gene18 21.68222  1.7352600  5.3151209  0.7945908 16.3582420  2.291566  6.349197  8.602098
7  gene10 12.66188 17.6456263  6.6793751 16.6105792  0.2191368 30.380951 13.559502 14.531182
8  gene20 12.66188 17.6456263  6.6793751 16.6105792  0.2191368 30.380951 13.559502 14.531182
9   gene2 11.05985  0.4229906 49.9962174  6.4801072 10.8767814 40.454131  5.303173  0.351597
10 gene12 11.05985  0.4229906 49.9962174  6.4801072 10.8767814 40.454131  5.303173  0.351597
X9
1  10.640208
2  10.640208
3   4.395651
4   4.395651
5   1.989467
6   1.989467
7   1.711846
8   1.711846
9  16.567948
10 16.567948