Question: How can I run a paired RankProd analysis in R?
0
5 weeks ago by
joker330
joker330 wrote:

Hi! I would like to run a paired RankProd analysis in R. My dataset consists of matched cancer samples and healthy controls. For example, the Colorectal Cancer data set consists of 40 cancer samples and 40 matched healthy controls. I defined the class vectors and origin vectors accordingly:

# class vector

n1 <- 40

n2 <- 40

cl <- rep(c(0,1), c(n1,n2))

# origin vector

origin <- rep(1, n1+n2)

# rankprod code

RP.out <- RankProducts(cancer_data, cl, origin, logged=TRUE, plot=FALSE, gene.names = genenames, rand=123)

I have two questions: (1)This calculation will always run as "Rank Product analysis for unpaired case". Is there a specific argument where I can specify the pairing of my cancer samples and healthy controls?

(2) If I run a dataset with more than 60 samples (30 cancer + 30 healthy controls), I receive an error message that reads: "Error: vector memory exhausted (limit reached?)" (NB: I use a MAC). I've followed some advice from other forums to change .Renviron file to "RMAXVSIZE=100GB". But with those large matrices, the error does not go away. Is there any solution?

rnaseq rankprod R paired • 61 views
modified 5 weeks ago by James W. MacDonald52k • written 5 weeks ago by joker330
Answer: How can I run a paired RankProd analysis in R?
0
5 weeks ago by
United States
James W. MacDonald52k wrote:

Matching of subjects is not the same as paired data. Pairing assumes that there is a dependence between the paired samples (usually meaning that the samples came from the same subject, or within the same litter, or something like that). There is no reason to assume that two matched subjects are anything but iid.

As for the memory issue, do you really have 100G RAM on your mac? I suppose not, but if so congratulations! If you have a 'regular' amount of RAM like 8 or 16G, then saying 'hey R, you can have 100G RAM' doesn't actually allow R to have that much.

I'm sort of surprised that RankProd needs that much memory. It's been years since I have been memory constrained (well, computer memory constrained, that is), so I don't know for sure, but dollars to donuts limma would chug through a linear model with 80 samples no sweat. But maybe not?

Anyway, if you don't have the memory, get a computer that does. You could rent a reasonable sized EC2 instance on AWS for cheap, maybe even for free, which is probably the best bet if you don't have other access.

Hi!

Thank you very much for your answer! I apologize, I chose the wrong term above. The data is actually paired. The healthy and cancer samples are taken from the same patient. Is there a way to specify a paired analysis? The RankProd package documentation/tutorial only describes unpaired cases.

And regarding the memory issue: I have a MAC with 16GB. I've followed the instructions from a post on StackOverflow where someone had the same issue. They solved the problem apparently by setting "RMAXVSIZE=100GB". For me, this didn't work.

I hope that in my case the calculation is too resource-hungry as it assumes an unpaired case. I hope once I can correctly specify the function so that only the correct pairs are considered, the memory issue might be solved as well.

Conventionally for a paired analysis in a non-parametric setting you would first compute the difference (after taking logs) between the cancer and healthy tissues, then do a one-sample analysis. This is essentially a sign test. That may help with the memory issues, which seems to be true for me.

> fakedat <- matrix(rnorm(80*3e4), ncol = 80)
> z <- RankProducts(fakedat, rep(0:1, each = 40))
Rank Product analysis for unpaired case

Error: cannot allocate vector of size 9.2 Mb
> z <- RankProducts(fakedat[,1:40] - fakedat[,41:80], rep(1, 40))
Rank Product analysis for paired case

done  >

> sysinf <- system("systeminfo", intern = TRUE)
> sysinf[grep("^Total", sysinf)]
[1] "Total Physical Memory:     16,292 MB"