Question: Why does the order of data being input matter in Timecourse.R
0
5.0 years ago by
b.curran0
New Zealand
b.curran0 wrote:

So I'm not sure whether this is a bug or I'm missing something.

I'm using timecourse to look at a bunch of genes over a period of 5 days.

Section 2.2 of the timecourse manual tells me that I either have to assign time and replicate groups orn have my data ordered in a specific way that the package will read in as default.

I select the relevant columns of data from my data frame, I set up time vector detailing the time, the replicates and treatments and run the mb.long()

rawCounts <- rawCounts[,c(2:12,39:57)]

M3 <- as.matrix(rawCounts[1:39039,])

assay<-c("rep1","rep1","rep1","rep1","rep1","rep2","rep2","rep2","rep1","rep3","rep3","rep2","rep3","rep1","rep2","rep2","rep3","rep3","rep1","rep2","rep3","rep1","rep2","rep3","rep1","rep2","rep3","rep1","rep2","rep3")

trt<-c("Control","Psa","Control","Control","Psa","Control","Psa","Control","Psa","Control","Psa","Control","Control","Psa","Psa","Psa","Psa","Control","Control","Control","Control","Psa","Psa","Psa","Control","Control","Control","Psa","Psa","Psa")

time<-c(72,24,48,24,72,72,72,48,48,48,72,24,24,24,24,48,48,72,120,120,120,120,120,120,96,96,96,96,96,96)

This is fine. It works. I get a bunch of genes out the other end. When I switch the order though like this:

rawCounts <- rawCounts[,c(39:57,2:12)]

M3 <- as.matrix(rawCounts[1:39039,])

assay<-c("rep2","rep3","rep1","rep2","rep2","rep3","rep3","rep1","rep2","rep3","rep1","rep2","rep3","rep1","rep2","rep3","rep1","rep2","rep3","rep1","rep1","rep1","rep1","rep1","rep2","rep2","rep2","rep1","rep3","rep3")

trt<-c("Control","Control","Psa","Psa","Psa","Psa","Control","Control","Control","Control","Psa","Psa","Psa","Control","Control","Control","Psa","Psa","Psa","Control","Psa","Control","Control","Psa","Control","Psa","Control","Psa","Control","Psa")
time<-c(24,24,24,24,48,48,72,120,120,120,120,120,120,96,96,96,96,96,96,72,24,48,24,72,72,72,48,48,48,72)

MB.paired <- mb.long(M3, method="paired", times=5, reps=size, condition.grp=trt, rep.grp=assay, time.grp=time)

I get a different set of genes. The first 2 or 3 are the same, then they're all different. This second set has the samples 2:12 selected last, the relevant time points/assay/treatments have also been swapped in the set up. So why are they different? Timecourse shouldn't be reading them in in the default settings as the settings have been specified, surely? Either I'm missing something (quite likely) or this is a bug.

Cheers

Ben.

timecourse • 659 views
modified 4.9 years ago by James W. MacDonald51k • written 5.0 years ago by b.curran0
Answer: Why does the order of data being input matter in Timecourse.R
0
4.9 years ago by
United States
James W. MacDonald51k wrote:

I think it is because there is an error in your 'assays' variable, although you don't say what you are using for the 'size' variable, so that might be a problem as well. Anyway, note that (using either version of your assay, time, trt vectors):

> d.f <- data.frame(assay, time, trt)
> d.f[order(d.f$time, d.f$trt, d.f\$assay),]
assay time     trt
23  rep1   24 Control
1   rep2   24 Control
2   rep3   24 Control
3   rep1   24     Psa
21  rep1   24     Psa
4   rep2   24     Psa
22  rep1   48 Control
27  rep2   48 Control
29  rep3   48 Control
28  rep1   48     Psa
5   rep2   48     Psa
6   rep3   48     Psa
20  rep1   72 Control
25  rep2   72 Control
7   rep3   72 Control
24  rep1   72     Psa
26  rep2   72     Psa
30  rep3   72     Psa
14  rep1   96 Control
15  rep2   96 Control
16  rep3   96 Control
17  rep1   96     Psa
18  rep2   96     Psa
19  rep3   96     Psa
8   rep1  120 Control
9   rep2  120 Control
10  rep3  120 Control
11  rep1  120     Psa
12  rep2  120     Psa
13  rep3  120     Psa

The bolded portion shows the error.

Thanks for the response. Also the catching of the error :)

Sadly it wasn't that though. With the amended assay, I still get different results. Again, the first few genes are ranked the same and then it varies. And whilst the first few are the same, their hotellingT2 stat changes.

For size, for both instances of the test, I was using

size <- matrix(3, nrow=366, ncol=2)

Which, as best I can tell from the manual, is how it's meant to be, one row for each variable, one column for each treatment, with the number of reps in the cells.

Cheers

Ben.