Question: edgeR subsetting DGEList by column/sample
2.2 years ago by
mnaymik10
United States
mnaymik10 wrote:

I saw this post from a while ago regarding a similar issue: edgeR: Problem with subsetting a DGEList in latest package version

>d$samples[1:6,] sample lib.size norm.factor type time preExercise_TAGGCTGACTTGAG.1 856 1.1020236 B pre preExercise_TCCATCCTCGTTAG.1 1033 1.2198739 B pre pbmc001_TTGAGGACTTTCAC.1 703 1.2050717 B pre pbmc001_AGTCGCCTGCTTAG.1 1230 1.0304974 B post pbmc001_TACTACACAGCACT.1 1053 0.9790636 C post pbmc001_TAAACAACCCTTAT.1 895 1.1032946 D pre ... I am trying to do differential expression of things only of type 'B', with the time frame as the group 'post vs pre'. I though the easiest way would be to just subset d via: d.B = d[,grep('B',d$samples$type)] But I get the error: Error in $<-.data.frame(*tmp*, "group", value = integer(0)) :
replacement has 0 rows, data has 226

Is there a proper way of doing differential expression on just a subset of the DGEList?

I got around this by employing the method from the post Iinked:

B=grep('B',d$samples$type)
test=DGEList(d$counts) test=test[,B] Then replacing test$samples with its proper subset from d:

test$samples=d$samples[Bcells,]

This just seems sort of hacky...

modified 2.2 years ago by Gordon Smyth35k • written 2.2 years ago by mnaymik10

Something seems strange.

Can you start a new R session, call library(edgeR) and then come back here to update your question with the contents provided by copy/pasting the output of  sessionInfo()

> library(edgeR)

> sessionInfo()

R version 3.3.1 (2016-06-21)

Platform: x86_64-apple-darwin13.4.0 (64-bit)

Running under: OS X 10.11.5 (El Capitan)

locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:

[1] edgeR_3.14.0  limma_3.28.14

Can you post a minimal working example of this behaviour?

2.2 years ago by
Denali
Steve Lianoglou12k wrote:

Now that you've verified you're running the latest version of edgeR, I've looked a bit more closely at your example and error.

It seems that you have somehow constructed a DGEList (d) with a $samples data.frame that doesn't have a group column -- what were the commands you used to construct d? In any event, try adding a group column, like so: d$samples <- transform(y$samples, group=paste(type, time, sep="_")) Then try subsetting by columns again ... Also, adding such a group column can be useful in your downstream analysis since you can now analyze your experiment as a one-way layout: design <- model.matrix(~ 0 + group, d$samples)

You can then construct contrasts with makeContrasts that are easy-to-interpret arithmetic over the columns of design.

Since I was using the time column as my group I had set the samples$group=NULL. Later I had been setting group = time which if I do before subsetting It works just fine. I did not realize group was that sensitive. Thanks! ADD REPLYlink written 2.2 years ago by mnaymik10 Or just d$samples$group <- paste(type, time, sep=".") would also do the job. ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Gordon Smyth35k 0 2.2 years ago by Gordon Smyth35k Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia Gordon Smyth35k wrote: A DGEList object needs to satisfy some minimum conditions to be a valid object. If you change a DGEList object so that it no longer satisfies these minimum conditions, then operations such as subsetting can no longer be guaranteed to work. help("DGEList-class") explains what a DGEList object is assumed to contain. It explains that 'group', 'lib.size' and 'norm.factors' are compulsory columns for the d$samples data.frame, so you cannot remove them.