Question

edgeR: How to set model for glmQLFTest in time-course experiment (compared to glmLRT?

0

Entering edit mode

heikki.sarin ▴ 10

@heikkisarin-13379

Last seen 4.5 years ago

Hi,

We have an experiment model: ~ condition + time + condition:subject.nested + condition:time . We're trying find any differentially expressed genes/exons between condition groups across any of the timepoints when accounting for the within subject variability. When we setup the model.matric we get 36 coefficients in total.

We can test it with glmLRT(fit, coef=35:36). Doing so glmLRTtest tests between two models: 1) full: ~ condition + time + condition:subject.nested + condition:time, and 2) reduced: ~ condition + time + condition:subject.nested.

However we would like to do this same also with glmQLFtest. How can we do this? How does one have to setup the coefficients or contrast?

Really appreciate all the help!

edgeR edger "glmqlftest()" timecourse model • 2.8k views

ADD COMMENT • link updated 7.9 years ago by Ryan C. Thompson ★ 7.9k • written 7.9 years ago by heikki.sarin ▴ 10

score 1 · Answer 1 · 2017-08-03

1

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 9 months ago

Icahn School of Medicine at Mount Sinai…

As far as I know, the quasi-likelihood F test supports all the same designs and contrasts as the LRT, and the usage is almost identical. Just swap out glmFit for glmQLFit and swap out glmLRT for glmQLFTest. If you're having a specific problem getting it to work, please update your question with details with your code and the error it produced.

ADD COMMENT • link 7.9 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

There is no problems or errors when running the glmQLFit and glmQLFTest instead of LRT approach. The point of interest on the matter is that is the statistical background of both approaches similar in such way that if I also set the coef=35:36 in glmQLFtest the test will produce an answer to the same question as with glmLRT?

ADD REPLY • link 7.9 years ago heikki.sarin ▴ 10

1

Entering edit mode

Yes, it will.

ADD REPLY • link 7.9 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank you very much!

Little follow-up on the matter. Is it an issue if on plotQLDisp() the Average Log2 CPM value axis and the values plotted start from -5 ? Otherwise the plot seems symmetrical and somewhat similar to the ones in the user guide.

ADD REPLY • link 7.9 years ago heikki.sarin ▴ 10

1

Entering edit mode

Average abundances of -5 suggest that the lowest-abundance genes have, on average, one read per 32 million reads in each library. This seems pretty small for typical library sizes - have you forgotten to filter out low-abundance genes?

ADD REPLY • link 7.9 years ago Aaron Lun ★ 28k

0

Entering edit mode

Yeah, I've done pre filtering before the analysis in two steps:

1) remove.zeros=TRUE (when creating the DGEList object)

2) A <- rowSums(dge$counts)
dge <- dge[A>10, ,keep.lib.sizes=FALSE]

Should I do more aggressive pre-filtering or is it possible that the plots are fine? Also on the Dispersion estimate plot the Average Log CPM values start from the -5 value.

ADD REPLY • link 7.9 years ago heikki.sarin ▴ 10

1

Entering edit mode

This seems okay; I can only assume that you must have very large libraries (>100 million reads, perhaps?).

ADD REPLY • link 7.9 years ago Aaron Lun ★ 28k

0

Entering edit mode

Well, that's good to hear. Yeah, on average around 100 million reads and highest ones a bit over 200 million.

ADD REPLY • link 7.9 years ago heikki.sarin ▴ 10