how to design a model matrix and test by edgeR
1
0
Entering edit mode
wang peter ★ 2.0k
@wang-peter-4647
Last seen 9.7 years ago
how to design a model matrix and test by edgeR dear all: my data is composed of 35 samples, have two factor time and treatment 1. The first question is i want to find DE genes (6h.treatment-0h.treatment)-(6h.control- 0h.control) how to design the model? please tell me if the design and test is right such is my coding. 2. the second question is should i use samples involved 6h and 0h to do glmFitting or use all of 35 samples to do? i think i should use all raw.data <- read.table("expression-table.txt",row.names=1) lib_size <- read.table("lib_size.txt"); lib_size <- unlist(lib_size) d <- DGEList(counts = raw.data, lib.size = lib_size) #normalization dge <- calcNormFactors(dge) treatment=factor(c(rep('control',6),rep('treated',24),rep('control',5) )) time=factor(c('0h','0h','0h','24h','24h','24h','0h','0h','0h','6h','6h ','6h','6h','12h','12h','12h','12h','18h','18h','18h','18h', '24h','24h','24h','36h','36h','36h','48h','48h','48h','6h ','12h','18h','36h','48h')) design <- model.matrix(~time*treatment) dge <- estimateGLMCommonDisp(dge, design) dge <- estimateGLMTagwiseDisp(dge, design) glmfit.dge <- glmFit(dge, design,dispersion=dge$common.dispersion) lrt.dge <- glmLRT(dge, glmfit.dge, coef=13) result <- topTags(lrt.dge, adjust.method="BH", sort.by="logFC") -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839 at cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253
• 2.0k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia
> Date: Wed, 25 Jul 2012 22:07:35 -0400 > From: wang peter <wng.peter at="" gmail.com=""> > To: bioconductor at r-project.org > Subject: [BioC] how to design a model matrix and test by edgeR > > how to design a model matrix and test by edgeR > dear all: > my data is composed of 35 samples, have two factor > time and treatment > > 1. The first question is > i want to find DE genes (6h.treatment-0h.treatment)-(6h.control- 0h.control) > how to design the model? please tell me if the design and test is right > such is my coding. As you can see for yourself, coef=13 does not relate to 0h: colnames(design)[13] [1] "time48h:treatmenttreated" On the other hand, coef=14 does: colnames(design)[14] [1] "time6h:treatmenttreated" So you could use either glmLRT(dge, glmfit.dge, coef=14) or glmLRT(dge, glmfit.dge, coef="time6h:treatmenttreated") to find the DE genes for the interaction you want. > 2. the second question is > should i use samples involved 6h and 0h to do glmFitting > or use all of 35 samples to do? > i think i should use all Yes, it is better to use all. There are some strange aspects to your code. You use common dispersion, although the edgeR User's Guide and our published papers make it clear that we recommend tagwise dispersions. You actually estimate tagwise dispersions, but then don't use them. Are you sure you need to sort genes by logFC? Best wishes Gordon > raw.data <- read.table("expression-table.txt",row.names=1) > lib_size <- read.table("lib_size.txt"); > lib_size <- unlist(lib_size) > d <- DGEList(counts = raw.data, lib.size = lib_size) > #normalization > dge <- calcNormFactors(dge) > > treatment=factor(c(rep('control',6),rep('treated',24),rep('control', 5))) > time=factor(c('0h','0h','0h','24h','24h','24h','0h','0h','0h','6h',' 6h','6h','6h','12h','12h','12h','12h','18h','18h','18h','18h', > '24h','24h','24h','36h','36h','36h','48h','48h','48h','6 h','12h','18h','36h','48h')) > design <- model.matrix(~time*treatment) > > dge <- estimateGLMCommonDisp(dge, design) > dge <- estimateGLMTagwiseDisp(dge, design) > glmfit.dge <- glmFit(dge, design,dispersion=dge$common.dispersion) > lrt.dge <- glmLRT(dge, glmfit.dge, coef=13) > result <- topTags(lrt.dge, adjust.method="BH", sort.by="logFC") > > -- > shan gao > Room 231(Dr.Fei lab) > Boyce Thompson Institute > Cornell University > Tower Road, Ithaca, NY 14853-1801 > Office phone: 1-607-254-1267(day) > Official email:sg839 at cornell.edu > Facebook:http://www.facebook.com/profile.php?id=100001986532253 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
dear Dr Gordon: thank you very much for your reply. for the first question, it should be cof=14. i made a mistake. and for the dispersion=dge$common.dispersion i make a mistake either. but thank you for youre corrrection. another question, if you donot mind. do i need estimate the common dispersion before i estimate the tagwised dispersion thank u shan
ADD REPLY
0
Entering edit mode
On Thu, 26 Jul 2012, wang peter wrote: > dear Dr Gordon: > thank you very much for your reply. > for the first question, it should be cof=14. > i made a mistake. > > and for the dispersion=dge$common.dispersion > i make a mistake either. > > but thank you for youre corrrection. > > another question, if you donot mind. > do i need estimate the common dispersion > before i estimate the tagwised dispersion Try it, and you will have your answer. If you run estimateGLMTagwiseDisp() without having previously estimated the common dispersion, it will give an error message to tell you that you need to run estimateGLMCommonDisp() first. Gordon > thank u > shan > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD REPLY

Login before adding your answer.

Traffic: 621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6