limma for finding differentialy expressed genes of several groups
3
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.7 years ago
I would like to find the differentially expressed genes for several variables using the limma package for several groups. -- output of sessionInfo(): I have the rma normalized matrix in the following format : ID_REF GSM362180 GSM362181 GSM362188 GSM362189 GSM362192 244901 5.094871713 4.626623079 4.554272515 4.748604391 4.759221647 244902 5.194528083 4.985930299 4.817426064 5.151654407 4.838741605 244903 5.412329253 5.352970877 5.06250609 5.305709079 8.365082403 244904 5.529220594 5.28134657 5.467445095 5.62968933 5.458388909 244905 5.024052699 4.714631878 4.792865831 4.843975286 4.657188246 244906 5.786557533 5.242403911 5.060605782 5.458148567 5.890061836 where the different columns correspond to four different types of promoters and each of the four promoters has a biological replicate so totally there are 8 columns. I tried using the Limma package to find the differentially expressed genes across several promoters ( with replicates) and I always get an error as Iam new to r and unable to understand it fully . This is the code that I used: Group <- factor(c("p1", "p1", "p2", "p2", "p3","p3","p3","p4","p4"), levels = c("GSM362180","GSM362181","GSM362188","GSM362189","GSM362192" ,"GSM362193","GSM362194","GSM362197","GSM362198")) design <- model.matrix(~0 + Group) colnames(design) <- c("GSM362180","GSM362181","GSM362188","GSM362189", "GSM362192","GSM362193","GSM362194","GSM362197","GSM362198") fit <- lmFit(modified, design) where modified is the rma normalized data matrix as inputted in the above format. I get the following error: Coefficients not estimable: GSM362180 GSM362181 GSM362188 GSM362189 GSM362192 GSM362193 GSM362194 GSM362197 GSM362198 Error in lm.fit(design, t(M)) : 0 (non-NA) cases -- Sent via the guest posting facility at bioconductor.org.
limma limma • 1.1k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States
Hi Priya, On 10/17/2012 10:20 AM, priya [guest] wrote: > I would like to find the differentially expressed genes for several variables using the limma package for several groups. > > -- output of sessionInfo(): > > I have the rma normalized matrix in the following format : > > > ID_REF GSM362180 GSM362181 GSM362188 GSM362189 GSM362192 > 244901 5.094871713 4.626623079 4.554272515 4.748604391 4.759221647 > 244902 5.194528083 4.985930299 4.817426064 5.151654407 4.838741605 > 244903 5.412329253 5.352970877 5.06250609 5.305709079 8.365082403 > 244904 5.529220594 5.28134657 5.467445095 5.62968933 5.458388909 > 244905 5.024052699 4.714631878 4.792865831 4.843975286 4.657188246 > 244906 5.786557533 5.242403911 5.060605782 5.458148567 5.890061836 > > where the different columns correspond to four different types of promoters and each of the four promoters has a biological replicate so totally there are 8 columns. > > I tried using the Limma package to find the differentially expressed genes across several promoters ( with replicates) and I always get an error as Iam new to r and unable to understand it fully . > > This is the code that I used: > > Group<- factor(c("p1", "p1", "p2", "p2", "p3","p3","p3","p4","p4"), levels = c("GSM362180","GSM362181","GSM362188","GSM362189","GSM362192" ,"GSM362193","GSM362194","GSM362197","GSM362198")) This ^^^^^^^^^^^^^ doesn't make sense. If you look at the results from that, you get this: > Group [1] <na> <na> <na> <na> <na> <na> <na> <na> <na> 9 Levels: GSM362180 GSM362181 GSM362188 GSM362189 GSM362192 ... GSM362198 Plus you say you have 8 samples, yet you list 9. You most likely want something like > Group <- factor(c("p1", "p1", "p2", "p2", "p3","p3","p3","p4","p4")) > design <- model.matrix(~0+Group) > colnames(design) <- gsub("Group","", colnames(design)) > design p1 p2 p3 p4 1 1 0 0 0 2 1 0 0 0 3 0 1 0 0 4 0 1 0 0 5 0 0 1 0 6 0 0 1 0 7 0 0 1 0 8 0 0 0 1 9 0 0 0 1 attr(,"assign") [1] 1 1 1 1 attr(,"contrasts") attr(,"contrasts")$Group [1] "contr.treatment" Then you can make your contrasts matrix and go from there. Best, Jim > design<- model.matrix(~0 + Group) > > colnames(design)<- c("GSM362180","GSM362181","GSM362188","GSM362189" ,"GSM362192","GSM362193","GSM362194","GSM362197","GSM362198") > fit<- lmFit(modified, design) > > where modified is the rma normalized data matrix as inputted in the above format. > I get the following error: > > Coefficients not estimable: GSM362180 GSM362181 GSM362188 GSM362189 GSM362192 GSM362193 GSM362194 GSM362197 GSM362198 > Error in lm.fit(design, t(M)) : 0 (non-NA) cases > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 4 months ago
United States
On Wed, Oct 17, 2012 at 10:20 AM, priya [guest] <guest@bioconductor.org>wrote: > > I would like to find the differentially expressed genes for several > variables using the limma package for several groups. > > -- output of sessionInfo(): > > I have the rma normalized matrix in the following format : > > > ID_REF GSM362180 GSM362181 GSM362188 GSM362189 GSM362192 > 244901 5.094871713 4.626623079 4.554272515 4.748604391 4.759221647 > 244902 5.194528083 4.985930299 4.817426064 5.151654407 4.838741605 > 244903 5.412329253 5.352970877 5.06250609 5.305709079 8.365082403 > 244904 5.529220594 5.28134657 5.467445095 5.62968933 5.458388909 > 244905 5.024052699 4.714631878 4.792865831 4.843975286 4.657188246 > 244906 5.786557533 5.242403911 5.060605782 5.458148567 5.890061836 > > where the different columns correspond to four different types of > promoters and each of the four promoters has a biological replicate so > totally there are 8 columns. > > I tried using the Limma package to find the differentially expressed genes > across several promoters ( with replicates) and I always get an error as > Iam new to r and unable to understand it fully . > > This is the code that I used: > > Group <- factor(c("p1", "p1", "p2", "p2", "p3","p3","p3","p4","p4"), > levels = > c("GSM362180","GSM362181","GSM362188","GSM362189","GSM362192","GSM36 2193","GSM362194","GSM362197","GSM362198")) > You want: Group <- factor(c("p1", "p1", "p2", "p2", "p3","p3","p3","p4","p4")) > > design <- model.matrix(~0 + Group) > > The design matrix will not be correct. > colnames(design) <- > c("GSM362180","GSM362181","GSM362188","GSM362189","GSM362192","GSM36 2193","GSM362194","GSM362197","GSM362198") > Do not try to change the column names. > fit <- lmFit(modified, design) > > This should probably work, now. You may need to specify a contrast matrix to get something closer to what you want, though. Sean > where modified is the rma normalized data matrix as inputted in the above > format. > I get the following error: > > Coefficients not estimable: GSM362180 GSM362181 GSM362188 GSM362189 > GSM362192 GSM362193 GSM362194 GSM362197 GSM362198 > Error in lm.fit(design, t(M)) : 0 (non-NA) cases > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.7 years ago
I would like to find the differentially expressed genes for several variables using the limma package for several groups. I have the rma normalized matrix in the following format : ID_REF GSM362180 GSM362181 GSM362188 GSM362189 GSM362192 244901 5.094871713 4.626623079 4.554272515 4.748604391 4.759221647 244902 5.194528083 4.985930299 4.817426064 5.151654407 4.838741605 244903 5.412329253 5.352970877 5.06250609 5.305709079 8.365082403 244904 5.529220594 5.28134657 5.467445095 5.62968933 5.458388909 244905 5.024052699 4.714631878 4.792865831 4.843975286 4.657188246 244906 5.786557533 5.242403911 5.060605782 5.458148567 5.890061836 where the different columns correspond to four different types of promoters and each of the four promoters has a biological replicate so totally there are 8 columns.There are totally 22810 genes and I would like to get a list of the genes which are differentially expressed I tried using the Limma package to find the differentially expressed genes across several promoters ( with replicates). This is the code that I used: Group <- factor(c("p1", "p1", "p2", "p2", "p3","p3","p3","p4","p4"), levels = c("GSM362180","GSM362181","GSM362188","GSM362189","GSM362192" ,"GSM362193","GSM362194","GSM362197","GSM362198")) design <- model.matrix(~0 + Group) colnames(design) <- c("GSM362180","GSM362181","GSM362188","GSM362189", "GSM362192","GSM362193","GSM362194","GSM362197") fit <- lmFit(modified, design) where modified is the rma normalized data matrix as inputted in the above format. I get the following error: Coefficients not estimable: GSM362180 GSM362181 GSM362188 GSM362189 GSM362192 GSM362193 GSM362194 GSM362197 GSM362198 Error in lm.fit(design, t(M)) : 0 (non-NA) cases I managed to get help from the mailing list prior to this and was able to correct it in the following way. -- output of sessionInfo(): Group <- factor(c("p1", "p1", "p2", "p2", "p3","p3","p3","p4","p4")) design <- model.matrix(~0+Group) colnames(design) <- gsub("Group","", colnames(design)) For creating the contrast matrix I proceeded as : fit<-lmFit(modified,design) fit<-ebayes(fit) fit<-lmFit(modified,design) contrast.matrix<-makeContrasts(p1-p2,p1-p3,p1-p4,p2-p3,p2-p4,p3-p4,lev els=design) fit2<-contrasts.fit(fit,contrast.matrix) fit2<-eBayes(fit2) topTable(fit2,coef=1,adjust="fdr") logFC AveExpr t P.Value adj.P.Val B 14865 -3.063442 11.939646 -20.85957 5.020817e-09 8.235097e-05 10.906936 15107 -3.316203 13.136888 -19.79194 8.041764e-09 8.235097e-05 10.543106 12037 2.806403 10.772050 19.10380 1.103823e-08 8.235097e-05 10.292274 15931 -3.469330 10.325303 -18.53793 1.444120e-08 8.235097e-05 10.075671 18327 3.198993 9.633795 17.57118 2.328424e-08 8.331092e-05 9.682365 7521 -2.419999 7.373064 -17.16080 2.873576e-08 8.331092e-05 9.505924 16564 3.268568 8.365454 17.09028 2.980775e-08 8.331092e-05 9.475007 3832 -2.685268 7.540418 -16.89167 3.307237e-08 8.331092e-05 9.386966 10364 2.466369 6.779762 16.71021 3.640344e-08 8.331092e-05 9.305265 4967 -2.453614 11.409188 -16.62282 3.813877e-08 8.331092e-05 9.265479 o<-order(fit2$F.p.value) fit2$genes[o[1:30],] After the above step I get as NULL. I do not know where am making the mistake. clas <- decideTests(fit2, method = "nestedF", + adjust.method = "fdr", p = 0.05) I get the following output which I know is quite wrong : Contrasts p1 - p2 p1 - p3 p1 - p4 p2 - p3 p2 - p4 p3 - p4 [1,] 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 -- Sent via the guest posting facility at bioconductor.org.
ADD COMMENT
0
Entering edit mode
On Thu, Oct 18, 2012 at 4:24 AM, priya [guest] <guest@bioconductor.org>wrote: > > I would like to find the differentially expressed genes for several > variables using the limma package for several groups. > I have the rma normalized matrix in the following format : > > > ID_REF GSM362180 GSM362181 GSM362188 GSM362189 GSM362192 > 244901 5.094871713 4.626623079 4.554272515 4.748604391 4.759221647 > 244902 5.194528083 4.985930299 4.817426064 5.151654407 4.838741605 > 244903 5.412329253 5.352970877 5.06250609 5.305709079 8.365082403 > 244904 5.529220594 5.28134657 5.467445095 5.62968933 5.458388909 > 244905 5.024052699 4.714631878 4.792865831 4.843975286 4.657188246 > 244906 5.786557533 5.242403911 5.060605782 5.458148567 5.890061836 > > where the different columns correspond to four different types of > promoters and each of the four promoters has a biological replicate so > totally there are 8 columns.There are totally 22810 genes and I would like > to get a list of the genes which are differentially expressed > > I tried using the Limma package to find the differentially expressed genes > across several promoters ( with replicates). > This is the code that I used: > > Group <- factor(c("p1", "p1", "p2", "p2", "p3","p3","p3","p4","p4"), > levels = > c("GSM362180","GSM362181","GSM362188","GSM362189","GSM362192","GSM36 2193","GSM362194","GSM362197","GSM362198")) > > design <- model.matrix(~0 + Group) > > colnames(design) <- > c("GSM362180","GSM362181","GSM362188","GSM362189","GSM362192","GSM36 2193","GSM362194","GSM362197") > fit <- lmFit(modified, design) > > where modified is the rma normalized data matrix as inputted in the above > format. > I get the following error: > > Coefficients not estimable: GSM362180 GSM362181 GSM362188 GSM362189 > GSM362192 GSM362193 GSM362194 GSM362197 GSM362198 > Error in lm.fit(design, t(M)) : 0 (non-NA) cases > > > I managed to get help from the mailing list prior to this and was able to > correct it in the following way. > > > -- output of sessionInfo(): > > Group <- factor(c("p1", "p1", "p2", "p2", "p3","p3","p3","p4","p4")) > design <- model.matrix(~0+Group) > colnames(design) <- gsub("Group","", colnames(design)) > > For creating the contrast matrix I proceeded as : > fit<-lmFit(modified,design) > fit<-ebayes(fit) > fit<-lmFit(modified,design) > > contrast.matrix<-makeContrasts(p1-p2,p1-p3,p1-p4,p2-p3,p2-p4,p3-p4,l evels=design) > fit2<-contrasts.fit(fit,contrast.matrix) > fit2<-eBayes(fit2) > topTable(fit2,coef=1,adjust="fdr") > logFC AveExpr t P.Value adj.P.Val B > 14865 -3.063442 11.939646 -20.85957 5.020817e-09 8.235097e-05 10.906936 > 15107 -3.316203 13.136888 -19.79194 8.041764e-09 8.235097e-05 10.543106 > 12037 2.806403 10.772050 19.10380 1.103823e-08 8.235097e-05 10.292274 > 15931 -3.469330 10.325303 -18.53793 1.444120e-08 8.235097e-05 10.075671 > 18327 3.198993 9.633795 17.57118 2.328424e-08 8.331092e-05 9.682365 > 7521 -2.419999 7.373064 -17.16080 2.873576e-08 8.331092e-05 9.505924 > 16564 3.268568 8.365454 17.09028 2.980775e-08 8.331092e-05 9.475007 > 3832 -2.685268 7.540418 -16.89167 3.307237e-08 8.331092e-05 9.386966 > 10364 2.466369 6.779762 16.71021 3.640344e-08 8.331092e-05 9.305265 > 4967 -2.453614 11.409188 -16.62282 3.813877e-08 8.331092e-05 9.265479 > o<-order(fit2$F.p.value) > fit2$genes[o[1:30],] > > > After the above step I get as NULL. I do not know where am making the > mistake. > > > clas <- decideTests(fit2, method = "nestedF", > + adjust.method = "fdr", p = 0.05) > > > I get the following output which I know is quite wrong : > Contrasts > p1 - p2 p1 - p3 p1 - p4 p2 - p3 p2 - p4 p3 - p4 > [1,] 0 0 0 0 0 0 > [2,] 0 0 0 0 0 0 > [3,] 0 0 0 0 0 0 > [4,] 0 0 0 0 0 0 > [5,] 0 0 0 0 0 0 > [6,] 0 0 0 0 0 0 > Hi, Priya. Nice job moving forward with your analysis. The output above looks fine to me. If you read the help for decideTests, you will notice that the output is 0/1 where a 1 signifies that a given probe was "significant" for a given contrast. What makes you think your results are wrong? Sean [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 346 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6