Question

LogFC query in Limma

0

Entering edit mode

Roopa Subbaiaih ▴ 130

@roopa-subbaiaih-5490

Last seen 10.1 years ago

United States

Hi James, Am I still making any silly mistake over here. I am comparing normal (21) with lesionous patient samples (34). This is what I get in R console. Is there a problem with the script? Any advice would greatly help for further analysis.Thanks, Roopa > getwd() [1] "C:/Documents and Settings/rsubbaiaih/My Documents" > setwd(dir="/CRSP 406-11/DEMOS/GSE14905-a") > ls() character(0) > #-----------------------------------------------# > library(affy) > eset = justRMA() Loading required package: AnnotationDbi > eset ExpressionSet (storageMode: lockedEnvironment) assayData: 54675 features, 55 samples element names: exprs, se.exprs protocolData sampleNames: GSM372286.CEL GSM372287.CEL ... GSM372367.CEL (55 total) varLabels: ScanDate varMetadata: labelDescription phenoData sampleNames: GSM372286.CEL GSM372287.CEL ... GSM372367.CEL (55 total) varLabels: sample varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation: hgu133plus2 > f <- factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, + 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2), labels=c("Healthy", "affected")) > design <- model.matrix(~ 0 + f) > design fHealthy faffected 1 1 0 2 1 0 3 1 0 4 1 0 5 1 0 6 1 0 7 1 0 8 1 0 9 1 0 10 1 0 11 1 0 12 1 0 13 1 0 14 1 0 15 1 0 16 1 0 17 1 0 18 1 0 19 1 0 20 1 0 21 1 0 22 0 1 23 0 1 24 0 1 25 0 1 26 0 1 27 0 1 28 0 1 29 0 1 30 0 1 31 0 1 32 0 1 33 0 1 34 0 1 35 0 1 36 0 1 37 0 1 38 0 1 39 0 1 40 0 1 41 0 1 42 0 1 43 0 1 44 0 1 45 0 1 46 0 1 47 0 1 48 0 1 49 0 1 50 0 1 51 0 1 52 0 1 53 0 1 54 0 1 55 0 1 attr(,"assign") [1] 1 1 attr(,"contrasts") attr(,"contrasts")$f [1] "contr.treatment" > colnames(design) <-c("Healthy","affected") > library(limma) > fit <- lmFit(eset, design) > fit An object of class "MArrayLM" $coefficients Healthy affected 1007_s_at 10.081309 9.548286 1053_at 6.807501 7.482849 117_at 5.969921 6.147594 121_at 7.403842 7.733666 1255_g_at 2.804475 2.827041 54670 more rows ... $rank [1] 2 $assign [1] 1 1 $qr $qr Healthy affected 1 -4.5825757 0.000000 2 0.2182179 -5.830952 3 0.2182179 0.000000 4 0.2182179 0.000000 5 0.2182179 0.000000 50 more rows ... $qraux [1] 1.218218 1.000000 $pivot [1] 1 2 $tol [1] 1e-07 $rank [1] 2 $df.residual [1] 53 53 53 53 53 54670 more elements ... $sigma 1007_s_at 1053_at 117_at 121_at 1255_g_at 0.2866489 0.4801618 0.3499880 0.2332555 0.1053397 54670 more elements ... $cov.coefficients Healthy affected Healthy 0.04761905 0.00000000 affected 0.00000000 0.02941176 $stdev.unscaled Healthy affected 1007_s_at 0.2182179 0.1714986 1053_at 0.2182179 0.1714986 117_at 0.2182179 0.1714986 121_at 0.2182179 0.1714986 1255_g_at 0.2182179 0.1714986 54670 more rows ... $pivot [1] 1 2 $genes [1] "1007_s_at" "1053_at" "117_at" "121_at" [5] "1255_g_at" 54670 more rows ... $Amean 1007_s_at 1053_at 117_at 121_at 1255_g_at 9.751804 7.224988 6.079756 7.607733 2.818425 54670 more elements ... $method [1] "ls" $design Healthy affected 1 1 0 2 1 0 3 1 0 4 1 0 5 1 0 50 more rows ... > contrast.matrix <-makeContrasts(affected-Healthy,levels = design) > fit2 <- contrasts.fit(fit, contrast.matrix) > fit2 <- eBayes(fit2) > results <- decideTests(fit2, adjust="fdr",lfc=1) > summary(results) affected - Healthy -1 1187 0 52504 1 984 > results TestResults matrix 1007_s_at 1053_at 117_at 121_at 1255_g_at 0 0 0 0 0 54670 more rows ... > On Thu, Jan 31, 2013 at 4:48 PM, James W. MacDonald <jmacdon@uw.edu> wrote: > If you just use the expression values from the original authors, I get > just under 9K probesets for this comparison at an FDR of 0.05 and no fold > change criterion. It drops to just under 900 with a 2-fold difference added > in. > > So yeah, seems like a lot to me as well. > > Best, > > Jim > > > On 1/31/2013 4:02 PM, Steve Lianoglou wrote: > >> ... what Jim said. >> >> But also, this 20k differentially expressed (likely probe sets, not >> genes) is raising a red flag for me, no? Am I alone here? >> >> That's .. what's the word I'm looking for ... "a lot". >> >> -steve >> >> On Thu, Jan 31, 2013 at 3:56 PM, James W. MacDonald<jmacdon@uw.edu> >> wrote: >> >>> Hi Roopa, >>> >>> >>> On 1/31/2013 3:45 PM, Roopa Subbaiaih wrote: >>> >>>> Hi Steve, >>>> >>>> This was the script I used- >>>> getwd() >>>> setwd(dir="/CRSP 406-11/DEMOS/GSE14905-a") >>>> ls() >>>> #-----------------------------**------------------# >>>> library(affy) >>>> eset = justRMA() >>>> f<- factor(c(1,1,1,1,1,1,1,1,1,1,**1,1,1,1,1,1,1,1,1,1,1, >>>> >>>> 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,**2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,** >>>> 2,2,2,2), >>>> labels=c("Healthy", "unaffected")) >>>> design<- model.matrix(~ 0 + f) >>>> design >>>> colnames(design)<-c("Healthy",**"unaffected") >>>> design >>>> library(limma) >>>> fit<- lmFit(eset, design) >>>> library(hgu133plus2.db) >>>> fit$genes$Symbol<- getSYMBOL(fit$genes$ID,"**hgu133plus2.db") >>>> contrast.matrix<-**makeContrasts(affected-**Healthy,levels = design) >>>> fit2<- contrasts.fit(fit, contrast.matrix) >>>> fit2<- eBayes(fit2) >>>> topTable(fit2,coef=1,p=0.05, adjust="fdr") >>>> results<- decideTests(fit2, adjust="fdr", p=0.05) >>>> summary(results) >>>> write.table(results,file="**myresults.txt") >>>> write.fit(). >>>> >>>> I had identified ~54,000 genes of which ~ 20K were differentially >>>> expressed. >>>> >>>> But when I use these genes for pathway analysis the software asks for >>>> fold >>>> change values but not p value so it is easier to analyze the data. >>>> >>>> What I did was - I used the differentially expressed gene table for >>>> further >>>> analysis. That is I converted logFC values to FC(test/control) assuming >>>> that >>>> >>>> FC= FCmean(test)-FCmean(blank) and LogFC is log2 of FC values. >>>> >>>> Once I got test/control values I converted them to fold changes using >>>> "IF" >>>> function in excel sheet to eliminate genes with fold changes between -2 >>>> to >>>> +2. >>>> >>>> Once I did this the number of significant genes drastically reduced to ~ >>>> 2,000. >>>> >>>> Is this the right method? >>>> >>> >>> No. Note that the range of fold changes after 'unlogging' will be 0-INF, >>> and >>> the down-regulated genes will be in the range 0-1 whereas the upregulated >>> genes will be in the range 1-INF. (e.g. two fold up will be 2, whereas 2 >>> fold down will be 1/2 or 0.5). >>> >>> The easiest way to filter is to keep the logFC and filter on -1 and 1. Or >>> you can use the lfc argument to decideTests(). >>> >>> Best, >>> >>> Jim >>> >>> >>> >>> Please advice, thanks, Roopa >>>> >>>> On Thu, Jan 31, 2013 at 3:23 PM, Steve Lianoglou< >>>> mailinglist.honeypot@gmail.com**> wrote: >>>> >>>> Hi, >>>>> >>>>> On Thu, Jan 31, 2013 at 2:54 PM, Roopa Subbaiaih<rss115@case.edu> >>>>> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> Thanks for the reply, I could pull out the the whole information for >>>>>> differentially expressed genes. The criteria used was adjust="fdr", >>>>>> >>>>> p=0.05. >>>>> >>>>>> I came up with ~ 20,000 genes to be differentially expressed. >>>>>> >>>>> Hmm ... surely 20k cannot be correct? >>>>> >>>>> Since I wanted to analyze these genes for deregulated pathways I had to >>>>>> come up with fold change values for further analysis. >>>>>> >>>>>> I assume that for each gene FC= FCmean(test)-FCmean(blank). LogFC is >>>>>> log2 >>>>>> of FC values. >>>>>> >>>>>> When I convert the FC values (test/blank) to foldchanges using IF >>>>>> >>>>> function >>>>> >>>>>> I get lesser number of genes to be deregulated. The criteria was =>2 >>>>>> foldchanges and =<-2 fold changes. >>>>>> >>>>> I'm missing previous context to this email, so -- not sure what the >>>>> "IF function" is, but if you're using limma, the log2fold changes are >>>>> reported for you in the logFC column that is returned from >>>>> `topTable(...)` >>>>> >>>>> -steve >>>>> >>>>> -- >>>>> Steve Lianoglou >>>>> Graduate Student: Computational Systems Biology >>>>> | Memorial Sloan-Kettering Cancer Center >>>>> | Weill Medical College of Cornell University >>>>> Contact Info: http://cbio.mskcc.org/~lianos/**contact<http: cbi="" o.mskcc.org="" ~lianos="" contact=""> >>>>> >>>>> >>>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >>> >>> >> >> > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > -- --------------------------------------- Roopa Shree Subbaiaih Post Doctoral Fellow Department of Dermatology School of Medicine Case Western Reserve University Cleveland, OH-44106 Tel:+1 216 368 0211 [[alternative HTML version deleted]]

Pathways Cancer probe limma convert Pathways Cancer probe limma convert • 1.3k views

ADD COMMENT • link updated 11.9 years ago by Gordon Smyth 52k • written 11.9 years ago by Roopa Subbaiaih ▴ 130

Gordon Smyth · Answer 1 · 2013-02-03

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

Dear Roopa,

Perhaps you have just misread the limma output. This thread has been mainly concerned with your statement that you found ~20,000 genes to be differentially expressed:

LogFC query in Limma

But the limma output shows 984 up-regulated and 1187 down-regulated genes, in other words about 2k total rather than 20k.

Best wishes
Gordon

ADD COMMENT • link 11.9 years ago • updated 10.1 years ago Gordon Smyth 52k

0

Entering edit mode

Dear Gordon,

Thanks, I could complete the analysis. Roopa

ADD REPLY • link updated 10.1 years ago by Gordon Smyth 52k • written 11.9 years ago by Roopa Subbaiaih ▴ 130