Entering edit mode
Hi James,
Am I still making any silly mistake over here. I am comparing normal
(21)
with lesionous patient samples (34). This is what I get in R console.
Is
there a problem with the script?
Any advice would greatly help for further analysis.Thanks, Roopa
> getwd()
[1] "C:/Documents and Settings/rsubbaiaih/My Documents"
> setwd(dir="/CRSP 406-11/DEMOS/GSE14905-a")
> ls()
character(0)
> #-----------------------------------------------#
> library(affy)
> eset = justRMA()
Loading required package: AnnotationDbi
> eset
ExpressionSet (storageMode: lockedEnvironment)
assayData: 54675 features, 55 samples
element names: exprs, se.exprs
protocolData
sampleNames: GSM372286.CEL GSM372287.CEL ...
GSM372367.CEL (55 total)
varLabels: ScanDate
varMetadata: labelDescription
phenoData
sampleNames: GSM372286.CEL GSM372287.CEL ...
GSM372367.CEL (55 total)
varLabels: sample
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: hgu133plus2
> f <- factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
+
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2),
labels=c("Healthy", "affected"))
> design <- model.matrix(~ 0 + f)
> design
fHealthy faffected
1 1 0
2 1 0
3 1 0
4 1 0
5 1 0
6 1 0
7 1 0
8 1 0
9 1 0
10 1 0
11 1 0
12 1 0
13 1 0
14 1 0
15 1 0
16 1 0
17 1 0
18 1 0
19 1 0
20 1 0
21 1 0
22 0 1
23 0 1
24 0 1
25 0 1
26 0 1
27 0 1
28 0 1
29 0 1
30 0 1
31 0 1
32 0 1
33 0 1
34 0 1
35 0 1
36 0 1
37 0 1
38 0 1
39 0 1
40 0 1
41 0 1
42 0 1
43 0 1
44 0 1
45 0 1
46 0 1
47 0 1
48 0 1
49 0 1
50 0 1
51 0 1
52 0 1
53 0 1
54 0 1
55 0 1
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")$f
[1] "contr.treatment"
> colnames(design) <-c("Healthy","affected")
> library(limma)
> fit <- lmFit(eset, design)
> fit
An object of class "MArrayLM"
$coefficients
Healthy affected
1007_s_at 10.081309 9.548286
1053_at 6.807501 7.482849
117_at 5.969921 6.147594
121_at 7.403842 7.733666
1255_g_at 2.804475 2.827041
54670 more rows ...
$rank
[1] 2
$assign
[1] 1 1
$qr
$qr
Healthy affected
1 -4.5825757 0.000000
2 0.2182179 -5.830952
3 0.2182179 0.000000
4 0.2182179 0.000000
5 0.2182179 0.000000
50 more rows ...
$qraux
[1] 1.218218 1.000000
$pivot
[1] 1 2
$tol
[1] 1e-07
$rank
[1] 2
$df.residual
[1] 53 53 53 53 53
54670 more elements ...
$sigma
1007_s_at 1053_at 117_at 121_at 1255_g_at
0.2866489 0.4801618 0.3499880 0.2332555 0.1053397
54670 more elements ...
$cov.coefficients
Healthy affected
Healthy 0.04761905 0.00000000
affected 0.00000000 0.02941176
$stdev.unscaled
Healthy affected
1007_s_at 0.2182179 0.1714986
1053_at 0.2182179 0.1714986
117_at 0.2182179 0.1714986
121_at 0.2182179 0.1714986
1255_g_at 0.2182179 0.1714986
54670 more rows ...
$pivot
[1] 1 2
$genes
[1] "1007_s_at" "1053_at" "117_at" "121_at"
[5] "1255_g_at"
54670 more rows ...
$Amean
1007_s_at 1053_at 117_at 121_at 1255_g_at
9.751804 7.224988 6.079756 7.607733 2.818425
54670 more elements ...
$method
[1] "ls"
$design
Healthy affected
1 1 0
2 1 0
3 1 0
4 1 0
5 1 0
50 more rows ...
> contrast.matrix <-makeContrasts(affected-Healthy,levels = design)
> fit2 <- contrasts.fit(fit, contrast.matrix)
> fit2 <- eBayes(fit2)
> results <- decideTests(fit2, adjust="fdr",lfc=1)
> summary(results)
affected - Healthy
-1 1187
0 52504
1 984
> results
TestResults matrix
1007_s_at 1053_at 117_at 121_at 1255_g_at
0 0 0 0 0
54670 more rows ...
>
On Thu, Jan 31, 2013 at 4:48 PM, James W. MacDonald <jmacdon@uw.edu>
wrote:
> If you just use the expression values from the original authors, I
get
> just under 9K probesets for this comparison at an FDR of 0.05 and no
fold
> change criterion. It drops to just under 900 with a 2-fold
difference added
> in.
>
> So yeah, seems like a lot to me as well.
>
> Best,
>
> Jim
>
>
> On 1/31/2013 4:02 PM, Steve Lianoglou wrote:
>
>> ... what Jim said.
>>
>> But also, this 20k differentially expressed (likely probe sets, not
>> genes) is raising a red flag for me, no? Am I alone here?
>>
>> That's .. what's the word I'm looking for ... "a lot".
>>
>> -steve
>>
>> On Thu, Jan 31, 2013 at 3:56 PM, James W. MacDonald<jmacdon@uw.edu>
>> wrote:
>>
>>> Hi Roopa,
>>>
>>>
>>> On 1/31/2013 3:45 PM, Roopa Subbaiaih wrote:
>>>
>>>> Hi Steve,
>>>>
>>>> This was the script I used-
>>>> getwd()
>>>> setwd(dir="/CRSP 406-11/DEMOS/GSE14905-a")
>>>> ls()
>>>> #-----------------------------**------------------#
>>>> library(affy)
>>>> eset = justRMA()
>>>> f<- factor(c(1,1,1,1,1,1,1,1,1,1,**1,1,1,1,1,1,1,1,1,1,1,
>>>>
>>>> 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,**2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,**
>>>> 2,2,2,2),
>>>> labels=c("Healthy", "unaffected"))
>>>> design<- model.matrix(~ 0 + f)
>>>> design
>>>> colnames(design)<-c("Healthy",**"unaffected")
>>>> design
>>>> library(limma)
>>>> fit<- lmFit(eset, design)
>>>> library(hgu133plus2.db)
>>>> fit$genes$Symbol<- getSYMBOL(fit$genes$ID,"**hgu133plus2.db")
>>>> contrast.matrix<-**makeContrasts(affected-**Healthy,levels =
design)
>>>> fit2<- contrasts.fit(fit, contrast.matrix)
>>>> fit2<- eBayes(fit2)
>>>> topTable(fit2,coef=1,p=0.05, adjust="fdr")
>>>> results<- decideTests(fit2, adjust="fdr", p=0.05)
>>>> summary(results)
>>>> write.table(results,file="**myresults.txt")
>>>> write.fit().
>>>>
>>>> I had identified ~54,000 genes of which ~ 20K were differentially
>>>> expressed.
>>>>
>>>> But when I use these genes for pathway analysis the software asks
for
>>>> fold
>>>> change values but not p value so it is easier to analyze the
data.
>>>>
>>>> What I did was - I used the differentially expressed gene table
for
>>>> further
>>>> analysis. That is I converted logFC values to FC(test/control)
assuming
>>>> that
>>>>
>>>> FC= FCmean(test)-FCmean(blank) and LogFC is log2 of FC values.
>>>>
>>>> Once I got test/control values I converted them to fold changes
using
>>>> "IF"
>>>> function in excel sheet to eliminate genes with fold changes
between -2
>>>> to
>>>> +2.
>>>>
>>>> Once I did this the number of significant genes drastically
reduced to ~
>>>> 2,000.
>>>>
>>>> Is this the right method?
>>>>
>>>
>>> No. Note that the range of fold changes after 'unlogging' will be
0-INF,
>>> and
>>> the down-regulated genes will be in the range 0-1 whereas the
upregulated
>>> genes will be in the range 1-INF. (e.g. two fold up will be 2,
whereas 2
>>> fold down will be 1/2 or 0.5).
>>>
>>> The easiest way to filter is to keep the logFC and filter on -1
and 1. Or
>>> you can use the lfc argument to decideTests().
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>> Please advice, thanks, Roopa
>>>>
>>>> On Thu, Jan 31, 2013 at 3:23 PM, Steve Lianoglou<
>>>> mailinglist.honeypot@gmail.com**> wrote:
>>>>
>>>> Hi,
>>>>>
>>>>> On Thu, Jan 31, 2013 at 2:54 PM, Roopa
Subbaiaih<rss115@case.edu>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> Thanks for the reply, I could pull out the the whole
information for
>>>>>> differentially expressed genes. The criteria used was
adjust="fdr",
>>>>>>
>>>>> p=0.05.
>>>>>
>>>>>> I came up with ~ 20,000 genes to be differentially expressed.
>>>>>>
>>>>> Hmm ... surely 20k cannot be correct?
>>>>>
>>>>> Since I wanted to analyze these genes for deregulated pathways I
had to
>>>>>> come up with fold change values for further analysis.
>>>>>>
>>>>>> I assume that for each gene FC= FCmean(test)-FCmean(blank).
LogFC is
>>>>>> log2
>>>>>> of FC values.
>>>>>>
>>>>>> When I convert the FC values (test/blank) to foldchanges using
IF
>>>>>>
>>>>> function
>>>>>
>>>>>> I get lesser number of genes to be deregulated. The criteria
was =>2
>>>>>> foldchanges and =<-2 fold changes.
>>>>>>
>>>>> I'm missing previous context to this email, so -- not sure what
the
>>>>> "IF function" is, but if you're using limma, the log2fold
changes are
>>>>> reported for you in the logFC column that is returned from
>>>>> `topTable(...)`
>>>>>
>>>>> -steve
>>>>>
>>>>> --
>>>>> Steve Lianoglou
>>>>> Graduate Student: Computational Systems Biology
>>>>> | Memorial Sloan-Kettering Cancer Center
>>>>> | Weill Medical College of Cornell University
>>>>> Contact Info: http://cbio.mskcc.org/~lianos/**contact<http: cbi="" o.mskcc.org="" ~lianos="" contact="">
>>>>>
>>>>>
>>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>>
>>>
>>
>>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
--
---------------------------------------
Roopa Shree Subbaiaih
Post Doctoral Fellow
Department of Dermatology
School of Medicine
Case Western Reserve University
Cleveland, OH-44106
Tel:+1 216 368 0211
[[alternative HTML version deleted]]
Dear Gordon,
Thanks, I could complete the analysis. Roopa