Question: Adding Batch Factor in linear model...

0

Md.Mamunur Rashid •

**260**wrote:
-------- Original Message --------
Subject: Re: [BioC] Fitting arrayweights in linear model
Date: Tue, 3 Nov 2009 14:23:52 +0000
From: Md.Mamunur Rashid <mamunur.rashid@kcl.ac.uk>
To: Wei Shi <shi@wehi.edu.au>, bioconductor
<bioconductor@stat.math.ethz.ch>
Dear Wei,
Sorry for replying late. I was on vacation and went back to home.
thanks for your suggestion.
I have tried reading some previous mails from bioc forum to find a way
to add batch factor
in the linear model. But did not clearly understand how to do that.
In my linear model I am comparing three conditions. So I have created
a contrast matrix:
fit1<- lmfit(normalized_matrix, design, array_weights)
contrast<-
make.contrast(group2-gropu1,group3-group2,group3-group1,levels=
design)
confit<- contrasts.fit(fit1 ,contrast)
efit<- eBayes()
I have tried the same R script with a new batch of 96 samples from
the same 3 groups.
These 96 sample does not really have that much variation among them .
Yet again the top
list of genes have a very high adjusted p.value.
Q1. Does this mean that there is not not enough evidence of
differential expression. ??
Q2. Is it a very good idea to analyze 96 samples in one go, or should
I subset the file( ExpressionSet)
to work with smaller set of data. Cause I have got this
impression that for this large data
set variations in one of the slides (illumina HT12 containing 12
arrays) do a lot of damage
when the whole data set is normalized. ...?????
(I have done this part to check if my algorithm is ok or not ..)
I have tested the R script for contrasting among male and female
group. and found the
evidence of differential expression among the genes found in Y
chromosome.
This finding gave me a idea that my R script is ok (anyway this is
very straight forward)
****Please any suggestion about this assumption.
contrast<- make.contrast(male-female, levels= design)
Can you please suggest me few more gene filtering methods. I will
start with the gene filter package.
Thanks in advance.
regards,
Mamun
On 09/30/2009 05:26 AM, Wei Shi wrote:
> Dear Mamun:
>
> Sorry for replying to you late. Your boxplot of raw data shows
> that there is a batch effect in your data. You might need to include
> this batch effect in your linear model fitting.
>
> Arrays with low weights will not be dropped when you include
array
> weights in your linear model fitting. The idea of giving weights to
> arrays is to keep all the arrays in the analysis.
>
> Hope this helps.
>
> Cheers,
> Wei
>
>
> Md.Mamunur Rashid wrote:
>> Dear Wei,
>>
>> Thanks for your reply. I know it's a long email, but I tried to
give details about the raw data
>> Please have a look when you have some spare time.
>>
>>
>> 1. About the raw data :
>> -----------------------
>> Yes this arrays are from the same bead chip. The experiments were
done by three individuals (each
>> person conducted experiments with 3 beadchips each consisting 12
arrays, 96 intotal). I think this
>> is the reason for high variance among 3 sets of arrays. 1-32,
33-64, 65-96. But after normalization
>> ("quantile"-lumi package) the data looks quite normal.
>>
>>
>> I have attached two snapshot of boxplot of the raw data and
normalized data.
>>
>> A. Normalized data -http://www.esnips.com/doc/ad4c8a55-2166-49
11-8510-df81eaea63df/data_96_illumina_norm
>>
>> B. Raw data -http://www.esnips.com/doc/0a965715-b5ec-44da-
b64c-4d025c63827b/96_illumina_raw
>>
>> If you look at the two snapshots above : Expression range for raw
data in most cases are below 6. But
>> after normalization it came slightly above 6. Is it possible at all
or something is going wrong during
>> the normalization.
>>
>>
>> One question here. Does illumina data require intra-array
normalization.?? To the extent of my little
>> study I know that they do not need it as it is already done. Please
let me know what do you think.
>>
>>
>> 2. About Arrayweight
>> --------------------
>>
>> As you can see below the the results have not been improved much
compare to the old topTable result.
>>
>> 1. Do you think this high variance in arrays are causing this
problem.???
>> 2. Does fitting the arrayweight in the linear model along the
normalized object drops the downweighted arrays
>> or that needed to be done manually??
>>
>>
>> 3. Pre-processing with beadarray
>> --------------------------------
>>
>> I have the done the same analysis using beadArray
package(data import and pre processing) and had the similar
>> result. So there can be two possibilities which can lead to
this problem.
>>
>> A. Problems in the raw data.
>> B. Or I am doing a wrong linear modeling of the data.(as that
is the common part in the both analysis)
>>
>> I have attached my code fragment here. please have a look:
>>
>>
>> *** What are the other bio packages/tests I can use for
identifying differentially expressed genes>?
>>
>> Thanks in advance
>>
>> best regards,
>> Md.Mamunur Rashid
>>
>>
>> Code Fragment
>> -----------------------------------
>>
>> About fitting the weights : I have done the following
>>
>> data_96_nuId_E holds the normalized object
>>
>>
>>
>> data_96_sampleType<- c("I","I","I","I","I","S","S","S","I","C","S",
"S","C","I","C","I","I","I","I","S","I","S","I","S","C","C","S","I","C
","S","I","S","S","C","C","S","S","I","I","I","S","I","S","I","I","C",
"I","C","S","C","I","I","S","C","S","S","S","C","C","I","S","S","C","I
","I","C","C","S","S","S","C","C","S","I","C","C","C","S","C","C","I",
"I","C","S","C","C","C","S","C","S","C","I","I","S","I","C")
>> data_96_design<- model.matrix(~0+data_96_sampleType)
>> colnames(data_96_design)<- c('C','I','S')
>>
>> ## filtring out the non-expressed genes
>> presentCount<- detectionCall(data_96_raw_nuID)
>> data_96_Matrix<- data_96_Matrix_old[presentCount> 0, ]
>> data_96_probeList<- rownames(data_96_Matrix)
>>
>>
>> arraw_E_96<- arrayWeights(data_96_nuID_E,data_96_design)
>>
>> data_96_contrast<- makeContrasts
(C-I,I-S,S-C,levels=data_96_design)
>>
>> fitw_nu_96<-
lmFit(data_96_Matrix,data_96_design,weights=arraw_E_96)
>> fitw_nu_con_96<- contrasts.fit(fitw_nu_96,data_96_contrast)
>> fitw_e_nu_96<- eBayes(fitw_nu_con_96)
>> topTable(fitw_e_nu_96,coef=1,adjust="BH")
>>
>>
>> ID logFC AveExpr t
P.Value adj.P.Val
>> 3564 oEptU7gl5ULB7ghAf4 -0.11681240 6.341389 -4.749578
7.204301e-06 0.1484950
>> 3900 cUSJ2.kBKqPAk0eZPY 0.31230030 6.881320 4.561405
1.514923e-05 0.1561280
>> 10705 uHk1A0FNxKMV0k7a50 -0.13640964 6.616042 -4.197525
6.070562e-05 0.2895144
>> 14598 WokXUp_QIC4pd_s1.U -0.09998362 6.342947 -4.111694
8.338387e-05 0.2895144
>> 4324 xXLgIpJ_AeI7gRNJ_U 0.11230531 6.478843 4.072272
9.634276e-05 0.2895144
>> 6399 HHteoiUdSJz9wh.e64 0.16605833 7.322017 4.034305
1.106350e-04 0.2895144
>> 13446 x65aXRiHVTIIUQiv00 -0.23601528 6.487643 -3.986742
1.314198e-04 0.2895144
>> 13445 HrlpdGIdVMghRCK.TU -0.20278431 6.470915 -3.986291
1.316337e-04 0.2895144
>> 16912 BgFUhUUgIQu3JDl_sU 0.11902363 6.815447 3.978352
1.354541e-04 0.2895144
>> 9813 NAv6twbtRAh7QuCDro 0.09999394 6.511654 3.958318
1.455733e-04 0.2895144
>> B
>> 3564 3.5478599
>> 3900 2.8790555
>> 10705 1.6341835
>> 14598 1.3504633
>> 4324 1.2214803
>> 6399 1.0980586
>> 13446 0.9445695
>> 13445 0.9431198
>> 16912 0.9176269
>> 9813 0.8534456
>>
>>
>>
>>
>>
>> About fitting the arrayweights in the linear model :
>> previously I have produced arrayweights
>>
>> On 09/21/2009 12:59 AM, Wei Shi wrote:
>>
>>> Dear Mamun:
>>>
>>> Sorry for replying to you late. I am just back from the
holiday.
>>>
>>> Some of your arrays have pretty low weights. Arrays 37-42
have
>>> weights of around 0.2. Are these arrays from the same bead chip?
You
>>> can try put array weights in the linear model fitting to see if
you
>>> can get DE genes.
>>>
>>> Cheers,
>>> Wei
>>>
>>> Md.Mamunur Rashid wrote:
>>>
>>>> Dear Wei,
>>>>
>>>> Thanks again for your reply. I have measured the array-weights
using limma's arrayweights() method. There seem to be
>>>> a good amount of variation among the raw data. But I am not
really sure whether it is acceptable or not??.
>>>>
>>>> But the normalized data have an average value of 1.022609
>>>> with min = 0.5641711 max =1.417277.
>>>>
>>>> Also 2 more question :
>>>>
>>>> 1. can you explain a bit more about how can I filter out the
genes whose detection p-values
>>>> are larger than cut-off (e.g. 0.01) in all samples??
>>>> 2. how can I check batch effect and chip effect in the raw
data.??
>>>>
>>>> Thanks in advance. Any help from anyone also is appreciated.
>>>>
>>>> regards
>>>> Mamun
>>>>
>>>>
>>>> Below is the result from arrayWeigts() method.(Table 1 (weights
for raw data) Table 2(weights of normalized data))
>>>> *** Also attached a text file with the weights.
>>>>
>>>> ArrayWeight of Raw Data:
>>>>
>>>> arrayWeights(data_96_raw,data_96_design)
>>>>
>>>>
>>>> 1 2 3 4 5 6
7 8
>>>> 3.2861102 0.6091478 2.6927503 2.2778201 2.4391841 0.7802437
1.1378995 0.5714981
>>>> 9 10 11 12 13 14
15 16
>>>> 0.8091165 0.5260511 0.7565424 0.8354660 0.2929977 0.4950789
0.4512856 0.1283279
>>>> 17 18 19 20 21 22
23 24
>>>> 0.6713553 0.6275333 0.5441217 0.2711778 0.7278274 0.3323277
0.5786381 0.7375983
>>>> 25 26 27 28 29 30
31 32
>>>> 0.5793944 0.4018671 0.6658730 1.6805341 0.7210032 1.2802383
0.8902326 0.6715769
>>>> 33 34 35 36 37 38
39 40
>>>> 0.2959048 0.8854763 0.7587111 1.5067705 0.1766820 0.1639351
0.1739882 0.1833592
>>>> 41 42 43 44 45 46
47 48
>>>> 0.2443871 0.2188281 0.2945594 0.2607380 0.3419603 0.5932775
0.3396003 0.7229979
>>>> 49 50 51 52 53 54
55 56
>>>> 0.8274153 1.7944800 0.8306203 1.0738534 1.5052515 2.9798751
2.0836544 1.5010941
>>>> 57 58 59 60 61 62
63 64
>>>> 2.3827591 2.7575819 1.5967670 3.3397084 2.0576009 2.9660262
3.8417611 1.3469622
>>>> 65 66 67 68 69 70
71 72
>>>> 1.7334743 2.7855044 4.5417142 3.2551286 3.9634064 3.8054377
3.8020479 4.2732205
>>>> 73 74 75 76 77 78
79 80
>>>> 0.2659556 0.3440704 0.6404601 0.8697107 1.0736648 0.8835502
0.9912724 0.4334270
>>>> 81 82 83 84 85 86
87 88
>>>> 0.8801461 1.0309162 2.5352693 1.8366521 2.5501944 4.5388081
4.2603196 1.7665255
>>>> 89 90 91 92 93 94
95 96
>>>> 2.5511084 2.3867529 4.0237262 2.1647555 3.3101943 2.5863558
1.8618118 0.2397050
>>>>
>>>>
>>>>
>>>>
>>>> Array Weights of normalized data:
>>>>
>>>> 1 2 3 4 5 6
7 8
>>>> 0.8908085 1.1978748 1.1454450 1.0215452 1.3605699 0.8141813
1.4172765 1.2958270
>>>> 9 10 11 12 13 14
15 16
>>>> 1.0522608 1.2562065 1.3531495 1.1489293 1.0596229 1.1217643
0.9156484 0.6705476
>>>> 17 18 19 20 21 22
23 24
>>>> 0.9643412 1.2120423 1.0672521 0.9983735 0.8096782 0.9379575
1.0493924 0.7614746
>>>> 25 26 27 28 29 30
31 32
>>>> 1.1573255 1.2735353 1.3795986 0.9499952 1.3602425 1.2549726
1.1772558 1.4158351
>>>> 33 34 35 36 37 38
39 40
>>>> 1.3659316 1.0658774 1.3185647 0.9017821 0.7915704 0.6326567
1.0325512 0.6812818
>>>> 41 42 43 44 45 46
47 48
>>>> 1.0142990 1.1921695 1.1476346 0.8255798 1.2012711 1.1893762
0.9367947 1.1594407
>>>> 49 50 51 52 53 54
55 56
>>>> 1.1072266 1.0561352 0.8488650 1.1756689 1.0554286 0.9326934
1.2836555 0.9665945
>>>> 57 58 59 60 61 62
63 64
>>>> 1.3293081 1.3027377 1.3459009 1.2834152 0.9657114 1.0041629
0.8823282 0.6843356
>>>> 65 66 67 68 69 70
71 72
>>>> 0.8883121 0.9374396 0.9799943 0.9733364 1.2045700 1.0693506
0.7730626 0.9408430
>>>> 73 74 75 76 77 78
79 80
>>>> 0.6685933 1.0492289 0.9952222 0.9690452 1.0150076 1.0148188
0.6687192 0.5641711
>>>> 81 82 83 84 85 86
87 88
>>>> 0.7872238 0.8793660 0.8553373 0.9662603 0.6143880 1.0340769
0.9842437 0.6626941
>>>> 89 90 91 92 93 94
95 96
>>>> 1.0270788 0.8590296 1.0807271 0.8162435 1.0398548 0.8993595
1.2094240 0.5715857
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor@stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:http://news.gmane.org/gmane.science.biology.inf
ormatics.conductor
>>
[[alternative HTML version deleted]]