Adding Batch Factor in linear model...

0

Entering edit mode

Md.Mamunur Rashid ▴ 260

@mdmamunur-rashid-3595

Last seen 11.4 years ago

-------- Original Message -------- Subject: Re: [BioC] Fitting arrayweights in linear model Date: Tue, 3 Nov 2009 14:23:52 +0000 From: Md.Mamunur Rashid <mamunur.rashid@kcl.ac.uk> To: Wei Shi <shi@wehi.edu.au>, bioconductor <bioconductor@stat.math.ethz.ch> Dear Wei, Sorry for replying late. I was on vacation and went back to home. thanks for your suggestion. I have tried reading some previous mails from bioc forum to find a way to add batch factor in the linear model. But did not clearly understand how to do that. In my linear model I am comparing three conditions. So I have created a contrast matrix: fit1<- lmfit(normalized_matrix, design, array_weights) contrast<- make.contrast(group2-gropu1,group3-group2,group3-group1,levels= design) confit<- contrasts.fit(fit1 ,contrast) efit<- eBayes() I have tried the same R script with a new batch of 96 samples from the same 3 groups. These 96 sample does not really have that much variation among them . Yet again the top list of genes have a very high adjusted p.value. Q1. Does this mean that there is not not enough evidence of differential expression. ?? Q2. Is it a very good idea to analyze 96 samples in one go, or should I subset the file( ExpressionSet) to work with smaller set of data. Cause I have got this impression that for this large data set variations in one of the slides (illumina HT12 containing 12 arrays) do a lot of damage when the whole data set is normalized. ...????? (I have done this part to check if my algorithm is ok or not ..) I have tested the R script for contrasting among male and female group. and found the evidence of differential expression among the genes found in Y chromosome. This finding gave me a idea that my R script is ok (anyway this is very straight forward) ****Please any suggestion about this assumption. contrast<- make.contrast(male-female, levels= design) Can you please suggest me few more gene filtering methods. I will start with the gene filter package. Thanks in advance. regards, Mamun On 09/30/2009 05:26 AM, Wei Shi wrote: > Dear Mamun: > > Sorry for replying to you late. Your boxplot of raw data shows > that there is a batch effect in your data. You might need to include > this batch effect in your linear model fitting. > > Arrays with low weights will not be dropped when you include array > weights in your linear model fitting. The idea of giving weights to > arrays is to keep all the arrays in the analysis. > > Hope this helps. > > Cheers, > Wei > > > Md.Mamunur Rashid wrote: >> Dear Wei, >> >> Thanks for your reply. I know it's a long email, but I tried to give details about the raw data >> Please have a look when you have some spare time. >> >> >> 1. About the raw data : >> ----------------------- >> Yes this arrays are from the same bead chip. The experiments were done by three individuals (each >> person conducted experiments with 3 beadchips each consisting 12 arrays, 96 intotal). I think this >> is the reason for high variance among 3 sets of arrays. 1-32, 33-64, 65-96. But after normalization >> ("quantile"-lumi package) the data looks quite normal. >> >> >> I have attached two snapshot of boxplot of the raw data and normalized data. >> >> A. Normalized data -http://www.esnips.com/doc/ad4c8a55-2166-49 11-8510-df81eaea63df/data_96_illumina_norm >> >> B. Raw data -http://www.esnips.com/doc/0a965715-b5ec-44da- b64c-4d025c63827b/96_illumina_raw >> >> If you look at the two snapshots above : Expression range for raw data in most cases are below 6. But >> after normalization it came slightly above 6. Is it possible at all or something is going wrong during >> the normalization. >> >> >> One question here. Does illumina data require intra-array normalization.?? To the extent of my little >> study I know that they do not need it as it is already done. Please let me know what do you think. >> >> >> 2. About Arrayweight >> -------------------- >> >> As you can see below the the results have not been improved much compare to the old topTable result. >> >> 1. Do you think this high variance in arrays are causing this problem.??? >> 2. Does fitting the arrayweight in the linear model along the normalized object drops the downweighted arrays >> or that needed to be done manually?? >> >> >> 3. Pre-processing with beadarray >> -------------------------------- >> >> I have the done the same analysis using beadArray package(data import and pre processing) and had the similar >> result. So there can be two possibilities which can lead to this problem. >> >> A. Problems in the raw data. >> B. Or I am doing a wrong linear modeling of the data.(as that is the common part in the both analysis) >> >> I have attached my code fragment here. please have a look: >> >> >> *** What are the other bio packages/tests I can use for identifying differentially expressed genes>? >> >> Thanks in advance >> >> best regards, >> Md.Mamunur Rashid >> >> >> Code Fragment >> ----------------------------------- >> >> About fitting the weights : I have done the following >> >> data_96_nuId_E holds the normalized object >> >> >> >> data_96_sampleType<- c("I","I","I","I","I","S","S","S","I","C","S", "S","C","I","C","I","I","I","I","S","I","S","I","S","C","C","S","I","C ","S","I","S","S","C","C","S","S","I","I","I","S","I","S","I","I","C", "I","C","S","C","I","I","S","C","S","S","S","C","C","I","S","S","C","I ","I","C","C","S","S","S","C","C","S","I","C","C","C","S","C","C","I", "I","C","S","C","C","C","S","C","S","C","I","I","S","I","C") >> data_96_design<- model.matrix(~0+data_96_sampleType) >> colnames(data_96_design)<- c('C','I','S') >> >> ## filtring out the non-expressed genes >> presentCount<- detectionCall(data_96_raw_nuID) >> data_96_Matrix<- data_96_Matrix_old[presentCount> 0, ] >> data_96_probeList<- rownames(data_96_Matrix) >> >> >> arraw_E_96<- arrayWeights(data_96_nuID_E,data_96_design) >> >> data_96_contrast<- makeContrasts (C-I,I-S,S-C,levels=data_96_design) >> >> fitw_nu_96<- lmFit(data_96_Matrix,data_96_design,weights=arraw_E_96) >> fitw_nu_con_96<- contrasts.fit(fitw_nu_96,data_96_contrast) >> fitw_e_nu_96<- eBayes(fitw_nu_con_96) >> topTable(fitw_e_nu_96,coef=1,adjust="BH") >> >> >> ID logFC AveExpr t P.Value adj.P.Val >> 3564 oEptU7gl5ULB7ghAf4 -0.11681240 6.341389 -4.749578 7.204301e-06 0.1484950 >> 3900 cUSJ2.kBKqPAk0eZPY 0.31230030 6.881320 4.561405 1.514923e-05 0.1561280 >> 10705 uHk1A0FNxKMV0k7a50 -0.13640964 6.616042 -4.197525 6.070562e-05 0.2895144 >> 14598 WokXUp_QIC4pd_s1.U -0.09998362 6.342947 -4.111694 8.338387e-05 0.2895144 >> 4324 xXLgIpJ_AeI7gRNJ_U 0.11230531 6.478843 4.072272 9.634276e-05 0.2895144 >> 6399 HHteoiUdSJz9wh.e64 0.16605833 7.322017 4.034305 1.106350e-04 0.2895144 >> 13446 x65aXRiHVTIIUQiv00 -0.23601528 6.487643 -3.986742 1.314198e-04 0.2895144 >> 13445 HrlpdGIdVMghRCK.TU -0.20278431 6.470915 -3.986291 1.316337e-04 0.2895144 >> 16912 BgFUhUUgIQu3JDl_sU 0.11902363 6.815447 3.978352 1.354541e-04 0.2895144 >> 9813 NAv6twbtRAh7QuCDro 0.09999394 6.511654 3.958318 1.455733e-04 0.2895144 >> B >> 3564 3.5478599 >> 3900 2.8790555 >> 10705 1.6341835 >> 14598 1.3504633 >> 4324 1.2214803 >> 6399 1.0980586 >> 13446 0.9445695 >> 13445 0.9431198 >> 16912 0.9176269 >> 9813 0.8534456 >> >> >> >> >> >> About fitting the arrayweights in the linear model : >> previously I have produced arrayweights >> >> On 09/21/2009 12:59 AM, Wei Shi wrote: >> >>> Dear Mamun: >>> >>> Sorry for replying to you late. I am just back from the holiday. >>> >>> Some of your arrays have pretty low weights. Arrays 37-42 have >>> weights of around 0.2. Are these arrays from the same bead chip? You >>> can try put array weights in the linear model fitting to see if you >>> can get DE genes. >>> >>> Cheers, >>> Wei >>> >>> Md.Mamunur Rashid wrote: >>> >>>> Dear Wei, >>>> >>>> Thanks again for your reply. I have measured the array-weights using limma's arrayweights() method. There seem to be >>>> a good amount of variation among the raw data. But I am not really sure whether it is acceptable or not??. >>>> >>>> But the normalized data have an average value of 1.022609 >>>> with min = 0.5641711 max =1.417277. >>>> >>>> Also 2 more question : >>>> >>>> 1. can you explain a bit more about how can I filter out the genes whose detection p-values >>>> are larger than cut-off (e.g. 0.01) in all samples?? >>>> 2. how can I check batch effect and chip effect in the raw data.?? >>>> >>>> Thanks in advance. Any help from anyone also is appreciated. >>>> >>>> regards >>>> Mamun >>>> >>>> >>>> Below is the result from arrayWeigts() method.(Table 1 (weights for raw data) Table 2(weights of normalized data)) >>>> *** Also attached a text file with the weights. >>>> >>>> ArrayWeight of Raw Data: >>>> >>>> arrayWeights(data_96_raw,data_96_design) >>>> >>>> >>>> 1 2 3 4 5 6 7 8 >>>> 3.2861102 0.6091478 2.6927503 2.2778201 2.4391841 0.7802437 1.1378995 0.5714981 >>>> 9 10 11 12 13 14 15 16 >>>> 0.8091165 0.5260511 0.7565424 0.8354660 0.2929977 0.4950789 0.4512856 0.1283279 >>>> 17 18 19 20 21 22 23 24 >>>> 0.6713553 0.6275333 0.5441217 0.2711778 0.7278274 0.3323277 0.5786381 0.7375983 >>>> 25 26 27 28 29 30 31 32 >>>> 0.5793944 0.4018671 0.6658730 1.6805341 0.7210032 1.2802383 0.8902326 0.6715769 >>>> 33 34 35 36 37 38 39 40 >>>> 0.2959048 0.8854763 0.7587111 1.5067705 0.1766820 0.1639351 0.1739882 0.1833592 >>>> 41 42 43 44 45 46 47 48 >>>> 0.2443871 0.2188281 0.2945594 0.2607380 0.3419603 0.5932775 0.3396003 0.7229979 >>>> 49 50 51 52 53 54 55 56 >>>> 0.8274153 1.7944800 0.8306203 1.0738534 1.5052515 2.9798751 2.0836544 1.5010941 >>>> 57 58 59 60 61 62 63 64 >>>> 2.3827591 2.7575819 1.5967670 3.3397084 2.0576009 2.9660262 3.8417611 1.3469622 >>>> 65 66 67 68 69 70 71 72 >>>> 1.7334743 2.7855044 4.5417142 3.2551286 3.9634064 3.8054377 3.8020479 4.2732205 >>>> 73 74 75 76 77 78 79 80 >>>> 0.2659556 0.3440704 0.6404601 0.8697107 1.0736648 0.8835502 0.9912724 0.4334270 >>>> 81 82 83 84 85 86 87 88 >>>> 0.8801461 1.0309162 2.5352693 1.8366521 2.5501944 4.5388081 4.2603196 1.7665255 >>>> 89 90 91 92 93 94 95 96 >>>> 2.5511084 2.3867529 4.0237262 2.1647555 3.3101943 2.5863558 1.8618118 0.2397050 >>>> >>>> >>>> >>>> >>>> Array Weights of normalized data: >>>> >>>> 1 2 3 4 5 6 7 8 >>>> 0.8908085 1.1978748 1.1454450 1.0215452 1.3605699 0.8141813 1.4172765 1.2958270 >>>> 9 10 11 12 13 14 15 16 >>>> 1.0522608 1.2562065 1.3531495 1.1489293 1.0596229 1.1217643 0.9156484 0.6705476 >>>> 17 18 19 20 21 22 23 24 >>>> 0.9643412 1.2120423 1.0672521 0.9983735 0.8096782 0.9379575 1.0493924 0.7614746 >>>> 25 26 27 28 29 30 31 32 >>>> 1.1573255 1.2735353 1.3795986 0.9499952 1.3602425 1.2549726 1.1772558 1.4158351 >>>> 33 34 35 36 37 38 39 40 >>>> 1.3659316 1.0658774 1.3185647 0.9017821 0.7915704 0.6326567 1.0325512 0.6812818 >>>> 41 42 43 44 45 46 47 48 >>>> 1.0142990 1.1921695 1.1476346 0.8255798 1.2012711 1.1893762 0.9367947 1.1594407 >>>> 49 50 51 52 53 54 55 56 >>>> 1.1072266 1.0561352 0.8488650 1.1756689 1.0554286 0.9326934 1.2836555 0.9665945 >>>> 57 58 59 60 61 62 63 64 >>>> 1.3293081 1.3027377 1.3459009 1.2834152 0.9657114 1.0041629 0.8823282 0.6843356 >>>> 65 66 67 68 69 70 71 72 >>>> 0.8883121 0.9374396 0.9799943 0.9733364 1.2045700 1.0693506 0.7730626 0.9408430 >>>> 73 74 75 76 77 78 79 80 >>>> 0.6685933 1.0492289 0.9952222 0.9690452 1.0150076 1.0148188 0.6687192 0.5641711 >>>> 81 82 83 84 85 86 87 88 >>>> 0.7872238 0.8793660 0.8553373 0.9662603 0.6143880 1.0340769 0.9842437 0.6626941 >>>> 89 90 91 92 93 94 95 96 >>>> 1.0270788 0.8590296 1.0807271 0.8162435 1.0398548 0.8993595 1.2094240 0.5715857 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives:http://news.gmane.org/gmane.science.biology.inf ormatics.conductor >> [[alternative HTML version deleted]]

Normalization GO Normalization GO • 1.1k views

ADD COMMENT • link 16.3 years ago Md.Mamunur Rashid ▴ 260

Login before adding your answer.