Fitting arrayweights in linear model

0

Entering edit mode

Md.Mamunur Rashid ▴ 260

@mdmamunur-rashid-3595

Last seen 11.2 years ago

Dear Wei, Thanks for your reply. I know it's a long email, but I tried to give details about the raw data Please have a look when you have some spare time. 1. About the raw data : ----------------------- Yes this arrays are from the same bead chip. The experiments were done by three individuals (each person conducted experiments with 3 beadchips each consisting 12 arrays, 96 intotal). I think this is the reason for high variance among 3 sets of arrays. 1-32, 33-64, 65-96. But after normalization ("quantile"-lumi package) the data looks quite normal. I have attached two snapshot of boxplot of the raw data and normalized data. A. Normalized data -http://www.esnips.com/doc/ad4c8a55-2166-49 11-8510-df81eaea63df/data_96_illumina_norm B. Raw data - http://www.esnips.com/doc/0a965715-b5ec-44da- b64c-4d025c63827b/96_illumina_raw If you look at the two snapshots above : Expression range for raw data in most cases are below 6. But after normalization it came slightly above 6. Is it possible at all or something is going wrong during the normalization. One question here. Does illumina data require intra-array normalization.?? To the extent of my little study I know that they do not need it as it is already done. Please let me know what do you think. 2. About Arrayweight -------------------- As you can see below the the results have not been improved much compare to the old topTable result. 1. Do you think this high variance in arrays are causing this problem.??? 2. Does fitting the arrayweight in the linear model along the normalized object drops the downweighted arrays or that needed to be done manually?? 3. Pre-processing with beadarray -------------------------------- I have the done the same analysis using beadArray package(data import and pre processing) and had the similar result. So there can be two possibilities which can lead to this problem. A. Problems in the raw data. B. Or I am doing a wrong linear modeling of the data.(as that is the common part in the both analysis) I have attached my code fragment here. please have a look: *** What are the other bio packages/tests I can use for identifying differentially expressed genes>? Thanks in advance best regards, Md.Mamunur Rashid Code Fragment ----------------------------------- About fitting the weights : I have done the following data_96_nuId_E holds the normalized object data_96_sampleType<- c("I","I","I","I","I","S","S","S","I","C","S","S" ,"C","I","C","I","I","I","I","S","I","S","I","S","C","C","S","I","C"," S","I","S","S","C","C","S","S","I","I","I","S","I","S","I","I","C","I" ,"C","S","C","I","I","S","C","S","S","S","C","C","I","S","S","C","I"," I","C","C","S","S","S","C","C","S","I","C","C","C","S","C","C","I","I" ,"C","S","C","C","C","S","C","S","C","I","I","S","I","C") data_96_design<- model.matrix(~0+data_96_sampleType) colnames(data_96_design)<- c('C','I','S') ## filtring out the non-expressed genes presentCount<- detectionCall(data_96_raw_nuID) data_96_Matrix<- data_96_Matrix_old[presentCount> 0, ] data_96_probeList<- rownames(data_96_Matrix) arraw_E_96<- arrayWeights(data_96_nuID_E,data_96_design) data_96_contrast<- makeContrasts (C-I,I-S,S-C,levels=data_96_design) fitw_nu_96<- lmFit(data_96_Matrix,data_96_design,weights=arraw_E_96) fitw_nu_con_96<- contrasts.fit(fitw_nu_96,data_96_contrast) fitw_e_nu_96<- eBayes(fitw_nu_con_96) topTable(fitw_e_nu_96,coef=1,adjust="BH") ID logFC AveExpr t P.Value adj.P.Val 3564 oEptU7gl5ULB7ghAf4 -0.11681240 6.341389 -4.749578 7.204301e-06 0.1484950 3900 cUSJ2.kBKqPAk0eZPY 0.31230030 6.881320 4.561405 1.514923e-05 0.1561280 10705 uHk1A0FNxKMV0k7a50 -0.13640964 6.616042 -4.197525 6.070562e-05 0.2895144 14598 WokXUp_QIC4pd_s1.U -0.09998362 6.342947 -4.111694 8.338387e-05 0.2895144 4324 xXLgIpJ_AeI7gRNJ_U 0.11230531 6.478843 4.072272 9.634276e-05 0.2895144 6399 HHteoiUdSJz9wh.e64 0.16605833 7.322017 4.034305 1.106350e-04 0.2895144 13446 x65aXRiHVTIIUQiv00 -0.23601528 6.487643 -3.986742 1.314198e-04 0.2895144 13445 HrlpdGIdVMghRCK.TU -0.20278431 6.470915 -3.986291 1.316337e-04 0.2895144 16912 BgFUhUUgIQu3JDl_sU 0.11902363 6.815447 3.978352 1.354541e-04 0.2895144 9813 NAv6twbtRAh7QuCDro 0.09999394 6.511654 3.958318 1.455733e-04 0.2895144 B 3564 3.5478599 3900 2.8790555 10705 1.6341835 14598 1.3504633 4324 1.2214803 6399 1.0980586 13446 0.9445695 13445 0.9431198 16912 0.9176269 9813 0.8534456 About fitting the arrayweights in the linear model : previously I have produced arrayweights On 09/21/2009 12:59 AM, Wei Shi wrote: > Dear Mamun: > > Sorry for replying to you late. I am just back from the holiday. > > Some of your arrays have pretty low weights. Arrays 37-42 have > weights of around 0.2. Are these arrays from the same bead chip? You > can try put array weights in the linear model fitting to see if you > can get DE genes. > > Cheers, > Wei > > Md.Mamunur Rashid wrote: >> Dear Wei, >> >> Thanks again for your reply. I have measured the array-weights using limma's arrayweights() method. There seem to be >> a good amount of variation among the raw data. But I am not really sure whether it is acceptable or not??. >> >> But the normalized data have an average value of 1.022609 >> with min = 0.5641711 max =1.417277. >> >> Also 2 more question : >> >> 1. can you explain a bit more about how can I filter out the genes whose detection p-values >> are larger than cut-off (e.g. 0.01) in all samples?? >> 2. how can I check batch effect and chip effect in the raw data.?? >> >> Thanks in advance. Any help from anyone also is appreciated. >> >> regards >> Mamun >> >> >> Below is the result from arrayWeigts() method.(Table 1 (weights for raw data) Table 2(weights of normalized data)) >> *** Also attached a text file with the weights. >> >> ArrayWeight of Raw Data: >> >> arrayWeights(data_96_raw,data_96_design) >> >> >> 1 2 3 4 5 6 7 8 >> 3.2861102 0.6091478 2.6927503 2.2778201 2.4391841 0.7802437 1.1378995 0.5714981 >> 9 10 11 12 13 14 15 16 >> 0.8091165 0.5260511 0.7565424 0.8354660 0.2929977 0.4950789 0.4512856 0.1283279 >> 17 18 19 20 21 22 23 24 >> 0.6713553 0.6275333 0.5441217 0.2711778 0.7278274 0.3323277 0.5786381 0.7375983 >> 25 26 27 28 29 30 31 32 >> 0.5793944 0.4018671 0.6658730 1.6805341 0.7210032 1.2802383 0.8902326 0.6715769 >> 33 34 35 36 37 38 39 40 >> 0.2959048 0.8854763 0.7587111 1.5067705 0.1766820 0.1639351 0.1739882 0.1833592 >> 41 42 43 44 45 46 47 48 >> 0.2443871 0.2188281 0.2945594 0.2607380 0.3419603 0.5932775 0.3396003 0.7229979 >> 49 50 51 52 53 54 55 56 >> 0.8274153 1.7944800 0.8306203 1.0738534 1.5052515 2.9798751 2.0836544 1.5010941 >> 57 58 59 60 61 62 63 64 >> 2.3827591 2.7575819 1.5967670 3.3397084 2.0576009 2.9660262 3.8417611 1.3469622 >> 65 66 67 68 69 70 71 72 >> 1.7334743 2.7855044 4.5417142 3.2551286 3.9634064 3.8054377 3.8020479 4.2732205 >> 73 74 75 76 77 78 79 80 >> 0.2659556 0.3440704 0.6404601 0.8697107 1.0736648 0.8835502 0.9912724 0.4334270 >> 81 82 83 84 85 86 87 88 >> 0.8801461 1.0309162 2.5352693 1.8366521 2.5501944 4.5388081 4.2603196 1.7665255 >> 89 90 91 92 93 94 95 96 >> 2.5511084 2.3867529 4.0237262 2.1647555 3.3101943 2.5863558 1.8618118 0.2397050 >> >> >> >> >> Array Weights of normalized data: >> >> 1 2 3 4 5 6 7 8 >> 0.8908085 1.1978748 1.1454450 1.0215452 1.3605699 0.8141813 1.4172765 1.2958270 >> 9 10 11 12 13 14 15 16 >> 1.0522608 1.2562065 1.3531495 1.1489293 1.0596229 1.1217643 0.9156484 0.6705476 >> 17 18 19 20 21 22 23 24 >> 0.9643412 1.2120423 1.0672521 0.9983735 0.8096782 0.9379575 1.0493924 0.7614746 >> 25 26 27 28 29 30 31 32 >> 1.1573255 1.2735353 1.3795986 0.9499952 1.3602425 1.2549726 1.1772558 1.4158351 >> 33 34 35 36 37 38 39 40 >> 1.3659316 1.0658774 1.3185647 0.9017821 0.7915704 0.6326567 1.0325512 0.6812818 >> 41 42 43 44 45 46 47 48 >> 1.0142990 1.1921695 1.1476346 0.8255798 1.2012711 1.1893762 0.9367947 1.1594407 >> 49 50 51 52 53 54 55 56 >> 1.1072266 1.0561352 0.8488650 1.1756689 1.0554286 0.9326934 1.2836555 0.9665945 >> 57 58 59 60 61 62 63 64 >> 1.3293081 1.3027377 1.3459009 1.2834152 0.9657114 1.0041629 0.8823282 0.6843356 >> 65 66 67 68 69 70 71 72 >> 0.8883121 0.9374396 0.9799943 0.9733364 1.2045700 1.0693506 0.7730626 0.9408430 >> 73 74 75 76 77 78 79 80 >> 0.6685933 1.0492289 0.9952222 0.9690452 1.0150076 1.0148188 0.6687192 0.5641711 >> 81 82 83 84 85 86 87 88 >> 0.7872238 0.8793660 0.8553373 0.9662603 0.6143880 1.0340769 0.9842437 0.6626941 >> 89 90 91 92 93 94 95 96 >> 1.0270788 0.8590296 1.0807271 0.8162435 1.0398548 0.8993595 1.2094240 0.5715857 >> >> >> >> >> >> [[alternative HTML version deleted]]

Normalization beadarray Normalization beadarray • 1.1k views

ADD COMMENT • link updated 16.2 years ago by Wei Shi ★ 3.6k • written 16.2 years ago by Md.Mamunur Rashid ▴ 260

0

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 4 months ago

Australia/Melbourne

Dear Mamun: Sorry for replying to you late. Your boxplot of raw data shows that there is a batch effect in your data. You might need to include this batch effect in your linear model fitting. Arrays with low weights will not be dropped when you include array weights in your linear model fitting. The idea of giving weights to arrays is to keep all the arrays in the analysis. Hope this helps. Cheers, Wei Md.Mamunur Rashid wrote: > Dear Wei, > > Thanks for your reply. I know it's a long email, but I tried to give details about the raw data > Please have a look when you have some spare time. > > > 1. About the raw data : > ----------------------- > Yes this arrays are from the same bead chip. The experiments were done by three individuals (each > person conducted experiments with 3 beadchips each consisting 12 arrays, 96 intotal). I think this > is the reason for high variance among 3 sets of arrays. 1-32, 33-64, 65-96. But after normalization > ("quantile"-lumi package) the data looks quite normal. > > > I have attached two snapshot of boxplot of the raw data and normalized data. > > A. Normalized data -http://www.esnips.com/doc/ad4c8a55-2166-49 11-8510-df81eaea63df/data_96_illumina_norm > > B. Raw data - http://www.esnips.com/doc/0a965715-b5ec-44da- b64c-4d025c63827b/96_illumina_raw > > If you look at the two snapshots above : Expression range for raw data in most cases are below 6. But > after normalization it came slightly above 6. Is it possible at all or something is going wrong during > the normalization. > > > One question here. Does illumina data require intra-array normalization.?? To the extent of my little > study I know that they do not need it as it is already done. Please let me know what do you think. > > > 2. About Arrayweight > -------------------- > > As you can see below the the results have not been improved much compare to the old topTable result. > > 1. Do you think this high variance in arrays are causing this problem.??? > 2. Does fitting the arrayweight in the linear model along the normalized object drops the downweighted arrays > or that needed to be done manually?? > > > 3. Pre-processing with beadarray > -------------------------------- > > I have the done the same analysis using beadArray package(data import and pre processing) and had the similar > result. So there can be two possibilities which can lead to this problem. > > A. Problems in the raw data. > B. Or I am doing a wrong linear modeling of the data.(as that is the common part in the both analysis) > > I have attached my code fragment here. please have a look: > > > *** What are the other bio packages/tests I can use for identifying differentially expressed genes>? > > Thanks in advance > > best regards, > Md.Mamunur Rashid > > > Code Fragment > ----------------------------------- > > About fitting the weights : I have done the following > > data_96_nuId_E holds the normalized object > > > > data_96_sampleType<- c("I","I","I","I","I","S","S","S","I","C","S"," S","C","I","C","I","I","I","I","S","I","S","I","S","C","C","S","I","C" ,"S","I","S","S","C","C","S","S","I","I","I","S","I","S","I","I","C"," I","C","S","C","I","I","S","C","S","S","S","C","C","I","S","S","C","I" ,"I","C","C","S","S","S","C","C","S","I","C","C","C","S","C","C","I"," I","C","S","C","C","C","S","C","S","C","I","I","S","I","C") > data_96_design<- model.matrix(~0+data_96_sampleType) > colnames(data_96_design)<- c('C','I','S') > > ## filtring out the non-expressed genes > presentCount<- detectionCall(data_96_raw_nuID) > data_96_Matrix<- data_96_Matrix_old[presentCount> 0, ] > data_96_probeList<- rownames(data_96_Matrix) > > > arraw_E_96<- arrayWeights(data_96_nuID_E,data_96_design) > > data_96_contrast<- makeContrasts (C-I,I-S,S-C,levels=data_96_design) > > fitw_nu_96<- lmFit(data_96_Matrix,data_96_design,weights=arraw_E_96) > fitw_nu_con_96<- contrasts.fit(fitw_nu_96,data_96_contrast) > fitw_e_nu_96<- eBayes(fitw_nu_con_96) > topTable(fitw_e_nu_96,coef=1,adjust="BH") > > > ID logFC AveExpr t P.Value adj.P.Val > 3564 oEptU7gl5ULB7ghAf4 -0.11681240 6.341389 -4.749578 7.204301e-06 0.1484950 > 3900 cUSJ2.kBKqPAk0eZPY 0.31230030 6.881320 4.561405 1.514923e-05 0.1561280 > 10705 uHk1A0FNxKMV0k7a50 -0.13640964 6.616042 -4.197525 6.070562e-05 0.2895144 > 14598 WokXUp_QIC4pd_s1.U -0.09998362 6.342947 -4.111694 8.338387e-05 0.2895144 > 4324 xXLgIpJ_AeI7gRNJ_U 0.11230531 6.478843 4.072272 9.634276e-05 0.2895144 > 6399 HHteoiUdSJz9wh.e64 0.16605833 7.322017 4.034305 1.106350e-04 0.2895144 > 13446 x65aXRiHVTIIUQiv00 -0.23601528 6.487643 -3.986742 1.314198e-04 0.2895144 > 13445 HrlpdGIdVMghRCK.TU -0.20278431 6.470915 -3.986291 1.316337e-04 0.2895144 > 16912 BgFUhUUgIQu3JDl_sU 0.11902363 6.815447 3.978352 1.354541e-04 0.2895144 > 9813 NAv6twbtRAh7QuCDro 0.09999394 6.511654 3.958318 1.455733e-04 0.2895144 > B > 3564 3.5478599 > 3900 2.8790555 > 10705 1.6341835 > 14598 1.3504633 > 4324 1.2214803 > 6399 1.0980586 > 13446 0.9445695 > 13445 0.9431198 > 16912 0.9176269 > 9813 0.8534456 > > > > > > About fitting the arrayweights in the linear model : > previously I have produced arrayweights > > On 09/21/2009 12:59 AM, Wei Shi wrote: > >> Dear Mamun: >> >> Sorry for replying to you late. I am just back from the holiday. >> >> Some of your arrays have pretty low weights. Arrays 37-42 have >> weights of around 0.2. Are these arrays from the same bead chip? You >> can try put array weights in the linear model fitting to see if you >> can get DE genes. >> >> Cheers, >> Wei >> >> Md.Mamunur Rashid wrote: >> >>> Dear Wei, >>> >>> Thanks again for your reply. I have measured the array-weights using limma's arrayweights() method. There seem to be >>> a good amount of variation among the raw data. But I am not really sure whether it is acceptable or not??. >>> >>> But the normalized data have an average value of 1.022609 >>> with min = 0.5641711 max =1.417277. >>> >>> Also 2 more question : >>> >>> 1. can you explain a bit more about how can I filter out the genes whose detection p-values >>> are larger than cut-off (e.g. 0.01) in all samples?? >>> 2. how can I check batch effect and chip effect in the raw data.?? >>> >>> Thanks in advance. Any help from anyone also is appreciated. >>> >>> regards >>> Mamun >>> >>> >>> Below is the result from arrayWeigts() method.(Table 1 (weights for raw data) Table 2(weights of normalized data)) >>> *** Also attached a text file with the weights. >>> >>> ArrayWeight of Raw Data: >>> >>> arrayWeights(data_96_raw,data_96_design) >>> >>> >>> 1 2 3 4 5 6 7 8 >>> 3.2861102 0.6091478 2.6927503 2.2778201 2.4391841 0.7802437 1.1378995 0.5714981 >>> 9 10 11 12 13 14 15 16 >>> 0.8091165 0.5260511 0.7565424 0.8354660 0.2929977 0.4950789 0.4512856 0.1283279 >>> 17 18 19 20 21 22 23 24 >>> 0.6713553 0.6275333 0.5441217 0.2711778 0.7278274 0.3323277 0.5786381 0.7375983 >>> 25 26 27 28 29 30 31 32 >>> 0.5793944 0.4018671 0.6658730 1.6805341 0.7210032 1.2802383 0.8902326 0.6715769 >>> 33 34 35 36 37 38 39 40 >>> 0.2959048 0.8854763 0.7587111 1.5067705 0.1766820 0.1639351 0.1739882 0.1833592 >>> 41 42 43 44 45 46 47 48 >>> 0.2443871 0.2188281 0.2945594 0.2607380 0.3419603 0.5932775 0.3396003 0.7229979 >>> 49 50 51 52 53 54 55 56 >>> 0.8274153 1.7944800 0.8306203 1.0738534 1.5052515 2.9798751 2.0836544 1.5010941 >>> 57 58 59 60 61 62 63 64 >>> 2.3827591 2.7575819 1.5967670 3.3397084 2.0576009 2.9660262 3.8417611 1.3469622 >>> 65 66 67 68 69 70 71 72 >>> 1.7334743 2.7855044 4.5417142 3.2551286 3.9634064 3.8054377 3.8020479 4.2732205 >>> 73 74 75 76 77 78 79 80 >>> 0.2659556 0.3440704 0.6404601 0.8697107 1.0736648 0.8835502 0.9912724 0.4334270 >>> 81 82 83 84 85 86 87 88 >>> 0.8801461 1.0309162 2.5352693 1.8366521 2.5501944 4.5388081 4.2603196 1.7665255 >>> 89 90 91 92 93 94 95 96 >>> 2.5511084 2.3867529 4.0237262 2.1647555 3.3101943 2.5863558 1.8618118 0.2397050 >>> >>> >>> >>> >>> Array Weights of normalized data: >>> >>> 1 2 3 4 5 6 7 8 >>> 0.8908085 1.1978748 1.1454450 1.0215452 1.3605699 0.8141813 1.4172765 1.2958270 >>> 9 10 11 12 13 14 15 16 >>> 1.0522608 1.2562065 1.3531495 1.1489293 1.0596229 1.1217643 0.9156484 0.6705476 >>> 17 18 19 20 21 22 23 24 >>> 0.9643412 1.2120423 1.0672521 0.9983735 0.8096782 0.9379575 1.0493924 0.7614746 >>> 25 26 27 28 29 30 31 32 >>> 1.1573255 1.2735353 1.3795986 0.9499952 1.3602425 1.2549726 1.1772558 1.4158351 >>> 33 34 35 36 37 38 39 40 >>> 1.3659316 1.0658774 1.3185647 0.9017821 0.7915704 0.6326567 1.0325512 0.6812818 >>> 41 42 43 44 45 46 47 48 >>> 1.0142990 1.1921695 1.1476346 0.8255798 1.2012711 1.1893762 0.9367947 1.1594407 >>> 49 50 51 52 53 54 55 56 >>> 1.1072266 1.0561352 0.8488650 1.1756689 1.0554286 0.9326934 1.2836555 0.9665945 >>> 57 58 59 60 61 62 63 64 >>> 1.3293081 1.3027377 1.3459009 1.2834152 0.9657114 1.0041629 0.8823282 0.6843356 >>> 65 66 67 68 69 70 71 72 >>> 0.8883121 0.9374396 0.9799943 0.9733364 1.2045700 1.0693506 0.7730626 0.9408430 >>> 73 74 75 76 77 78 79 80 >>> 0.6685933 1.0492289 0.9952222 0.9690452 1.0150076 1.0148188 0.6687192 0.5641711 >>> 81 82 83 84 85 86 87 88 >>> 0.7872238 0.8793660 0.8553373 0.9662603 0.6143880 1.0340769 0.9842437 0.6626941 >>> 89 90 91 92 93 94 95 96 >>> 1.0270788 0.8590296 1.0807271 0.8162435 1.0398548 0.8993595 1.2094240 0.5715857 >>> >>> >>> >>> >>> >>> >>> > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 16.2 years ago Wei Shi ★ 3.6k

Login before adding your answer.