Question: how to import Agilent CGH custom design xml file into limma
0
gravatar for Sean Davis
8.8 years ago by
Sean Davis21k
United States
Sean Davis21k wrote:
On Thu, Oct 21, 2010 at 3:15 AM, Jarek Bryk <bryk@evolbio.mpg.de> wrote: > Hi Sean, > > and thanks a lot for the answer. I still need some clarification though. > > On 20 Oct 2010, at 18:31, Sean Davis wrote: > > You definitely DO NOT want to be loess-normalizing CGH data. There is an > expected positive correlation between average intensity and log- ratio; > loess-normalization will effectively eliminate that signal. > > Oops. Thanks. I got that idea from limma user guide (p. 24) - I assumed two > color normalization should be similar regardless of whether one measures > gene expression or CGH - is this where the trick is? How would you recommend > normalizing the data? Just to add some information, my custom design is > concentrated on some 1000 genic regions (genes+10kb up and downstream) > selected based on our previous experiment. These regions have very high > probe density (essentially like a tiling array - my average genomic probe > density is 170bp) Some of these regions may be copy-number-variable, but > some (majority probably) of them should not be. I have a standard set of > Agilent's 11k normalizing probes plus 5k replicated probes on the arrays as > well, but most of the genome is not covered with probes at all. > > You will need to look at your data to determine what normalization will work. In general and for future reference, when making custom designs like this, we always use a backbone of 20-30k genomic probes so that we can see a baseline. The main two issues to be concerned with are "centering" the data and removal of "wave" artifacts. There is a fair amount of literature on the latter and a bit on the former, but median-centering the data will likely not be too far off as a start. The latter may be difficult to apply to your data given the gene-centric nature of your design. > > This information is stored in the MA$Genes$SystematicName above. You > will need to split out the pieces (chromosome, position) and then use those > for the locations. You will also probably need to remove control probes. > > OK, I got the MA$genes: > > > head(MA$genes) > Row Col Start Sequence > ProbeUID ControlType > 1 1 1 0 > 0 1 > 2 1 2 0 > 1 1 > 3 1 3 0 > 1 1 > 4 1 4 0 > CCCATTCACTAATCTACATATTATCTCCATCCAACAAAAATTTCTTTCAGTAAGGTGTGG 3 > 0 > 5 1 5 0 > AAACAATCAGGTTTTCACTCTGACAGCCCAATAATGCATTTAGTTTGAAATTACACTTGG 5 > 0 > 6 1 6 0 > CTATTCTGTATCATATAGGGAGGATGCTGTCCTGGAATATCTCAAGATTGCTCAAGAC 7 > 0 > ProbeName GeneName SystematicName > 1 MmCGHBrightCorner MmCGHBrightCorner MmCGHBrightCorner > 2 DarkCorner DarkCorner DarkCorner > 3 DarkCorner DarkCorner DarkCorner > 4 A_67_P21917383 Msn chrX:93323379-93323438 > 5 A_67_P17120964 chr11:18773745-18773804 chr11:18773745-18773804 > 6 A_67_P21917619 Msn chrX:93350785-93350842 > > If I understand you correctly, I'd have to divide the content of > MA$genes$SystematicName so that it becomes something like: > > >test<-MA$genes[10:20,] # to avoid control probes in this example > >head(test) > Row Col Start > Sequence ProbeUID ControlType > 10 1 10 0 > GTTTTAAACCATCTCTGAGGTTAATGGATGGCTTTCTGGGCCAGCAG 15 0 > 11 1 11 0 > ACACAAAAGCAATTCTGGCTAGTCAAGACTTGAAAGAAACATATAAACGCACACACATTG 17 > 0 > 12 1 12 0 > GCTTCTGTGCACTCTTATTGGAAACAAGTTCTTCAGAAGAACAAATGCAGGATAAAACTT 19 > 0 > 13 1 13 0 > AAACATGAATACAGGACGGGGATGACAAATCCACGAGAAACAAAGTTAATTTCACTTC 21 > 0 > 14 1 14 0 > AACAACTATCGAGGTCTAATGAAAAGAGCATGCATTTGTCAAGGAACAGCAGGAAA 23 > 0 > 15 1 15 0 > CTCCCTCATTAGCCCATGTCGTGGAGGGTTGGCTGTGAGAAAAGT 25 0 > ProbeName GeneName SystematicName > 10 A_67_P11007835 Gm13547 chr2:29617549-29617595 > 11 A_67_P15475991 chr8:49310974-49311033 chr8:49310974-49311033 > 12 A_67_P08003555 Arl15 chr13:114808546-114808605 > 13 A_67_P01452405 Grin2b chr6:136111490-136111547 > 14 A_67_P06614048 AK042807 chr8:102166914-102166969 > 15 A_67_P21149931 Tshz1 chr18:84192155-84192199 > > > >test$Chr<-sapply(test$SystematicName,function(x){strsplit(x,":")[[1 ]][[1]]} > >test$Position<-sapply(test$SystematicName,function(x){strsplit(x,": ")[[1]][[2]]} > # there's probably an easier way of doing this... > >head(test) # but it gets the job done > Row Col Start > Sequence ProbeUID ControlType > 10 1 10 0 > GTTTTAAACCATCTCTGAGGTTAATGGATGGCTTTCTGGGCCAGCAG 15 0 > 11 1 11 0 > ACACAAAAGCAATTCTGGCTAGTCAAGACTTGAAAGAAACATATAAACGCACACACATTG 17 > 0 > 12 1 12 0 > GCTTCTGTGCACTCTTATTGGAAACAAGTTCTTCAGAAGAACAAATGCAGGATAAAACTT 19 > 0 > 13 1 13 0 > AAACATGAATACAGGACGGGGATGACAAATCCACGAGAAACAAAGTTAATTTCACTTC 21 > 0 > 14 1 14 0 > AACAACTATCGAGGTCTAATGAAAAGAGCATGCATTTGTCAAGGAACAGCAGGAAA 23 > 0 > 15 1 15 0 > CTCCCTCATTAGCCCATGTCGTGGAGGGTTGGCTGTGAGAAAAGT 25 0 > ProbeName GeneName SystematicName Chr > Position > 10 A_67_P11007835 Gm13547 chr2:29617549-29617595 chr2 > 29617549-29617595 > 11 A_67_P15475991 chr8:49310974-49311033 chr8:49310974-49311033 chr8 > 49310974-49311033 > 12 A_67_P08003555 Arl15 chr13:114808546-114808605 chr13 > 114808546-114808605 > 13 A_67_P01452405 Grin2b chr6:136111490-136111547 chr6 > 136111490-136111547 > 14 A_67_P06614048 AK042807 chr8:102166914-102166969 chr8 > 102166914-102166969 > 15 A_67_P21149931 Tshz1 chr18:84192155-84192199 chr18 > 84192155-84192199 > > That is most of the way there. The Position column needs to be numeric, so you'll want to do another split to get that down to a single number, I think. Sean > > Hope that helps. > > Thanks a lot! > cheers > jarek > > > > sessionInfo() > > R version 2.12.0 (2010-10-15) > > Platform: i386-apple-darwin9.8.0/i386 (32-bit) > > > > locale: > > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > > > attached base packages: > > [1] splines stats graphics grDevices utils datasets methods > base > > > > other attached packages: > > [1] snapCGH_1.20.0 DNAcopy_1.24.0 limma_3.6.0 aCGH_1.28.0 > multtest_2.6.0 Biobase_2.10.0 > > [7] survival_2.35-8 cluster_1.13.1 GLAD_2.12.0 > > > > loaded via a namespace (and not attached): > > [1] affy_1.28.0 affyio_1.18.0 annotate_1.28.0 > AnnotationDbi_1.12.0 > > [5] DBI_0.2-5 genefilter_1.32.0 grid_2.12.0 > lattice_0.19-13 > > [9] MASS_7.3-8 preprocessCore_1.12.0 RColorBrewer_1.0-2 > RSQLite_0.9-2 > > [13] strucchange_1.4-2 tilingArray_1.28.0 tools_2.12.0 > vsn_3.18.0 > > [17] xtable_1.5-6 > > -- > Jarek Bryk | www.evolbio.mpg.de/~bryk > Max Planck Institute for Evolutionary Biology > August Thienemann Str. 2 | 24306 Plön, Germany > tel. +49 4522 763 287 | bryk@evolbio.mpg.de > > > [[alternative HTML version deleted]]
normalization cgh probe limma • 466 views
ADD COMMENTlink written 8.8 years ago by Sean Davis21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 558 users visited in the last hour