Nested Design (Again) & Subset WithinArray Correlation

0

Entering edit mode

Y. Osee Sanogo ▴ 80

@y-osee-sanogo-4183

Last seen 9.6 years ago

Hello, I have two questions which may be really trivial...but since I am stuck, I'll appreciate any help. Question 1: Nested design: This has been addressed before, but I am just not sure whether I am doing it right. The experiment consisted of two groups of fishes (treated and not treated) with three tanks in each group. Each tank hosted three fishes (total =18) of those fishes n=10 (5 per treatment group) were selected for microarray (Notice unequal number of fishes per tank!). I am interested in 1) Treatment effect (individual fishes) 2) Treatment effect (fishes nested within tanks, i.e. Need to average the gene expression of fishes within each tank ) 3) Whether there is tank effect #ExpressionSet =ES_Filt #targets= see below: Sample Key tank Fish SAMPLE_LABEL 25407102_532.xys CON 1 CON_3 SOM01K28 25407202_532.xys CON 1 CON_2 SOM01K29 25414902_532.xys EXP 2 EXP_1 SOM01K2D 25407302_532.xys CON 3 CON_1 SOM01K2C 25406602_532.xys EXP 4 EXP_2 SOM01K25 25407002_532.xys EXP 4 EXP_3 SOM01K27 25415502_532.xys EXP 4 EXP_4 SOM01K2E 25405602_532.xys CON 5 CON_4 SOM01K23 25406702_532.xys CON 5 CON_5 SOM01K26 25415702_532.xys EXP 6 EXP_5 SOM01K24 I have tried the following design based upon what I found online, but was not really sure whether this is the right way of doing it. design.nested_ES<- model.matrix(~Key + (tank/Fish), data=targets) colnames(design.nested_ES) #I am getting many contrasts, and I am not sure which one represents ?tank/Fish? fit.nested_ES <- lmFit(ES_Filt, design.nested_ES) Fit.nested_ES <- eBayes(fit.nested_ES) Pred2_Nested_ES<-topTable(Fit.nested_ES, coef=2, adjust="BH", n=Inf) Pred2_Nested_ES[1:10,] I will really appreciate your help. Question 2: Testing Subset of within array replicates with different gene names. I have a subset of "overlapping" gene list [as below] and I would like to see how they correlate to assess the hybridization efficiency on the chip. The sequences and the probes are not identical, but overlap significantly. From reading the postings, I know I can't use duplicaleCorrelation, because the probes are randomly scattered on the array and I was not sure about how to use "avedups" in a subset of genes with different names. GENSCAN_ID Matched transcript ID GENSCAN00000010293 ENSGACT00000002218 GENSCAN00000003508 ENSGACT00000001310 GENSCAN00000021873 ENSGACT00000000225 GENSCAN00000007931 ENSGACT00000000496 GENSCAN00000022171 ENSGACT00000002296 GENSCAN00000026278 ENSGACT00000000071 GENSCAN00000000631 ENSGACT00000002139 GENSCAN00000008636 ENSGACT00000002427 GENSCAN00000008635 ENSGACT00000002432 GENSCAN00000022111 ENSGACT00000007564 Thank you so much and my apologies if this has been addressed before (You can point me to the discussion). Cheers, Osee

Microarray Microarray • 938 views

ADD COMMENT • link updated 13.7 years ago by Gordon Smyth 50k • written 13.7 years ago by Y. Osee Sanogo ▴ 80

0

Entering edit mode

Y. Osee Sanogo ▴ 80

@y-osee-sanogo-4183

Last seen 9.6 years ago

Hello, I have two questions which may be really trivial...but since I am stuck, I?ll appreciate any help. Question 1: Nested design: This has been addressed before, but I am just not sure whether I am doing it right. The experiment consisted of two groups of fishes (treated and not treated) with three tanks in each group. Each tank hosted three fishes (total =18) of those fishes n=10 (5 per treatment group) were selected for microarray (Notice unequal number of fishes per tank!). I am interested in 1) Treatment effect (individual fishes) 2) Treatment effect (fishes nested within tanks, i.e. Need to average the gene expression of fishes within each tank ) 3) Whether there is tank effect I have tried the following design based upon what I found online, but was not really sure whether this is the right way of doing it. #ExpressionSet =ES_Filt #targets= ?phenoData3.txt?: ATTACHED design.nested_ES<- model.matrix(~Key + (tank/Fish), data=targets) colnames(design.nested_ES) #I am getting many contrasts, and I am not sure which one represents ?tank/Fish? fit.nested_ES <- lmFit(ES_Filt, design.nested_ES) Fit.nested_ES <- eBayes(fit.nested_ES) Pred2_Nested_ES<-topTable(Fit.nested_ES, coef=2, adjust="BH", n=Inf) Pred2_Nested_ES[1:10,] I will really appreciate your help. Question 2: Testing Subset of Within array ?replicates? with different gene names. I have a subset of ?overlapping? gene list [ATTACHED] = ?Unique_gene2transcripts.xls? and I would like to see how they correlate to assess the hybridization efficiency on the chip. The sequences and the probes are not identical, but overlap significantly. From reading the postings, I know I can?t use ?duplicaleCorrelation, because the probes are randomly scattered on the array and I was not sure about how to use ?avedups? in a subset of genes with different names. Thank you so much and my apologies if this has been solved before (You can point me to the discussion). Cheers, Osee Thank you. Osee -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: phenoData3.txt URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20100727="" b76d8c86="" attachment.txt="">

ADD COMMENT • link 13.7 years ago Y. Osee Sanogo ▴ 80

0

Entering edit mode

Jenny Drnevich ★ 2.0k

@jenny-drnevich-2812

Last seen 15 days ago

United States

Hi Everyone, I've been helping Osee with the second question he posted today. I'll explain it a bit further, as I'd like some help on how to interpret his results. He has an array where some of the probes (ENSGACT) were designed from known transcript sequences and other probes (GENSCAN) were designed from predicted sequences from a sequencing project. Further annotation of the predicted sequences has revealed that many of them actually overlap with the known transcripts sequences. He would like to estimate how correlated the expression values from the GENSCAN probes are to their matching ENSGACT probes. I thought this could be done by treating the probe pairs as technical replicates and running duplicateCorrelation() on them. On an array with true technical replication of probes, you'd hope the consensus correlation would be strongly positive, close to 1. Well, the consensus correlation for the GENSCAN:ENSGACT pairs is strongly _negative_ : between -0.8 and -0.92 depending on the subset of pairs we use. I can't quite figure out what the strong negative correlation means - it's probably something simple that I'm overlooking. We have no idea right now how much overlap there may be between ENSGACT probe oligo sequences and their corresponding GENSCAN probe oligo sequences. Anyone have an explanation for the strong negative correlation? Thanks, Jenny >Question 2: Testing Subset of within array replicates with different gene >names. I have a subset of "overlapping" gene list [as below] and I >would like >to see how they correlate to >assess the hybridization efficiency on the chip. The sequences and the >probes are not identical, but overlap significantly. From reading the >postings, I know I can't use duplicaleCorrelation, because the probes are >randomly scattered on the array and I was not sure about how to use >"avedups" in a subset of genes with different names. > >GENSCAN_ID Matched transcript ID >GENSCAN00000010293 ENSGACT00000002218 >GENSCAN00000003508 ENSGACT00000001310 >GENSCAN00000021873 ENSGACT00000000225 >GENSCAN00000007931 ENSGACT00000000496 >GENSCAN00000022171 ENSGACT00000002296 >GENSCAN00000026278 ENSGACT00000000071 >GENSCAN00000000631 ENSGACT00000002139 >GENSCAN00000008636 ENSGACT00000002427 >GENSCAN00000008635 ENSGACT00000002432 >GENSCAN00000022111 ENSGACT00000007564 > >Thank you so much and my apologies if this has been addressed before (You >can >point me to the discussion). > >Cheers, > >Osee > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at illinois.edu

ADD COMMENT • link 13.7 years ago Jenny Drnevich ★ 2.0k

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 13 hours ago

WEHI, Melbourne, Australia

Dear Osee, Despite the name of the function, which I admit does suggest more narrow applicability, duplicateCorrelation() can be used for any nested error structure. Best wishes Gordon On Thu, 29 Jul 2010, Y. Osee Sanogo wrote: > Dear Gordon, > > Thank you for the code. It works!! > My only question is does it then matter that the probes set are not > duplicated and there is no technical replicate per se? I thought > duplicateCorrelation is meant for duplicates or technical replicates? Please > clarify. > > Thanks again. > > Osee > > > On 7/28/10 6:53 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: > >> Dear Osee, >> >> I haven't seen anyone else try to answer your first question, so I will. >> >> You're trying to put too many terms in your design matrix, making the >> experiment much more complicated than it actually is. Your experiment >> simply compares two treatment groups. It doesn't make sense to estimate >> effects for fish or tanks, because these are just your randomly sampled >> experimental units. The only real complication of your experiment is that >> some fish share the same tank, so you need to allow for possible >> correlations with a tank. You can do this is limma by: >> >> design <- model.matrix(~Key) >> fitcor <- duplicateCorrelation(ES,design,block=tank) >> fit <- lmFit(ES,design,block=tank,correlation=fitcor$consensus) >> fit <- eBayes(fit) >> topTable(fit,coef=2) >> >> This approach finds genes which respond to your treatment. >> >> Best wishes >> Gordon >> >>> Date: Tue, 27 Jul 2010 06:57:36 -0500 (CDT) >>> From: "Y. Osee Sanogo" <sanogo at="" illinois.edu=""> >>> To: bioconductor at stat.math.ethz.ch >>> Subject: [BioC] Nested Design (Again) & Subset WithinArray Correlation >>> >>> Hello, >>> >>> I have two questions which may be really trivial...but since I am stuck, >>> I'll appreciate any help. >>> >>> Question 1: Nested design: This has been addressed before, but I am just not >>> sure whether I am doing it right. The experiment consisted of two groups of >>> fishes (treated and not treated) with three tanks in each group. Each tank >>> hosted three fishes (total =18) of those fishes n=10 (5 per treatment group) >>> were selected for microarray (Notice unequal number of fishes per tank!). >>> >>> I am interested in 1) Treatment effect (individual fishes) >>> 2) Treatment effect (fishes nested within >>> tanks, i.e. Need to average the gene expression of fishes within each tank ) >>> 3) Whether there is tank effect >>> >>> #ExpressionSet =ES_Filt >>> #targets= see below: >>> >>> Sample Key tank Fish SAMPLE_LABEL >>> 25407102_532.xys CON 1 CON_3 SOM01K28 >>> 25407202_532.xys CON 1 CON_2 SOM01K29 >>> 25414902_532.xys EXP 2 EXP_1 SOM01K2D >>> 25407302_532.xys CON 3 CON_1 SOM01K2C >>> 25406602_532.xys EXP 4 EXP_2 SOM01K25 >>> 25407002_532.xys EXP 4 EXP_3 SOM01K27 >>> 25415502_532.xys EXP 4 EXP_4 SOM01K2E >>> 25405602_532.xys CON 5 CON_4 SOM01K23 >>> 25406702_532.xys CON 5 CON_5 SOM01K26 >>> 25415702_532.xys EXP 6 EXP_5 SOM01K24 >>> >>> I have tried the following design based upon what I found online, but was >>> not really sure whether this is the right way of doing it. >>> >>> design.nested_ES<- model.matrix(~Key + (tank/Fish), data=targets) >>> colnames(design.nested_ES) >>> #I am getting many contrasts, and I am not sure which one represents >>> ?tank/Fish? >>> >>> fit.nested_ES <- lmFit(ES_Filt, design.nested_ES) >>> Fit.nested_ES <- eBayes(fit.nested_ES) >>> Pred2_Nested_ES<-topTable(Fit.nested_ES, coef=2, adjust="BH", n=Inf) >>> Pred2_Nested_ES[1:10,] >>> >>> I will really appreciate your help. >> >> ______________________________________________________________________ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of the >> sender. >> ______________________________________________________________________ > > > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 13.7 years ago Gordon Smyth 50k

Login before adding your answer.