Question

EdgeR - generating appropriate design and contrast matrix : multi-factorial experiment

0

Entering edit mode

Zack Will ▴ 30

@zack-will-5491

Last seen 9.7 years ago

United Kingdom

Dear EdgeR developers and kind list members, I have a RNA-seq experiment which I would like to analyse using edgeR as i think it is a multi-factorial experiment . After reading the excellent EdgeR user manual as well the wealth of design-matrix related question in the mailing list, I am still unsure about what design matrix would be appropriate for my data. Therefore I would appreciate feedback from members of mailing list. The RNA-seq data : 19 samples {16 tumours & 3 normal}. All samples are from different individual, all samples was sequenced once (ie ? no replicates) The aim ? To find DE genes based on the sensitivity of tumour samples to drug A [and replicate the same analysis to drug B] The aim (reworded for clarification) - To find DE genes between tumour samples which are sensitive to drug A and tumour samples which are resistant to drug A Integrating previously known information on drug sensitivity, therefore I designed my meta-data as below ; targets files samples drug_A drug_B 1 T01 T01 resistant sensitive 2 T02 T02 resistant resistant 3 T03 T03 sensitive resistant 4 T04 T04 medium sensitive 5 T05 T05 medium sensitive 6 T06 T06 resistant sensitive 7 T07 T07 medium resistant 8 T08 T08 medium resistant 9 T09 T09 resistant resistant 10 T10 T10 medium sensitive 11 T11 T11 resistant resistant 12 T12 T12 sensitive resistant 13 T13 T13 resistant resistant 14 T14 T14 sensitive sensitive 15 T15 T15 sensitive resistant 16 T16 T16 sensitive sensitive 17 N01 normal unknown unknown 18 N02 normal unknown unknown 19 N03 normal unknown unknown To clarify :- 1)All RNA-seq was data was from untreated samples 2)Information on drug sensitivity was obtain from wet-lab experiments 3)No drug sensitivity experiments was done on normal samples, hence the unknown >From my current understanding after reading the EdgeR user manual and to an extent the limma section 8.5, to test my aim, I am inclined to say the design matrix for my data should be an interaction model (limma section 8.5.1 & edgeR section 3.31) rather than block model (edgeR section 3.4.2). #I have not fully understand what nested model is, so I am unsure if nested is the better option?? Therefore my design is Groups = factor(paste(targets$samples,targets$drug_A,sep=".")) design = model.matrix(~0 + Groups) colnames(design) = levels(Groups) >From this design, I fail to see a way to specify a contrast that would answer my aim of the study (Determine which genes are differently expressed between tumour samples which are sensitive to drug A and tumour samples which are resistant to drug A). Therefore my question to dear mailing list members would be, 1) Is my experimental design correct to test my aim? (My gut feeling is it is not...) 2) What design is appropriate to account for the individual variability in the tumour while addressing the aim of expreiment (tumour sensitive vs tumour resistant) ? Is this possible? Would this meta-data be the key? targets_2 targets files samples type drug_A drug_B 1 T01 T01 tumour resistant sensitive 2 T02 T02 tumour resistant resistant 3 T03 T03 tumour sensitive resistant 4 T04 T04 tumour medium sensitive . . 16 T16 T16 tumour sensitive sensitive 17 N01 normal normal unknown unknown 18 N02 normal normal unknown unknown 19 N03 normal normal unknown unknown Following the above meta-data, proceed along this line:- Groups = factor(paste(targets_2$type,targets_2$drug_A,sep=".")) design = model.matrix(~0 + Groups) colnames(design) = levels(Groups) my.contrast = makeContrasts( tumour.sensitiveVSresistant = tumour.sensitive-tumour.resistant, tumour_normal.sensitiveVsresitabt = (tumour.sensitive- normal.unknown)-(tumour.resistant-normal.unknown) ,levels=design) Would the method above be more appropriate?? But will it account for the variability in the tumour samples? (ie- Does the design above treat the tumour as replicates??) Thank you for taking time reading this post and I apologies if I included many unnecessary information. Zaki

limma edgeR limma edgeR • 3.1k views

ADD COMMENT • link updated 12.0 years ago by Gordon Smyth 52k • written 12.0 years ago by Zack Will ▴ 30

score 0 · Answer 1 · 2013-04-06

Dear Zaki, The best way forward would be for you to collaborate with a statistician at your own institution, if you can possibly do that. edgeR provides the capabilities to do lots of analyses, but figuring out what analyses are appropriate for your scientific problem is another question. When I work with biologists, it often takes months or years for us to understand all the scientist's questions and to translate these into appropriate statistical analyses. So there is no way that I can tell you in a few sentences how to analyse your data appropriately. However your stated aim "To find DE genes between tumour samples which are sensitive to drug A and tumour samples which are resistant to drug A" seems to have an easy answer. The drug_A column splits your samples into four groups (resistant, medium, sensitive, unknown) and you want to compare the resistant and sensitive groups. This is one-way layout, and you can follow Section 3.2 of the edgeR User's Guide. Please don't put sample IDs into a factor like: Groups = factor(paste(targets$samples,targets$drug_A,sep=".")) This is in effect trying to treat each sample as its own group, and that makes no sense. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Fri, 5 Apr 2013, Zaki Fadlullah wrote: > Dear EdgeR developers and kind list members, > I have a RNA-seq experiment which I would like to analyse using edgeR as > i think it is a multi-factorial experiment . > > After reading the excellent EdgeR user manual as well the wealth of > design-matrix related question in the mailing list, I am still unsure > about what design matrix would be appropriate for my data. Therefore I > would appreciate feedback from members of mailing list. > > The RNA-seq data : 19 samples {16 tumours & 3 normal}. All samples are > from different individual, all samples was sequenced once (ie no > replicates) > The aim -- To find DE genes based on the sensitivity of tumour samples > to drug A [and replicate the same analysis to drug B] > The aim (reworded for clarification) - To find DE genes between tumour > samples which are sensitive to drug A and tumour samples which are > resistant to drug A > > Integrating previously known information on drug sensitivity, therefore I designed my meta-data as below ; > targets > files samples drug_A drug_B > 1 T01 T01 resistant sensitive > 2 T02 T02 resistant resistant > 3 T03 T03 sensitive resistant > 4 T04 T04 medium sensitive > 5 T05 T05 medium sensitive > 6 T06 T06 resistant sensitive > 7 T07 T07 medium resistant > 8 T08 T08 medium resistant > 9 T09 T09 resistant resistant > 10 T10 T10 medium sensitive > 11 T11 T11 resistant resistant > 12 T12 T12 sensitive resistant > 13 T13 T13 resistant resistant > 14 T14 T14 sensitive sensitive > 15 T15 T15 sensitive resistant > 16 T16 T16 sensitive sensitive > 17 N01 normal unknown unknown > 18 N02 normal unknown unknown > 19 N03 normal unknown unknown > > To clarify :- > 1)All RNA-seq was data was from untreated samples > 2)Information on drug sensitivity was obtain from wet-lab experiments > 3)No drug sensitivity experiments was done on normal samples, hence the unknown > > From my current understanding after reading the EdgeR user manual and to an extent the limma section 8.5, to test my aim, I am inclined to say the design matrix for my data should be an interaction model (limma section 8.5.1 & edgeR section 3.31) rather than block model (edgeR section 3.4.2). #I have not fully understand what nested model is, so I am unsure if nested is the better option?? > > Therefore my design is > > Groups = factor(paste(targets$samples,targets$drug_A,sep=".")) > design = model.matrix(~0 + Groups) > colnames(design) = levels(Groups) > > From this design, I fail to see a way to specify a contrast that would answer my aim of the study (Determine which genes are differently expressed between tumour samples which are sensitive to drug A and tumour samples which are resistant to drug A). > Therefore my question to dear mailing list members would be, > 1) Is my experimental design correct to test my aim? (My gut feeling is it is not...) > > 2) What design is appropriate to account for the individual variability in the tumour while addressing the aim of expreiment (tumour sensitive vs tumour resistant) ? Is this possible? > Would this meta-data be the key? > targets_2 > targets > files samples type drug_A drug_B > 1 T01 T01 tumour resistant sensitive > 2 T02 T02 tumour resistant resistant > 3 T03 T03 tumour sensitive resistant > 4 T04 T04 tumour medium sensitive > . > . > 16 T16 T16 tumour sensitive sensitive > 17 N01 normal normal unknown unknown > 18 N02 normal normal unknown unknown > 19 N03 normal normal unknown unknown > > Following the above meta-data, proceed along this line:- > Groups = factor(paste(targets_2$type,targets_2$drug_A,sep=".")) > design = model.matrix(~0 + Groups) > colnames(design) = levels(Groups) > > my.contrast = makeContrasts( > tumour.sensitiveVSresistant = tumour.sensitive-tumour.resistant, > tumour_normal.sensitiveVsresitabt = (tumour.sensitive- normal.unknown)-(tumour.resistant-normal.unknown) > ,levels=design) > > Would the method above be more appropriate?? But will it account for the > variability in the tumour samples? (ie- Does the design above treat the > tumour as replicates??) > > Thank you for taking time reading this post and I apologies if I included many unnecessary information. > Zaki > > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}