finding differential expression pattern between paired columns of data frame.

0

Entering edit mode

Jakub Stanislaw Nowak ▴ 70

@jakub-stanislaw-nowak-6656

Last seen 9.6 years ago

Hello Bioconductor list members, I am trying to analyse output of my smallRNA sequencing data. My starting point is a table with differentially expressed miRs between two stages of different cell types. My goal is to identify miRs that are changing in LineA - Line D and not changing in LineB - LineC and reciprocal situation. Below is a head of my data.frame that I work on. > data.selected.list ID Line A Line B Line C Line D 1 mmu-miR-23a-5p -2.28461513 -2.75356783 -2.0132004 0.1397216 2 mmu-miR-22-3p 1.65924854 0.03717303 0.4909068 2.4100528 3 mmu-miR-5113 -0.91337275 -0.87022608 -1.3224488 1.3906263 4 mmu-miR-324-3p 1.44133874 -0.07767235 -0.2187064 -0.2201993 5 mmu-miR-365-1-5p -0.03644659 0.72755490 1.0445618 1.3071159 6 mmu-miR-3068-5p 0.49474466 -0.28506641 -0.2995997 1.3754639 As this very much looked for me as microarray experiment I originally was thinking to apply some tools for microarray analysis to get this done. So i did as below 1. Constructed matrix > design <- model.matrix(~-1+factor(c(1,2,3,4))) > colnames(design) <- c("LineA","LineB","LineC","LineD") > design LineA LineB LineC LineD 1 1 0 0 0 2 0 1 0 0 3 0 0 1 0 4 0 0 0 1 attr(,"assign") [1] 1 1 1 1 attr(,"contrasts") attr(,"contrasts")$`factor(c(1, 2, 3, 4))` [1] ?contr.treatment? 2. Constructed comparison matrix > contrast.matrix <- makeContrasts(LineA-LineD, LineB-LineC, levels=design) > contrast.matrix Contrasts Levels LineA - LineD LineB - LineC LineA 1 0 LineB 0 1 LineC 0 -1 LineD -1 0 3. Perform fitting and of course got benched for my naive thinking that this is going to work > fit <- lmFit(data.selected[,(2:5)],design) > fit2 <- contrasts.fit(fit, contrast.matrix) > fit2 <- eBayes(fit2) Error in ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim, : No residual degrees of freedom in linear model fits I thought that maybe there is something wrong with the input for my fitting and I checked that limma is using normalised log data so I followed my very naive thinking and try to normalise the table > eset<-rma(data.selected.list[,2:5]) Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ?probeNames? for signature ?"data.frame?? As it hasn?t work I have two questions: 1. Can I somehow edit my data.frame that it will become a better fit for limma package and perform comparison I want. 2. I won?t be surprised if you say that this approach is not valid therefore I will welcome any suggestion about other models/methods I can use to make this comparison for my initial table. Thank you very much for help, Jakub -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Sequencing SmallRNA Microarray limma Sequencing SmallRNA Microarray limma • 1.6k views

ADD COMMENT • link updated 9.7 years ago by James W. MacDonald 65k • written 9.7 years ago by Jakub Stanislaw Nowak ▴ 70

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 2 hours ago

United States

Hi Jakub, The problem here is that you don't have any replication, so you cannot fit a model. The only thing you can do is compute fold changes, which are not very reliable. Best, Jim On Wed, Aug 6, 2014 at 10:02 AM, Jakub Stanislaw Nowak <jakub.nowak@ed.ac.uk> wrote: > Hello Bioconductor list members, > > I am trying to analyse output of my smallRNA sequencing data. My starting > point is a table with differentially expressed miRs between two stages of > different cell types. > My goal is to identify miRs that are changing in LineA - Line D and not > changing in LineB - LineC and reciprocal situation. > > Below is a head of my data.frame that I work on. > > > data.selected.list > ID Line A Line B Line C > Line D > 1 mmu-miR-23a-5p -2.28461513 -2.75356783 -2.0132004 0.1397216 > 2 mmu-miR-22-3p 1.65924854 0.03717303 0.4909068 2.4100528 > 3 mmu-miR-5113 -0.91337275 -0.87022608 -1.3224488 1.3906263 > 4 mmu-miR-324-3p 1.44133874 -0.07767235 -0.2187064 -0.2201993 > 5 mmu-miR-365-1-5p -0.03644659 0.72755490 1.0445618 1.3071159 > 6 mmu-miR-3068-5p 0.49474466 -0.28506641 -0.2995997 1.3754639 > > As this very much looked for me as microarray experiment I originally was > thinking to apply some tools for microarray analysis to get this done. > > So i did as below > 1. Constructed matrix > > design <- model.matrix(~-1+factor(c(1,2,3,4))) > > colnames(design) <- c("LineA","LineB","LineC","LineD") > > design > LineA LineB LineC LineD > 1 1 0 0 0 > 2 0 1 0 0 > 3 0 0 1 0 > 4 0 0 0 1 > attr(,"assign") > [1] 1 1 1 1 > attr(,"contrasts") > attr(,"contrasts")$`factor(c(1, 2, 3, 4))` > [1] âcontr.treatmentâ > > 2. Constructed comparison matrix > > contrast.matrix <- makeContrasts(LineA-LineD, LineB-LineC, levels=design) > > contrast.matrix > Contrasts > Levels LineA - LineD LineB - LineC > LineA 1 0 > LineB 0 1 > LineC 0 -1 > LineD -1 0 > > 3. Perform fitting and of course got benched for my naive thinking that > this is going to work > > > fit <- lmFit(data.selected[,(2:5)],design) > > fit2 <- contrasts.fit(fit, contrast.matrix) > > fit2 <- eBayes(fit2) > Error in ebayes(fit = fit, proportion = proportion, stdev.coef.lim = > stdev.coef.lim, : > No residual degrees of freedom in linear model fits > > I thought that maybe there is something wrong with the input for my > fitting and I checked that limma is using normalised log data so I followed > my very naive thinking and try to normalise the table > > > eset<-rma(data.selected.list[,2:5]) > Error in (function (classes, fdef, mtable) : > unable to find an inherited method for function âprobeNamesâ for > signature â"data.frameââ > > As it hasnât work I have two questions: > 1. Can I somehow edit my data.frame that it will become a better fit for > limma package and perform comparison I want. > 2. I wonât be surprised if you say that this approach is not valid > therefore I will welcome any suggestion about other models/methods I can > use to make this comparison for my initial table. > > Thank you very much for help, > > Jakub > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]

ADD COMMENT • link 9.7 years ago James W. MacDonald 65k

0

Entering edit mode

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

ADD REPLY • link 9.7 years ago Jakub Stanislaw Nowak ▴ 70

0

Entering edit mode

You could create a statistic that captures what you want. i.e: s = LineAD / (abs(LineBC) ) this number would be big if lineAD foldchange is big and/or line BC is small. It's not perfect but may somewhat capture what you want. You can alter it as well to prioritize something particular, like: s = LineAD^2 / (abs(LineBC)+1) would give more 'weight' to big AD changes and avoid the stat 'exploding' when BC change is near zero. p.s. If data is log measured ?shouldn't you compute FC by substraction? p.p.s. you may want to rank on the absolute value of the fold change as well, and later look at the sign. On Wed, Aug 6, 2014 at 11:10 AM, Jakub Stanislaw Nowak <jakub.nowak at="" ed.ac.uk=""> wrote: > Hi Jim, > > Thanks for your reply. It become clear for me. > > I computed the fold changes and added them to the data frame > >> LineA.LineD <- data.selected.list$LineA / data.selected.list$LineD >> LineB.LineC <- data.selected.list$LineB/data.selected.list$LineC >> data.merged <- cbind(data.selected.list, LineAD=LineA.LineD, LineBC=LineB.LineC) >> View(data.merged) >> data.merged > ID LineA LineB LineC LineD LineAD LineBC > 1 mmu-miR-23a-5p -2.28461513 -2.75356783 -2.0132004 0.1397216 -16.35119386 1.36775647 > 2 mmu-miR-22-3p 1.65924854 0.03717303 0.4909068 2.4100528 0.68846978 0.07572319 > 3 mmu-miR-5113 -0.91337275 -0.87022608 -1.3224488 1.3906263 -0.65680676 0.65804140 > 4 mmu-miR-324-3p 1.44133874 -0.07767235 -0.2187064 -0.2201993 -6.54561061 0.35514432 > 5 mmu-miR-365-1-5p -0.03644659 0.72755490 1.0445618 1.3071159 -0.02788321 0.69651688 > 6 mmu-miR-3068-5p 0.49474466 -0.28506641 -0.2995997 1.3754639 0.35969294 0.95149110 >> > > I want to ask for suggestion what would be the best way to sort this data frame now by: > 1. Descending by fold changes in LineAD column > 2. In the same time removing these miRNAs that fold change do not differ so much in LineBC column > > Many thanks for your great help, > > Jakub > > > On 6 Aug 2014, at 15:26, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > >> Hi Jakub, >> >> The problem here is that you don't have any replication, so you cannot fit a model. The only thing you can do is compute fold changes, which are not very reliable. >> >> Best, >> >> Jim >> >> >> >> >> On Wed, Aug 6, 2014 at 10:02 AM, Jakub Stanislaw Nowak <jakub.nowak at="" ed.ac.uk=""> wrote: >> Hello Bioconductor list members, >> >> I am trying to analyse output of my smallRNA sequencing data. My starting point is a table with differentially expressed miRs between two stages of different cell types. >> My goal is to identify miRs that are changing in LineA - Line D and not changing in LineB - LineC and reciprocal situation. >> >> Below is a head of my data.frame that I work on. >> >> > data.selected.list >> ID Line A Line B Line C Line D >> 1 mmu-miR-23a-5p -2.28461513 -2.75356783 -2.0132004 0.1397216 >> 2 mmu-miR-22-3p 1.65924854 0.03717303 0.4909068 2.4100528 >> 3 mmu-miR-5113 -0.91337275 -0.87022608 -1.3224488 1.3906263 >> 4 mmu-miR-324-3p 1.44133874 -0.07767235 -0.2187064 -0.2201993 >> 5 mmu-miR-365-1-5p -0.03644659 0.72755490 1.0445618 1.3071159 >> 6 mmu-miR-3068-5p 0.49474466 -0.28506641 -0.2995997 1.3754639 >> >> As this very much looked for me as microarray experiment I originally was thinking to apply some tools for microarray analysis to get this done. >> >> So i did as below >> 1. Constructed matrix >> > design <- model.matrix(~-1+factor(c(1,2,3,4))) >> > colnames(design) <- c("LineA","LineB","LineC","LineD") >> > design >> LineA LineB LineC LineD >> 1 1 0 0 0 >> 2 0 1 0 0 >> 3 0 0 1 0 >> 4 0 0 0 1 >> attr(,"assign") >> [1] 1 1 1 1 >> attr(,"contrasts") >> attr(,"contrasts")$`factor(c(1, 2, 3, 4))` >> [1] ?contr.treatment? >> >> 2. Constructed comparison matrix >> > contrast.matrix <- makeContrasts(LineA-LineD, LineB-LineC, levels=design) >> > contrast.matrix >> Contrasts >> Levels LineA - LineD LineB - LineC >> LineA 1 0 >> LineB 0 1 >> LineC 0 -1 >> LineD -1 0 >> >> 3. Perform fitting and of course got benched for my naive thinking that this is going to work >> >> > fit <- lmFit(data.selected[,(2:5)],design) >> > fit2 <- contrasts.fit(fit, contrast.matrix) >> > fit2 <- eBayes(fit2) >> Error in ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim, : >> No residual degrees of freedom in linear model fits >> >> I thought that maybe there is something wrong with the input for my fitting and I checked that limma is using normalised log data so I followed my very naive thinking and try to normalise the table >> >> > eset<-rma(data.selected.list[,2:5]) >> Error in (function (classes, fdef, mtable) : >> unable to find an inherited method for function ?probeNames? for signature ?"data.frame?? >> >> As it hasn?t work I have two questions: >> 1. Can I somehow edit my data.frame that it will become a better fit for limma package and perform comparison I want. >> 2. I won?t be surprised if you say that this approach is not valid therefore I will welcome any suggestion about other models/methods I can use to make this comparison for my initial table. >> >> Thank you very much for help, >> >> Jakub >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 > > > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 9.7 years ago Federico Lasa ▴ 80

Login before adding your answer.