How to create nested data frames
2
0
Entering edit mode
Ana Conesa ▴ 130
@ana-conesa-2246
Last seen 9.2 years ago
Dear List, This is more an R than a Bioconductor question but I cannot post at the R list at the moment, so I apologize for using Bioconductor instead. I am trying to use the pls package to compute pls regression of gene expression data on medical variables. I should provide my data as a data.frame (NOT A LIST) which contains the matrices of X and Y variables, i.e. if mydata is such data frame then mydata$expr gives the expression matrix and mydata$medical is the matrix of medical data. If I do: >mydata <- data.frame(expr=expr, medical=medical) I simply obtain a single data.frame combining the two and I am not able to call the matrices independently I have been seaching the R help and documentation without sucess. Any help appreciated. Ana
Regression Regression • 2.0k views
0
Entering edit mode
Oleg Sklyar ▴ 260
@oleg-sklyar-1882
Last seen 9.2 years ago
AFAIK it should be impossible, at least directly: a data.frame is essentially a list of vectors of equal length. However, a matrix in R is essentially a vector with dim attributes set. So what you can do is something like this: # function that uses such a crazy data.frame, x f = function(x) { a = x$a dim(a) = attr(x,"matrixdim") b = x$b dim(b) = dim(a) # use matrices, e.g. print(dim(b)) } m1 = matrix(runif(10),2,5) m2 = matrix(runif(10),2,5) df = data.frame(a=as.numeric(m1), b=as.numeric(m2)) attr(df, "matrixdim") = dim(m1) f(df) ### should print 2 5 as those are the dimensions of matrices! > f(df) [1] 2 5 indeed! Well, you need to consider that when you do as.numeric and when you do 'dim' on a vector - you copy the data! But honestly you can always find a way to pass another object and a list would be more reasonable as you do not need to copy data. And although the above example works I cannot think of a situation where it would be justified to use it, also data.frames are Slow! - Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466 On Fri, 2007-10-26 at 00:19 +0200, Ana Conesa wrote: > Dear List, > > This is more an R than a Bioconductor question but I cannot post at > the R list at the moment, so I apologize for using Bioconductor > instead. > > I am trying to use the pls package to compute pls regression of gene > expression data on medical variables. I should provide my data as a > data.frame (NOT A LIST) which contains the matrices of X and Y > variables, i.e. if mydata is such data frame then mydata$expr gives > the expression matrix and mydata$medical is the matrix of medical > data. > If I do: > >mydata <- data.frame(expr=expr, medical=medical) > > I simply obtain a single data.frame combining the two and I am not > able to call the matrices independently I have been seaching the R > help and documentation without sucess. > Any help appreciated. > > Ana > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
0
Entering edit mode
Ana Conesa ▴ 130
@ana-conesa-2246
Last seen 9.2 years ago
Dear Oleg Thanks for your help, I have tried, but it seems that method indicated works when the 2 matrices have the same length, which it is not my case. I cannot contruct the data.frame as you indicated if I have different lengths for a and b... Ana > > >---- Mensaje Original ---- >De: osklyar at ebi.ac.uk >Para: aconesa at ochoa.fib.es >Asunto: Re: [BioC] How to create nested data frames >Fecha: Thu, 25 Oct 2007 23:42:51 +0100 > >>AFAIK it should be impossible, at least directly: a data.frame is >>essentially a list of vectors of equal length. However, a matrix in >R is >>essentially a vector with dim attributes set. So what you can do is >>something like this: >> >># function that uses such a crazy data.frame, x >>f = function(x) { >> a = x$a >> dim(a) = attr(x,"matrixdim") >> b = x$b >> dim(b) = dim(a) >> # use matrices, e.g. >> print(dim(b)) >>} >> >>m1 = matrix(runif(10),2,5) >>m2 = matrix(runif(10),2,5) >> >>df = data.frame(a=as.numeric(m1), b=as.numeric(m2)) >>attr(df, "matrixdim") = dim(m1) >> >>f(df) >> >>### should print 2 5 as those are the dimensions of matrices! >>> f(df) >>[1] 2 5 >> >>indeed! >> >>Well, you need to consider that when you do as.numeric and when you >do >>'dim' on a vector - you copy the data! But honestly you can always >find >>a way to pass another object and a list would be more reasonable as >you >>do not need to copy data. And although the above example works I >cannot >>think of a situation where it would be justified to use it, also >>data.frames are Slow! >> >> >>- >>Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466 >> >> >>On Fri, 2007-10-26 at 00:19 +0200, Ana Conesa wrote: >>> Dear List, >>> >>> This is more an R than a Bioconductor question but I cannot post >at >>> the R list at the moment, so I apologize for using Bioconductor >>> instead. >>> >>> I am trying to use the pls package to compute pls regression of >gene >>> expression data on medical variables. I should provide my data as >a >>> data.frame (NOT A LIST) which contains the matrices of X and Y >>> variables, i.e. if mydata is such data frame then mydata$expr >gives >>> the expression matrix and mydata$medical is the matrix of medical >>> data. >>> If I do: >>> >mydata <- data.frame(expr=expr, medical=medical) >>> >>> I simply obtain a single data.frame combining the two and I am not >>> able to call the matrices independently I have been seaching the >R >>> help and documentation without sucess. >>> Any help appreciated. >>> >>> Ana >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: http://news.gmane.org/gmane.science.biology.inf >ormatics.conductor >>
0
Entering edit mode
Hi Ana, I don't think you need anything special to use the pls package. All you need is a data.frame containing your data, or alternatively to have your data in your .GlobalEnv (which IMO is easier to do anyway). Note that the pls package is expecting your data to be in the conventional format of subjects in rows and observations in columns, so you will have to transpose your matrix of expression data. Does results <- mvr(medical ~ t(expr), other args) not work for you? Best, Jim Ana Conesa wrote: > Dear Oleg > > Thanks for your help, I have tried, but it seems that method > indicated works when the 2 matrices have the same length, which it is > not my case. I cannot contruct the data.frame as you indicated if I > have different lengths for a and b... > > Ana >> >> ---- Mensaje Original ---- >> De: osklyar at ebi.ac.uk >> Para: aconesa at ochoa.fib.es >> Asunto: Re: [BioC] How to create nested data frames >> Fecha: Thu, 25 Oct 2007 23:42:51 +0100 >> >>> AFAIK it should be impossible, at least directly: a data.frame is >>> essentially a list of vectors of equal length. However, a matrix in >> R is >>> essentially a vector with dim attributes set. So what you can do is >>> something like this: >>> >>> # function that uses such a crazy data.frame, x >>> f = function(x) { >>> a = x$a >>> dim(a) = attr(x,"matrixdim") >>> b = x$b >>> dim(b) = dim(a) >>> # use matrices, e.g. >>> print(dim(b)) >>> } >>> >>> m1 = matrix(runif(10),2,5) >>> m2 = matrix(runif(10),2,5) >>> >>> df = data.frame(a=as.numeric(m1), b=as.numeric(m2)) >>> attr(df, "matrixdim") = dim(m1) >>> >>> f(df) >>> >>> ### should print 2 5 as those are the dimensions of matrices! >>>> f(df) >>> [1] 2 5 >>> >>> indeed! >>> >>> Well, you need to consider that when you do as.numeric and when you >> do >>> 'dim' on a vector - you copy the data! But honestly you can always >> find >>> a way to pass another object and a list would be more reasonable as >> you >>> do not need to copy data. And although the above example works I >> cannot >>> think of a situation where it would be justified to use it, also >>> data.frames are Slow! >>> >>> >>> - >>> Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466 >>> >>> >>> On Fri, 2007-10-26 at 00:19 +0200, Ana Conesa wrote: >>>> Dear List, >>>> >>>> This is more an R than a Bioconductor question but I cannot post >> at >>>> the R list at the moment, so I apologize for using Bioconductor >>>> instead. >>>> >>>> I am trying to use the pls package to compute pls regression of >> gene >>>> expression data on medical variables. I should provide my data as >> a >>>> data.frame (NOT A LIST) which contains the matrices of X and Y >>>> variables, i.e. if mydata is such data frame then mydata$expr >> gives >>>> the expression matrix and mydata$medical is the matrix of medical >>>> data. >>>> If I do: >>>>> mydata <- data.frame(expr=expr, medical=medical) >>>> I simply obtain a single data.frame combining the two and I am not >>>> able to call the matrices independently I have been seaching the >> R >>>> help and documentation without sucess. >>>> Any help appreciated. >>>> >>>> Ana >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.inf >> ormatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623
0
Entering edit mode
The data.frame can accommodate the presence of a matrix, given that the number of rows of the matrix and of the data.frame are matching. An instance is created as usual: m <- matrix(1:20, ncol=2) dataf <- data.frame(m=m, letter=letters[1:10]) Hoping this helps, Laurent > Dear Oleg > > Thanks for your help, I have tried, but it seems that method > indicated works when the 2 matrices have the same length, which it is > not my case. I cannot contruct the data.frame as you indicated if I > have different lengths for a and b... > > Ana >> >> >>---- Mensaje Original ---- >>De: osklyar at ebi.ac.uk >>Para: aconesa at ochoa.fib.es >>Asunto: Re: [BioC] How to create nested data frames >>Fecha: Thu, 25 Oct 2007 23:42:51 +0100 >> >>>AFAIK it should be impossible, at least directly: a data.frame is >>>essentially a list of vectors of equal length. However, a matrix in >>R is >>>essentially a vector with dim attributes set. So what you can do is >>>something like this: >>> >>># function that uses such a crazy data.frame, x >>>f = function(x) { >>> a = x$a >>> dim(a) = attr(x,"matrixdim") >>> b = x$b >>> dim(b) = dim(a) >>> # use matrices, e.g. >>> print(dim(b)) >>>} >>> >>>m1 = matrix(runif(10),2,5) >>>m2 = matrix(runif(10),2,5) >>> >>>df = data.frame(a=as.numeric(m1), b=as.numeric(m2)) >>>attr(df, "matrixdim") = dim(m1) >>> >>>f(df) >>> >>>### should print 2 5 as those are the dimensions of matrices! >>>> f(df) >>>[1] 2 5 >>> >>>indeed! >>> >>>Well, you need to consider that when you do as.numeric and when you >>do >>>'dim' on a vector - you copy the data! But honestly you can always >>find >>>a way to pass another object and a list would be more reasonable as >>you >>>do not need to copy data. And although the above example works I >>cannot >>>think of a situation where it would be justified to use it, also >>>data.frames are Slow! >>> >>> >>>- >>>Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466 >>> >>> >>>On Fri, 2007-10-26 at 00:19 +0200, Ana Conesa wrote: >>>> Dear List, >>>> >>>> This is more an R than a Bioconductor question but I cannot post >>at >>>> the R list at the moment, so I apologize for using Bioconductor >>>> instead. >>>> >>>> I am trying to use the pls package to compute pls regression of >>gene >>>> expression data on medical variables. I should provide my data as >>a >>>> data.frame (NOT A LIST) which contains the matrices of X and Y >>>> variables, i.e. if mydata is such data frame then mydata$expr >>gives >>>> the expression matrix and mydata$medical is the matrix of medical >>>> data. >>>> If I do: >>>> >mydata <- data.frame(expr=expr, medical=medical) >>>> >>>> I simply obtain a single data.frame combining the two and I am not >>>> able to call the matrices independently I have been seaching the >>R >>>> help and documentation without sucess. >>>> Any help appreciated. >>>> >>>> Ana >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>>_______________________________________________ >>>Bioconductor mailing list >>>Bioconductor at stat.math.ethz.ch >>>https://stat.ethz.ch/mailman/listinfo/bioconductor >>>Search the archives: http://news.gmane.org/gmane.science.biology.inf >>ormatics.conductor >>> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >