sum of columns

0

Entering edit mode

chris Jhon ▴ 260

@chris-jhon-5047

Last seen 9.6 years ago

Hi All, I have a data frame like this gene symbol sample1 sample2 sample3 sample4 gene1 A 0 0 0 0 gene2 B 0 10 2 0 gene3 C 0 0 0 0 and i would like to subset the data frame to have only genes that have sum in all samples greater than zero. How to do this in R Thank you for any help

• 1.5k views

ADD COMMENT • link 10.8 years ago chris Jhon ▴ 260

0

Entering edit mode

chris Jhon ▴ 260

@chris-jhon-5047

Last seen 9.6 years ago

Hi Alex, Thank you. However , i got error due to memory limit Error: memory exhausted (limit reached?) In addition i have one col that have no numerical vlaue (e.g gene name) row sums will work only for numerical value columns? On 7/11/13, Alessandro Brozzi <alessandro.brozzi at="" gmail.com=""> wrote: > let call D your dataframe then: > > D[ rowSums(D) > 0 , ] > > alex > > > On Thu, Jul 11, 2013 at 10:56 AM, chris Jhon <cjhon217 at="" gmail.com=""> wrote: > >> Hi All, >> >> I have a data frame like this >> >> >> gene symbol sample1 sample2 sample3 sample4 >> >> gene1 A 0 0 0 0 >> gene2 B 0 10 2 0 >> gene3 C 0 0 0 0 >> >> and i would like to subset the data frame to have only genes that have >> sum in all samples greater than zero. >> >> How to do this in R >> >> Thank you for any help >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >

ADD COMMENT • link 10.8 years ago chris Jhon ▴ 260

0

Entering edit mode

chris Jhon ▴ 260

@chris-jhon-5047

Last seen 9.6 years ago

Thank you very much ,However, I got an error Error: cannot allocate vector of size 290 Kb > sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.14.0 On 7/11/13, Alessandro Brozzi <alessandro.brozzi at="" gmail.com=""> wrote: > D[ rowSums( D[ , -c(1,2) ] ) > 0 , ] > > where 1 and 2 are the indices of the non-numerical columns > > > On Thu, Jul 11, 2013 at 11:12 AM, chris Jhon <cjhon217 at="" gmail.com=""> wrote: > >> Hi Alex, >> >> Thank you. >> >> However , i got error due to memory limit >> >> Error: memory exhausted (limit reached?) >> >> In addition i have one col that have no numerical vlaue (e.g gene >> name) row sums will work only for numerical value columns? >> >> On 7/11/13, Alessandro Brozzi <alessandro.brozzi at="" gmail.com=""> wrote: >> > let call D your dataframe then: >> > >> > D[ rowSums(D) > 0 , ] >> > >> > alex >> > >> > >> > On Thu, Jul 11, 2013 at 10:56 AM, chris Jhon <cjhon217 at="" gmail.com=""> >> > wrote: >> > >> >> Hi All, >> >> >> >> I have a data frame like this >> >> >> >> >> >> gene symbol sample1 sample2 sample3 sample4 >> >> >> >> gene1 A 0 0 0 >> >> 0 >> >> gene2 B 0 10 2 0 >> >> gene3 C 0 0 0 >> >> 0 >> >> >> >> and i would like to subset the data frame to have only genes that have >> >> sum in all samples greater than zero. >> >> >> >> How to do this in R >> >> >> >> Thank you for any help >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > >> >

ADD COMMENT • link 10.8 years ago chris Jhon ▴ 260

0

Entering edit mode

chris Jhon ▴ 260

@chris-jhon-5047

Last seen 9.6 years ago

Hi Alex, Thank you i used this pos = rep(NA, nrow(D)) > > for (i in 1:nrow(D)) { > if(sum(D[i,-c(1,2)]) <0) pos[i] = i > } and it is working and i can get all !is.na and put them is vector P P=pos[!is.na(pos)] The subset can not work i got the error Error: memory exhausted (limit reached?) ANy idea?? Thank you very much On 7/11/13, Alessandro Brozzi <alessandro.brozzi at="" gmail.com=""> wrote: > you might try to allocate first an empty vector of NA and then use a for > loop: > > pos = rep(NA, nrow(D)) > > for (i in 1:nrow(D)) { > if(sum(D[i,-c(1,2)]) <0) pos[i] = i > } > > the subset: > > D[ pos[!is.na(pos)] , ] > > should be what you are seeking. > > hth, > alex > > On Thu, Jul 11, 2013 at 11:22 AM, chris Jhon <cjhon217 at="" gmail.com=""> wrote: > >> Thank you very much ,However, I got an error >> Error: cannot allocate vector of size 290 Kb >> >> > sessionInfo() >> R version 2.14.0 (2011-10-31) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> loaded via a namespace (and not attached): >> [1] tools_2.14.0 >> >> On 7/11/13, Alessandro Brozzi <alessandro.brozzi at="" gmail.com=""> wrote: >> > D[ rowSums( D[ , -c(1,2) ] ) > 0 , ] >> > >> > where 1 and 2 are the indices of the non-numerical columns >> > >> > >> > On Thu, Jul 11, 2013 at 11:12 AM, chris Jhon <cjhon217 at="" gmail.com=""> >> > wrote: >> > >> >> Hi Alex, >> >> >> >> Thank you. >> >> >> >> However , i got error due to memory limit >> >> >> >> Error: memory exhausted (limit reached?) >> >> >> >> In addition i have one col that have no numerical vlaue (e.g gene >> >> name) row sums will work only for numerical value columns? >> >> >> >> On 7/11/13, Alessandro Brozzi <alessandro.brozzi at="" gmail.com=""> wrote: >> >> > let call D your dataframe then: >> >> > >> >> > D[ rowSums(D) > 0 , ] >> >> > >> >> > alex >> >> > >> >> > >> >> > On Thu, Jul 11, 2013 at 10:56 AM, chris Jhon <cjhon217 at="" gmail.com=""> >> >> > wrote: >> >> > >> >> >> Hi All, >> >> >> >> >> >> I have a data frame like this >> >> >> >> >> >> >> >> >> gene symbol sample1 sample2 sample3 sample4 >> >> >> >> >> >> gene1 A 0 0 0 >> >> >> 0 >> >> >> gene2 B 0 10 2 >> 0 >> >> >> gene3 C 0 0 0 >> >> >> 0 >> >> >> >> >> >> and i would like to subset the data frame to have only genes that >> have >> >> >> sum in all samples greater than zero. >> >> >> >> >> >> How to do this in R >> >> >> >> >> >> Thank you for any help >> >> >> >> >> >> _______________________________________________ >> >> >> Bioconductor mailing list >> >> >> Bioconductor at r-project.org >> >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> >> Search the archives: >> >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> >> > >> >> >> > >> >

ADD COMMENT • link 10.8 years ago chris Jhon ▴ 260

0

Entering edit mode

Hi, On Fri, Jul 12, 2013 at 1:40 AM, chris Jhon <cjhon217 at="" gmail.com=""> wrote: > Hi Alex, > Thank you i used this > > pos = rep(NA, nrow(D)) >> >> for (i in 1:nrow(D)) { >> if(sum(D[i,-c(1,2)]) <0) pos[i] = i >> } > > and it is working and i can get all !is.na and put them is vector P > > P=pos[!is.na(pos)] > > The subset can not work i got the error > Error: memory exhausted (limit reached?) > > ANy idea?? > Thank you very much It's not clear what you are expecting to get out of this ... it seems pretty obvious that you do not have enough memory to process this data, yet you keep asking if anybody has "any idea". The simple idea is that you should use a machine with more RAM. If that is not sufficient advice, could you please specify a bit more clearly what you want help with? You might consider reading in the file line by line, if the numbers/counts in the current line are not sufficient to keep that row of data in the next step of the analysis, then simply ignore the line, otherwise append the line to a new file that you are building that will contain the reduced set of data you want to work with. Once that's done, then you can restart R and just load in the filtered data, and move on. Still, if you keep running out of RAM trying to take a subset of a data.frame that you want to process, then I suspect actually *processing* the data even after it has been filtered will be problematic with the limited resources the machine you have is working with. Last question is to ask yourself if it makes sense that you are running out of memory. How big is the data you are trying to process? How much RAM do you have? Also, while you are sorting all of these things out, you might as well upgrade to the latest version of R (3.0.1), as it seems you are working with R-2.14, which is a bit outdated, and if you'd like help with later parts of your analysis, you will be asked to upgraded to the latest and greatest version of R anyway. -steve > On 7/11/13, Alessandro Brozzi <alessandro.brozzi at="" gmail.com=""> wrote: >> you might try to allocate first an empty vector of NA and then use a for >> loop: >> >> pos = rep(NA, nrow(D)) >> >> for (i in 1:nrow(D)) { >> if(sum(D[i,-c(1,2)]) <0) pos[i] = i >> } >> >> the subset: >> >> D[ pos[!is.na(pos)] , ] >> >> should be what you are seeking. >> >> hth, >> alex >> >> On Thu, Jul 11, 2013 at 11:22 AM, chris Jhon <cjhon217 at="" gmail.com=""> wrote: >> >>> Thank you very much ,However, I got an error >>> Error: cannot allocate vector of size 290 Kb >>> >>> > sessionInfo() >>> R version 2.14.0 (2011-10-31) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> loaded via a namespace (and not attached): >>> [1] tools_2.14.0 >>> >>> On 7/11/13, Alessandro Brozzi <alessandro.brozzi at="" gmail.com=""> wrote: >>> > D[ rowSums( D[ , -c(1,2) ] ) > 0 , ] >>> > >>> > where 1 and 2 are the indices of the non-numerical columns >>> > >>> > >>> > On Thu, Jul 11, 2013 at 11:12 AM, chris Jhon <cjhon217 at="" gmail.com=""> >>> > wrote: >>> > >>> >> Hi Alex, >>> >> >>> >> Thank you. >>> >> >>> >> However , i got error due to memory limit >>> >> >>> >> Error: memory exhausted (limit reached?) >>> >> >>> >> In addition i have one col that have no numerical vlaue (e.g gene >>> >> name) row sums will work only for numerical value columns? >>> >> >>> >> On 7/11/13, Alessandro Brozzi <alessandro.brozzi at="" gmail.com=""> wrote: >>> >> > let call D your dataframe then: >>> >> > >>> >> > D[ rowSums(D) > 0 , ] >>> >> > >>> >> > alex >>> >> > >>> >> > >>> >> > On Thu, Jul 11, 2013 at 10:56 AM, chris Jhon <cjhon217 at="" gmail.com=""> >>> >> > wrote: >>> >> > >>> >> >> Hi All, >>> >> >> >>> >> >> I have a data frame like this >>> >> >> >>> >> >> >>> >> >> gene symbol sample1 sample2 sample3 sample4 >>> >> >> >>> >> >> gene1 A 0 0 0 >>> >> >> 0 >>> >> >> gene2 B 0 10 2 >>> 0 >>> >> >> gene3 C 0 0 0 >>> >> >> 0 >>> >> >> >>> >> >> and i would like to subset the data frame to have only genes that >>> have >>> >> >> sum in all samples greater than zero. >>> >> >> >>> >> >> How to do this in R >>> >> >> >>> >> >> Thank you for any help >>> >> >> >>> >> >> _______________________________________________ >>> >> >> Bioconductor mailing list >>> >> >> Bioconductor at r-project.org >>> >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >> >> Search the archives: >>> >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >>> >> > >>> >> >>> > >>> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech

ADD REPLY • link 10.8 years ago Steve Lianoglou ★ 13k

Login before adding your answer.