basic R question
2
0
Entering edit mode
Jing Huang ▴ 380
@jing-huang-4737
Last seen 9.7 years ago
Hi Expert! I am trying to get rid of the features that contain more than half of samples with NA data. Could you help me? Here is the example. > head(exprs(eset.vsn)) GSM48598 GSM48617 GSM48600 GSM48601 GSM48602 GSM48604 GSM48607 GSM48608 GSM48614 GSM48616 GSM48599 GSM48603 GSM48605 GSM48606 GSM48609 GSM48615 1000_at 11.324097 10.881484 11.004342 10.649591 11.196320 11.405016 11.159627 11.144816 11.280698 11.008774 11.158076 10.083978 11.024338 10.641091 10.508528 10.836564 1001_at 6.503407 6.940207 7.776485 6.744207 8.393132 7.994422 7.417291 8.383466 8.278285 8.476460 7.702632 7.811951 6.955951 8.490921 6.632979 7.751188 1002_f_at 6.682602 6.320622 NA 7.503875 5.969647 NA 5.394164 6.293754 7.140539 5.791176 5.493847 8.379308 8.163210 6.900236 6.384235 6.620342 1003_s_at 8.113777 7.298421 NA NA NA NA NA NA NA NA NA 8.243218 NA NA NA NA 1004_at 7.133844 7.052989 6.986067 NA NA NA NA NA 6.712877 7.176983 NA NA 7.252336 NA NA NA 1005_at 8.600065 13.149781 8.636922 8.862644 11.790418 6.276165 10.805382 6.908298 12.894008 10.353165 8.762901 8.135442 NA 9.235085 NA 10.925639 > Many many thanks Jing OHSU [[alternative HTML version deleted]]
• 1.1k views
ADD COMMENT
0
Entering edit mode
Tim Triche ★ 4.2k
@tim-triche-3561
Last seen 3.6 years ago
United States
R> eset.vsn[ which(rowSumsis.na(exprs(eset.vsn)))<(0.5*dim(eset.vsn)[1])), ] On Mon, Apr 23, 2012 at 3:55 PM, Jing Huang <huangji@ohsu.edu> wrote: > Hi Expert! > > I am trying to get rid of the features that contain more than half of > samples with NA data. Could you help me? > > Here is the example. > > > > head(exprs(eset.vsn)) > > GSM48598 GSM48617 GSM48600 GSM48601 GSM48602 GSM48604 > GSM48607 GSM48608 GSM48614 GSM48616 GSM48599 GSM48603 GSM48605 > GSM48606 GSM48609 GSM48615 > > 1000_at 11.324097 10.881484 11.004342 10.649591 11.196320 11.405016 > 11.159627 11.144816 11.280698 11.008774 11.158076 10.083978 11.024338 > 10.641091 10.508528 10.836564 > > 1001_at 6.503407 6.940207 7.776485 6.744207 8.393132 7.994422 > 7.417291 8.383466 8.278285 8.476460 7.702632 7.811951 6.955951 > 8.490921 6.632979 7.751188 > > 1002_f_at 6.682602 6.320622 NA 7.503875 5.969647 NA > 5.394164 6.293754 7.140539 5.791176 5.493847 8.379308 8.163210 > 6.900236 6.384235 6.620342 > > 1003_s_at 8.113777 7.298421 NA NA NA NA > NA NA NA NA NA 8.243218 NA NA > NA NA > > 1004_at 7.133844 7.052989 6.986067 NA NA NA > NA NA 6.712877 7.176983 NA NA 7.252336 NA > NA NA > > 1005_at 8.600065 13.149781 8.636922 8.862644 11.790418 6.276165 > 10.805382 6.908298 12.894008 10.353165 8.762901 8.135442 NA > 9.235085 NA 10.925639 > > > > > > > Many many thanks > > > Jing > > > OHSU > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 14 months ago
United States
Hi, On Mon, Apr 23, 2012 at 6:55 PM, Jing Huang <huangji at="" ohsu.edu=""> wrote: > Hi Expert! > > I am trying to get rid of the features that contain more than half of samples with NA data. Could you help me? > > Here is the example. > > >> head(exprs(eset.vsn)) How about: R> ae <- exprs(eset.vsn) R> good <- ae[rowSumsis.na(ae)) / ncol(ae) < 0.5, ] HTH, -steve > > ? ? ? ? ? GSM48598 ?GSM48617 ?GSM48600 ?GSM48601 ?GSM48602 ?GSM48604 ?GSM48607 ?GSM48608 ?GSM48614 ?GSM48616 ?GSM48599 ?GSM48603 ?GSM48605 ?GSM48606 ?GSM48609 ?GSM48615 > > 1000_at ? 11.324097 10.881484 11.004342 10.649591 11.196320 11.405016 11.159627 11.144816 11.280698 11.008774 11.158076 10.083978 11.024338 10.641091 10.508528 10.836564 > > 1001_at ? ?6.503407 ?6.940207 ?7.776485 ?6.744207 ?8.393132 ?7.994422 ?7.417291 ?8.383466 ?8.278285 ?8.476460 ?7.702632 ?7.811951 ?6.955951 ?8.490921 ?6.632979 ?7.751188 > > 1002_f_at ?6.682602 ?6.320622 ? ? ? ?NA ?7.503875 ?5.969647 ? ? ? ?NA ?5.394164 ?6.293754 ?7.140539 ?5.791176 ?5.493847 ?8.379308 ?8.163210 ?6.900236 ?6.384235 ?6.620342 > > 1003_s_at ?8.113777 ?7.298421 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ?8.243218 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > > 1004_at ? ?7.133844 ?7.052989 ?6.986067 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA ?6.712877 ?7.176983 ? ? ? ?NA ? ? ? ?NA ?7.252336 ? ? ? ?NA ? ? ? ?NA ? ? ? ?NA > > 1005_at ? ?8.600065 13.149781 ?8.636922 ?8.862644 11.790418 ?6.276165 10.805382 ?6.908298 12.894008 10.353165 ?8.762901 ?8.135442 ? ? ? ?NA ?9.235085 ? ? ? ?NA 10.925639 > >> > > > > Many many thanks > > > Jing > > > OHSU > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD COMMENT
0
Entering edit mode
Oops, you are right, I meant ncol(eset.vsn) a.k.a. dim(eset.vsn)[2] in the above. Not sure how the [2] became a [1] en route from R to Gmail. derp On Mon, Apr 23, 2012 at 4:08 PM, Steve Lianoglou < mailinglist.honeypot@gmail.com> wrote: > Hi, > > On Mon, Apr 23, 2012 at 6:55 PM, Jing Huang <huangji@ohsu.edu> wrote: > > Hi Expert! > > > > I am trying to get rid of the features that contain more than half of > samples with NA data. Could you help me? > > > > Here is the example. > > > > > >> head(exprs(eset.vsn)) > > How about: > > R> ae <- exprs(eset.vsn) > R> good <- ae[rowSumsis.na(ae)) / ncol(ae) < 0.5, ] > > HTH, > -steve > > > > > GSM48598 GSM48617 GSM48600 GSM48601 GSM48602 GSM48604 > GSM48607 GSM48608 GSM48614 GSM48616 GSM48599 GSM48603 GSM48605 > GSM48606 GSM48609 GSM48615 > > > > 1000_at 11.324097 10.881484 11.004342 10.649591 11.196320 11.405016 > 11.159627 11.144816 11.280698 11.008774 11.158076 10.083978 11.024338 > 10.641091 10.508528 10.836564 > > > > 1001_at 6.503407 6.940207 7.776485 6.744207 8.393132 7.994422 > 7.417291 8.383466 8.278285 8.476460 7.702632 7.811951 6.955951 > 8.490921 6.632979 7.751188 > > > > 1002_f_at 6.682602 6.320622 NA 7.503875 5.969647 NA > 5.394164 6.293754 7.140539 5.791176 5.493847 8.379308 8.163210 > 6.900236 6.384235 6.620342 > > > > 1003_s_at 8.113777 7.298421 NA NA NA NA > NA NA NA NA NA 8.243218 NA > NA NA NA > > > > 1004_at 7.133844 7.052989 6.986067 NA NA NA > NA NA 6.712877 7.176983 NA NA 7.252336 > NA NA NA > > > > 1005_at 8.600065 13.149781 8.636922 8.862644 11.790418 6.276165 > 10.805382 6.908298 12.894008 10.353165 8.762901 8.135442 NA > 9.235085 NA 10.925639 > > > >> > > > > > > > > Many many thanks > > > > > > Jing > > > > > > OHSU > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Many THANK!! Works great. Jing From: "Tim Triche, Jr." <tim.triche@gmail.com<mailto:tim.triche@gmail.com>> Reply-To: "ttriche@usc.edu<mailto:ttriche@usc.edu>" <ttriche@usc.edu<mailto:ttriche@usc.edu>> Date: Mon, 23 Apr 2012 16:49:05 -0700 To: Steve Lianoglou <mailinglist.honeypot@gmail.com<mailto:mailinglist .honeypot@gmail.com="">> Cc: Jing Huang <huangji@ohsu.edu<mailto:huangji@ohsu.edu>>, "bioconductor@r-project.org<mailto:bioconductor@r-project.org>" <bioconductor@r-project.org<mailto:bioconductor@r-project.org>> Subject: Re: [BioC] basic R question Oops, you are right, I meant ncol(eset.vsn) a.k.a. dim(eset.vsn)[2] in the above. Not sure how the [2] became a [1] en route from R to Gmail. derp On Mon, Apr 23, 2012 at 4:08 PM, Steve Lianoglou <mailinglist.honeypot @gmail.com<mailto:mailinglist.honeypot@gmail.com="">> wrote: Hi, On Mon, Apr 23, 2012 at 6:55 PM, Jing Huang <huangji@ohsu.edu<mailto:huangji@ohsu.edu>> wrote: > Hi Expert! > > I am trying to get rid of the features that contain more than half of samples with NA data. Could you help me? > > Here is the example. > > >> head(exprs(eset.vsn)) How about: R> ae <- exprs(eset.vsn) R> good <- ae[rowSumsis.na<http: is.na="">(ae)) / ncol(ae) < 0.5, ] HTH, -steve > > GSM48598 GSM48617 GSM48600 GSM48601 GSM48602 GSM48604 GSM48607 GSM48608 GSM48614 GSM48616 GSM48599 GSM48603 GSM48605 GSM48606 GSM48609 GSM48615 > > 1000_at 11.324097 10.881484 11.004342 10.649591 11.196320 11.405016 11.159627 11.144816 11.280698 11.008774 11.158076 10.083978 11.024338 10.641091 10.508528 10.836564 > > 1001_at 6.503407 6.940207 7.776485 6.744207 8.393132 7.994422 7.417291 8.383466 8.278285 8.476460 7.702632 7.811951 6.955951 8.490921 6.632979 7.751188 > > 1002_f_at 6.682602 6.320622 NA 7.503875 5.969647 NA 5.394164 6.293754 7.140539 5.791176 5.493847 8.379308 8.163210 6.900236 6.384235 6.620342 > > 1003_s_at 8.113777 7.298421 NA NA NA NA NA NA NA NA NA 8.243218 NA NA NA NA > > 1004_at 7.133844 7.052989 6.986067 NA NA NA NA NA 6.712877 7.176983 NA NA 7.252336 NA NA NA > > 1005_at 8.600065 13.149781 8.636922 8.862644 11.790418 6.276165 10.805382 6.908298 12.894008 10.353165 8.762901 8.135442 NA 9.235085 NA 10.925639 > >> > > > > Many many thanks > > > Jing > > > OHSU > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org<mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- A model is a lie that helps you see the truth. Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 478 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6