Bioconductor Digest, Vol 115, Issue 22
1
0
Entering edit mode
@weng-khong-lim-5513
Last seen 9.6 years ago
Help "bioconductor-request@r-project.org" <bioconductor- request@r-project.org=""> wrote: >Send Bioconductor mailing list submissions to > bioconductor@r-project.org > >To subscribe or unsubscribe via the World Wide Web, visit > https://stat.ethz.ch/mailman/listinfo/bioconductor >or, via email, send a message with subject or body 'help' to > bioconductor-request@r-project.org > >You can reach the person managing the list at > bioconductor-owner@r-project.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of Bioconductor digest..." > > >Today's Topics: > > 1. Open-rank faculty position, Dept of Biostatistics, Virginia > Commonwealth University (Kellie J Archer/FS/VCU) > 2. Re: question on easyRNASeq developer version (Yanju Zhang) > 3. Re: question on easyRNASeq developer version (Nicolas Delhomme) > 4. Re: question on easyRNASeq developer version (Yanju Zhang) > 5. Re: Error of GTF Annotation in easyRNASeq (Nicolas Delhomme) > 6. Feature request in readVcf (Sean Davis) > 7. Re: Feature request in readVcf (Tim Triche, Jr.) > 8. Re: GO annotation (Marc Carlson) > 9. Re: GO annotation (Srinivasan, Sathish K) > 10. Is normalization in edgeR required for small RNA sequencing > data? (Daniela Lopes Paim Pinto) > 11. NGS public data analysis (Jill Pleasance) > 12. Re: GO annotation (KJ Lim) > 13. Analysis of public GEO datasets - NGS (Jill [guest]) > 14. Re: Is normalization in edgeR required for small RNA > sequencing data? (Mark Robinson) > 15. Euro Bioc Devel 2012 Zurich CH -- Dec 13-14 2012 -- > registration open (Mark Robinson) > > >--------------------------------------------------------------------- - > >Message: 1 >Date: Fri, 21 Sep 2012 09:45:27 -0400 >From: Kellie J Archer/FS/VCU <kjarcher@vcu.edu> >To: bioconductor@r-project.org >Subject: [BioC] Open-rank faculty position, Dept of Biostatistics, > Virginia Commonwealth University >Message-ID: > <of6c640745.b02aba54-on85257a80.004b92b2-85257a80.004b92c1@vcu.edu> >Content-Type: text/plain; charset="ISO-8859-1" > > >The Department of Biostatistics at Virginia Commonwealth University >(VCU) is >seeking to fill a tenured/tenure-eligible faculty position at the level >of >assistant, associate, or full professor. We are seeking applicants with >training and research interest in the design and statistical analysis >of >high-throughput genomic data (e.g., next generation sequencing, >microarray, >proteomic technologies), bioinformatics, computational biology, or >closely >related area. Additionally, applicants should have collaborative >research >experience. Primary responsibilities include teaching and advising >graduate > students as well as conducting independent methodological research. In >addition, the successful applicant will be expected to collaborate with >other VCU investigators in related fields in obtaining extramural grant > support. > >The Department of Biostatistics has a 40+ year history in the VCU >School of >Medicine and is committed to excellence in both biostatistical research >and >graduate education. The department offers both M.S. and Ph.D. programs >in >Biostatistics, including a concentration in Genomic Biostatistics, a >M.S. >in Clinical Research in Biostatistics, and a Master of Public Health. >Our > biostatistics faculty, students, and staff collaborate with clinical >investigators on the Medical College of Virginia Campus (which includes >the >Schools of Medicine, Dentistry, Pharmacy, Nursing, and Allied Health) >in a >wide variety of biomedical research projects. Located in Richmond, >Virginia, >VCU has established relationships with the Virginia Department of >Health as > well as local and regional health departments. > > Qualifications: For all levels, candidates should have a Ph.D. in >biostatistics, statistics or related field, demonstrated experience in >the >analyses of high-throughput genomic or proteomic data, familiarity with >statistical programming environments for analyzing such data, and >excellent > oral and written communication skills. > > By Level of Appointment: > > Full Professor: Applicants should have an established track record > publishing in peer-reviewed journals, have national or international >prominence in their area of expertise, and have demonstrated experience > obtaining extramural research support. > >Associate Professor: Applicants should have an established track record > publishing in peer-reviewed journals and have demonstrated experience > obtaining extramural research support. > > Assistant Professor: Applicants should have at least two years of >experience beyond completion of their degree program and must >demonstrate > excellent oral and written communication skills. > >All candidates should have demonstrated experience working in and >fostering >a diverse faculty, staff, and student environment or commitment to do >so as >a faculty member at VCU. Potential candidates can submit >applications, >including a statement of research, teaching philosophy, curriculum >vitae and >contact information for three professional references, via mail ??? to >Yvonne >Hargrove, Department of Biostatistics, Virginia Commonwealth >University, >P.O. Box 980032, Richmond, VA 23298-0032 ??? or by e-mail >to > yfhargro@vcu.edu. > >Virginia Commonwealth University is an equal opportunity/affirmative >action >employer. Women, minorities and persons with disabilities are >encouraged to > apply. > Kellie J. Archer, Ph.D. > Associate Professor, Department of Biostatistics > Director, VCU Massey Cancer Center Biostatistics Shared Resource > Virginia Commonwealth University > 830 East Main St., 718 > Richmond, VA 23298-0032 > phone: (804) 827-2039 > fax: (804) 828-8900 > e-mail: kjarcher@vcu.edu > website: www.people.vcu.edu/~kjarcher > > >------------------------------ > >Message: 2 >Date: Fri, 21 Sep 2012 16:32:58 +0200 >From: Yanju Zhang <hollandorange.yanju@gmail.com> >To: Nicolas Delhomme <delhomme@embl.de> >Cc: bioconductor@r-project.org >Subject: Re: [BioC] question on easyRNASeq developer version >Message-ID: > <cabnzwf6nfm0eqs_ht3n5mcqcn=mbfkt+gutycobsm2gwyaycow@mail.gmail.com> >Content-Type: text/plain > >Hi Nico >As mentioned in SEQAnswers, I also encountered this problem: > >> "Error in mk_singleBracketReplacementValue(x, value) : >> 'value' must be a CompressedIntegerList object" > >In my bam files, the reads are with different length. > >I am expecting the solution. If you need more information, please let >me know. > >Best wishes >Yanju > > [[alternative HTML version deleted]] > > > >------------------------------ > >Message: 3 >Date: Fri, 21 Sep 2012 16:37:13 +0200 >From: Nicolas Delhomme <delhomme@embl.de> >To: Yanju Zhang <hollandorange.yanju@gmail.com> >Cc: bioconductor@r-project.org >Subject: Re: [BioC] question on easyRNASeq developer version >Message-ID: <aafb721c-86af-49bf-acf8-47ae5dba320d@embl.de> >Content-Type: text/plain; charset=us-ascii > >Hi Yanju, > >Would you be OK with uploading the file that creates the problem on my >dropbox? If that's OK, I'll send you a link to it. That would be best >for me to reproduce the error. > >Cheers, > >Nico > >--------------------------------------------------------------- >Nicolas Delhomme > >Genome Biology Computational Support > >European Molecular Biology Laboratory > >Tel: +49 6221 387 8310 >Email: nicolas.delhomme@embl.de >Meyerhofstrasse 1 - Postfach 10.2209 >69102 Heidelberg, Germany >--------------------------------------------------------------- > > > > > >On Sep 21, 2012, at 4:32 PM, Yanju Zhang wrote: > >> Hi Nico >> As mentioned in SEQAnswers, I also encountered this problem: >> > "Error in mk_singleBracketReplacementValue(x, value) : >> > 'value' must be a CompressedIntegerList object" >> >> In my bam files, the reads are with different length. >> >> I am expecting the solution. If you need more information, please let >me know. >> >> >> Best wishes >> Yanju >> >> > > > >------------------------------ > >Message: 4 >Date: Fri, 21 Sep 2012 17:54:21 +0200 >From: Yanju Zhang <hollandorange.yanju@gmail.com> >To: Nicolas Delhomme <delhomme@embl.de> >Cc: bioconductor@r-project.org >Subject: Re: [BioC] question on easyRNASeq developer version >Message-ID: > <cabnzwf45nimkwg3guqjwbyqqqhxbw7qmpd4dx_s9wxjh13vo2w@mail.gmail.com> >Content-Type: text/plain > >Hi Nico, > >It is fine with me to upload my bam file. Please give me the link. > >Best wishes >Yanju > >Code + error + sessionInfo >> chr.sizes=as.list(seqlengths(Hsapiens)) >> bamfiles=dir(getwd(),pattern="*.sorted.bam$") >> RNASeq<- easyRNASeq(filesDirectory=getwd(), >+ organism="Hsapiens", >+ chr.sizes=chr.sizes, >+ #readLength=80L, >+ annotationMethod="biomaRt", >+ format="bam", >+ count="genes", >+ summarization="geneModels", >+ filenames=bamfiles[1], >+ outputFormat="RNAseq" >+ ) > > > >Checking arguments... >Fetching annotations... >Computing gene models... >Summarizing counts... >Processing test.sorted.bam >Updating the read length information. >The reads have been trimmed. >Minimum length of 50 bp. >Maximum length of 80 bp. >Error in mk_singleBracketReplacementValue(x, value) : > 'value' must be a CompressedIntegerList object >In addition: Warning messages: >1: The use of the list for providing chromosome sizes has been >deprecated. >Use a named numeric vector instead. >2: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", >chr.sizes >= chr.sizes, : >There are 16696 synthetic exons as determined from your annotation that >overlap! This implies that some reads will be counted more than once! >Is >that really what you want? >3: In fetchCoverage(rnaSeq, format = format, filename = filename, >filter = >filter, : >You enforce UCSC chromosome conventions, however the provided >alignments >are not compliant. Correcting it. >4: In fetchCoverage(rnaSeq, format = format, filename = filename, >filter = >filter, : >Not all the chromosome names in your chromosome size list 'chr.sizes' >are >present in your read file(s) (aln or bam). >5: In fetchCoverage(rnaSeq, format = format, filename = filename, >filter = >filter, : > The available chromosomes in both your read file(s) (aln or bam) and >'chr.sizes' list were restricted to their common term. >These are: chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, >chr17, >chr18, chr19, chr2, chr20, chr21, chr22, chr3, chr4, chr5, chr6, chr7, >chr8, chr9, chrM, chrX, chrY. > >> sessionInfo() >R version 2.15.1 (2012-06-22) >Platform: x86_64-unknown-linux-gnu (64-bit) > >locale: >[1] C > >attached base packages: >[1] parallel stats graphics grDevices utils datasets methods >[8] base > >other attached packages: > [1] BSgenome.Hsapiens.UCSC.hg19_1.3.19 easyRNASeq_1.3.14 > [3] ShortRead_1.15.11 latticeExtra_0.6-24 > [5] RColorBrewer_1.0-5 Rsamtools_1.9.30 > [7] DESeq_1.9.14 lattice_0.20-6 > [9] locfit_1.5-8 BSgenome_1.25.8 >[11] GenomicRanges_1.9.65 Biostrings_2.25.12 >[13] IRanges_1.15.44 edgeR_2.99.8 >[15] limma_3.12.1 biomaRt_2.13.2 >[17] Biobase_2.17.7 genomeIntervals_1.13.3 >[19] BiocGenerics_0.3.1 intervals_0.13.3 > >loaded via a namespace (and not attached): > [1] AnnotationDbi_1.18.1 DBI_0.2-5 RCurl_1.91-1 > [4] RSQLite_0.11.1 XML_3.9-4 annotate_1.34.1 > [7] bitops_1.0-4.1 genefilter_1.38.0 geneplotter_1.35.1 >[10] grid_2.15.1 hwriter_1.3 splines_2.15.1 >[13] stats4_2.15.1 survival_2.36-14 xtable_1.7-0 >[16] zlibbioc_1.2.0 > > > > >On 21 September 2012 16:37, Nicolas Delhomme <delhomme@embl.de> wrote: > >> Hi Yanju, >> >> Would you be OK with uploading the file that creates the problem on >my >> dropbox? If that's OK, I'll send you a link to it. That would be best >for >> me to reproduce the error. >> >> Cheers, >> >> Nico >> >> --------------------------------------------------------------- >> Nicolas Delhomme >> >> Genome Biology Computational Support >> >> European Molecular Biology Laboratory >> >> Tel: +49 6221 387 8310 >> Email: nicolas.delhomme@embl.de >> Meyerhofstrasse 1 - Postfach 10.2209 >> 69102 Heidelberg, Germany >> --------------------------------------------------------------- >> >> >> >> >> >> On Sep 21, 2012, at 4:32 PM, Yanju Zhang wrote: >> >> > Hi Nico >> > As mentioned in SEQAnswers, I also encountered this problem: >> > > "Error in mk_singleBracketReplacementValue(x, value) : >> > > 'value' must be a CompressedIntegerList object" >> > >> > In my bam files, the reads are with different length. >> > >> > I am expecting the solution. If you need more information, please >let me >> know. >> > >> > >> > Best wishes >> > Yanju >> > >> > >> >> > > [[alternative HTML version deleted]] > > > >------------------------------ > >Message: 5 >Date: Fri, 21 Sep 2012 17:59:35 +0200 >From: Nicolas Delhomme <delhomme@embl.de> >To: Nicolas Delhomme <delhomme@embl.de> >Cc: Dadi Gao <dgao3450@uni.sydney.edu.au>, bioconductor@r-project.org >Subject: Re: [BioC] Error of GTF Annotation in easyRNASeq >Message-ID: <e8f88d62-00ca-45b9-97b9-6b8dfa8cc0a7@embl.de> >Content-Type: text/plain; charset=us-ascii > >Hi Dadi, > >The error comes from a change of API that affects a package I depend >upon. I've contacted the maintainer and will let you know once it gets >fixed. I might take some time though (~ 1 week). > >Cheers, > >Nico > >--------------------------------------------------------------- >Nicolas Delhomme > >Genome Biology Computational Support > >European Molecular Biology Laboratory > >Tel: +49 6221 387 8310 >Email: nicolas.delhomme@embl.de >Meyerhofstrasse 1 - Postfach 10.2209 >69102 Heidelberg, Germany >--------------------------------------------------------------- > > > > > >On Sep 21, 2012, at 10:26 AM, Nicolas Delhomme wrote: > >> Moreover, to make sure that this is not a package conflict can you >please NOT load the library(RnaSeqTutorial). You do not need it to run >easyRNASeq. So your script should read: >> >> library(easyRNASeq) >> library(BSgenome.Mmusculus.UCSC.mm9) >> >> setwd("/home/gao/RNA") >> >> ## the "." is your current directory. >> count.table <- easyRNASeq(".", >> pattern=".sorted.bam$", >> organism="MMusculus", >> annotationMethod="gtf", >> annotationFile="mm9gene.gtf", >> count="genes", >> summarization="geneModels", >> normalize=TRUE >> ) >> >> >> Cheers, >> >> Nico >> >> --------------------------------------------------------------- >> Nicolas Delhomme >> >> Genome Biology Computational Support >> >> European Molecular Biology Laboratory >> >> Tel: +49 6221 387 8310 >> Email: nicolas.delhomme@embl.de >> Meyerhofstrasse 1 - Postfach 10.2209 >> 69102 Heidelberg, Germany >> --------------------------------------------------------------- >> >> >> >> >> >> On Sep 21, 2012, at 10:19 AM, Nicolas Delhomme wrote: >> >>> Dear Dadi, >>> >>> I will need a little more information from you. In addition, it's >best if you post such emails to the bioconductor mailing list (which >I've Cced, so please "answer to all" when you reply.). See there for >subscribing: http://www.bioconductor.org/help/mailing-list/. What I >need to know from you first is what is described in that page: >http://www.bioconductor.org/help/mailing-list/posting-guide/ mainly >under the sections preparing and composing. In essence I need to know >what version of R and bioconductor packages you are using. >>> >>> Then, installing you package in the installation directory of an >existing package is not the safest. You might a) disrupt that package >functionality b) possibly lose your data if that package gets updated. >You'd rather move your RNA folder to you home directory and use that >directory, e.g. /home/gao/RNA instead. Using the setwd command, you can >make that your current working dir. >>> >>> So the following two blocks results in the same: >>> >>> setwd("/home/gao/RNA") >>> >>> ## the "." is your current directory. >>> count.table <- easyRNASeq(".", >>> pattern=".sorted.bam$", >>> organism="MMusculus", >>> annotationMethod="gtf", >>> annotationFile="mm9gene.gtf", >>> count="genes", >>> summarization="geneModels", >>> normalize=TRUE >>> ) >>> >>> Or: >>> >>> count.table <- easyRNASeq("/home/gao/RNA", >>> pattern=".sorted.bam$", >>> organism="MMusculus", >>> annotationMethod="gtf", >>> annotationFile="/home/gao/RNA /mm9gene.gtf", >>> count="genes", >>> summarization="geneModels", >>> normalize=TRUE >>> ) >>> >>> Now, for the error, can you please tell me more about what aligner >you used for you data , whether it is Paired-End or not and finally >whether the reads have been dynamically trimmed (i.e. if reads of >variable length are expected ) or not? >>> >>> What actually bothers me in your error is that it mentions: >>> >>> easyRNASeq(system.file("miRNA", package = "RnaSeqTutorial"), >>> >>> instead of >>> >>> easyRNASeq(system.file("RNA", package="RnaSeqTutorial"), >>> >>> i.e. miRNA instead of RNA. So to make sure that the error is >reproducible can you move your RNA folder to a different directory and >re-run the command as above? I don't expect this to solve the error >though, but at least we'd have a "cleaner" setup for reproducing it. >>> >>> Best, >>> >>> Nico >>> >>> --------------------------------------------------------------- >>> Nicolas Delhomme >>> >>> Genome Biology Computational Support >>> >>> European Molecular Biology Laboratory >>> >>> Tel: +49 6221 387 8310 >>> Email: nicolas.delhomme@embl.de >>> Meyerhofstrasse 1 - Postfach 10.2209 >>> 69102 Heidelberg, Germany >>> --------------------------------------------------------------- >>> >>> >>> >>> >>> >>> On Sep 21, 2012, at 2:41 AM, Dadi Gao wrote: >>> >>>> Dear Dr. Delhomme, >>>> >>>> I'm currently study gene expression pattern from deep sequencing >data of mouse blood cell using easyRNASeq. >>>> I created a folder called "RNA" under R package RnaSeqTutorial >path. >>>> Within this folder, I put 3 RNA-seq data files called >"N1.sorted.bam", "N2.sorted.bam" and "N3.sorted.bam", with their bam >index files. >>>> It also contains a GTF file for mouse gene annotation downloaded >from UCSC, called "mm9gene.gtf". >>>> >>>> I'm using the following code to normalize the gene expression: >>>> >>>> library(easyRNASeq) >>>> library(RnaSeqTutorial) >>>> library(BSgenome.Mmusculus.UCSC.mm9) >>>> >>>> count.table <- easyRNASeq(system.file("RNA", >package="RnaSeqTutorial"), >>>> pattern=".sorted.bam$", >>>> organism="MMusculus", >>>> annotationMethod="gtf", >>>> annotationFile=system.file("RNA", "mm9gene.gtf", >package="RnaSeqTutorial"), >>>> count="genes", >>>> summarization="geneModels", >>>> normalize=TRUE >>>> ) >>>> >>>> But this runs with an error as: >>>> >>>> Checking arguments... >>>> Fetching annotations... >>>> Read 962651 records >>>> Warning message: >>>> In easyRNASeq(system.file("miRNA", package = "RnaSeqTutorial"), : >>>> You enforce UCSC chromosome conventions, however the provided >chromosome size list is not compliant. Correcting it. >>>> Error in all.annotation[all.annotation$type %in% annotation.type, ] >: >>>> error in evaluating the argument 'i' in selecting a method for >function '[': Error in all.annotation$type %in% annotation.type : >>>> error in evaluating the argument 'x' in selecting a method for >function '%in%': Error in function (classes, fdef, mtable) : >>>> unable to find an inherited method for function "annotation", for >signature "Genome_intervals_stranded" >>>> >>>> Did I do something wrong? >>>> >>>> Sincerely yours, >>>> Dadi Gao >>>> >>>> Bioinformatics Group >>>> Centenary Institute >>>> Building 93, Royal Prince Alfred Hospital >>>> Missenden Rd, Camperdown, NSW 2050 >>>> Australia >>>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > >------------------------------ > >Message: 6 >Date: Fri, 21 Sep 2012 13:51:54 -0400 >From: Sean Davis <sdavis2@mail.nih.gov> >To: bioconductor@r-project.org >Subject: [BioC] Feature request in readVcf >Message-ID: > <caneavbmmjpjryjx8axvcdsd1xw4qbajd=s7b3oaep_ypr- mxwa@mail.gmail.com=""> >Content-Type: text/plain > >Hi, Val. > >Is there in interest in simply ignoring unknown INFO and GENOTYPE >fields >when parsing VCF files, perhaps by issuing a warning instead of an >error? >There are LOTS of malformed VCF files out there. In some cases, they >are >not useable, but in this case, they can be perfectly useable if these >unknown fields are simply ignored. > >> dat = readVcf('tmp.gatk.vcf',genome='hg19') >Error: scanVcf: record 22 INFO 'KGPilot123' not found > path: >/Volumes/CCRBioinfo/projects/RosenbergImmuneStudy/staging/tmp.gatk.vc f > >Thanks, >Sean > > [[alternative HTML version deleted]] > > > >------------------------------ > >Message: 7 >Date: Fri, 21 Sep 2012 13:50:29 -0700 >From: "Tim Triche, Jr." <tim.triche@gmail.com> >To: Sean Davis <sdavis2@mail.nih.gov> >Cc: bioconductor@r-project.org >Subject: Re: [BioC] Feature request in readVcf >Message-ID: > <cac+n9bvynua_smpcsfwwxmmu3ovzqnc314bq6+c=eyeozmgpmw@mail.gmail.com> >Content-Type: text/plain > >+1 > >thanks, > >--t > > > >On Fri, Sep 21, 2012 at 10:51 AM, Sean Davis <sdavis2@mail.nih.gov> >wrote: > >> Hi, Val. >> >> Is there in interest in simply ignoring unknown INFO and GENOTYPE >fields >> when parsing VCF files, perhaps by issuing a warning instead of an >error? >> There are LOTS of malformed VCF files out there. In some cases, >they are >> not useable, but in this case, they can be perfectly useable if these >> unknown fields are simply ignored. >> >> > dat = readVcf('tmp.gatk.vcf',genome='hg19') >> Error: scanVcf: record 22 INFO 'KGPilot123' not found >> path: >> >/Volumes/CCRBioinfo/projects/RosenbergImmuneStudy/staging/tmp.gatk.vc f >> >> Thanks, >> Sean >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > >-- >*A model is a lie that helps you see the truth.* >* >* >Howard >Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> > > [[alternative HTML version deleted]] > > > >------------------------------ > >Message: 8 >Date: Fri, 21 Sep 2012 17:49:26 -0700 >From: Marc Carlson <mcarlson@fhcrc.org> >To: bioconductor@r-project.org >Subject: Re: [BioC] GO annotation >Message-ID: <505D0B16.80804@fhcrc.org> >Content-Type: text/plain; charset=ISO-8859-1; format=flowed > >Hi Lim, > >First of all it all depends on what you have for gene identifiers. If >you are like most people you will have entrez gene IDs. So for now I >will assume you have those. > >## So lets further assume you are working with humans and just choose >the 1st two entrez gene IDs so that we can make a (hopefully >meaningful) >example >ids = c("1","2") >## now load the org library for humans >library(org.Hs.eg.db) >## then you can call select to extract your GO IDs like this: >select(org.Hs.eg.db, keys = ids, cols = "GO", keytype = "ENTREZID") > >Now one thing to notice is that if you have some other kind of >identifier, then your keytype argument will have to be set to a >different value. And hopefully, the kind of ID you are using, is >present in the package that you have to search... See the manual page >for select for more information. > >?select > > From your question, I also recognize that you may not be able to do >this because it sounds like you might be using a more unusual organism >and not be using something commonplace like human. Well don't give up >just yet, because we may be able to help you there too. You can look >at >the manual page for the function makeOrgPackageFromNCBI to learn how >you >can try to generate an org package from just the taxonomy ID (which you >can look up on NCBIs website). If the data is available at NCBI, then >you should be able to generate a package from NCBI that will match your >organism of choice. > >?makeOrgPackageFromNCBI > > >Does that answer your question? > > > Marc > > > >On 09/21/2012 02:35 AM, KJ Lim wrote: >> Dear Bioconductor community, >> >> Good day. >> >> I did the differential expression analysis for my RNA-Seq data with >edgeR >> package. I have a list of differentially expressed genes now and I >would >> like to find the GO terms for the genes. >> >> I have been reading and searching around for the right package. But, >I >> found that several packages are developed based on model species. >Could the >> community kindly please suggest me what GO annotation package I can >use for >> non-model species; plant RNA-Seq data? >> >> Thank you very much and have a nice weekend. >> >> Best regards, >> KJ Lim >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor > > > >------------------------------ > >Message: 9 >Date: Sat, 22 Sep 2012 01:21:29 -0400 >From: "Srinivasan, Sathish K" <ssrinivasan@med.miami.edu> >To: Marc Carlson <mcarlson@fhcrc.org>, "bioconductor@r-project.org" > <bioconductor@r-project.org> >Subject: Re: [BioC] GO annotation >Message-ID: ><4EB41664F8279A4CA870A740E038CFEA3E1E7EB0CB@MEDEXMB05.ad.med.miami.ed u> > >Content-Type: text/plain; charset="us-ascii" > >Hi Marc, >Could you suggest a go-to literature reference on annotating genomic >data using bioconductor packages, probably a book or any of its kind. >Thanks > >~Sathish > >-----Original Message----- >From: bioconductor-bounces@r-project.org >[mailto:bioconductor-bounces@r-project.org] On Behalf Of Marc Carlson >Sent: Friday, September 21, 2012 8:49 PM >To: bioconductor@r-project.org >Subject: Re: [BioC] GO annotation > >Hi Lim, > >First of all it all depends on what you have for gene identifiers. If >you are like most people you will have entrez gene IDs. So for now I >will assume you have those. > >## So lets further assume you are working with humans and just choose >the 1st two entrez gene IDs so that we can make a (hopefully >meaningful) example ids = c("1","2") ## now load the org library for >humans >library(org.Hs.eg.db) >## then you can call select to extract your GO IDs like this: >select(org.Hs.eg.db, keys = ids, cols = "GO", keytype = "ENTREZID") > >Now one thing to notice is that if you have some other kind of >identifier, then your keytype argument will have to be set to a >different value. And hopefully, the kind of ID you are using, is >present in the package that you have to search... See the manual page >for select for more information. > >?select > >From your question, I also recognize that you may not be able to do >this because it sounds like you might be using a more unusual organism >and not be using something commonplace like human. Well don't give up >just yet, because we may be able to help you there too. You can look >at the manual page for the function makeOrgPackageFromNCBI to learn how >you can try to generate an org package from just the taxonomy ID (which >you can look up on NCBIs website). If the data is available at NCBI, >then you should be able to generate a package from NCBI that will match >your organism of choice. > >?makeOrgPackageFromNCBI > > >Does that answer your question? > > > Marc > > > >On 09/21/2012 02:35 AM, KJ Lim wrote: >> Dear Bioconductor community, >> >> Good day. >> >> I did the differential expression analysis for my RNA-Seq data with >> edgeR package. I have a list of differentially expressed genes now >and >> I would like to find the GO terms for the genes. >> >> I have been reading and searching around for the right package. But, >I >> found that several packages are developed based on model species. >> Could the community kindly please suggest me what GO annotation >> package I can use for non-model species; plant RNA-Seq data? >> >> Thank you very much and have a nice weekend. >> >> Best regards, >> KJ Lim >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >_______________________________________________ >Bioconductor mailing list >Bioconductor@r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor > > > >------------------------------ > >Message: 10 >Date: Sat, 22 Sep 2012 00:23:09 +0200 >From: Daniela Lopes Paim Pinto <d.lopespaimpinto@sssup.it> >To: bioconductor@r-project.org >Subject: [BioC] Is normalization in edgeR required for small RNA > sequencing data? >Message-ID: > <cahk- ra1aw+tcnkajrax7wfjychh22whn2sfwhat2wmu-="WWdsA@mail.gmail.com"> >Content-Type: text/plain > >Dear All, > >I am PhD student, currently working on differential expression analysis >of >my smallRNA library deep sequencing data and trying to identify >differentially expressed miRNAs, using edgeR package. I have 24 >different >samples with 2 biological replicates (48 libraries). I am performing >multiple group comparison using GLM method and also Anova-like test to >idetify DE miRNAs among the different groups of my samples. >My question is : > >Do I need to normalize my input data using *calcNormFactors() *once I >set >my DGE list or I could proceed without any normalization? I assume in >this >case that edgeR performs a default normallization when it is >"calculating >library sizes from column totals"? > > >I would really appreciate any suggestion on this! > > >Thanks in advance, > > >Daniela > > [[alternative HTML version deleted]] > > > >------------------------------ > >Message: 11 >Date: Sat, 22 Sep 2012 08:03:32 +0100 >From: Jill Pleasance <jpleasance@gmail.com> >To: bioconductor@r-project.org >Subject: [BioC] NGS public data analysis >Message-ID: > <cajhs7oj_ebg2vgbx=fpyvxxkbdq- tdzzmdbmsymnau="r6Gp6Lw@mail.gmail.com"> >Content-Type: text/plain > >Hi > > > >I am writing as I am trying to analyse NGS data from public data (GEO) >specifically datasets such as one sample per time point. The raw >(somewhat >processed data) is 3 samples at different time points where ?The read >count >at exon, splice-junction, transcript and gene levels were summarized >and >normalized to relative abundance in Fragments Per Kilobase of exon >model >per Million (FPKM) in order to compare transcription level among >samples.? > > > >The authors of this paper then used The differentially expressed >transcripts were identified using M-A based random sampling method >implemented in DEGseq package in BioConductor ( >http://bioconductor.org/packages/2.5/bioc/html/DEGseq.html). The >transcripts were further filtered at > 2-fold change and a minimum read >count of 50 in either condition. > > > >I have read through some of your posts where Gordon suggested using a >simple excel formula to achieve fold changes when you don?t have >replicates > >*lib.size1 <- sum(y1)* > >>>* lib.size2 <- sum(y2)* > >>>* logFC <- log2((y1+0.5)/(lib.size1+0.5)/(y2+0.5)*(lib.size2+0.5))* > >* * > >Is this something I could apply to the current analysis? I have 3 >files - >with gene ID and counts (one for each sample) and if genes are not >listed >in the sample files ? I assume the counts are zero. Would you have any >suggestions as to what to do with these zero count reads? > > >I am trying to avoid learning how to script write at the moment to see >if >this analysis works and obviously when -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. [[alternative HTML version deleted]]
Transcription Sequencing miRNA SmallRNA Annotation Normalization GO Cancer Organism edgeR • 1.8k views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 14 months ago
United States
On Saturday, September 22, 2012, Weng Khong LIM wrote: > Help With what? (Also, please don't include all of the contents if your digest emails when you send messages to the list) -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 567 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6