Bioconductor Digest, Vol 115, Issue 22

0

Entering edit mode

Weng Khong LIM ▴ 10

@weng-khong-lim-5513

Last seen 9.6 years ago

Help "bioconductor-request@r-project.org" <bioconductor- request@r-project.org=""> wrote: >Send Bioconductor mailing list submissions to > bioconductor@r-project.org > >To subscribe or unsubscribe via the World Wide Web, visit > https://stat.ethz.ch/mailman/listinfo/bioconductor >or, via email, send a message with subject or body 'help' to > bioconductor-request@r-project.org > >You can reach the person managing the list at > bioconductor-owner@r-project.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of Bioconductor digest..." > > >Today's Topics: > > 1. Open-rank faculty position, Dept of Biostatistics, Virginia > Commonwealth University (Kellie J Archer/FS/VCU) > 2. Re: question on easyRNASeq developer version (Yanju Zhang) > 3. Re: question on easyRNASeq developer version (Nicolas Delhomme) > 4. Re: question on easyRNASeq developer version (Yanju Zhang) > 5. Re: Error of GTF Annotation in easyRNASeq (Nicolas Delhomme) > 6. Feature request in readVcf (Sean Davis) > 7. Re: Feature request in readVcf (Tim Triche, Jr.) > 8. Re: GO annotation (Marc Carlson) > 9. Re: GO annotation (Srinivasan, Sathish K) > 10. Is normalization in edgeR required for small RNA sequencing > data? (Daniela Lopes Paim Pinto) > 11. NGS public data analysis (Jill Pleasance) > 12. Re: GO annotation (KJ Lim) > 13. Analysis of public GEO datasets - NGS (Jill [guest]) > 14. Re: Is normalization in edgeR required for small RNA > sequencing data? (Mark Robinson) > 15. Euro Bioc Devel 2012 Zurich CH -- Dec 13-14 2012 -- > registration open (Mark Robinson) > > >--------------------------------------------------------------------- - > >Message: 1 >Date: Fri, 21 Sep 2012 09:45:27 -0400 >From: Kellie J Archer/FS/VCU <kjarcher@vcu.edu> >To: bioconductor@r-project.org >Subject: [BioC] Open-rank faculty position, Dept of Biostatistics, > Virginia Commonwealth University >Message-ID: > <of6c640745.b02aba54-on85257a80.004b92b2-85257a80.004b92c1@vcu.edu> >Content-Type: text/plain; charset="ISO-8859-1" > > >The Department of Biostatistics at Virginia Commonwealth University >(VCU) is >seeking to fill a tenured/tenure-eligible faculty position at the level >of >assistant, associate, or full professor. We are seeking applicants with >training and research interest in the design and statistical analysis >of >high-throughput genomic data (e.g., next generation sequencing, >microarray, >proteomic technologies), bioinformatics, computational biology, or >closely >related area. Additionally, applicants should have collaborative >research >experience. Primary responsibilities include teaching and advising >graduate > students as well as conducting independent methodological research. In >addition, the successful applicant will be expected to collaborate with >other VCU investigators in related fields in obtaining extramural grant > support. > >The Department of Biostatistics has a 40+ year history in the VCU >School of >Medicine and is committed to excellence in both biostatistical research >and >graduate education. The department offers both M.S. and Ph.D. programs >in >Biostatistics, including a concentration in Genomic Biostatistics, a >M.S. >in Clinical Research in Biostatistics, and a Master of Public Health. >Our > biostatistics faculty, students, and staff collaborate with clinical >investigators on the Medical College of Virginia Campus (which includes >the >Schools of Medicine, Dentistry, Pharmacy, Nursing, and Allied Health) >in a >wide variety of biomedical research projects. Located in Richmond, >Virginia, >VCU has established relationships with the Virginia Department of >Health as > well as local and regional health departments. > > Qualifications: For all levels, candidates should have a Ph.D. in >biostatistics, statistics or related field, demonstrated experience in >the >analyses of high-throughput genomic or proteomic data, familiarity with >statistical programming environments for analyzing such data, and >excellent > oral and written communication skills. > > By Level of Appointment: > > Full Professor: Applicants should have an established track record > publishing in peer-reviewed journals, have national or international >prominence in their area of expertise, and have demonstrated experience > obtaining extramural research support. > >Associate Professor: Applicants should have an established track record > publishing in peer-reviewed journals and have demonstrated experience > obtaining extramural research support. > > Assistant Professor: Applicants should have at least two years of >experience beyond completion of their degree program and must >demonstrate > excellent oral and written communication skills. > >All candidates should have demonstrated experience working in and >fostering >a diverse faculty, staff, and student environment or commitment to do >so as >a faculty member at VCU. Potential candidates can submit >applications, >including a statement of research, teaching philosophy, curriculum >vitae and >contact information for three professional references, via mail ??? to >Yvonne >Hargrove, Department of Biostatistics, Virginia Commonwealth >University, >P.O. Box 980032, Richmond, VA 23298-0032 ??? or by e-mail >to > yfhargro@vcu.edu. > >Virginia Commonwealth University is an equal opportunity/affirmative >action >employer. Women, minorities and persons with disabilities are >encouraged to > apply. > Kellie J. Archer, Ph.D. > Associate Professor, Department of Biostatistics > Director, VCU Massey Cancer Center Biostatistics Shared Resource > Virginia Commonwealth University > 830 East Main St., 718 > Richmond, VA 23298-0032 > phone: (804) 827-2039 > fax: (804) 828-8900 > e-mail: kjarcher@vcu.edu > website: www.people.vcu.edu/~kjarcher > > >------------------------------ > >Message: 2 >Date: Fri, 21 Sep 2012 16:32:58 +0200 >From: Yanju Zhang <hollandorange.yanju@gmail.com> >To: Nicolas Delhomme <delhomme@embl.de> >Cc: bioconductor@r-project.org >Subject: Re: [BioC] question on easyRNASeq developer version >Message-ID: > <cabnzwf6nfm0eqs_ht3n5mcqcn=mbfkt+gutycobsm2gwyaycow@mail.gmail.com> >Content-Type: text/plain > >Hi Nico >As mentioned in SEQAnswers, I also encountered this problem: > >> "Error in mk_singleBracketReplacementValue(x, value) : >> 'value' must be a CompressedIntegerList object" > >In my bam files, the reads are with different length. > >I am expecting the solution. If you need more information, please let >me know. > >Best wishes >Yanju > > [[alternative HTML version deleted]] > > > >------------------------------ > >Message: 3 >Date: Fri, 21 Sep 2012 16:37:13 +0200 >From: Nicolas Delhomme <delhomme@embl.de> >To: Yanju Zhang <hollandorange.yanju@gmail.com> >Cc: bioconductor@r-project.org >Subject: Re: [BioC] question on easyRNASeq developer version >Message-ID: <aafb721c-86af-49bf-acf8-47ae5dba320d@embl.de> >Content-Type: text/plain; charset=us-ascii > >Hi Yanju, > >Would you be OK with uploading the file that creates the problem on my >dropbox? If that's OK, I'll send you a link to it. That would be best >for me to reproduce the error. > >Cheers, > >Nico > >--------------------------------------------------------------- >Nicolas Delhomme > >Genome Biology Computational Support > >European Molecular Biology Laboratory > >Tel: +49 6221 387 8310 >Email: nicolas.delhomme@embl.de >Meyerhofstrasse 1 - Postfach 10.2209 >69102 Heidelberg, Germany >--------------------------------------------------------------- > > > > > >On Sep 21, 2012, at 4:32 PM, Yanju Zhang wrote: > >> Hi Nico >> As mentioned in SEQAnswers, I also encountered this problem: >> > "Error in mk_singleBracketReplacementValue(x, value) : >> > 'value' must be a CompressedIntegerList object" >> >> In my bam files, the reads are with different length. >> >> I am expecting the solution. If you need more information, please let >me know. >> >> >> Best wishes >> Yanju >> >> > > > >------------------------------ > >Message: 4 >Date: Fri, 21 Sep 2012 17:54:21 +0200 >From: Yanju Zhang <hollandorange.yanju@gmail.com> >To: Nicolas Delhomme <delhomme@embl.de> >Cc: bioconductor@r-project.org >Subject: Re: [BioC] question on easyRNASeq developer version >Message-ID: > <cabnzwf45nimkwg3guqjwbyqqqhxbw7qmpd4dx_s9wxjh13vo2w@mail.gmail.com> >Content-Type: text/plain > >Hi Nico, > >It is fine with me to upload my bam file. Please give me the link. > >Best wishes >Yanju > >Code + error + sessionInfo >> chr.sizes=as.list(seqlengths(Hsapiens)) >> bamfiles=dir(getwd(),pattern="*.sorted.bam$") >> RNASeq<- easyRNASeq(filesDirectory=getwd(), >+ organism="Hsapiens", >+ chr.sizes=chr.sizes, >+ #readLength=80L, >+ annotationMethod="biomaRt", >+ format="bam", >+ count="genes", >+ summarization="geneModels", >+ filenames=bamfiles[1], >+ outputFormat="RNAseq" >+ ) > > > >Checking arguments... >Fetching annotations... >Computing gene models... >Summarizing counts... >Processing test.sorted.bam >Updating the read length information. >The reads have been trimmed. >Minimum length of 50 bp. >Maximum length of 80 bp. >Error in mk_singleBracketReplacementValue(x, value) : > 'value' must be a CompressedIntegerList object >In addition: Warning messages: >1: The use of the list for providing chromosome sizes has been >deprecated. >Use a named numeric vector instead. >2: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", >chr.sizes >= chr.sizes, : >There are 16696 synthetic exons as determined from your annotation that >overlap! This implies that some reads will be counted more than once! >Is >that really what you want? >3: In fetchCoverage(rnaSeq, format = format, filename = filename, >filter = >filter, : >You enforce UCSC chromosome conventions, however the provided >alignments >are not compliant. Correcting it. >4: In fetchCoverage(rnaSeq, format = format, filename = filename, >filter = >filter, : >Not all the chromosome names in your chromosome size list 'chr.sizes' >are >present in your read file(s) (aln or bam). >5: In fetchCoverage(rnaSeq, format = format, filename = filename, >filter = >filter, : > The available chromosomes in both your read file(s) (aln or bam) and >'chr.sizes' list were restricted to their common term. >These are: chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, >chr17, >chr18, chr19, chr2, chr20, chr21, chr22, chr3, chr4, chr5, chr6, chr7, >chr8, chr9, chrM, chrX, chrY. > >> sessionInfo() >R version 2.15.1 (2012-06-22) >Platform: x86_64-unknown-linux-gnu (64-bit) > >locale: >[1] C > >attached base packages: >[1] parallel stats graphics grDevices utils datasets methods >[8] base > >other attached packages: > [1] BSgenome.Hsapiens.UCSC.hg19_1.3.19 easyRNASeq_1.3.14 > [3] ShortRead_1.15.11 latticeExtra_0.6-24 > [5] RColorBrewer_1.0-5 Rsamtools_1.9.30 > [7] DESeq_1.9.14 lattice_0.20-6 > [9] locfit_1.5-8 BSgenome_1.25.8 >[11] GenomicRanges_1.9.65 Biostrings_2.25.12 >[13] IRanges_1.15.44 edgeR_2.99.8 >[15] limma_3.12.1 biomaRt_2.13.2 >[17] Biobase_2.17.7 genomeIntervals_1.13.3 >[19] BiocGenerics_0.3.1 intervals_0.13.3 > >loaded via a namespace (and not attached): > [1] AnnotationDbi_1.18.1 DBI_0.2-5 RCurl_1.91-1 > [4] RSQLite_0.11.1 XML_3.9-4 annotate_1.34.1 > [7] bitops_1.0-4.1 genefilter_1.38.0 geneplotter_1.35.1 >[10] grid_2.15.1 hwriter_1.3 splines_2.15.1 >[13] stats4_2.15.1 survival_2.36-14 xtable_1.7-0 >[16] zlibbioc_1.2.0 > > > > >On 21 September 2012 16:37, Nicolas Delhomme <delhomme@embl.de> wrote: > >> Hi Yanju, >> >> Would you be OK with uploading the file that creates the problem on >my >> dropbox? If that's OK, I'll send you a link to it. That would be best >for >> me to reproduce the error. >> >> Cheers, >> >> Nico >> >> --------------------------------------------------------------- >> Nicolas Delhomme >> >> Genome Biology Computational Support >> >> European Molecular Biology Laboratory >> >> Tel: +49 6221 387 8310 >> Email: nicolas.delhomme@embl.de >> Meyerhofstrasse 1 - Postfach 10.2209 >> 69102 Heidelberg, Germany >> --------------------------------------------------------------- >> >> >> >> >> >> On Sep 21, 2012, at 4:32 PM, Yanju Zhang wrote: >> >> > Hi Nico >> > As mentioned in SEQAnswers, I also encountered this problem: >> > > "Error in mk_singleBracketReplacementValue(x, value) : >> > > 'value' must be a CompressedIntegerList object" >> > >> > In my bam files, the reads are with different length. >> > >> > I am expecting the solution. If you need more information, please >let me >> know. >> > >> > >> > Best wishes >> > Yanju >> > >> > >> >> > > [[alternative HTML version deleted]] > > > >------------------------------ > >Message: 5 >Date: Fri, 21 Sep 2012 17:59:35 +0200 >From: Nicolas Delhomme <delhomme@embl.de> >To: Nicolas Delhomme <delhomme@embl.de> >Cc: Dadi Gao <dgao3450@uni.sydney.edu.au>, bioconductor@r-project.org >Subject: Re: [BioC] Error of GTF Annotation in easyRNASeq >Message-ID: <e8f88d62-00ca-45b9-97b9-6b8dfa8cc0a7@embl.de> >Content-Type: text/plain; charset=us-ascii > >Hi Dadi, > >The error comes from a change of API that affects a package I depend >upon. I've contacted the maintainer and will let you know once it gets >fixed. I might take some time though (~ 1 week). > >Cheers, > >Nico > >--------------------------------------------------------------- >Nicolas Delhomme > >Genome Biology Computational Support > >European Molecular Biology Laboratory > >Tel: +49 6221 387 8310 >Email: nicolas.delhomme@embl.de >Meyerhofstrasse 1 - Postfach 10.2209 >69102 Heidelberg, Germany >--------------------------------------------------------------- > > > > > >On Sep 21, 2012, at 10:26 AM, Nicolas Delhomme wrote: > >> Moreover, to make sure that this is not a package conflict can you >please NOT load the library(RnaSeqTutorial). You do not need it to run >easyRNASeq. So your script should read: >> >> library(easyRNASeq) >> library(BSgenome.Mmusculus.UCSC.mm9) >> >> setwd("/home/gao/RNA") >> >> ## the "." is your current directory. >> count.table <- easyRNASeq(".", >> pattern=".sorted.bam$", >> organism="MMusculus", >> annotationMethod="gtf", >> annotationFile="mm9gene.gtf", >> count="genes", >> summarization="geneModels", >> normalize=TRUE >> ) >> >> >> Cheers, >> >> Nico >> >> --------------------------------------------------------------- >> Nicolas Delhomme >> >> Genome Biology Computational Support >> >> European Molecular Biology Laboratory >> >> Tel: +49 6221 387 8310 >> Email: nicolas.delhomme@embl.de >> Meyerhofstrasse 1 - Postfach 10.2209 >> 69102 Heidelberg, Germany >> --------------------------------------------------------------- >> >> >> >> >> >> On Sep 21, 2012, at 10:19 AM, Nicolas Delhomme wrote: >> >>> Dear Dadi, >>> >>> I will need a little more information from you. In addition, it's >best if you post such emails to the bioconductor mailing list (which >I've Cced, so please "answer to all" when you reply.). See there for >subscribing: http://www.bioconductor.org/help/mailing-list/. What I >need to know from you first is what is described in that page: >http://www.bioconductor.org/help/mailing-list/posting-guide/ mainly >under the sections preparing and composing. In essence I need to know >what version of R and bioconductor packages you are using. >>> >>> Then, installing you package in the installation directory of an >existing package is not the safest. You might a) disrupt that package >functionality b) possibly lose your data if that package gets updated. >You'd rather move your RNA folder to you home directory and use that >directory, e.g. /home/gao/RNA instead. Using the setwd command, you can >make that your current working dir. >>> >>> So the following two blocks results in the same: >>> >>> setwd("/home/gao/RNA") >>> >>> ## the "." is your current directory. >>> count.table <- easyRNASeq(".", >>> pattern=".sorted.bam$", >>> organism="MMusculus", >>> annotationMethod="gtf", >>> annotationFile="mm9gene.gtf", >>> count="genes", >>> summarization="geneModels", >>> normalize=TRUE >>> ) >>> >>> Or: >>> >>> count.table <- easyRNASeq("/home/gao/RNA", >>> pattern=".sorted.bam$", >>> organism="MMusculus", >>> annotationMethod="gtf", >>> annotationFile="/home/gao/RNA /mm9gene.gtf", >>> count="genes", >>> summarization="geneModels", >>> normalize=TRUE >>> ) >>> >>> Now, for the error, can you please tell me more about what aligner >you used for you data , whether it is Paired-End or not and finally >whether the reads have been dynamically trimmed (i.e. if reads of >variable length are expected ) or not? >>> >>> What actually bothers me in your error is that it mentions: >>> >>> easyRNASeq(system.file("miRNA", package = "RnaSeqTutorial"), >>> >>> instead of >>> >>> easyRNASeq(system.file("RNA", package="RnaSeqTutorial"), >>> >>> i.e. miRNA instead of RNA. So to make sure that the error is >reproducible can you move your RNA folder to a different directory and >re-run the command as above? I don't expect this to solve the error >though, but at least we'd have a "cleaner" setup for reproducing it. >>> >>> Best, >>> >>> Nico >>> >>> --------------------------------------------------------------- >>> Nicolas Delhomme >>> >>> Genome Biology Computational Support >>> >>> European Molecular Biology Laboratory >>> >>> Tel: +49 6221 387 8310 >>> Email: nicolas.delhomme@embl.de >>> Meyerhofstrasse 1 - Postfach 10.2209 >>> 69102 Heidelberg, Germany >>> --------------------------------------------------------------- >>> >>> >>> >>> >>> >>> On Sep 21, 2012, at 2:41 AM, Dadi Gao wrote: >>> >>>> Dear Dr. Delhomme, >>>> >>>> I'm currently study gene expression pattern from deep sequencing >data of mouse blood cell using easyRNASeq. >>>> I created a folder called "RNA" under R package RnaSeqTutorial >path. >>>> Within this folder, I put 3 RNA-seq data files called >"N1.sorted.bam", "N2.sorted.bam" and "N3.sorted.bam", with their bam >index files. >>>> It also contains a GTF file for mouse gene annotation downloaded >from UCSC, called "mm9gene.gtf". >>>> >>>> I'm using the following code to normalize the gene expression: >>>> >>>> library(easyRNASeq) >>>> library(RnaSeqTutorial) >>>> library(BSgenome.Mmusculus.UCSC.mm9) >>>> >>>> count.table <- easyRNASeq(system.file("RNA", >package="RnaSeqTutorial"), >>>> pattern=".sorted.bam$", >>>> organism="MMusculus", >>>> annotationMethod="gtf", >>>> annotationFile=system.file("RNA", "mm9gene.gtf", >package="RnaSeqTutorial"), >>>> count="genes", >>>> summarization="geneModels", >>>> normalize=TRUE >>>> ) >>>> >>>> But this runs with an error as: >>>> >>>> Checking arguments... >>>> Fetching annotations... >>>> Read 962651 records >>>> Warning message: >>>> In easyRNASeq(system.file("miRNA", package = "RnaSeqTutorial"), : >>>> You enforce UCSC chromosome conventions, however the provided >chromosome size list is not compliant. Correcting it. >>>> Error in all.annotation[all.annotation$type %in% annotation.type, ] >: >>>> error in evaluating the argument 'i' in selecting a method for >function '[': Error in all.annotation$type %in% annotation.type : >>>> error in evaluating the argument 'x' in selecting a method for >function '%in%': Error in function (classes, fdef, mtable) : >>>> unable to find an inherited method for function "annotation", for >signature "Genome_intervals_stranded" >>>> >>>> Did I do something wrong? >>>> >>>> Sincerely yours, >>>> Dadi Gao >>>> >>>> Bioinformatics Group >>>> Centenary Institute >>>> Building 93, Royal Prince Alfred Hospital >>>> Missenden Rd, Camperdown, NSW 2050 >>>> Australia >>>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > >------------------------------ > >Message: 6 >Date: Fri, 21 Sep 2012 13:51:54 -0400 >From: Sean Davis <sdavis2@mail.nih.gov> >To: bioconductor@r-project.org >Subject: [BioC] Feature request in readVcf >Message-ID: > <caneavbmmjpjryjx8axvcdsd1xw4qbajd=s7b3oaep_ypr- mxwa@mail.gmail.com=""> >Content-Type: text/plain > >Hi, Val. > >Is there in interest in simply ignoring unknown INFO and GENOTYPE >fields >when parsing VCF files, perhaps by issuing a warning instead of an >error? >There are LOTS of malformed VCF files out there. In some cases, they >are >not useable, but in this case, they can be perfectly useable if these >unknown fields are simply ignored. > >> dat = readVcf('tmp.gatk.vcf',genome='hg19') >Error: scanVcf: record 22 INFO 'KGPilot123' not found > path: >/Volumes/CCRBioinfo/projects/RosenbergImmuneStudy/staging/tmp.gatk.vc f > >Thanks, >Sean > > [[alternative HTML version deleted]] > > > >------------------------------ > >Message: 7 >Date: Fri, 21 Sep 2012 13:50:29 -0700 >From: "Tim Triche, Jr." <tim.triche@gmail.com> >To: Sean Davis <sdavis2@mail.nih.gov> >Cc: bioconductor@r-project.org >Subject: Re: [BioC] Feature request in readVcf >Message-ID: > <cac+n9bvynua_smpcsfwwxmmu3ovzqnc314bq6+c=eyeozmgpmw@mail.gmail.com> >Content-Type: text/plain > >+1 > >thanks, > >--t > > > >On Fri, Sep 21, 2012 at 10:51 AM, Sean Davis <sdavis2@mail.nih.gov> >wrote: > >> Hi, Val. >> >> Is there in interest in simply ignoring unknown INFO and GENOTYPE >fields >> when parsing VCF files, perhaps by issuing a warning instead of an >error? >> There are LOTS of malformed VCF files out there. In some cases, >they are >> not useable, but in this case, they can be perfectly useable if these >> unknown fields are simply ignored. >> >> > dat = readVcf('tmp.gatk.vcf',genome='hg19') >> Error: scanVcf: record 22 INFO 'KGPilot123' not found >> path: >> >/Volumes/CCRBioinfo/projects/RosenbergImmuneStudy/staging/tmp.gatk.vc f >> >> Thanks, >> Sean >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > >-- >*A model is a lie that helps you see the truth.* >* >* >Howard >Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> > > [[alternative HTML version deleted]] > > > >------------------------------ > >Message: 8 >Date: Fri, 21 Sep 2012 17:49:26 -0700 >From: Marc Carlson <mcarlson@fhcrc.org> >To: bioconductor@r-project.org >Subject: Re: [BioC] GO annotation >Message-ID: <505D0B16.80804@fhcrc.org> >Content-Type: text/plain; charset=ISO-8859-1; format=flowed > >Hi Lim, > >First of all it all depends on what you have for gene identifiers. If >you are like most people you will have entrez gene IDs. So for now I >will assume you have those. > >## So lets further assume you are working with humans and just choose >the 1st two entrez gene IDs so that we can make a (hopefully >meaningful) >example >ids = c("1","2") >## now load the org library for humans >library(org.Hs.eg.db) >## then you can call select to extract your GO IDs like this: >select(org.Hs.eg.db, keys = ids, cols = "GO", keytype = "ENTREZID") > >Now one thing to notice is that if you have some other kind of >identifier, then your keytype argument will have to be set to a >different value. And hopefully, the kind of ID you are using, is >present in the package that you have to search... See the manual page >for select for more information. > >?select > > From your question, I also recognize that you may not be able to do >this because it sounds like you might be using a more unusual organism >and not be using something commonplace like human. Well don't give up >just yet, because we may be able to help you there too. You can look >at >the manual page for the function makeOrgPackageFromNCBI to learn how >you >can try to generate an org package from just the taxonomy ID (which you >can look up on NCBIs website). If the data is available at NCBI, then >you should be able to generate a package from NCBI that will match your >organism of choice. > >?makeOrgPackageFromNCBI > > >Does that answer your question? > > > Marc > > > >On 09/21/2012 02:35 AM, KJ Lim wrote: >> Dear Bioconductor community, >> >> Good day. >> >> I did the differential expression analysis for my RNA-Seq data with >edgeR >> package. I have a list of differentially expressed genes now and I >would >> like to find the GO terms for the genes. >> >> I have been reading and searching around for the right package. But, >I >> found that several packages are developed based on model species. >Could the >> community kindly please suggest me what GO annotation package I can >use for >> non-model species; plant RNA-Seq data? >> >> Thank you very much and have a nice weekend. >> >> Best regards, >> KJ Lim >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor > > > >------------------------------ > >Message: 9 >Date: Sat, 22 Sep 2012 01:21:29 -0400 >From: "Srinivasan, Sathish K" <ssrinivasan@med.miami.edu> >To: Marc Carlson <mcarlson@fhcrc.org>, "bioconductor@r-project.org" > <bioconductor@r-project.org> >Subject: Re: [BioC] GO annotation >Message-ID: ><4EB41664F8279A4CA870A740E038CFEA3E1E7EB0CB@MEDEXMB05.ad.med.miami.ed u> > >Content-Type: text/plain; charset="us-ascii" > >Hi Marc, >Could you suggest a go-to literature reference on annotating genomic >data using bioconductor packages, probably a book or any of its kind. >Thanks > >~Sathish > >-----Original Message----- >From: bioconductor-bounces@r-project.org >[mailto:bioconductor-bounces@r-project.org] On Behalf Of Marc Carlson >Sent: Friday, September 21, 2012 8:49 PM >To: bioconductor@r-project.org >Subject: Re: [BioC] GO annotation > >Hi Lim, > >First of all it all depends on what you have for gene identifiers. If >you are like most people you will have entrez gene IDs. So for now I >will assume you have those. > >## So lets further assume you are working with humans and just choose >the 1st two entrez gene IDs so that we can make a (hopefully >meaningful) example ids = c("1","2") ## now load the org library for >humans >library(org.Hs.eg.db) >## then you can call select to extract your GO IDs like this: >select(org.Hs.eg.db, keys = ids, cols = "GO", keytype = "ENTREZID") > >Now one thing to notice is that if you have some other kind of >identifier, then your keytype argument will have to be set to a >different value. And hopefully, the kind of ID you are using, is >present in the package that you have to search... See the manual page >for select for more information. > >?select > >From your question, I also recognize that you may not be able to do >this because it sounds like you might be using a more unusual organism >and not be using something commonplace like human. Well don't give up >just yet, because we may be able to help you there too. You can look >at the manual page for the function makeOrgPackageFromNCBI to learn how >you can try to generate an org package from just the taxonomy ID (which >you can look up on NCBIs website). If the data is available at NCBI, >then you should be able to generate a package from NCBI that will match >your organism of choice. > >?makeOrgPackageFromNCBI > > >Does that answer your question? > > > Marc > > > >On 09/21/2012 02:35 AM, KJ Lim wrote: >> Dear Bioconductor community, >> >> Good day. >> >> I did the differential expression analysis for my RNA-Seq data with >> edgeR package. I have a list of differentially expressed genes now >and >> I would like to find the GO terms for the genes. >> >> I have been reading and searching around for the right package. But, >I >> found that several packages are developed based on model species. >> Could the community kindly please suggest me what GO annotation >> package I can use for non-model species; plant RNA-Seq data? >> >> Thank you very much and have a nice weekend. >> >> Best regards, >> KJ Lim >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >_______________________________________________ >Bioconductor mailing list >Bioconductor@r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor > > > >------------------------------ > >Message: 10 >Date: Sat, 22 Sep 2012 00:23:09 +0200 >From: Daniela Lopes Paim Pinto <d.lopespaimpinto@sssup.it> >To: bioconductor@r-project.org >Subject: [BioC] Is normalization in edgeR required for small RNA > sequencing data? >Message-ID: > <cahk- ra1aw+tcnkajrax7wfjychh22whn2sfwhat2wmu-="WWdsA@mail.gmail.com"> >Content-Type: text/plain > >Dear All, > >I am PhD student, currently working on differential expression analysis >of >my smallRNA library deep sequencing data and trying to identify >differentially expressed miRNAs, using edgeR package. I have 24 >different >samples with 2 biological replicates (48 libraries). I am performing >multiple group comparison using GLM method and also Anova-like test to >idetify DE miRNAs among the different groups of my samples. >My question is : > >Do I need to normalize my input data using *calcNormFactors() *once I >set >my DGE list or I could proceed without any normalization? I assume in >this >case that edgeR performs a default normallization when it is >"calculating >library sizes from column totals"? > > >I would really appreciate any suggestion on this! > > >Thanks in advance, > > >Daniela > > [[alternative HTML version deleted]] > > > >------------------------------ > >Message: 11 >Date: Sat, 22 Sep 2012 08:03:32 +0100 >From: Jill Pleasance <jpleasance@gmail.com> >To: bioconductor@r-project.org >Subject: [BioC] NGS public data analysis >Message-ID: > <cajhs7oj_ebg2vgbx=fpyvxxkbdq- tdzzmdbmsymnau="r6Gp6Lw@mail.gmail.com"> >Content-Type: text/plain > >Hi > > > >I am writing as I am trying to analyse NGS data from public data (GEO) >specifically datasets such as one sample per time point. The raw >(somewhat >processed data) is 3 samples at different time points where ?The read >count >at exon, splice-junction, transcript and gene levels were summarized >and >normalized to relative abundance in Fragments Per Kilobase of exon >model >per Million (FPKM) in order to compare transcription level among >samples.? > > > >The authors of this paper then used The differentially expressed >transcripts were identified using M-A based random sampling method >implemented in DEGseq package in BioConductor ( >http://bioconductor.org/packages/2.5/bioc/html/DEGseq.html). The >transcripts were further filtered at > 2-fold change and a minimum read >count of 50 in either condition. > > > >I have read through some of your posts where Gordon suggested using a >simple excel formula to achieve fold changes when you don?t have >replicates > >*lib.size1 <- sum(y1)* > >>>* lib.size2 <- sum(y2)* > >>>* logFC <- log2((y1+0.5)/(lib.size1+0.5)/(y2+0.5)*(lib.size2+0.5))* > >* * > >Is this something I could apply to the current analysis? I have 3 >files - >with gene ID and counts (one for each sample) and if genes are not >listed >in the sample files ? I assume the counts are zero. Would you have any >suggestions as to what to do with these zero count reads? > > >I am trying to avoid learning how to script write at the moment to see >if >this analysis works and obviously when -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. [[alternative HTML version deleted]]

Transcription Sequencing miRNA SmallRNA Annotation Normalization GO Cancer Organism edgeR • 1.8k views

ADD COMMENT • link updated 11.6 years ago by Steve Lianoglou ★ 13k • written 11.6 years ago by Weng Khong LIM ▴ 10

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 14 months ago

United States

On Saturday, September 22, 2012, Weng Khong LIM wrote: > Help With what? (Also, please don't include all of the contents if your digest emails when you send messages to the list) -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]]

ADD COMMENT • link 11.6 years ago Steve Lianoglou ★ 13k

Login before adding your answer.