Bioconductor Digest, Vol 106, Issue 21

0

Entering edit mode

Jing Huang ▴ 380

@jing-huang-4737

Last seen 11.4 years ago

-----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of bioconductor-request@r-project.org Sent: Thursday, December 22, 2011 3:00 AM To: bioconductor at r-project.org Subject: Bioconductor Digest, Vol 106, Issue 21 Send Bioconductor mailing list submissions to bioconductor at r-project.org To subscribe or unsubscribe via the World Wide Web, visit https://stat.ethz.ch/mailman/listinfo/bioconductor or, via email, send a message with subject or body 'help' to bioconductor-request at r-project.org You can reach the person managing the list at bioconductor-owner at r-project.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Bioconductor digest..." Today's Topics: 1. Re: problems with pd.genomewidesnp.6 (MacDonald, James) 2. Re: Affymetrix Mouse Gene 1.0 ST - Number of probes (MacDonald, James) 3. Limma question (asas asasa) 4. Re: "reverse" a set of nucleotides: from reverse to direct sense (Jane Merlevede) ---------------------------------------------------------------------- Message: 1 Date: Wed, 21 Dec 2011 08:56:44 -0500 From: "MacDonald, James" <jmacdon@med.umich.edu> To: Sebastian Thieme <thieme at="" mi.fu-berlin.de=""> Cc: bioconductor at r-project.org Subject: Re: [BioC] problems with pd.genomewidesnp.6 Message-ID: <4EF1E59C.9040901 at med.umich.edu> Content-Type: text/plain; charset="iso-8859-1"; format="flowed" Hi Sebastian, On 12/20/11 6:01 PM, Sebastian Thieme wrote: > Hi at all, > > I have some problems with the pd.genomewidesnp.6 package and I hope > some one can help me. The info with > get(objects("package:pd.genomewidesnp.6")) is > > #Class........: AffySNPCNVPDInfo > #Manufacturer.: Affymetrix > #Genome Build.: HG19 > #Chip Geometry: 2572 rows x 2680 columns > > I want match the man_festid of each prob to one gene, therefore I look > in the gene_assoc part and call the gene with minimum distance to the > respective prob as corresponding gene. My commands for get the raw > informations are: > > snp.f<- dbGetQuery(con6, "select * from featureSet") > snp.f<- snpfeature[,c("fsetid","man_fsetid","chrom","physical_pos"," strand","cytoband","gene_assoc")] > > cn.f<- dbGetQuery(con6, "select * from featureSetCNV") > cn.f<- cn.f[,c("fsetid","man_fsetid","chrom","chrom_start","strand", "cytoband","gene_assoc")] > > snp6.f<- rbind(snp.f,cn.f) > > and process the gene_assoc part. Now the problem within the gene_assoc > part is that there are genes which are not on the same chromosome as > the respective probs e.g. > > fsetid man_fsetid chrom physical_pos strand cytoband > 650443 CN_618877 12 93793083 - q22 > gene_assoc > ENST00000358888 // upstream // 315610 // Hs.112553 // RPL41 // 6171 > //ribosomal protein L41 /// ENST00000318066 // downstream // 8981 // > Hs.524630 // UBE2N // 7334 // ubiquitin-conjugating enzyme E2N (UBC13 > homolog, yeast) /// NR_002212 // exon // 0 // --- // NUDT4P1 // 440672 > // nudix (nucleoside diphosphate linked moiety X)-type motif 4 > pseudogene 1 /// NM_199040 // CDS // 0 // Hs.506325 // NUDT4 // 11163 > // nudix (nucleoside diphosphate linked moiety X)-type motif 4 > ///NM_019094 // CDS // 0 // Hs.506325 // NUDT4 // 11163 // nudix > (nucleoside diphosphate linked moiety X)-type motif 4 > > gene "NUDT4P1" is annotated on Chromosome 1 not 12 and this is only > one. An other example is In what build is that true? UCSC claims that NUDT4 and NUDT4P1 are overlapping, on chr12 (hg19). Anyway, the larger point here is a discussion of what a SNP is, and how they are localized. Essentially, a SNP is a single base that has been found to vary with a certain frequency in a population. They are localized by the flanking sequence, which means that in the case of a pseudogene (which may or may not be on the same chromosome), you will see the same flanking sequence and cannot reliably say where the SNP is really located. Since DNA chips work by binding to the SNP and its flanking sequence, you cannot say whether you have measured the gene, the pseudogene, or some combination thereof. Listing all possibilities for the SNP location is therefore not a 'problem', it just reflects our lack of precision. Best, Jim > fsetid man_fsetid chrom physical_pos strand cytoband > 186938 SNP_A-4227519 12 31784081 - p11.21 > > gene_assoc > ENST00000294419 // upstream // 14576 // Hs.10862 // AK3L1 // 205 // > adenylate kinase 3-like 1 /// ENST00000412352 // upstream // 16012 // > Hs.585084 // C12orf72 // 254013 // chromosome 12 open reading frame 72 > /// NM_013410 // upstream // 14564 // Hs.10862 // AK3L1 // 205 // > adenylate kinase 3-like 1 /// NM_001135864 // upstream // 16012 // > Hs.585084 // C12orf72 // 254013 // chromosome 12 open reading frame 72 > > AK3L1 is annotated at chromosome 9 not 12. The corresponding ensembl > ID (ENST00000294419 ) is mapped to AK4-201 which is annotated on > chromosome 1 . This are only two examples there are a lot more. Can > some one help? > > > best regards > > Basti > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues ------------------------------ Message: 2 Date: Wed, 21 Dec 2011 09:00:18 -0500 From: "MacDonald, James" <jmacdon@med.umich.edu> To: "Sophie LAMARRE [guest]" <guest at="" bioconductor.org=""> Cc: bioconductor at r-project.org, sophie.lamarre at insa-toulouse.fr Subject: Re: [BioC] Affymetrix Mouse Gene 1.0 ST - Number of probes Message-ID: <4EF1E672.7000905 at med.umich.edu> Content-Type: text/plain; charset="iso-8859-1"; format="flowed" Hi Sophie, On 12/21/11 5:38 AM, Sophie LAMARRE [guest] wrote: > Hello, > > I work on affymetrix mouse gene 1.0 ST. > > I used two methods in order to match my data base with my probes. I compared the uniques probes in the two methods after doing a RMA normalization: > > -> there were 34 760 probes (controls probe and main probes) when I used R/ Bioconductor. I downloaded the Unsupported Mouse Gene 1.0 ST Array CDF (Technical documentation -> Library Files) on Affymetrix website in order to have the cdf files and to make my own CDF package. > -> there were 35 556 probes (controls probe and main probes) when I used Expression Console. I downloaded the Mouse Gene 1.0 ST Array, Analysis (Technical documentation -> Library Files) in order to have the files that Expression Console need. > > => So I lost 796 probes. It's boring! > > Next, when I kept only main probes (after matched my data base with the Affymetrix annotation file available on Affymetrix website), I had: > -> 28 104 probes with Bioconductor > -> 28 856 probes with Expression Console > > => There were 752 main probes, I hadn't if I realized my data analysis with Bioconductor. I'm worry because sometimes one can ask me not to do summarization probes, so I can't use Expression Console, I have to use Bioconductor. I lost a lot of probes. > > I asked my question to Affymetrix support and they answered: > > This difference can be due to a number of reasons. > > Firstly, the CDF file is the array layout information designed for 3' IVT array analysis, and are therefore not optimal for a WT array (The WT arrays use different library files, CLF and PGF). This is the reason why it is given a unsupported status (as seen in the name). This could explain the difference you see. > > Secondly, bioconductor and Expression Console are different software, so the RMA algorithm may not work identically the same. Things like background correction, filtering and such might differ between these two software. > > What do you think answer Affymetrix support? Personnally, I don't think that the summarization (median polish) removes somes probes. How you could explain the difference I found? How I can do in so as to I keep all the probes I need (main probes)? The short answer is that Affy technical support is correct. There are a host of problems associated with using the affy package to analyze WT arrays, which is why the oligo and xps packages exist. You will be better served by switching to either oligo or xps for the analysis of these data. Best, Jim > Thank you, > > Sophie LAMARRE > Biostatistician - Toulouse (FRANCE) > > -- output of sessionInfo(): > > R version 2.13.0 (2011-04-13) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 > [4] LC_NUMERIC=C LC_TIME=French_France.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] affy_1.30.0 Biobase_2.12.2 > > loaded via a namespace (and not attached): > [1] affyio_1.20.0 preprocessCore_1.14.0 tools_2.13.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues ------------------------------ Message: 3 Date: Wed, 21 Dec 2011 17:09:29 +0200 From: asas asasa <asssaaaaf@gmail.com> To: bioconductor at r-project.org Subject: [BioC] Limma question Message-ID: <caeq1hok5wmhj9edk8xfadc52no4crl8phmrugqow9dxzd5am=w at="" mail.gmail.com=""> Content-Type: text/plain Hello limma people, I use Limma in the separate channel analysis for two color data, but there are some small problems : 1) While my differential expression analysis is fine after background correction ( backgroundCorrect(...) ), when this step is omitted the following error occur: Error in intraspotCorrelation(MA2, design) : Missing or infinite values found in M or A So, MA$A,$G include NA values, but I don't understand why they appear and how to dill with them, to avoid this error. My raw data doesn't include NA values. 2) I need to moralized intensities of my microarray results, so it can be exported for other programs. Is the following formula a correct way to extract log intensities from MA data ? logR <- MA2$A + (0.5*MA2$M) # red = Cy5 logG <- MA2$A - (0.5*MA2$M) # green = Cy3 Thanks for your previous help, Assaf [[alternative HTML version deleted]] ------------------------------ Message: 4 Date: Thu, 22 Dec 2011 11:46:53 +0100 From: Jane Merlevede <jane.merlevede@gmail.com> To: Michael Lawrence <lawrence.michael at="" gene.com=""> Cc: bioconductor at r-project.org Subject: Re: [BioC] "reverse" a set of nucleotides: from reverse to direct sense Message-ID: <cade5-ot-uz2yr7hpwfwufqwpet5awn42=dn+hcq1ypjjnvi3na at="" mail.gmail.com=""> Content-Type: text/plain Thanks for your answer ! I'm interesting in using the package that you developed: VariantAnnotation. I will try it after installing R.2.14 At the beginning of your vignette, you show your data ; there is a column "strand". I would like to know if it cares about the strand, because that is why I need. I went through your paper and I haven't seen that you consider both direct and reverse strand. Does your package handle both strands? Or do I need to use the ReverseComplement function first and then use your method on only direct strand? Jane Merlev?de 2011/12/20 Michael Lawrence <lawrence.michael at="" gene.com=""> > I think you are looking for the reverseComplement function in Biostrings. > Also, the VariantAnnotation package provides much of the functionality of > Annovar. > > Michael > > On Mon, Dec 19, 2011 at 2:13 AM, Jane Merlevede <jane.merlevede at="" gmail.com="">wrote: > >> Hello, >> >> I am looking for "interesting" mutations among a set of mutations. To >> reduce the amount of mutations, I am using Annovar. This software takes as >> input a file which contains the following information: chromosome, wild >> and >> mutated nucleotide(s) and the start and end position of the variant(s). >> It seems that this soft use only information from the "direct" sense but I >> have information on reverse strand too. >> I wrote a R-script to "reverse" the mutated variants, but I was told that >> there is probably a solution to do that in bioconductor. >> I haven't found yet, that's why I would like your help to know if it >> exists. >> >> Thanks in advance, >> Jane >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] ------------------------------ _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor End of Bioconductor Digest, Vol 106, Issue 21

SNP Microarray VariantAnnotation Annotation cdf probe affy limma PROcess oligo xps SNP • 1.2k views

ADD COMMENT • link 14.1 years ago Jing Huang ▴ 380

Login before adding your answer.