edgeR: effect of 'outlier' tags on differential expression calls
1
0
Entering edit mode
@alessandroguffantigenomniacom-4436
Last seen 10.2 years ago
Dear colleagues: I am using edgeR to examine differential expression on small RNA data I noticed this problem also when working with SAGE datasets: when just one of the samples is clearly an outlier, like you can see below for sample 7 (the comparison is 1-4 versus 5-8), there is a call of significant differential expression which seems to be inappropriate, or at least it should be reexamined. How can we diagnose these situations before checking manually the tag counts for all the significant differential expression calls ? Please note that these are tumoral samples, so an high sample by sample variability is expected in principle.. Thanks a lot in advance, Alessandro miRNA_ID 1.mirna 2.mirna 3.mirna 4.mirna 5.mirna 6.mirna 7.mirna 8.mirna hsa-miR-515-3p 3 1 1 1 1 7 1601 3 hsa-miR-518e 4 0 1 0 1 2 1715 2 hsa-miR-520d-3p 0 0 0 0 0 1 243 0 hsa-miR-519c-3p 0 0 0 0 0 1 248 0 hsa-miR-520f 0 0 0 0 0 0 163 0 hsa-miR-519d 12 1 0 1 1 4 1754 1 hsa-miR-520h 0 0 0 0 0 0 189 2 hsa-miR-519c-5p 0 0 0 0 0 0 123 0 hsa-miR-520g 16 1 1 4 2 4 1917 2 hsa-miR-518b 5 0 0 1 1 3 686 1 hsa-miR-517a 100 5 4 2 6 45 10024 3 miRNA_ID logConc logFC P.Value adj.P.Val hsa-miR-515-3p -15.09154 -8.61753 0.00000 0.00082 hsa-miR-518e -15.30278 -9.22926 0.00000 0.00110 hsa-miR-520d-3p -18.23592 -9.46747 0.00001 0.00201 hsa-miR-519c-3p -17.98705 -9.01722 0.00002 0.00338 hsa-miR-520f -32.04992 -35.93228 0.00002 0.00338 hsa-miR-519d -14.46073 -7.61177 0.00003 0.00338 hsa-miR-520h -18.02925 -8.34496 0.00003 0.00338 hsa-miR-519c-5p -32.25620 -35.51970 0.00004 0.00382 hsa-miR-520g -14.16219 -7.27220 0.00005 0.00382 hsa-miR-518b -15.70611 -7.39997 0.00006 0.00382 hsa-miR-517a -11.74423 -7.21374 0.00006 0.00382 R version 2.15.0 (2012-03-30) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_2.6.0 limma_3.12.0 -- Alessandro Guffanti - Head, Bioinformatics, Genomnia srl Via Nerviano, 31 - 20020 Lainate, Milano, Italy Ph: +39-0293305.702 Fax: +39-0293305.777 http://www.genomnia.com "When you're curious, you find lots of interesting things to do." (Walt Disney) ----------------------------------------------------------- Il Contenuto del presente messaggio potrebbe contenere informazioni confidenziali a favore dei soli destinatari del messaggio stesso. Qualora riceviate per errore questo messaggio siete pregati di cancellarlo dalla memoria del computer e di contattare i numeri sopra indicati. Ogni utilizzo o ritrasmissione dei contenuti del messaggio da parte di soggetti diversi dai destinatari รจ da considerarsi vietato ed abusivo. The information transmitted is intended only for the per...{{dropped:10}}
SAGE edgeR SAGE edgeR • 1.4k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 21 minutes ago
WEHI, Melbourne, Australia
Dear Alessandro, You seem to giving examples of miRs that are expressed at a high degree is just one sample. The easiest way to deal with such miRs, if you really don't want to detect them, is to filter out miRs that fail to be expressed to a reasonable degree in at least four samples (since your groups are of size four). See for example pages 24-25 of the edgeR user's guide, where this is done for the Dclk1 mouse case study. We often suggest cpm>1 for at least m samples, where m is the minimum group size. Another obvious thing to do is to examine an MDS plot to identify outlier samples. I also have to point out that the output in your email cannot be correct, at least the output cannot all be from the same R session. The sessionInfo() output says edgeR 2.6.0, but the column headings show that the results being present are actually from an earlier version of edgeR. I'd much rather see continuous code and output from one session, rather than output snippets, without quite knowing how they were obtained. Best wishes Gordon -------------- original message -------------- [BioC] edgeR: effect of 'outlier' tags on differential expression calls alessandro.guffanti at genomnia.com alessandro.guffanti at genomnia.com Tue Apr 24 12:48:22 CEST 2012 Dear colleagues: I am using edgeR to examine differential expression on small RNA data I noticed this problem also when working with SAGE datasets: when just one of the samples is clearly an outlier, like you can see below for sample 7 (the comparison is 1-4 versus 5-8), there is a call of significant differential expression which seems to be inappropriate, or at least it should be reexamined. How can we diagnose these situations before checking manually the tag counts for all the significant differential expression calls ? Please note that these are tumoral samples, so an high sample by sample variability is expected in principle.. Thanks a lot in advance, Alessandro miRNA_ID 1.mirna 2.mirna 3.mirna 4.mirna 5.mirna 6.mirna 7.mirna 8.mirna hsa-miR-515-3p 3 1 1 1 1 7 1601 3 hsa-miR-518e 4 0 1 0 1 2 1715 2 hsa-miR-520d-3p 0 0 0 0 0 1 243 0 hsa-miR-519c-3p 0 0 0 0 0 1 248 0 hsa-miR-520f 0 0 0 0 0 0 163 0 hsa-miR-519d 12 1 0 1 1 4 1754 1 hsa-miR-520h 0 0 0 0 0 0 189 2 hsa-miR-519c-5p 0 0 0 0 0 0 123 0 hsa-miR-520g 16 1 1 4 2 4 1917 2 hsa-miR-518b 5 0 0 1 1 3 686 1 hsa-miR-517a 100 5 4 2 6 45 10024 3 miRNA_ID logConc logFC P.Value adj.P.Val hsa-miR-515-3p -15.09154 -8.61753 0.00000 0.00082 hsa-miR-518e -15.30278 -9.22926 0.00000 0.00110 hsa-miR-520d-3p -18.23592 -9.46747 0.00001 0.00201 hsa-miR-519c-3p -17.98705 -9.01722 0.00002 0.00338 hsa-miR-520f -32.04992 -35.93228 0.00002 0.00338 hsa-miR-519d -14.46073 -7.61177 0.00003 0.00338 hsa-miR-520h -18.02925 -8.34496 0.00003 0.00338 hsa-miR-519c-5p -32.25620 -35.51970 0.00004 0.00382 hsa-miR-520g -14.16219 -7.27220 0.00005 0.00382 hsa-miR-518b -15.70611 -7.39997 0.00006 0.00382 hsa-miR-517a -11.74423 -7.21374 0.00006 0.00382 R version 2.15.0 (2012-03-30) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_2.6.0 limma_3.12.0 -- Alessandro Guffanti - Head, Bioinformatics, Genomnia srl Via Nerviano, 31 - 20020 Lainate, Milano, Italy Ph: +39-0293305.702 Fax: +39-0293305.777 http://www.genomnia.com "When you're curious, you find lots of interesting things to do." (Walt Disney) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
Dear List Members My experimental design is set to contain two groups of RNA-seq and within each group there are two subgroups(each subgroup with 8 replicates). Our goal is to identify differential expression genes between two groups and also between two subgroups. I red through the case studies in edgeR user guide and has not found such cases. How can we achieve the goals with edgeR? My current idea is to do the analysis between subgroups first and then between groups. In the case, we can find the shared and different DE genes. I donot know if we can do the analysis in one run. Any comments and suggestions are very much appreciated. Thanks in advance! Best wishes Li
ADD REPLY
0
Entering edit mode
Thanks for the useful suggestion, I will chase this up ! I have to say that MDS plot are not too useful in these cases, because there is not a general strong deregulation in one specific sample, but one localized to few genes and possibily in more than one sample. These 'tag bursts' are not infrequent in miRNA cancer analyses and they seem to be 'private' to some samples. Yes, the output of the DEG calls ans sessionInfo() were originated from two different sessions, I forgot there was an update in the middle, thanks for pointing this. Regards Alessandro On 4/25/2012 11:09 AM, Gordon K Smyth wrote: > Dear Alessandro, > > You seem to giving examples of miRs that are expressed at a high > degree is just one sample. The easiest way to deal with such miRs, if > you really don't want to detect them, is to filter out miRs that fail > to be expressed to a reasonable degree in at least four samples (since > your groups are of size four). See for example pages 24-25 of the > edgeR user's guide, where this is done for the Dclk1 mouse case > study. We often suggest cpm>1 for at least m samples, where m is the > minimum group size. > > Another obvious thing to do is to examine an MDS plot to identify > outlier samples. > > I also have to point out that the output in your email cannot be > correct, at least the output cannot all be from the same R session. > The sessionInfo() output says edgeR 2.6.0, but the column headings > show that the results being present are actually from an earlier > version of edgeR. > > I'd much rather see continuous code and output from one session, > rather than output snippets, without quite knowing how they were > obtained. > > Best wishes > Gordon > -- Alessandro Guffanti - Head, Bioinformatics, Genomnia srl Via Nerviano, 31 - 20020 Lainate, Milano, Italy Ph: +39-0293305.702 Fax: +39-0293305.777 http://www.genomnia.com "When you're curious, you find lots of interesting things to do." (Walt Disney) ----------------------------------------------------------- Il Contenuto del presente messaggio potrebbe contenere informazioni confidenziali a favore dei soli destinatari del messaggio stesso. Qualora riceviate per errore questo messaggio siete pregati di cancellarlo dalla memoria del computer e di contattare i numeri sopra indicati. Ogni utilizzo o ritrasmissione dei contenuti del messaggio da parte di soggetti diversi dai destinatari ? da considerarsi vietato ed abusivo. The information transmitted is intended only for the per...{{dropped:8}}

Login before adding your answer.

Traffic: 451 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6