Question: edgeR - multiple comparisions

0

Davis McCarthy •

**260**wrote:Hi Sridhara
The function exactTest() has an argument 'pair'. This argument can be
used to define the two groups you wish to compare. The order in which
you put in the groups in the pair will determine the direction of the
log-fold change.
For instance if you called
> exactTest(d1, common.disp=FALSE, pair=c("con3","dca3"))
then you will get the comparison dca3 - con3. If you put in
pair=c("dca3","con3") then you will get the comparison con3-dca3.
This is a much better approach to making different comparisons between
groups than renaming the columns of your count matrix.
In principle, ?exactTest should have been sufficient to answer this
question for you. It's considered good form to read the documentation
thoroughly before posting to the list.
plotSmear() operates on DGEExact objects (output from exactTest), so
this avoids the final issue you're worried about entirely ---
direction of the log-fold change on the plot will match that in your
exact test. Again, ?plotSmear has details.
Cheers
Davis
On Jun 1, 2011, at 4:35 AM, Sridhara Gupta Kunjeti wrote:
> Hello Davis,
> Yes, this helped me to solve the problem. On the other hand, I have
a different kind of question, which is related to the exactTest. First
two columns in my inputs files are the counts for the control group.
> =============================================
> example 1: Textfile1
> Gene con3-1 con3-2 dca-1 dca-2.
> when I run the exactTest
>
> > de1.tgw <- exactTest(d1, common.disp = FALSE)
> Comparison of groups: dca3 - con3
>
> So, if the logFC is positive, it means it is up-regulated in dca3,
and these dots are plotted above '0' in the plotsmear.
>
> ================================================
> example 2:
> When I swap the columns
> Gene dca-1 dca-2 con3-1 con3-2
>
> > de1.tgw <- exactTest(d1, common.disp = FALSE)
> Comparison of groups: con3 - dca3
>
> Here if the logFC is negative, it means it is up-regulated in dca3,
and these are plotted below '0' in the plotSmear.
>
> Here the bottom line is if I swap the columns, when I run the
exactTest, it changes the sequence in pairing. In other words pairs
change from dca3 - con3 to con3 - dca3.
>
> This worked absolutely fine with 6 pairs. But for four pairs, even
when I swap the columns in the input data, in the exactTest the
sequence is not changing. i.e., con3 - c33 does not change to c33-con3
>
> My worry is if I look at logFC values, for some of the pair if the
values is "+", then it is up-regulated in the treatment and for some
it is "-". I am assuming this is going to be a problem when I
generate plotSmear. I mean inconsistent.
>
> Any help in generating same logFC values (positive for upregualtion
in treatment) will be appreciated.
>
> Thanks,
> Sridhara
>
>
>
> On Mon, May 30, 2011 at 2:33 AM, Davis McCarthy
<dmccarthy@wehi.edu.au> wrote:
> Hi Sridhara
>
> I'm not sure I completely follow what you're saying about the FDRs
being 0.5, 0.6 etc. Can you show us a top table? Output of topTags().
Actually it would be good to see all of your edgeR function calls to
get a better idea of how you're carrying out your analysis. In
principle I don't think that "0" in the data will have any adverse
effects on your analysis, so I'm not really sure what the results are
that you're trying to describe.
>
> If you are in an R 2.13 session and enter the commands:
> source("http://www.bioconductor.org/biocLite.R")
> biocLite("edgeR")
> then edgeR version 2.2.5 will be installed on your system. I would
recommend following the latest version of the edgeR User's Guide,
which was released with edgeR 2.2.x. You can get it from edgeR's
Bioconductor page:
> http://www.bioconductor.org/packages/2.8/bioc/html/edgeR.html
>
> Hope that helps.
>
> Cheers
> Davis
>
>
> On May 27, 2011, at 9:36 PM, Sridhara Gupta Kunjeti wrote:
>
>> Hello Davis,
>> Thank you very much for your email. After looking at one of my
comparisons, it makes total sense about the p-value. But, I did notice
that out of 10827 genes, most of them (10820) had an FDR of 1 and rest
others had an FDR of 0.5, 0.6, 0.7, and 0.8 so on.... I was wondering
if "0" in the data will cause this FDR?
>>
>> I will also install latest version of R 2.13 and also the edgeR.
Could you please let me know the latest version of edgeR that is
available for me to download? I am assuming I can still follow the
same manual (from version 2.0.3) for the new version of edgeR.
>>
>> Many thanks!
>> Sridhara
>>
>>
>> On Thu, May 26, 2011 at 10:52 PM, Davis McCarthy
<dmccarthy@wehi.edu.au> wrote:
>> Hi Sridhara
>>
>> I do not think it is genes with all zero counts for group A and
group C are causing the results you see.
>>
>> I just tested this on a dataset with 9 groups, and comparing two
groups, A and B, with 285 genes with all zero counts in groups A and B
yielded "expected" p-values and FDRs. Therefore I do not think that
your p-values all being 1 is driven by these all-zero genes.
>>
>> Is there truly very little difference in expression between groups
A and C relative to biological variability in your data? You could
have a look at the counts (raw, normalized or counts per million) for
the top-ranked (even if not significant) genes for your group A -
group C comparison.
>>
>> If you see little difference in expression between the groups for
the top genes then you may have no differential expression between
these groups. If, on the other hand, there does look to be large
differences in expression between the groups then you may have found a
bug in the p-values that are being output and we can go ahead and try
to fix the issue.
>>
>> I notice that you are using R 2.12 and edgeR version 2.0.3. I would
recommend updating to R 2.13 and the latest release of edgeR---there
have been many improvements made to the package since version 2.0.3
and any bug fixes (if required) will roll out to the current release
and devel versions, not legacy versions of the package.
>>
>> Cheers
>> Davis
>>
>>
>>
>> On May 26, 2011, at 6:16 AM, Sridhara Gupta Kunjeti wrote:
>>
>> > Hello Mark,
>> > Thank you very much for you email. It greatly helped me to export
the FDR,
>> > p-value, logFC and logConc into csv format.
>> > I have one real quick question, this is more of statistical
question.
>> > After exporting the FDR, I started analyzing pair by pair. In the
below
>> > example, what I noticed is when comparing the group A - B, I got
p-value and
>> > FDR that make sense. But, when I checked for the group A- group C
>> > comparision. all the 10,000 genes had FDR and p-value of 1, then
I counted
>> > the number of genes that had "0" in both the groups for both the
replicates,
>> > it turned out to be about 400 genes. So, my question is why the
other genes
>> > (9600) had FDR and p-value of "1". Do you think the 400 genes
with "0"
>> > counts would affect the analysis? Do I need to delete these 400
genes for
>> > the pair (gp A - gp C) comparison and then run and edgeR analysis
>> > individually?
>> >
>> > groupA Group B
Group
>> > C
>> > Genes A1 A2 B1 B2
C1 C2
>> > 1 0 0 11 12
0
>> > 0
>> > 2 120 102 45 38
30
>> > 40
>> >
>> >
>> > Any help or comments will be appreciated.
>> >
>> > Many thanks!
>> > Sridhara
>> >
>> >
>> > On Sun, May 22, 2011 at 4:24 PM, Mark Robinson
<mrobinson@wehi.edu.au>wrote:
>> >
>> >> Hi Sridhara,
>> >>
>> >> The problem here is that the output of topTags() (your 'fdr06')
is not a
>> >> data.frame or matrix, which is what write.table() works best on.
Instead,
>> >> try:
>> >>
>> >> fdr06 <- topTags(de06.tgw, n = nrow(de06.tgw), adjust.method =
"BH",
>> >> sort.by="p.value")
>> >> write.table(fdr06$table, file = "FDR06.csv", sep=",")
>> >>
>> >> Cheers,
>> >> Mark
>> >>
>> >> On May 22, 2011, at 11:02 PM, Sridhara Gupta Kunjeti wrote:
>> >>
>> >>> Hello Mark,
>> >>> Thanks for your email. I have one quick question. Is it
possible to
>> >> export all the 10,427 genes after passing exactTest()? what
argument do I
>> >> need to use to do that? Basically I wanted the complete list of
genes with
>> >> the following info:
>> >>>> topTags(de06.tgw, n = 10, adjust.method="BH",
sort.by="p.value")
>> >>> Comparison of groups: T6-P18
>> >>>
>> >> logConc logFC PValue FDR
>> >>> PITG_08841 | Pi conserved hypothetical protein (129 nt)
>> >> -28.79463 42.442850 1.032735e-11 1.076833e-07
>> >>> PITG_08845 | Pi mannitol dehydrogenase, putative (1065 nt)
>> >> -12.93992 9.148329 1.288618e-09 6.193586e-06
>> >>>
>> >>> If I use the following argument, it is showing an error
message.
>> >>>
>> >>> fdr06<- topTags(de06.tgw, n = 10,427, adjust.method = "BH",
sort.by
>> >> ="p.value")
>> >>> write.table(fdr06, file = "FDR06.csv", sep=",", col.names = NA,
>> >> qmethod="double")
>> >>> Error in data.frame(table = list(logConc = c(-28.7946,
-12.93992, :
>> >> arguments imply differing number of rows: 10427, 1, 2
>> >>>
>> >>> If I do the same with n = 10426, it is executinig without any
error.
>> >> Except that I am missing one row.
>> >>>
>> >>> Any suggetions on how to export all the columns for all the
rows will be
>> >> a great help.
>> >>>
>> >>> Many thanks!
>> >>> Sridhara
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Sun, May 22, 2011 at 5:34 AM, Mark Robinson
<mrobinson@wehi.edu.au>
>> >> wrote:
>> >>> Hi Sridhara,
>> >>>
>> >>> If you haven't already, you might have a solid read of the
edgeR user's
>> >> guide, it has answers to some of your questions.
>> >>>
>> >>>
>> >>> On May 21, 2011, at 11:20 PM, Sridhara Gupta Kunjeti wrote:
>> >>>
>> >>>> Hello,
>> >>>> I have used edgeR for DGE analysis and I have few questions
regarding
>> >> the
>> >>>> model and comparisions.
>> >>>>
>> >>>> 1) What kind of statistical model is taken into account to
analyze
>> >> treatment
>> >>>> structure and conduct analysis of variance?
>> >>>
>> >>> For the example you show below (a 2-group comparison), the
'Negative
>> >> binomial models' Section in the user's guide covers this. Of
course, the
>> >> package has facility for more complicated "treatment structure"
through
>> >> generalized linear models (See the 'Experiment with multiple
factors'
>> >> Section, for example).
>> >>>
>> >>>
>> >>>> 2) How does the edgeR correct the multiple comparisions?
>> >>>
>> >>> See ?topTags; its also mentioned in the user's guide.
>> >>>
>> >>> ----
>> >>> topTags(object, n=10, adjust.method="BH", sort.by="p.value")
>> >>> ...
>> >>> adjust.method: character string stating the method used to
adjust
>> >>> p-values for multiple testing, passed on to p.adjust
>> >>> ...
>> >>> ----
>> >>>
>> >>>
>> >>>> 3) I am assuming that the calculated p-values in the output
after
>> >>>> performing the tagwiseDispersion are after adjusting for
multiple
>> >> testing.
>> >>>> Please correct me if I am wrong? If so, what kind of multiple
testing
>> >> is
>> >>>> taken into account?
>> >>>
>> >>> exactTest() doesn't do the multiple testing correction, but
topTags()
>> >> does.
>> >>>
>> >>> HTH,
>> >>> Mark
>> >>>
>> >>>
>> >>>>
>> >>>> The arguments that I passed are as follows:
>> >>>>> raw.data <- read.delim("c33_con3.txt")
>> >>>>> raw.data.2a <- read.delim ("2c33_con3.txt")
>> >>>>> d2a <- raw.data.2a[, 2:5]
>> >>>>> rownames(d2a) <- raw.data.2a[,1]
>> >>>>> group2a <- c(rep("c33", 2), rep("con3", 2))
>> >>>>> d2a <- DGEList(counts = d2a, group = group2a)
>> >>>>> d2a <- estimateCommonDisp(d2a)
>> >>>>> d2a <- estimateTagwiseDisp(d2a, prior.n = 10, grid.length =
500)
>> >>>>> prior.n2a <- estimateSmoothing(d2a)
>> >>>>> de2a.tgw <- exactTest(d2a, common.disp = FALSE)
>> >>>>> de2a.tgw
>> >>>> An object of class "DGEExact"
>> >>>> $table
>> >>>>
>> >>>> logConc logFC p.value
>> >>>> MGG_00005 | Mo hypothetical protein (1014 nt)
>> >>>> -16.67772 0.05248378 0.9394668
>> >>>> MGG_00015 | Mo catechol O-methyltransferase (1102 nt)
>> >>>> -14.68066 0.36189877 0.2786389
>> >>>> MGG_00016 | Mo 2-epi-5-epi-valiolone synthase (1739 nt)
>> >>>> -13.50677 0.32379041 0.3759259
>> >>>> MGG_00017 | Mo L-aminoadipate-semialdehyde dehydrogenase (3472
nt)
>> >> -14.28686
>> >>>> -0.35747999 0.3040601
>> >>>> MGG_00018 | Mo integral membrane protein (2504 nt)
>> >>>> -14.56791 0.45187243 0.1701996
>> >>>> 11452 more rows ...
>> >>>> $comparison
>> >>>> [1] "c33" "con3"
>> >>>> $genes
>> >>>> NULL
>> >>>>
>> >>>>
>> >>>>> sessionInfo()
>> >>>> R version 2.12.1 (2010-12-16)
>> >>>> Platform: i386-pc-mingw32/i386 (32-bit)
>> >>>> locale:
>> >>>> [1] LC_COLLATE=English_United States.1252
LC_CTYPE=English_United
>> >>>> States.1252 LC_MONETARY=English_United States.1252
>> >>>> [4] LC_NUMERIC=C
LC_TIME=English_United
>> >>>> States.1252
>> >>>> attached base packages:
>> >>>> [1] stats graphics grDevices utils datasets methods
base
>> >>>> other attached packages:
>> >>>> [1] edgeR_2.0.3
>> >>>> loaded via a namespace (and not attached):
>> >>>> [1] limma_3.6.9 tools_2.12.1
>> >>>>
>> >>>> I would really appreciate your comments or suggestions.
>> >>>>
>> >>>> Many thanks!
>> >>>>
>> >>>> Sridhara
>> >>>>
>> >>>> --
>> >>>> Sridhara G Kunjeti
>> >>>> PhD Candidate
>> >>>> University of Delaware
>> >>>> Department of Plant and Soil Science
>> >>>> email- sridhara@udel.edu
>> >>>> Ph: 832-566-0011
>> >>>>
>> >>>> [[alternative HTML version deleted]]
>> >>>>
>> >>>> _______________________________________________
>> >>>> Bioconductor mailing list
>> >>>> Bioconductor@r-project.org
>> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >>>> Search the archives:
>> >>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >>>
>> >>> ------------------------------
>> >>> Mark Robinson, PhD (Melb)
>> >>> Epigenetics Laboratory, Garvan
>> >>> Bioinformatics Division, WEHI
>> >>> e: mrobinson@wehi.edu.au
>> >>> e: m.robinson@garvan.org.au
>> >>> p: +61 (0)3 9345 2628
>> >>> f: +61 (0)3 9347 0852
>> >>> ------------------------------
>> >>>
>> >>>
>> >>>
______________________________________________________________________
>> >>> The information in this email is confidential and intended
solely for the
>> >> addressee.
>> >>> You must not disclose, forward, print or use it without the
permission of
>> >> the sender.
>> >>>
______________________________________________________________________
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Sridhara G Kunjeti
>> >>> PhD Candidate
>> >>> University of Delaware
>> >>> Department of Plant and Soil Science
>> >>> email- sridhara@udel.edu
>> >>> Ph: 832-566-0011
>> >>
>> >> ------------------------------
>> >> Mark Robinson, PhD (Melb)
>> >> Epigenetics Laboratory, Garvan
>> >> Bioinformatics Division, WEHI
>> >> e: mrobinson@wehi.edu.au
>> >> e: m.robinson@garvan.org.au
>> >> p: +61 (0)3 9345 2628
>> >> f: +61 (0)3 9347 0852
>> >> ------------------------------
>> >>
>> >>
>> >>
______________________________________________________________________
>> >> The information in this email is confidential and
inte...{{dropped:20}}
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor@r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> -------------------------------------------------------------------
-----
>> Davis J McCarthy
>> Research Technician
>> Bioinformatics Division
>> Walter and Eliza Hall Institute of Medical Research
>> 1G Royal Parade, Parkville, Vic 3052, Australia
>> dmccarthy@wehi.edu.au
>> http://www.wehi.edu.au
>>
>>
>>
>>
>>
______________________________________________________________________
>> The information in this email is confidential and intended solely
for the addressee.
>> You must not disclose, forward, print or use it without the
permission of the sender.
>>
______________________________________________________________________
>>
>>
>>
>> --
>> Sridhara G Kunjeti
>> PhD Candidate
>> University of Delaware
>> Department of Plant and Soil Science
>> email- sridhara@udel.edu
>> Ph: 832-566-0011
>
> --------------------------------------------------------------------
----
> Davis J McCarthy
> Research Technician
> Bioinformatics Division
> Walter and Eliza Hall Institute of Medical Research
> 1G Royal Parade, Parkville, Vic 3052, Australia
> dmccarthy@wehi.edu.au
> http://www.wehi.edu.au
>
>
>
>
>
______________________________________________________________________
> The information in this email is confidential and intended solely
for the addressee.
> You must not disclose, forward, print or use it without the
permission of the sender.
>
______________________________________________________________________
>
>
>
> --
> Sridhara G Kunjeti
> PhD Candidate
> University of Delaware
> Department of Plant and Soil Science
> email- sridhara@udel.edu
> Ph: 832-566-0011
----------------------------------------------------------------------
--
Davis J McCarthy
Research Technician
Bioinformatics Division
Walter and Eliza Hall Institute of Medical Research
1G Royal Parade, Parkville, Vic 3052, Australia
dmccarthy@wehi.edu.au
http://www.wehi.edu.au
______________________________________________________________________
The information in this email is confidential and intended solely for
the addressee.
You must not disclose, forward, print or use it without the permission
of the sender.
______________________________________________________________________
[[alternative HTML version deleted]]

ADD COMMENT
• link
•
modified 6.5 years ago
by
Sridhara Gupta Kunjeti •

**320**• written 6.5 years ago by Davis McCarthy •**260**