DEXSeq: building exon count set
1
0
Entering edit mode
@vincenzo-capece-4556
Last seen 10.3 years ago
Dear all, I am using DEXSeq to make splicing analysis on my data. But I have a problem with the function read.HTSeqCounts: in the countfiles argument, HTSeqCounts get in input a vector with the paths to exon count files, but I have(as in DESeq package) a data.frame like this(computed by my pipeline): > head(countTable) X498_A X479_D X480_D X509_B X524_A X515_A X496_B X508_C ENSMUSG00000000001:001 338 397 404 350 404 381 485 521 ENSMUSG00000000001:002 47 48 55 26 40 39 35 41 ENSMUSG00000000001:003 23 12 37 17 14 18 27 15 ENSMUSG00000000001:004 17 9 14 18 8 9 27 15 ENSMUSG00000000001:005 16 15 16 9 13 18 16 24 ENSMUSG00000000001:006 18 14 11 20 12 22 20 24 X517_D X478_D X497_A X533_D X487_B X516_D X531_B X513_D ENSMUSG00000000001:001 514 513 387 620 579 627 467 616 ENSMUSG00000000001:002 48 65 31 65 58 75 39 85 ENSMUSG00000000001:003 19 35 26 32 25 34 22 29 ENSMUSG00000000001:004 19 23 4 18 23 25 13 24 ENSMUSG00000000001:005 16 17 11 12 20 24 8 18 ENSMUSG00000000001:006 26 26 9 22 24 24 19 25 X507_A X505_C X506_C X525_C X514_A X546_C X547_C X488_B ENSMUSG00000000001:001 569 725 621 488 534 575 639 1037 ENSMUSG00000000001:002 52 66 57 56 42 70 63 89 ENSMUSG00000000001:003 30 40 33 26 19 24 27 62 ENSMUSG00000000001:004 23 20 25 22 22 18 19 30 ENSMUSG00000000001:005 22 19 18 7 20 19 21 35 ENSMUSG00000000001:006 22 25 19 21 29 25 19 32 My question is: Is there a way in which I can use function read.HTSeqCounts with my table as input? I used all DEXSeq python scripts to obtain my outputs. Thanks in advance. Regards, Vincenzo [[alternative HTML version deleted]]
DESeq DEXSeq DESeq DEXSeq • 1.3k views
ADD COMMENT
0
Entering edit mode
Alejandro Reyes ★ 1.9k
@alejandro-reyes-5124
Last seen 5 months ago
Novartis Institutes for BioMedical Reseā€¦
Dear Vincenzo, The function newExonCountSet will do the job. Have a look into the pasilla vignette, it explains how to generate an ExonCountSet object both from the outputs of the python script or from basic R objects. Best regards, Alejandro > Dear all, > I am using DEXSeq to make splicing analysis on my data. > But I have a problem with the function read.HTSeqCounts: in the countfiles > argument, HTSeqCounts get in input a vector with the paths to exon count > files, but I have(as in DESeq package) a data.frame like this(computed by > my pipeline): > >> head(countTable) > X498_A X479_D X480_D X509_B X524_A X515_A X496_B > X508_C > ENSMUSG00000000001:001 338 397 404 350 404 381 485 > 521 > ENSMUSG00000000001:002 47 48 55 26 40 39 35 > 41 > ENSMUSG00000000001:003 23 12 37 17 14 18 27 > 15 > ENSMUSG00000000001:004 17 9 14 18 8 9 27 > 15 > ENSMUSG00000000001:005 16 15 16 9 13 18 16 > 24 > ENSMUSG00000000001:006 18 14 11 20 12 22 20 > 24 > X517_D X478_D X497_A X533_D X487_B X516_D X531_B > X513_D > ENSMUSG00000000001:001 514 513 387 620 579 627 467 > 616 > ENSMUSG00000000001:002 48 65 31 65 58 75 39 > 85 > ENSMUSG00000000001:003 19 35 26 32 25 34 22 > 29 > ENSMUSG00000000001:004 19 23 4 18 23 25 13 > 24 > ENSMUSG00000000001:005 16 17 11 12 20 24 8 > 18 > ENSMUSG00000000001:006 26 26 9 22 24 24 19 > 25 > X507_A X505_C X506_C X525_C X514_A X546_C X547_C > X488_B > ENSMUSG00000000001:001 569 725 621 488 534 575 639 > 1037 > ENSMUSG00000000001:002 52 66 57 56 42 70 63 > 89 > ENSMUSG00000000001:003 30 40 33 26 19 24 27 > 62 > ENSMUSG00000000001:004 23 20 25 22 22 18 19 > 30 > ENSMUSG00000000001:005 22 19 18 7 20 19 21 > 35 > ENSMUSG00000000001:006 22 25 19 21 29 25 19 > 32 > > My question is: Is there a way in which I can use function read.HTSeqCounts > with my table as input? > I used all DEXSeq python scripts to obtain my outputs. > > Thanks in advance. > Regards, > > Vincenzo > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Dear Alejandro, Thanks a lot. I have already used that function and it works perfectly; the problem is to add to this function the transcripts data(with argument "transcripts" in newExonCountSet. For instance, I need to know the start and end position of my exons). With read.HTSeqCount is very easy, I need just add the path to gtf file; but in the case of newExonCountSet the argument "transcripts" is not very clear in the manual: "A character vector of the same length as the rows of the count data containing, for each row in countData, a concatenation of transcript IDs separated by the character ";". This means that if an exon is contained in the transcripts "A", "B" and "C", the field of the row corresponding to that exon should contain "A;B;C". This information is only needed for the plotDEXSeq function, not for the actual tests." I hope I am clear in the answer. Thanks a lot On 31 January 2013 09:26, Alejandro Reyes <alejandro.reyes@embl.de> wrote: > Dear Vincenzo, > > The function newExonCountSet will do the job. > Have a look into the pasilla vignette, it explains how to generate an > ExonCountSet object both from the outputs of the python script or from > basic R objects. > > Best regards, > Alejandro > > Dear all, >> I am using DEXSeq to make splicing analysis on my data. >> But I have a problem with the function read.HTSeqCounts: in the countfiles >> argument, HTSeqCounts get in input a vector with the paths to exon count >> files, but I have(as in DESeq package) a data.frame like this(computed by >> my pipeline): >> >> head(countTable) >>> >> X498_A X479_D X480_D X509_B X524_A X515_A X496_B >> X508_C >> ENSMUSG00000000001:001 338 397 404 350 404 381 485 >> 521 >> ENSMUSG00000000001:002 47 48 55 26 40 39 35 >> 41 >> ENSMUSG00000000001:003 23 12 37 17 14 18 27 >> 15 >> ENSMUSG00000000001:004 17 9 14 18 8 9 27 >> 15 >> ENSMUSG00000000001:005 16 15 16 9 13 18 16 >> 24 >> ENSMUSG00000000001:006 18 14 11 20 12 22 20 >> 24 >> X517_D X478_D X497_A X533_D X487_B X516_D X531_B >> X513_D >> ENSMUSG00000000001:001 514 513 387 620 579 627 467 >> 616 >> ENSMUSG00000000001:002 48 65 31 65 58 75 39 >> 85 >> ENSMUSG00000000001:003 19 35 26 32 25 34 22 >> 29 >> ENSMUSG00000000001:004 19 23 4 18 23 25 13 >> 24 >> ENSMUSG00000000001:005 16 17 11 12 20 24 8 >> 18 >> ENSMUSG00000000001:006 26 26 9 22 24 24 19 >> 25 >> X507_A X505_C X506_C X525_C X514_A X546_C X547_C >> X488_B >> ENSMUSG00000000001:001 569 725 621 488 534 575 639 >> 1037 >> ENSMUSG00000000001:002 52 66 57 56 42 70 63 >> 89 >> ENSMUSG00000000001:003 30 40 33 26 19 24 27 >> 62 >> ENSMUSG00000000001:004 23 20 25 22 22 18 19 >> 30 >> ENSMUSG00000000001:005 22 19 18 7 20 19 21 >> 35 >> ENSMUSG00000000001:006 22 25 19 21 29 25 19 >> 32 >> >> My question is: Is there a way in which I can use function >> read.HTSeqCounts >> with my table as input? >> I used all DEXSeq python scripts to obtain my outputs. >> >> Thanks in advance. >> Regards, >> >> Vincenzo >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > -- Regards, Capece Vincenzo [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Dear Vincenzo, > Dear Alejandro, > Thanks a lot. > I have already used that function and it works perfectly; the problem > is to add to this function the transcripts data(with argument > "transcripts" in newExonCountSet. For instance, I need to know the > start and end position of my exons). If you used the dexseq python scripts, you can just give the flattened gtf file to the function read.HTSeqCounts. > With read.HTSeqCount is very easy, I need just add the path to gtf > file; but in the case of newExonCountSet the argument "transcripts" is > not very clear in the manual: > > "A character vector of the same length as the rows of the count data > containing, > for each row in countData, a concatenation of transcript IDs separated > by the > character ";". This means that if an exon is contained in the > transcripts "A", > "B" and "C", the field of the row corresponding to that exon should > contain > "A;B;C". This information is only needed for the plotDEXSeq function, not > for the actual tests." Sorry about that, I will make it clearer. In the meantime, using the pasillaExons example: > fData(pasillaExons)[c("FBgn0000256:E011", "FBgn0000256:E012"),c("geneID", "transcripts")] geneID transcripts FBgn0000256:E011 FBgn0000256 FBtr0077511;FBtr0077512;FBtr0290080;FBtr0290081 FBgn0000256:E012 FBgn0000256 FBtr0290080;FBtr0290081 In this example, we have two exon bins from the gene FBgn0000256. The transcripts FBtr0077511, FBtr0077512, FBtr0290080 and FBtr0290081 use the exon bin E011. The transcripts FBtr0290080 and FBtr0290081 use the exon bin E012. The "transcripts" argument requires this information for each exon bin as a character vector and on each value the different transcripts are separated by a ";", e.g: > head( fData(pasillaExons)$transcripts, 5) [1] "FBtr0077511;FBtr0077513;FBtr0077512;FBtr0290077;FBtr0290079;FBtr02900 78;FBtr0290082;FBtr0290080;FBtr0290081" [2] "FBtr0077511;FBtr0077513;FBtr0077512;FBtr0290077;FBtr0290079;FBtr02900 78;FBtr0290082;FBtr0290080;FBtr0290081" [3] "FBtr0077511;FBtr0077513;FBtr0077512;FBtr0290077;FBtr0290079;FBtr02900 78;FBtr0290082;FBtr0290080;FBtr0290081" [4] "FBtr0077511;FBtr0077513;FBtr0077512;FBtr0290077;FBtr0290079;FBtr02900 78;FBtr0290082;FBtr0290080;FBtr0290081" [5] "FBtr0077511;FBtr0077513;FBtr0077512;FBtr0290077;FBtr0290079;FBtr02900 78;FBtr0290082;FBtr0290080;FBtr0290081" Best wishes, Alejandro > > I hope I am clear in the answer. > Thanks a lot > > On 31 January 2013 09:26, Alejandro Reyes <alejandro.reyes@embl.de> <mailto:alejandro.reyes@embl.de>> wrote: > > Dear Vincenzo, > > The function newExonCountSet will do the job. > Have a look into the pasilla vignette, it explains how to generate > an ExonCountSet object both from the outputs of the python script > or from basic R objects. > > Best regards, > Alejandro > > Dear all, > I am using DEXSeq to make splicing analysis on my data. > But I have a problem with the function read.HTSeqCounts: in > the countfiles > argument, HTSeqCounts get in input a vector with the paths to > exon count > files, but I have(as in DESeq package) a data.frame like > this(computed by > my pipeline): > > head(countTable) > > X498_A X479_D X480_D X509_B X524_A > X515_A X496_B > X508_C > ENSMUSG00000000001:001 338 397 404 350 404 381 > 485 > 521 > ENSMUSG00000000001:002 47 48 55 26 40 39 > 35 > 41 > ENSMUSG00000000001:003 23 12 37 17 14 18 > 27 > 15 > ENSMUSG00000000001:004 17 9 14 18 8 9 > 27 > 15 > ENSMUSG00000000001:005 16 15 16 9 13 18 > 16 > 24 > ENSMUSG00000000001:006 18 14 11 20 12 22 > 20 > 24 > X517_D X478_D X497_A X533_D X487_B > X516_D X531_B > X513_D > ENSMUSG00000000001:001 514 513 387 620 579 627 > 467 > 616 > ENSMUSG00000000001:002 48 65 31 65 58 75 > 39 > 85 > ENSMUSG00000000001:003 19 35 26 32 25 34 > 22 > 29 > ENSMUSG00000000001:004 19 23 4 18 23 25 > 13 > 24 > ENSMUSG00000000001:005 16 17 11 12 20 24 > 8 > 18 > ENSMUSG00000000001:006 26 26 9 22 24 24 > 19 > 25 > X507_A X505_C X506_C X525_C X514_A > X546_C X547_C > X488_B > ENSMUSG00000000001:001 569 725 621 488 534 575 > 639 > 1037 > ENSMUSG00000000001:002 52 66 57 56 42 70 > 63 > 89 > ENSMUSG00000000001:003 30 40 33 26 19 24 > 27 > 62 > ENSMUSG00000000001:004 23 20 25 22 22 18 > 19 > 30 > ENSMUSG00000000001:005 22 19 18 7 20 19 > 21 > 35 > ENSMUSG00000000001:006 22 25 19 21 29 25 > 19 > 32 > > My question is: Is there a way in which I can use function > read.HTSeqCounts > with my table as input? > I used all DEXSeq python scripts to obtain my outputs. > > Thanks in advance. > Regards, > > Vincenzo > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > -- > Regards, > Capece Vincenzo [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 833 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6