goseq and transcript length data
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Dear All, Appreciate your time. Need your expertise. I am trying to use GOSeq for GO analysis of my RNA-seq experiments. I was using Tophat->Cufflinks for DE, and mouse mm10 for annotation. I am trying to build the gene length database by myself, given that the current version of goseq does not support the mm10 build. 1, Cufflinks seems ignored the original gene identifier that comes with the mm10 and make its own, but they do keep the gene name in its record, so I will just take gene name as identifier in my process. I have already used the gene names for building the assayed gene vector and the DE gene vector was built too. 2, Then it comes to the transcript length issue, I noticed one of cufflink output file genes.fpkm_tracking contains both the gene name and gene length information. The length column has this format: chr1:4807892-4846735. This is for Lypla1 gene. But this sequence range include introns too. So I can not simply get the transcript length by subtracting the second number by the first one. I went into every output file of cufflinks/cuffdiff and could not find a file containing the transcript length information. Where can I get the transcript length information? 3, In my experiment, I only have 39 DE genes, do you think it is even worthy for me to use goseq? Or should I simply go to DAVID? Best, Tom -- output of sessionInfo(): goseq -- Sent via the guest posting facility at bioconductor.org.
Annotation GO PROcess goseq Annotation GO PROcess goseq • 1.8k views
ADD COMMENT
0
Entering edit mode
@nadia-davidson-5739
Last seen 5.0 years ago
Australia
> I am trying to build the gene length database by myself, given that the > current version of goseq does not support the mm10 build. > > 2, Then it comes to the transcript length issue, I noticed one of cufflink > output file genes.fpkm_tracking contains both the gene name and gene > length information. The length column has this format: > chr1:4807892-4846735. This is for Lypla1 gene. But this sequence range > include introns too. So I can not simply get the transcript length > by subtracting the second number by the first one. I went into every > output file of cufflinks/cuffdiff and could not find a file containing the > transcript length information. Where can I get the transcript length > information? > > 3, In my experiment, I only have 39 DE genes, do you think it is even worthy > for me to use goseq? Or should I simply go to DAVID? Hi Tom, If you use goseq 1.12 or later it should fetch the mm10 lengths. Which annotation are you using? Getting the lengths from Cufflinks genes can be fiddly in my experience. You can do it by reading the annotation file into R and calculating intervals with GRanges. 39 genes is not many though. It probably wouldn't hurt to run the data through DAVID just to see if anything comes up. I've found DAVID pretty user friendly. Cheers, Nadia.
ADD COMMENT

Login before adding your answer.

Traffic: 691 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6