Question: Issue during transcript variant analysis using rnaseqDTU-DRIMseq-DEXseq.
0
gravatar for saddamhusain77
7 months ago by
saddamhusain770 wrote:

HI, I am analysing differential transcript usage on rna seq data and I am stuck at this point- It seems to me that issue is in the ctsgenco object where a single row contains multiple transcript ids but dont know how to correct it.

> filesgenco <- list.files( pattern = ".txt",full.names = TRUE)
> filesgenco
[1] "./hesalc1.txt" "./hesalc2.txt" "./hesalt1.txt" "./hesalt2.txt"
> names(filesgenco)
NULL
> mycols
       condition sample_id
HECON1 control      HECON1
HECON2 control      HECON2
HETRE1 treated      HETRE1
HETRE2 treated      HETRE2
> names(filesgenco) <- mycols$sampleID
> filesgenco
[1] "./hesalc1.txt" "./hesalc2.txt" "./hesalt1.txt" "./hesalt2.txt"
> txigenco <- tximport(filesgenco, type="salmon", txOut=TRUE,
+                      countsFromAbundance="scaledTPM")
reading in files with read_tsv
1 2 3 4 
> 
> ctsgenco <- txigenco$counts
> ctsgenco <- ctsgenco[rowSums(ctsgenco) > 0,]
> gtf <- "gencode.v29.annotation.gtf"
> txdfgenco <- select(txdb, keys(txdb, "GENEID"), "TXNAME", "GENEID")
'select()' returned 1:many mapping between keys and columns
> tabgenco <- table(txdfgenco$GENEID)
> txdfgenco$ntx <- tabgenco[match(txdfgenco$GENEID, names(tabgenco))]
> head(txdfgenco)
              GENEID            TXNAME ntx
1 ENSG00000000003.14 ENST00000612152.4   5
2 ENSG00000000003.14 ENST00000373020.8   5
3 ENSG00000000003.14 ENST00000614008.4   5
4 ENSG00000000003.14 ENST00000496771.5   5
5 ENSG00000000003.14 ENST00000494424.1   5
6  ENSG00000000005.5 ENST00000373031.4   2
> ctsgenco[1:3,1:3]
[,1]
ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|DDX11L1-202|DDX11L1|1657|processed_transcript|               0.6308597

ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|DDX11L1-201|DDX11L1|632|transcribed_unprocessed_pseudogene|  0.0000000

ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|WASH7P-201|WASH7P|1351|unprocessed_pseudogene|              28.3130559

[,2]
ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|DDX11L1-202|DDX11L1|1657|processed_transcript|               0.6636265

ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|DDX11L1-201|DDX11L1|632|transcribed_unprocessed_pseudogene|  0.0825407

ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|WASH7P-201|WASH7P|1351|unprocessed_pseudogene|              18.5500264

[,3]
ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|DDX11L1-202|DDX11L1|1657|processed_transcript|               0.00000

ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|DDX11L1-201|DDX11L1|632|transcribed_unprocessed_pseudogene|  0.00000

ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|WASH7P-201|WASH7P|1351|unprocessed_pseudogene|              54.07508

> range(colSums(cts)/1e6)
[1] 16.26968 26.21900
> range(colSums(ctsgenco)/1e6)
[1] 12.64907 17.67470
> all(rownames(ctsgenco) %in% txdfgenco$TXNAME)
[1] FALSE

Any help/suggestion would be appreciated.

ADD COMMENTlink modified 7 months ago • written 7 months ago by saddamhusain770
Answer: Issue during rnaseqDTU analysis
0
gravatar for Michael Love
7 months ago by
Michael Love26k
United States
Michael Love26k wrote:

Please post all of your code. See the posting guide for suggestions on how to formulate a post. Also you may want to tag the name of the Bioconductor package you are using, DRIMSeq? DEXSeq?

ADD COMMENTlink written 7 months ago by Michael Love26k

Thank you Michael. I have just corrected my post and have added entire code I am using.

ADD REPLYlink written 7 months ago by saddamhusain770

I can see by eye why the rownames in cts are not contained in the TXNAME column of txdf. Can you see it too?

You should chop off the extra characters.

Try using this function:

chop <- function(x) sub("\\|.*","",x)
ADD REPLYlink modified 7 months ago • written 7 months ago by Michael Love26k

The gene symbol and description. Please tell me if I have used the function properly as I am getting following output.

> chop1 <-function(x) {
+     sub("\\|.*","",x)
+ }
> ctsgenco4 <- chop1(ctsgenco)
> head(ctsgenco4)

[1] "0.630859660016231"       "0"                  "28.3130559180533"          "0.0143187459425427"
[5]  "0"                   "0.288752943795375" 
ADD REPLYlink modified 7 months ago • written 7 months ago by saddamhusain770

That is not correct. You are applying it to the counts. I suggested to remove the extra characters from the rownames of the counts.

Do you see the problem I am pointing out? You have extra characters on the rownames of the counts, and this doesn't match what's in TXNAME. You may want to collaborate with a bioinformatician to help you with this step, if the manipulation of objects in R is very difficult.

ADD REPLYlink written 7 months ago by Michael Love26k

Yeah sure. I will consider your advice. Thank you for your time.

ADD REPLYlink written 7 months ago by saddamhusain770

I am sorry to bother you again but the dmDSdata output does'nt look like what is there in the vignettes. It shows just 1 genes.

> all(rownames(ctsgenco) %in% txdfgenco$TXNAME)

[1] TRUE

> txdfgenco <- txdfgenco[match(rownames(ctsgenco),txdfgenco$TXNAME),]

> all(rownames(ctsgenco) == txdfgenco$TXNAME)

[1] TRUE

> countsgenco <- data.frame(gene_id=txdfgenco$GENEID,
+                      feature_id=txdfgenco$TXNAME,
+                      ctsgenco)

> head(countsgenco, 3)

gene_id        feature_id     HECON1     HECON2   HETRE1     HETRE2

ENST00000456328.2 ENSG00000223972.5 ENST00000456328.2  0.6308597  0.6636265  0.00000 0.08355679

ENST00000450305.2 ENSG00000223972.5 ENST00000450305.2  0.0000000  0.0825407  0.00000 0.00000000

ENST00000488147.1 ENSG00000227232.5 ENST00000488147.1 28.3130559 18.5500264 54.07508 3.89251328



> library(DRIMSeq)
> dgenco <- dmDSdata(counts=countsgenco, samples=samples)
> dgenco

An object of class dmDSdata 

with 1 genes and 4 samples

* data accessors: counts(), samples()


> session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.5.2 (2018-12-20)
 os       macOS High Sierra 10.13.6   
 system   x86_64, darwin15.6.0        
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       Asia/Kolkata                
 date     2019-04-11
ADD REPLYlink modified 7 months ago • written 7 months ago by saddamhusain770

I'm not sure.

But I also can't tell from what you have posted above, what you are providing as countsgenco. You could show for example the dim of this. Are you providing it with the counts for all the genes?

ADD REPLYlink written 7 months ago by Michael Love26k
> dim(countsgenco)
[1] 174318      6

The row names of countgenco is feature_id as you could notice above. Countsgenco is nothing but the counts and ctsgenco is cts; I have just added "genco" acronym.

It might be relevant to ask, since I have run the code beyond this point just to see how things work and got 900 genes after stageR testing. Do i need to run the DEXseq codes as the stageR output has already given me the differential used transcript along with the p-adj values and it appears to work similar to DRIMseq?

ADD REPLYlink modified 7 months ago • written 7 months ago by saddamhusain770

You don't need to run both methods. You can just run one or the other.

If you are having continued difficulty with the DRIMSeq set up above, I'd make a new post with all your code and tag the DRIMSeq package so those authors are notified. (I think if you add a tag to an existing post it doesn't email the package maintainers).

ADD REPLYlink written 7 months ago by Michael Love26k

Thank you for helping me on this issue.

ADD REPLYlink written 7 months ago by saddamhusain770
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 340 users visited in the last hour