Error loading sample and control data using Ballgown due to different i_id in i_data.ctab
1
0
Entering edit mode
pushgct • 0
@pushgct-24124
Last seen 4.2 years ago

I am finding differentially expressed genes in sample and control data using ballgown. Below is my directory structure:


  • extdata
    • ALL control1
      • tdata.ctab
      • i2t.ctab
      • idata.ctab
      • e2t.ctab
      • e_data.ctab
    • ALL control2
    • ALL control3
    • ALL sample1
    • ALL sample2
    • ALL sample3
    • ALL sample4
    • ALL sample5

I am finding differentially expressed genes in sample(n=5) and control(n=3) data using stattest() function in R to get most variance. But I am unable to load both sample and control data together as the intron i_id are different for samples and controls. I get below error. Could you please guide on how I can proceed? Should I process sample and control in separate ballgown objects? But if so how can I calculate differentially expressed genes between the groups? If I need to process together I might have to delete the extra introns data in samples which will cause loss in data. Please help.

Code and Output:

    pheno_data=read.csv(file="pheno_data.csv")
    `pheno_data`
             id   state
        1 ALL control1 control
        2 ALL control2 control
        3 ALL control3 control
        4  ALL sample1  sample
        5  ALL sample2  sample
        6  ALL sample3  sample
        7  ALL sample4  sample
        8 ALL sample5  sample

    bg=ballgown(dataDir = "C:/Users/lak/Documents/extdata", samplePattern = "ALL", pData= pheno_data)

Wed Sep 09 17:21:18 2020
Wed Sep 09 17:21:19 2020: Reading linking tables
Wed Sep 09 17:21:29 2020: Reading intron data files
Wed Sep 09 17:24:12 2020: Merging intron data
Error in ballgown(dataDir = "C:/Users/lak/Documents/extdata", samplePattern = "ALL",  : 
 **intron ids were either not the same or not in the same order across samples. double check i_data.ctab for each sample.**
In addition: Warning messages:
1: In x$i_id != intronAll[[1]]$i_id :
  longer object length is not a multiple of shorter object length
2: In x$i_id != intronAll[[1]]$i_id :
  longer object length is not a multiple of shorter object length
3: In x$i_id != intronAll[[1]]$i_id :
  longer object length is not a multiple of shorter object length
4: In x$i_id != intronAll[[1]]$i_id :
  longer object length is not a multiple of shorter object length
5: In x$i_id != intronAll[[1]]$i_id :
  longer object length is not a multiple of shorter object length
ballgown i_data.ctab differential gene expression • 1.4k views
ADD COMMENT
0
Entering edit mode
Alyssa Frazee ▴ 210
@alyssa-frazee-6710
Last seen 4.0 years ago
San Francisco, CA, USA

Hi! Sorry you're having this issue.

My guess is that the problem is upstream of ballgown. Did you create all of the .ctab files using the same run of your assembly program? If the IDs don't match, ballgown can't know how to compare expression of the same intron between the conditions (since the "same intron" is not well-defined). So I recommend figuring out how to get the same transcriptome assembly (with the same IDs) for all of your samples. Good luck!!

ADD COMMENT

Login before adding your answer.

Traffic: 695 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6