input data from CLC bio edge tests result
1
0
Entering edit mode
tarun2 • 0
@tarun2-11885
Last seen 2.5 years ago
United States

To the developers,

We did some initial RNA-Seq analysis through CLC Bio and got some excel files from it. For our individual sample, we actually get these several columns:

NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Expression values NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Normalized expression values NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Gene name NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Transcripts annotated NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Detected transcripts NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Transcript length NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Unique transcript reads NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Total transcript reads NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Ratio of unique to total (transcript reads) NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Exons NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - RPKM NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Relative RPKM NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Chromosome NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Chromosome region start NIL_D - Nil-95_Drought_Panicle-3_1 (paired) trimmed (paired) (TE) - Chromosome region end

 

We're trying to use the DESeq2 workflow for analysis after this, but we're not sure which input data from these can we use. The expression values (column 1) has the same value with our total transcript reads (column 8) as integers. The rows are the list of differentially express genes across all the chromosomes in rice.

Can we use this as input data?

Please and kindly advise.

 

Sincerely,

Asher

deseq2 • 1.3k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 9 hours ago
United States

You need to obtain a matrix of the count or estimated count of fragments that can be assigned to each gene in each sample. And you need this data over all the genes, not just the DE genes. From this point, you can simply follow the DESeq2 vignette or the workflow. You should request information from whoever processed your data on how to obtain this matrix for use with DESeq2.

ADD COMMENT
0
Entering edit mode

Thank you very much for your response. 

I did actually used the total transcript reads or the expression value from CLC bio edge tests results as countdata input. Dumb follow-up question here, so the total transcript reads or the expression value cannot be used as input file? What I did with that was with these codes:

#Upload table containing the NIL Drought vs Control#
countData<-read.table("Trial_RNASeq_NIL_DroughtxControl_all.txt",header=T,row.names=1)

#Lets see what is in countData#
head(countData)

#make a data frame colData, with column "condition" and "genotype", get entries in the  column "condition" and "genotype" from the column names of countData##
#this data frame will become a deseq table
colData<-data.frame(condition=ifelse(grepl("Drought",colnames(countData)),"Drought","Control"),
                    genotype=c(rep("NIL",8),rep("Swarna",7)) )

#add rownames in colData using the colnames of countData#
rownames(colData)<-colnames(countData)

#Create DESeq dataset from countData and colData matrix
##construct your DESeq2 data set, making sure to specify the design matrix here
dds<-DESeqDataSetFromMatrix(countData,colData,formula(~genotype+condition+genotype:condition))

#releveling column names so "Drought" will come earlier
colData(dds)$condition<-relevel(colData(dds)$condition,"Drought")

#RUN DESeq2 (differential expression analysis) on the dataset 
#DESeq is designed to assess the statistical significance of expression differences measured in RNAseq.
dds<-DESeq(dds)

Thanks and best regards.

Sincerely,

Asher

ADD REPLY
0
Entering edit mode
I don't know about the output of the upstream software. That's for you to make sure of by reading its documentation, or if in doubt you need to contact the developers.
ADD REPLY
0
Entering edit mode

Thank you again for responding.

I checked the CLC Genomics manual and found this part.

The Expression value parameter describes how expression per gene or transcript can be defined
in different ways on both levels:
 Total counts. When the reference is annotated with genes only, this value is the total
number of reads mapped to the gene. For un-annotated references, this value is the
total number of reads mapped to the reference sequence. For references annotated with
transcripts and genes, the value reported for each gene is the number of reads that map
to the exons of that gene. The value reported per transcript is the total number of reads
mapped to the transcript.
 Unique counts. This is similar to the above, except only reads that are non-specifically mapped are counted.This is the number of reads that match uniquely to the gene or its transcripts.

Another dumb question here does this looks like the counts or estimated counts of fragments that can be assigned to each gene.

Thank you so much for clarifying things out.

 

Best regards,

Asher

ADD REPLY

Login before adding your answer.

Traffic: 591 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6