Page 248 - PC2019 Program & Proceedings
P. 248
PLANT CANADA 2019
TOPIC 5: Bioinformatics and Systems Biology (Posters P69-P73)
P69. Custom selected reference genes outperform pre-defined reference genes in transcriptomic
analysis
*
Goncalves dos Santos, K. ; I. Desgagné-Penix; H. Germain
Université du Québec à Trois-Rivières
RNA sequencing allows the measuring of gene expression at a resolution unmet by expression arrays or
RT-qPCR. It is however necessary to normalize sequencing data before comparing expression levels. The
use of internal control genes or spike-ins is advocated in the literature for scaling read counts, but
methods for choosing reference genes are mostly targeted at RT-qPCR studies and require a set of pre-
selected candidate controls or pre-selected target genes. We report an R-based script to select internal
control genes based solely on read counts and gene sizes. This novel method first normalizes the read
counts and then excludes weakly expressed genes. It then selects as references the genes with lowest
Transcripts per Million covariance. We picked custom reference genes for the differential expression
analysis of three transcriptome sets from transgenic Arabidopsis plants expressing fungal effector proteins
tagged with GFP (using GFP alone as the control). The custom reference genes showed lower covariance
and fold change as well as a broader range of expression levels than commonly used reference genes.
When analyzed with NormFinder and geNorm, the custom selected genes were more stable than the
typical references. The proposed method is innovative, rapid and simple. Since it does not depend on
genome annotation, it can be used with any organism, and does not require pre-selected reference
candidates or target genes that are not always available.
Karen Goncalves dos Santos (cris.kgs@gmail.com)
P70. Redundancy removal in de novo transcriptomes of Piper nigrum (black pepper)
*
Doering, M. ; J. Stout (University of Manitoba) The computationally complex problem of de novo
transcriptome assembly, without a reference genome, has led to numerous algorithms to reliably and
intelligently piece together short reads. Each method has its own strengths, and tuning with different k-
mers can further improve the assembled transcriptome. Over-assembling, using multiple methods to take
advantage of unique strengths, is gaining recognition. Redundancy musst then be removed from the
combined assemblies. CD-HIT-EST is commonly used, as is the EvidentialGene (evigene) pipeline which
incorporates CD-HIT-EST, to select the longest assembled isoforms but erroneous selection of long
incorrectly assembled contigs is possible. The homology BLAST method (HBM) removes redundancy by
comparison with known protein-coding sequences. Despite being proposed four years ago, its usage has
been cited only three times (once in plants) and never in over-assembled RNA-Seq data. We extracted
RNA from Piper nigrum roots and sequenced 125 bp paired end reads on the Illumina HiSeq 2500
platform. Quality-trimmed reads were assembled with SOAPdenovo-Trans, TransABySS, and BinPacker
with multiple k-mers. Redundancy was removed with evigene and HBM. The same assembly and
redundancy removal methods were applied to an Arabidopsis read set from the NCBI short read archive
to allow for comparison to the Araport11 gene models. To our knowledge, this is the first application of
over-assembling a transcriptome of the polyploid P. nigrum, and the first application of HBM in the over-
assembly context.
Matthew Doering (umdoeri0@myumanitoba.ca)
Page 246 of 339