Application of dynamic topic models to toxicogenomics data.



BACKGROUND: All biological processes are inherently dynamic. Biological systems evolve transiently or sustainably according to sequential time points after perturbation by environment insults, drugs and chemicals. Investigating the temporal behavior of molecular events has been an important subject to understand the underlying mechanisms governing the biological system in response to, such as, drug treatment. The intrinsic complexity of time series data requires appropriate computational algorithms for data interpretation. In this study, we propose, for the first time, the application of dynamic topic models (DTM) for analyzing time-series gene expression data. RESULTS: A large time-series toxicogenomics dataset was studied. It contains over 3144 microarrays of gene expression data corresponding to rat livers treated with 131 compounds (most are drugs) at two doses (control and high dose) in a repeated schedule containing four separate time points (4-, 8-, 15- and 29-day). We analyzed, with DTM, the topics (consisting of a set of genes) and their biological interpretations over these four time points. We identified hidden patterns embedded in this time-series gene expression profiles. From the topic distribution for compound-time condition, a number of drugs were successfully clustered by their shared mode-of-action such as PPARɑ agonists and COX inhibitors. The biological meaning underlying each topic was interpreted using diverse sources of information such as functional analysis of the pathways and therapeutic uses of the drugs. Additionally, we found that sample clusters produced by DTM are much more coherent in terms of functional categories when compared to traditional clustering algorithms. CONCLUSIONS: We demonstrated that DTM, a text mining technique, can be a powerful computational approach for clustering time-series gene expression profiles with the probabilistic representation of their dynamic features along sequential time frames. The method offers an alternative way for uncovering hidden patterns embedded in time series gene expression profiles to gain enhanced understanding of dynamic behavior of gene regulation in the biological system.


Lee, Mikyung; Liu, Zhichao; Huang, Ruili; Tong, Weida;


  • Algorithms
  • Animals
  • Cluster Analysis
  • Computational Biology/ methods
  • Data Mining/ methods
  • Hepatocytes/ drug effects
  • Hepatocytes/ metabolism
  • Humans
  • Metabolic Networks and Pathways/ genetics
  • Models, Biological
  • Oligonucleotide Array Sequence Analysis/ methods
  • Rats
  • Toxicogenetics/ methods
  • Toxins, Biological/ pharmacology
  • Toxins, Biological/ toxicity
  • Transcriptome

External Links