Monday, May 14, 2012

Reverse Engineering Biochemical Networks and Compressive Sensing

Reverse Engineering on graphs using compressive sensing already has some history for man made systems (see [1-6]) but reverse engineering biochemical networks (an important aspect of synthetic biology) is really at its infancy. One of the underlying reason is the lack of data and the fact that in most cases, the chemical reactions are not linear. A few papers, some very recent, are trying to cover this central aspect of making biology an engineering endeavor. I note that none of them seem to be using the recent matrix factorization techniques to help out. Anyway, here they are:





Evolutionary games model a common type of interactions in a variety of complex, networked, natural systems and social systems. Given such a system, uncovering the interacting structure of the underlying network is key to understanding its collective dynamics. Based on compressive sensing, we develop an efficient approach to reconstructing complex networks under game-based interactions from small amounts of data. The method is validated by using a variety of model networks and by conducting an actual experiment to reconstruct a social network. While most existing methods in this area assume oscillator networks that generate continuous-time data, our work successfully demonstrates that the extremely challenging problem of reverse engineering of complex networks can also be addressed even when the underlying dynamical processes are governed by realistic, evolutionary-game type of interactions in discrete time.



The next paper was featured last week but I am putting it here to show that not everybody is on par with respect to dealing with different issues such as hidden nodes: Reconstruction of Arbitrary Biochemical Reaction Networks: A Compressive Sensing Approach by Wei Pan, Ye Yuan, Guy-Bart Stan. The abstract reads:
Reconstruction of biochemical reaction networks is a central topic in systems biology which raises crucial theoretical challenges in system identification. Nonlinear Ordinary Differential Equations (ODEs) that involve polynomial and rational functions are typically used to model biochemical reaction networks. Such nonlinear models make the problem of determining the connectivity of biochemical networks from time-series experimental data quite difficult. In this paper, we present a network reconstruction algorithm that can deal with model descriptions under the form of polynomial and rational functions. Rather than identifying the parameters of linear or nonlinear ODEs characterised by pre-defined equation structures, our methodology allows us to determine the nonlinear ODEs structure together with their associated reaction constants. To solve the network reconstruction problem, we cast it as a Compressive Sensing (CS) problem and use Bayesian Sparse Learning (BSL) algorithms as an efficient way to obtain its solution.

there is also:Inferring Gene Regulatory Networks from Multiple Time Course Gene Expression Datasets by Bo-Lin Chen, Li-Zhi LiuFang-Xiang Wu. The abstract reads:
We proposed a scheme to infer gene regulatory networks from multiple time course gene expression datasets. As the scarcity of time course data, most current methods  usually making the inferred gene regulatory network structure  as an ill-posed one, and typically cannot handle multiple  experimental datasets directly. On the other hand, gene expression data generated by different groups worldwide are  increasingly accumulated. In this paper, we first formulate the inference of sparse and stable gene regulatory networks as a constraint optimization problem, which can be easily solved by a given single dataset. Then, two methods of network combination  are proposed, which can combine structures inferred from various experimental datasets. After that, the parameters in gene regulatory network with that structure are estimated by solving another optimization problem. Finally, we test and validate our methods on synthetic datasets in a series of numerical experiments in terms of the structure accuracy and  the model error.
The next paper tries to infer the actual "hidden"node by using a regularizer on the graph: Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows by Julien Mairal and Bin Yu. The abstract reads:

We consider supervised learning problems where the features are embedded in a graph, such as gene expressions in a gene network. In this context, it is of much interest to take into account the problem structure, and automatically select a subgraph with a small number of connected components. By exploiting prior knowledge, one can indeed improve the prediction performance and/or obtain better interpretable results. Regularization or penalty functions for selecting features in graphs have recently been proposed but they raise new algorithmic challenges. For example, they typically require solving a combinatorially hard selection problem among all connected subgraphs. In this paper, we propose computationally feasible strategies to select a sparse and “well connected” subset of features sitting on a directed acyclic graph (DAG). We introduce structured sparsity penalties over paths on a DAG called “path coding” penalties. Unlike existing regularization functions, path coding penalties can both model long range interactions between features in the graph and be tractable. The penalties and their proximal operators involve path selection problems, which we efficiently solve by leveraging network flow optimization. We experimentally show on synthetic, image, and genomic data that our approach is scalable and lead to more connected subgraphs than other regularization functions for graphs.


there is also an older paper: Genetic Network Identification Using Convex Programming by A. Agung JuliusMichael Zavlanos, Stephen Boyd, and George J. Pappas. The abstract reads:

Gene regulatory networks capture interactions between genes and other cell substances, resulting in various models for the fundamental biological process of transcription and translation. The expression levels of the genes are typically measured as mRNA concentration in micro-array experiments. In a so called genetic perturbation experiment, small perturbations are applied to equilibrium states and the resulting changes in expression activity are measured. One of the most important problems in systems biology is to use these data to identify the interaction pattern between genes in a regulatory network, especially in a large scale network. In this paper, we develop a novel algorithm for identifying the smallest genetic network that explains genetic perturbation experimental data. By construction, our identification algorithm is able to incorporate and respect any a priori knowledge known about the network structure. A priori biological knowledge is typically qualitative, encoding whether one gene affects another gene or not, or whether the effect is positive or negative. Our method is based on a convex programming relaxation of the combinatorially hard problem of L0 minimization, so it can efficiently handle large scale problems. We apply the proposed method to the identification of a subnetwork of the SOS pathway in Escherichia coli, the segmentation polarity network in Drosophila melanogaster, and a larger artificial network for measuring the performance of the method. In all cases, we show that our method performs better than prior methods.

A summary is here and the actual paper article is here.

The last paper is not about biochemical networks but about mixture reconstruction and bio related as well:Bacterial Community Reconstruction Using A Single Sequencing Reaction by Amnon Amir, Or Zuk. The abstract reads:
Bacteria are the unseen majority on our planet, with millions of species and comprising most of the living protoplasm. While current methods enable in-depth study of a small number of communities, a simple tool for breadth studies of bacterial population composition in a large number of samples is lacking. We propose a novel approach for reconstruction of the composition of an unknown mixture of bacteria using a single Sanger-sequencing reaction of the mixture. This method is based on compressive sensing theory, which deals with reconstruction of a sparse signal using a small number of measurements. Utilizing the fact that in many cases each bacterial community is comprised of a small subset of the known bacterial species, we show the feasibility of this approach for determining the composition of a bacterial mixture. Using simulations, we show that sequencing a few hundred base-pairs of the 16S rRNA gene sequence may provide enough information for reconstruction of mixtures containing tens of species, out of tens of thousands, even in the presence of realistic measurement noise. Finally, we show initial promising results when applying our method for the reconstruction of a toy experimental mixture with five species. Our approach may have a potential for a practical and efficient way for identifying bacterial species compositions in biological samples.

References:
[1] DISTROY: Detecting IC Trojans with Compressive Measurements by Youngjune Gwon, H. T Kung, and DarioVlah (attendant paper is here)
[3] Network Tomography via Compressed Sensing by Mohammad H. Firooz, Sumit Roy
[4] Network Link Tomography and Compressive Sensing by Rhys Bowden, Matthew Roughan, Nigel Bean
[5] Compressive Sensing over Graphs by Weiyu Xu Enrique Mallada Ao Tang
[6] Spatio-Temporal Compressive Sensing and Internet Traffic Matrices, Yin Zhang, Matthew Roughan, Walter Willinger, Lili Qiu






No comments:

Printfriendly