Nuit Blanche: 09/01/2013

Monday, September 30, 2013

Internet of Things #2 Paris Meetup summary

A few days ago I attended the second Internet of Things meetup in Paris. The meting took place at KissKissBankBank, a crowdfunding platform company. A big thank you to them and the organizers: Marc Chareyron, Olivier Mével, and Pierre-Rudol

f Gerlach, for organizing the meet-up, those things are always a challenge to set up and animate. Once again they pulled it off.

They were three talks of equal interest but with different flavors:

First Romain whom I mentioned here before, presented with some of his colleagues from Smiirl(Gauthier Nadaud) his first product: Fliike, a physical counter that resembles those you find in airports and train stations which provides real time information on the number of Likes featured on a Facebook page. It is a way to make more lively the actual connection between a physical business (a bar,....) and the actual number of "likes" on the business' Facebook page. The object is sleek and aims at being displayed inside the establishment. It was interesting that they had put much thought on the user experience. For instance, they noticed the very short latency between somebody hitting a Facebook like on their smartphones and the actual display moving as a result. To make the experience more playful, they had to add a time delay so that if customers were to hit "like" button inside the establishment, they would also have the satisfaction seeing the counter moving as a result. This is smart thinking. My question revolved around drinking games in bars. While those are probably not that common in France, I foresee much fun to be had in countries were bars are more central to your social life. More precisely it could in fact be disruptive to the gaming-in-bars industry. The object is not for bars only and one imagine a large number of different implementations....

The second talk was by Rafi Haladjian. Rafi is a well known geek in France mostly because of his very early involvement with connected objects in the late 1990s. His current company sen.se provides support for communities around internet connected objects.The talk was about a mystery product to be unveiled on October 11th. Since the meeting took place earlier than that, we were served with general ideas about what that product might look like and the reasoning as to why it should exist in the first palce. On personal note, I liked the fact that one of the key concept of that thing is to be forgettable by the person who wears it. Indeed, there will be no ubiquitous computing if people have to think about dealing with sensors. In compressive sensing for instance, one the goals is to have dumb sensors so that they can be very low power while still producing worthwhile and even actionable data. It looks like the product of sen.se will have this notion of not requiring much power. I am curious on how they will make sense of that data.They also communicated that current communication infrastructures could not handle the type of data they were transferring and that they were developing a proprietary solution that could allow them to taylor their bandwidth requirements to their traffic. We'll see.

The third talk went into a somewhat different direction. Uros Petrevski et Drasko Draskovic, qui presented WeIO by the Nodesign.net company. From what I understood WeIO is moving server allowing conversation between multiple connected objects. What was fascinating is that they hit the sweet spot in terms of programming languages: HTML, Javascript and Python and the use of the navigator for the IDE. Instead of a general loops like Arduino, different scripts can talk to each other. They showed a demo where, through different wifi connections and different objects (iPad and an Android phone), people could use web interfaces on their android and ipads to communicate with js/html andpython scripts with that server. The server could then who broadcast to all the connected objects some relevant information. For instance, they used the accelerometers of an ipad to send the signal to the server which then converted the data into a color scheme which was then sent back to all the connected objects. The connected objects, in turn, took that information to display that color in their background. In short, moving the iPad around produced very rapid series of changing colors on tablet/smartphone displays. It was pretty impressive as one could foresee kids playing with this and enjoying it. The product will only cost $69, much lower than a Lego Mindstorm set and potentially more fun. The server uses Tornado and essentially removes much of Linux hacking from the uerds' hand (yes, my Linux hacking skills are pretty rusty).

In all a very lively set of presentations and questions.

Side conversations included trying to have a sensor that can sense mood. I noted that one of the reference in that area, at least for me, is Seth Roberts who seems to have a found a connection between watching faces and his mood 36 hours later. I think Seth's experiments are very englihting in as much as they provide a larger view of how much data needs to be used to make sense of them. An electronic scale provides you your weight right now and does not need too much thinking. In fact, it makes the user quite powerless. A 36 hours delay requires a combination of machine learning and how the "thing" communicate with its owner. This remind me of an aspect of robotics that is sometimes missing. Back when we were building an autonomous car to be fielded in DARPA's grand challenge, we needed to have a rapid operational feedback between the algorithm being trained and what the driver was doing. Since the driver couldn't watch the computer monitor at the same time as the road, we enabled the algorithm to "talk" to the driver. Then something peculiar happened: Forget the large number of hours spent designing a tracker and a real time trajectory computation engine or calibrating sensors, with two lines of Python code, one of our teammate's sister went from totally being oblivious to this somewhat bulky 1.5 ton of piece of autonomous machinery (it's just a car) to a "Wow! this thing talks!"...

Pierre Metivier, a french blogger, wrote a summary (in French) of the meeting. It's here at: Highlights de la 2ème édition du Meetup de Paris dédié à l’Internet des Objets

I also mentioned to a few attendees the next Paris Machine Learning Applications Meet-up on October 16th.

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

2497..2498..2499..

The Compressive Sensing group on LinkedIn has now 2499 members. Who will be the 2500th ?

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Saturday, September 28, 2013

Videos: Big Data Boot Camp Day 2,3 and 4, Simons Institute, Berkeley

Following up on Thursday's Videos from the Big Data Boot Camp Day 1 at the Simons Institute at Berkeley, here are the videos of Day 2, 3 and 4 of the Big Data Boot Camp at the Simons Institute. at the beginning of September:

Wednesday, September 4th, 2013

9:00 am – 10:30 am

Algorithmic High-Dimensional Geometry II (slides, References)

Alex Andoni, Microsoft Research

11:00 am – 12:30 pm

Past, Present and Future of Randomized Numerical Linear Algebra I (slides)

Petros Drineas, Rensselaer Polytechnic Institute & Michael Mahoney, Stanford University

2:00 pm – 3:30 pm

Past, Present and Future of Randomized Numerical Linear Algebra II

Petros Drineas, Rensselaer Polytechnic Institute & Michael Mahoney, Stanford University

4:00 pm – 5:30 pm

Optimization I (slides)

Ben Recht, UC Berkeley

Thursday, September 5th, 2013

9:00 am – 10:30 am

Optimization II (slides)

Ben Recht, UC Berkeley

11:00 am – 12:30 pm

High-Dimensional Statistics I

Martin Wainwright, UC Berkeley

2:00 pm – 3:30 pm

High-Dimensional Statistics II

Martin Wainwright, UC Berkeley

4:00 pm – 5:30 pm

Streaming, Sketching and Sufficient Statistics I (slides)

Graham Cormode, University of Warwick

Friday, September 6th, 2013

9:00 am – 10:30 am

Streaming, Sketching and Sufficient Statistics II (slides)

Graham Cormode, University of Warwick

11:00 am – 12:30 pm

Some Geometric Perspectives on Combinatorics: High-Dimensional, Local and Local-to-Global I

Nati Linial, Hebrew University of Jerusalem

These lectures are based on joint papers with several coauthors, e.g.

2:00 pm – 3:30 pm

Some Geometric Perspectives on Combinatorics: High-Dimensional, Local and Local-to-Global II

Nati Linial, Hebrew University of Jerusalem

4:00 pm – 5:30 pm

Theory and Big Data

Ravi Kannan, Microsoft Research India

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Friday, September 27, 2013

Accurate Profiling of Microbial Communities from Massively Parallel Sequencing using Convex Optimization - implementation -

Or Zuk just sent me the following:

Dear Igor,

We've just uploaded to the arxiv a manuscript which might be of interest to Nuit Blanche's readers,

http://arxiv.org/abs/1309.6919

Best,

Or

Thanks Or ! We've heard of the microbiome before see the recent Saturday Morning Videos on Human Microbiome Science.

Here is without pun, a piece of the puzzle and an implementation: Accurate Profiling of Microbial Communities from Massively Parallel Sequencing using Convex Optimization by Or Zuk, Amnon Amir, Amit Zeisel, Ohad Shamir, Noam Shental

We describe the Microbial Community Reconstruction ({\bf MCR}) Problem, which is fundamental for microbiome analysis. In this problem, the goal is to reconstruct the identity and frequency of species comprising a microbial community, using short sequence reads from Massively Parallel Sequencing (MPS) data obtained for specified genomic regions. We formulate the problem mathematically as a convex optimization problem and provide sufficient conditions for identifiability, namely the ability to reconstruct species identity and frequency correctly when the data size (number of reads) grows to infinity. We discuss different metrics for assessing the quality of the reconstructed solution, including a novel phylogenetically-aware metric based on the Mahalanobis distance, and give upper-bounds on the reconstruction error for a finite number of reads under different metrics. We propose a scalable divide-and-conquer algorithm for the problem using convex optimization, which enables us to handle large problems (with $\sim10^6$ species). We show using numerical simulations that for realistic scenarios, where the microbial communities are sparse, our algorithm gives solutions with high accuracy, both in terms of obtaining accurate frequency, and in terms of species phylogenetic resolution.

I note:

In the spirit of reproducible research, we have implemented all of our algorithms in the Matlab package COMPASS (Convex Optimization for Microbial Proﬁling by Aggregating Short Sequence reads), which is freely available at github: https://github.com/NoamShental/COMPASS

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Thursday, September 26, 2013

Videos: Big Data Boot Camp Day 1, Simons Institute, Berkeley

Here are the videos of Day 1 of the Big Data Boot Camp at the Simons Institute. at the beginning of September:

Tuesday, September 3rd, 20138:30 am – 8:50 am

9:00 am – 10:30 am
Big Data: The Computation/Statistics Interface, Slides
Michael Jordan, UC Berkeley

A scalable bootstrap for massive data. A. Kleiner, A. Talwalkar, P. Sarkar and M. I. Jordan. Journal of the Royal Statistical Society, Series B, in press.
On statistics, computation and scalability. M. I. Jordan. Bernoulli, 19, 1378-1390, 2013.
Computational and statistical tradeoffs via convex relaxation. V. Chandrasekaran and M. I. Jordan. Proceedings of the National Academy of Sciences, 110, E1181-E1190, 2013.

11:00 am – 12:30 pm
Algorithmic High-Dimensional Geometry I, slides
Alex Andoni, Microsoft Research

2:00 pm – 5:30 pm
User-Friendly Tools for Random Matrices I (slides)
Joel Tropp, California Institute of Technology

J. A. Tropp, "User-Friendly Tools for Random Matrices: An Introduction", 2012.
J. A. Tropp, "User-Friendly Tail Bounds for Sums of Random Matrices", FOCM 2012.
L. Mackey et al., "Matrix concentration inequalities via the method of exchangeable pairs," 2012.

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Wednesday, September 25, 2013

Around the blogs in 78 Summer hours: Dina Katabi Mac Arthur Fellow, Big Data at the Simons Institute and more ....

Mac Arthur fellows are out and one of the winner is Dina Katabi whom we have mentioned here on her work on sFFT see recently some use of it in this entry (Slides of the Workshop on Sparse Fourier Transform Etc.) and using wifi signal to see through walls (she was mentioned on Nuit Blanche as early as 2008). Congratulations Dina !

In a different direction, since the last Around the blogs in 78 Summer hours: Big Data boot camp" at the Simons Institute and more...we have had several bloggers continuing their description of their experience at the Simons Insititute at Berkeley, among them we had:

Sebastien

Suresh

Andrew: Is it a blog? Or is it epsilon-far from being a blog?
who mentioned
Gil:

and Moritz\

The Zen of Gradient Descent
What should a theory of Big Data say?

Andrew

Hein

Greg

Dustin

Danny

Bob

Dirk: Post-it for convex optimization for optimal transport, value functions, accelerated forward backward and Bayesion inversion

Larry

Sage

IPython Notebooks in the Cloud with Realtime Synchronization and Support for Collaborators

Chapter Zero

I miss Mathematica

Vladimir

Victoria

Changes in the Research Process Must Come From the Scientific Community, not Federal Regulation

John

Thomas

Gael: Publishing scientific software matters

Muthu

Greg

Maxim

Josh: Falcon UAV Air Support in Colorado

and on Nuit Blanche:

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Tuesday, September 24, 2013

When Buffon’s needle problem meets the Johnson-Lindenstrauss Lemma

If there is one thing that is changing our views of high dimensional data it is the Johnson-Lindestrauss lemma, a concentration of measure result from 1984 that is only bringing to bear on our daily life as we are slowly being swallowed by the tsunami of data around us. So when Laurent wrote some addtional insight on the subject on his blog and then put out an arxiv preprint, I paid attention and you should too. The very well written blog entry is entitled When Buffon’s needle problem meets the Johnson-Lindenstrauss Lemma, while the more formal arxiv preprint goes with: A Quantized Johnson Lindenstrauss Lem ma: The Finding of Buffon's Needle by Laurent Jacques

In 1733, Georges-Louis Leclerc, Comte de Buffon in France, set the ground of geometric probability theory by defining an enlightening problem: What is the probability that a needle thrown randomly on a ground made of equispaced parallel strips lies on two of them? In this work, we show that the solution to this problem, and its generalization to N dimensions, allows us to discover a quantized form of the Johnson-Lindenstrauss (JL) Lemma, i.e., one that combines a linear dimensionality reduction procedure with a uniform quantization of precision \delta>0. In particular, given a finite set S in R^N of |S| points and a distortion level \epsilon>0, as soon as M > M_0 = log(log |S|/\epsilon^2), we can (randomly) construct a mapping from (S, \ell_2) to ((\delta Z)^M, \ell_1) that approximately preserves the pairwise distances between the points of S. Interestingly, compared to the common JL Lemma, the mapping is quasi-isometric and we observe both an additive and a multiplicative distortions on the embedded distances. These two distortions, however, decay as O(\sqrt(log |S|/M)) when M increases. Moreover, for coarse quantization, i.e., for high \delta compared to the set radius, the distortion is mainly additive, while for small \delta we tend to a Lipschitz isometric embedding. Finally, we show that there exists "almost" a quasi-isometric embedding of (S, \ell_2) in ((\delta Z)^M, \ell_2). This one involves a non-linear distortion of the \ell_2-distance in S that vanishes for distant points in this set. Noticeably, the additive distortion in this case decays slower as O((\log S/M)^(1/4)).

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Monday, September 23, 2013

GraphLab Internships

One of the start-ups we care for here at Nuit Blanche is GraphLab. Danny Bickson wants me to advertize ten internships at GraphLab for the Summer:

Hi Igor!

....We will have about 10 open positions for ML phds this year in hot companies....

The blog entry you really want to read to know more about those opportunities and companies is at:
http://bickson.blogspot.com/2013/09/graphlab-internship-program-machine.html

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Bayesian Robust Matrix Factorization for Image and Video Processing - implementation -

Naiyan Wang just sent me the following:

Dear Igor,

We just published a new paper titled "Bayesian Robust Matrix Factorization for Image and Video Processing" in ICCV13'. In this paper, we give a full Bayesian formulation to robust matrix factorization (a.k.a Robust PCA) problem, and extend the model to handle contiguous outliers which are often encountered in computer vision applications. The paper, supplemental material and codes are all available at: http://winsty.net/brmf.html

Thanks

Naiyan

Thanks Naiyan !

The full page presenting this impressive implementation is here and the paper is: Bayesian Robust Matrix Factorization for Image and Video Processing by Naiyan Wang and Dit-Yan Yeung (Supplemental Material here)

Matrix factorization is a fundamental problem that is often encountered in many computer vision and machine learning tasks. In recent years, enhancing the robustness of matrix factorization methods has attracted much attention in the research community. To benefit from the strengths of full Bayesian treatment over point estimation, we propose here a full Bayesian approach to robust matrix factoriza-tion. For the generative process, the model parameters have conjugate priors and the likelihood (or noise model) takes the form of a Laplace mixture. For Bayesian inference, we devise an effiient sampling algorithm by exploiting a hierarchical view of the Laplace distribution. Besides the basic model, we also propose an extension which assumes that the outliers exhibit spatial or temporal proximity as encoun-tered in many computer vision applications. The proposed methods give competitive experimental results when compared with several state-of-the-art methods on some benchmark image and video processing tasks

The implementation is here. Maybe Cable and I should use in our adventures and use it on the Lana del Rey dataset. Anyway, the solver will be added shortly to the Advanced Matrix Factorization Jungle page.

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Sunday, September 22, 2013

Sunday Morning Insight: A conversation on Nanopore Sequencing and Signal Processing

From [8]

As some of you have noticed, nanopore sequencing is a subject that comes back often here. One of the latest instance is another Sunday Morning Insight on Thinking about a Compressive Genome Sequencer. All the entries on the subject can be found under the nanopore tag. Because I wanted to be more informed about the technology and see how compressive sensing could be inserted in it, I reached out to a few people to get some conversation going.

The following is the result a long email exchange with someone also interested in this area of nanopore sequencing, in which we tried to clarify some of the potential issues for Nanopore signal analysis, based on the published results. The letter "I" is for my remarks and questions while "A" is for the person with whom I had this conversation and who wishes to remain anonymous. A big thank you to this person for the great conversation!

I: Here are two or three things that bug me and I was wondering if you could provide some enlightenment.

From an outsider's point of view, the various reports I see about nanopore engineering seem contradictory. On the one hand, the voltage curves I see seem to have pretty low noise yet, I also see somewhere that the overall success for these techniques are only about 96% accurate( where one would expect 99.99% or better) These two facts do not fit. The only way I can reconcile them, so far, is as follows:

We see only the good traces with low noise, it's OK, it's PR. In reality, noise is actually much worse in general.
Voltage drift issues are not minimal in any sense of the word especially when reading long strands.
A false twin issue? As I previously mentioned in the blog (see http://nuit-blanche.blogspot.fr/2013/04/structural-information-in-nanopore.html ) a potential issue involving knots might be at play. If one were to assume that the distances between voltage steps changes are not uniform then it becomes difficult to make out the difference between, say, a nucleotide G that has a knot behind it (and which takes a little while to get through the nanopore) and two perfectly fine Gs following each other in the DNA strand. That way, there is pretty difficult classification issue.
Finally, for reasons that are still unknown, we see voltage step readings that can neither be classified as any of the A, G, T and C nucleotide, a situation that might be related to some of the issues mentioned above or a combination thereof or others (such as voltage change across the nanopore)

What is your feeling about this outsider's analysis or am I way off base? Is there is a simpler explanation for the low 96% accuracy?

By the way, the problems mentioned above are not insurmountable, it's just that we need to be more serious on taking a stab at them. One could, for instance, remove the slowly moving drift using an "analysis based dictionary learning" approach. There are actually other methods as well.

A: The first thing to note is that the paper you mention at:

http://nuit-blanche.blogspot.com/2013/04/structural-information-in-nanopore.html [2]

Is an analysis of single bases free in solution, not strands of DNA. I don't believe any work showing a protein nanopore producing clear, single base resolution reads has been published (as an peer reviewed paper) to date.

The current measurements shown there do look quite clean. So, yes I can see why you'd expect a low error rate. There are probably a few sources of error you might want to consider:

All reports I've seen show that the dwell time of molecule in a protein nanopore is exponentially distributed. This is why people in the ion channel literature people are mostly happy using HMMs, because it does appear to be a Markov process [1]. Given that the time is exponentially distributed, there's a high probability that you might not see a base, or see it for a short time. This is one possible source of error, in this case deletions.
Most research talks about the need to control the motion of the DNA through the nanopore. This could be a significant source of error [3] [4] [5] [6].
The diagram on your blog shows 4 bases in a current range of ~15pA, but in the literature no one has presented single base resolution on strands from a protein nanopore that I'm aware of. The work has shown “that several nucleotides contribute to the recorded signal” [6] [7].

I: I think the sentence in ref [6] makes it plain obvious as to why a low pass sensor might be interesting, from [6]

“thus far the reading of the bases from a DNA molecule in a nanopore has been hampered by the fast translocation speed of DNA together with the fact that several nucleotides contribute to the recorded signal”.

I: What is the actual purpose of these "motors" that control the motion of the strand? Is it that:

with them the process is slowed down so that we have enough electrons per base (as you mentioned earlier if it goes faster we might be electron starved for the signal.) [we talked before about 1pA being 6 electronics per time interval at 1MHz sample rate].
without them there would be no strand going through the pore?
without them the nominal dwell time would be not very well defined?
make sure that the knotty situation mentioned in the blog entry does not influence unduly the dwell time in the pore?
any or all these explanations?

A: Possibly all of the above, depending on the system. Slowing down the strand (1) is probably the most significant contribution. If you think about the default case where there are no forces at play the DNA would be moving around under Brownian motion. There might be other local forces at play that make it move faster or slower. You might be able to control that motion with a "motor", but that might not always work very well.

In addition to this, some work shows that several nucleotides might contribute to the signal [7].

I: Please explain this last sentence. If it is what I think it is, it is very interesting.

A: So, to quote the wikipedia page on nanopore sequencing:

"In the early papers methods, a nucleotide needed to be repeated in a sequence about 100 times successively in order to produce a measurable characteristic change"

If you slow the strand down, or make other changes, you might still be faced with the problem that “several nucleotides contribute to the recorded signal” [6] [7].

I: Going back to the motor. With no motor, the strand can go up or down, and since you have access only to the current, you have really no idea which direction the strand is going.

A: This is correct.

I: Which brings me to a different type of question: is there other information gathered during those experiments that could be used to detect what direction the strand is taking? and at what speed? Are there some additional measurements made during those experiment?

A: No, I don't know of any additional measurements that could be made. I think some people have talked about using fluorescence to detect the motion of the strand through a pore but I don't know how far that works has gone.

If you slow the strand down, or make other changes, you might still be faced with the problem that “several nucleotides contribute to the recorded signal” [6] [7], i.e. signal does not come from a single position. Cherf et al.[7] is probably a good reference to look at for some example traces and information on this.

I: A-ah! So the measurement seem to be falling in this category of group measurements/group testing This is really what compressive sensing projects well into. I guess the main issues are:

the strand going through is a stochastic process with a poisson distribution (the motor makes that distribution to be more peaked)
we do not seem to have other measurements that could directly or indirectly provide some side information about the actual speed of the strand going through. In another blog entry (http://nuit-blanche.blogspot.com/2013/03/of-well-logging-and-nanopores.html) I made the parallel between nanopore and well logging/drilling issues. In the drilling issue, though, the probes have accelerometers on them so that a relatively simple kalman filter on top of the other information (akin to the current sensing in the nanopore) allows a much cleaner picture to emerge.

Is there anything else I am missing from that picture?

A: I think that's pretty accurate!

I: A final question on motors, do all nanopore systems need a motor?

A: Having a way of controlling the motion is desirable.

I: Are you telling me there are other ways of controlling the motion that do not require motors?

A: Speed can be controlled by various factors including:

Viscosity of the buffer
Applied voltage
Salt concentration
Temperature

(from Nanopores - Sensing and Fundamental Biological Interactions - page 272)

They also suggest there that optical and magnetic tweezers and "DNA Transistors" could be used to control the actual motion so there are a bunch of options I think.

I: Ah! This is interesting, maybe I should get my hands on this book (Nanopores - Sensing and Fundamental Biological Interactions). That the voltage across the pore also change the dynamic is also worth investigating.

I: Thank you very much

Using the commenter's feedback, I went ahead and read this very well written 2011 review of the technology [8] (Nanopore sensors for nucleic acid analysis by Bala Murali Venkatesan and Rashid Bashir) with the following abstract:

Abstract: Nanopore analysis is an emerging technique that involves using a voltage to drive molecules through a nanoscale pore in a membrane between two electrolytes, and monitoring how the ionic current through the nanopore changes as single molecules pass through it. This approach allows charged polymers (including single-stranded DNA, double-stranded DNA and RNA) to be analysed with subnanometre resolution and without the need for labels or amplification. Recent advances suggest that nanopore-based sensors could be competitive with other third-generation DNA sequencing technologies, and may be able to rapidly and reliably sequence the human genome for under $1,000. In this article we review the use of nanopore technology in DNA sequencing, genetics and medical diagnostics

In the context of the discussion above, Here are some excerpts of the review of interest:

"... A structural drawback with α-haemolysin is that the cylindrical β-barrel can accommodate up to ~10 nucleotides at a time, all of which significantly modulate the pore current [25]: this dilutes the ionic signature of the single nucleotide in the 1.4 nm constriction, thus reducing the overall signal-to-noise ratio in sequencing applications…”

I: So in this instance, we have a group measurement and the signal-to-noise ratio definition is really about sensing a single nucleotide within a larger group.

“... Moreover, in experiments involving immobilized ssDNA, as few as three nucleotides within or near the constriction contributed to the pore current [27] compared with the ten or so nucleotides that modulate the current in native α-haemolysin [25]....”

I: Again the concept of group measurements.

“...Unidirectional transport of dsDNA through this channel (from amino-terminal entrance to carboxyl-terminal exit) was also observed [29], suggesting a natural valve mechanism in the channel that assists dsDNA packaging during bacteriophage phi29 virus maturation. The capabilities of this protein nanopore will become more apparent in years to come....”

I: The review highlights a possible mechanism to constrain the strand in only one direction.

“...The first reports of DNA sensing using solid-state nanopores emerged in early 2001 when Golovchenko and co-workers used a custom-built ion-beam sculpting tool with feedback control to make nanopores with well-defined sizes in thin SiN membranes [42]...”

I: This is one element I had not really understood, the possibility of having solid state nanopore (and potentially use Moore’s law).

“....Indeed, we observed that DNA translocation was slower in Al2O3 nanopores than in SiN nanopores with similar diameters, which was attributed to the strong electrostatic interactions between the positively charged Al2O3 surface and the negatively charged dsDNA [45]. Enhancing these interactions, either electrostatically or chemically, could reduce DNA velocities even more....”

I: or even control it **during** the analysis!

“...Translocation velocities were between about 10 and 100 nucleotides per microsecond, which is too fast for the electronic measurement of individual nucleotides…”

I: And this is where the idea of A2I comes out ( see Sunday Morning Insight: Thinking about a Compressive Genome Sequencer at http://nuit-blanche.blogspot.com/2013/08/sunday-morning-insight-thinking-about.html , use the architecture developed for these low pass sensors to get an idea of what passes through the solid state nanopore.

“...This result suggests that if the translocation speed could be reduced to roughly one nucleotide per millisecond, single-nucleotide detection should be possible, which could potentially lead to DNA sequencing with electronic readout…”

So this is, in my mind, a signal processing issue. Much discovery goes in developing hardware/,materials to slow down the phenomenon when one could probably look at it with current speeds and a different signal processing approach.

“....For example, is single-nucleotide resolution possible in the presence of thermodynamic fluctuations and electrical noise? And will the chemical and structural similarity of the purines (A and G) and the pyrimidines (C and T) inherently limit the identification of individual nucleotides using ionic current?...”

Looks like even the specialists are asking themselves good questions!

“.....SNPs and point mutations have been linked to a variety of mendelian diseases as well as more complex disease phenotypes [67]. In proof-of-principle experiments, SNPs have been detected using ~2-nm-diameter SiN nanopores [68]. Using the nanopore as a local force actuator, the binding energies of a DNA binding protein and its cognate sequence relative to a SNP sequence could be discriminated (Fig. 4b). This approach could be extended to screen mutations in the cognate sequences of various other DNA binding proteins, including transcription factors, nucleases and histones.....”

I: This is an interesting use of side information.

“.....Similarly, given the progress with solid-state nanopores, if the translocation velocity could be reduced to a single nucleotide (which is ~3Å long) per millisecond, and if nucleotides could be identified uniquely with an electronic signature (an area of intense research), it would be possible to sequence a molecule containing one million bases in less than 20 minutes....”

I: Again the reduction of speed to get “pure” signals

“....There have been preliminary reports on the use of embedded planar gate electrodes in nanopores [40] and nano-channels [81,82] to electrically modulate the ionic pore current, and the integration of single-walled carbon nanotubes for the translocation of ssDNA [83]. …”

I: It looks to me like one of the principal element of a low pass sensors descrived above from the A2I philosophy. Other mechanical changes or side information to the current device include:

“.....Recent experiments with scanning tunnelling microscopes suggest that it might be possible to identify nucleotides with electron tunnelling [89] (because the energy gaps between the highest occupied and lowest unoccupied molecular orbitals of A, C, G and T are unique [90]), and partially sequence DNA oligomers [91]....”

“.....Efforts to fabricate nanopore sensors that contain nanogap-based tunnelling detectors are currently underway [93,94], but thermal fluctuations and electrical noise present major challenges.....”

“.....Another challenge is the fact that tunnelling currents vary exponentially with both the width and the height of the barriers that electrons have to tunnel through, which in turn depends on the effective tunnel distance and on molecule orientation.....”

“....A four-point-probe measurement could therefore reveal significantly more information than the two-probe measurements attempted so far, but reliably fabricating such a four-probe structure with subnanometre precision will be a formidable challenge. It should also be noted that it is not necessary to uniquely identify all four bases for certain applications. Some researchers have used a binary conversion of nucleotide sequences (A or T = 0, and G or C = 1), to discover biomarkers and identify genomic alterations in short fragments of DNA and RNA [95,96]..."

From [8]

[1] Markov, fractal diffusion and related models for ion channel gating, MSP Sansom et al. 1989.

[2] James Clarke, Hai-Chen Wu, Lakmal Jayasinghe, Alpesh Patel, Stuart Reid, Hagan Bayley (2009). Continuous base identification for single-molecule nanopore DNA sequencing Nature Nanotechnology

[3] Controlled translocation of individual DNA molecules through protein nanopores with engineered molecular brakes, Marcela Rincon-Restrepo, Ellina Mikhailova, Hagan Bayley, and Giovanni Maglia.

[4] Nanopore Analysis of Nucleic Acids Bound to Exonucleases and Polymerases, David Deamer.

[5] Translocation of double stranded DNA through membrane adapted phi29 motor protein nanopore,David Wendell, Peng Jing, [...], and Peixuan Guo

[6] “thus far the reading of the bases from a DNA molecule in a nanopore has been hampered by the fast translocation speed of DNA together with the fact that several nucleotides contribute to the recorded signal”. DNA sequencing with nanopores, Grégory F Schneider & Cees Dekker, Nature Biotechnology. http://ceesdekkerlab.tudelft.nl/wp-content/uploads/Nature.pdf

[7] Automated forward and reverse ratcheting of DNA in a nanopore at 5-Å precision, Cherf et al. Nat. Biotechnol. 30, 344–348 (2012).

[8] Nanopore sensors for nucleic acid analysis by Bala Murali Venkatesan and Rashid Bashir, Nature Nanotechnology, 6, 615–624 (2011). Published online 18 September 2011 also at: http://libna.mntl.illinois.edu/pdf/publications/127_venkatesan.pdf

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Friday, September 20, 2013

Sparse Localized Deformation Components - implementation -

Kiran Varanasi decided to post his paper and code on the Google+ group, a group of 622 readers., Smart move! Enjoy.

Sparse Localized Deformation Components by Thomas Neumann, Kiran Varanasi, Stephan Wenger, Markus Wacker, Marcus Magnor, and Christian Theobalt

We propose a method that extracts sparse and spatially localized deformation modes from an animated mesh sequence. To this end, we propose a new way to extend the theory of sparse matrix decompositions to 3D mesh sequence processing, and further contribute with an automatic way to ensure spatial locality of the decomposition in a new optimization framework. The extracted dimensions often have an intuitive and clear interpretable meaning. Our method optionally accepts user-constraints to guide the process of discovering the underlying latent deformation space. The capabilities of our efficient, versatile, and easy-to-implement method are extensively demonstrated on a variety of data sets and application contexts. We demonstrate its power for user friendly intuitive editing of captured mesh animations, such as faces, full body motion, cloth animations, and muscle deformations. We further show its benefit for statistical geometry processing and biomechanically meaningful animation editing. It is further shown qualitatively and quantitatively that our method outperforms other unsupervised decomposition methods and other animation parameterization approaches in the above use cases.

The source code is on this page.

Join the CompressiveSensing subreddit or the Google+ Community and post there !