As part of our ongoing project on "turning the scientific method into math", Marc Harper and I have written a paper on expectation potential information as the key measure of information yield from a proposed experiment. Take a look at the paper; we are eager for feedback (e.g. add a comment on this post). The basic idea is:

*empirical information*(\(I_e\)) measures prediction power on observables.*potential information*(\(I_p\)) measures the maximum additional prediction power possible for a given set of observables, relative to the current model. In other words the theoretical increase in empirical information achievable by the best possible model. The key point is that \(I_p\) can be estimated without in any way searching model space.The value of any experiment is its ability to*surprise us*, i.e. to demonstrate that our current model is inadequate. Potential information provides a general measure of this, so the value of an experimental dataset is simply its potential information measure. For more details on this previous work, see here.

*expectation potential information*(\(E(I_p)\)) forecasts the expected information value of an experiment, under our current beliefs (uncertainty) about its likely outcomes. That is, adopting the view that our "current model" is always a mix of competing models, the \(E(I_p)\) for a proposed experiment measures its ability to resolve major uncertainties in that mixture.

- we used an interesting "test problem", RoboMendel: a robot scientist tasked with proposing experiments to discover the laws of genetics. It's been fun working through how the basic \(E(I_p)\) metric addresses not only fine details of experiment planning (e.g. the value of including a specific control) but also the big questions of "what should we look at?"
- Note that all these metrics are defined
*strictly*in terms of prediction power on*observable*variables, contrary to the usual focus in statistical inference on our ability to infer*hidden*variables. Yet the \(E(I_p)\) metric comes full circle; you can prove that as the mixture probabilities converge to the true marginal probabilities of possible "outcomes", the expectation potential information metric converges \(E(I_p) \to I(X;\Omega)\), i.e. the classic information theory metric of how "informative" the observable*X*is of the true hidden state of the system \(\Omega\).

## No comments:

## Post a Comment