Posted by Gunnar Cedersund Mon, March 04, 2019 00:51:15
In my work as a scientist, one of the possible tasks I can take upon myself is to be a reviewer. This is important, but almost completely non-valued; there is no pay, and no funding agency cares very much whether you do it or not. However, the review process is the most important place where scientific discussions are taking place, because currently - and unfortunately - this is virtually the only criteria for what constitutes a scientific finding today: has it been published in a peer-reviewed journal? I therefore do spend some time doing this. And I fight a lot of the time with papers and authors who want to publish mathematical modelling works without any comparison with experimental data. I strongly believe that such works are not science, and that they should not be published. Today, I just submitted such a response, and since the reply is written in a completely non-specific manner to the paper in question - it could have been written to any paper with the same problem - I also post it here. My principle for the the next decade of my life, which I just entered, is "going public, going deep", and this publishing of this here on the blog, is a part of me following that new principle.
Here is the review reply that I wrote:
"Thank you for your comments.
I do recognize the fact that you and others have published similar papers in the past, where models have been developed and presented with no comparison with data. There is nothing I can do about that. However, that fact does not transform such works into science. Modern science is, in my very firm opinion, the truth-seeking tool that was established by Galileo, many hundreds of years ago: it builds on i) the mathematical formalization of mechanistic hypotheses of the system that you study, and then ii) usage of *data* to distinguish between those hypotheses. The hypothesis that has the best ability at describing data - in the first round estimation data, and in the second round independent validation data, based on predictions and *then* experiments - is the superior hypothesis. It is this formula for truth-seeking that distinguished the science that started with Galileo, and the church-driven epistemology that ruled science before him (note that the prior Aristotelian science worldview also involved mathematics, and data, but not in the same hypothesis-testing manner). If Galileo’s formula is not followed, it is not science. That a paper has been published in a scientific journal does not make that work science. It does mean, however, that that work *should not* have been published. The only exception to that principle exists within the field of mathematics, which has other criteria for its judgements of a paper: e.g. that what is presented should be i) previously non-proven and ii) should hold true for a large family of equations/examples. Another type of paper in mathematics can be that of a new method, e.g. for optimization, that is proven to be superior to existing methods.
Unfortunately, this conception of what science is was lost in the field of modelling of biological systems during a large part of the 20th century. During this time, it was called mathematical biology, complexity theory, etc. This was, to a large extent, rectified, during the beginning of the 21st century, with the conception of systems biology. However, unfortunately, much old-school data-free modelling is still done. This has to stop! It is giving, and has been giving, the field of modelling in biology a bad reputation, with the impression that it has nothing to do with reality or biology – and rightly so, such modelling has nothing to do with reality! At least not in any way that has been demonstrated by science.
Two further clarifications and responses to your reply are in order:
i) You say that your model is based on data. That is true. The model structure is based on data. Your manuscript does therefore function as a review of existing biology. But that is something different than publishing an original research paper, with novel results. That is something that is fundamentally different than the kind of comparison between simulations of the *entire model structure* and data that I am referring to above. It is a bit like saying that the Ptolemaic worldview (with the sun in the middle) is based on data because it includes the sun, the planets, and the earth; which are observed in experiments. The question is not if they are present. The question is which way of connecting them in relationship to each other that is the correct one. To go beyond what can be said with biology alone – i.e. to do mathematical modelling – requires that one puts the structure together using competing hypotheses (e.g. one with the sun in the middle, and one with the earth in the middle), and then sees which of the two corresponding models that produces simulations that best agrees with data (existing data and future data). That is how science has functioned since Galileo, and that is how it should still function today.
ii) You say that a model component in your model – that has not been validated in any fashion whatsoever – produces a prediction that a specific component is important; you then also point to some papers that claim the same thing. That could, on the surface, seem like a comparison with data. However, with the model structure that you have put together – with the most well-known and most often considered main players in the beta cell ethiology – you could identify any component in your model as the most important one, and then find many papers that claim that that component is the most important one. That is, unfortunately, how biology is allowed to work today, with many co-existing hypotheses, that are allowed to continue to co-exist, where each lab focusing on one of the components is allowed to point to limited results as to why their particular component is the most important one, and without forcing anyone to challenge these claims with respect to each other; without finding out what the big picture looks like. That is where systems biology can and should come in and make a difference: by putting up alternative hypotheses regarding what is the most important component(s), and then letting data, systems-level data, judge which of the hypotheses that is the most compelling one. This is how systems biology has worked in many/most of the papers that are cited in the review paper that I gave you. That way requires a model that produces simulations (time-curves, typically), that agrees with estimation data, and with validation data.
In summary, for me to judge this or any paper as publishable, you need to produce (at least) these two things: i) at least one curve, e.g. a simulation of a variable as it progresses in time, that agrees with corresponding data; ii) a prediction that is validated by another dataset, not used for estimating the parameters in the model. In fact, apart from that you should also demonstrate that your model is superior to other models, i.e. that it can describe all data that one of the currently most important and realistic models can, and then more data apart from that.
In other words, there are many papers that are published for beta-cells, including for their ethiology. These models can describe a lot of data, in the above manner. Why not take one of those models, find a feature or dataset that they cannot explain (there are many), and then go ahead and improve the model to make it able to explain those data (while still retaining the ability to explain all old data). If you then also show that this is not due to overfitting w.r.t. all data, i.e. if you then show that your new model also can describe some validation data, not used for model fitting, then you will have contributed with an improvement that follows the tradition of science. Then, and only then, I will judge your – or any scientist’s – paper as publishable.