top of page

Annotated Publications List

This list broadly tracks our R&D milestones with an approximate 2–3 year delay.

Annotated publication list, chronological

 

2008 -- (JCC) :  not necessary to read, this summary is sufficient.  One of the predecessor papers (our analytical model director is a co-author). Michael Levitt supervised this group and he is a co-author on some of the publications.  Utilizes physics-inspired (analytical) expressions to fit a (too economical) training set of QM data. Polarization was used for transferability of models to bulk, but also to claim that interactions A <->B & A<->C would correctly determine A <->C (not true).  Did not achieve predictability nor transferability; a great start. 

 

2018 -- (PNAS) :  Formally this paper shows that a QM-faithful classical Hamiltonian produces significant errors unless light atom motion is treated quasi-classically. This is not as important for drug design, actually, as the error partially cancels out.  Informally, and more importantly, this paper is an announcement that we have achieved (really in 2016) an accurate and descriptive energetic and thermodynamic representation of alkanes and water.  And also that we have built much of our molecular dynamics (MD) and analytical model stacks. 

 

2022 - (Nature Comm.) :  Aptly named, first and still only: Accurate determination of solvation free energies of neutral organic compounds from first principles.  An analytical model, trained on dimer calculations only, is an accurate and descriptive thermodynamic representation of any ensemble of neutral molecules.  

This figure shows (blue) excellent agreement with experiment (also much better than (non-transferrable) models specifically fit to reproduce such results); along with the yellow dots that illustrate a point from the previous publication.

3b_error copy.png

This figure shows that, indeed, dimer calculations can be used as training sets for ensembles of arbitrary size (as do the final results of course).

This paper deals with neutral molecules only because charged interactions did not work (and prompted the inclusion of the NN terms, next paper). 

 

2022 -- (JCC) :  This paper definitely separates sampling issues from model issues in protein-ligand systems, and its real point is to conclude that strong intermolecular interactions cannot be sufficiently modeled by tractable analytical expressions. 

 

2023 -- (JACS) :  A key paper where everything comes together (referee quote: right answers for the right reasons).  The addition of an NN interaction term (finally) makes the agreement with QM accurate enough for predictive calculations for all systems, including ions.   The neural network correction term targets the 2-body interaction, which contributes the largest share of both total energy and modeling error.(as you recall, many-body terms are already sufficient). There is no practical way to achieve the needed accuracy with analytical expressions, and attempts to do so have stalled a decade ago at a water model (which is good). Any tractable analytics will produce very large errors (Fig. 4 in the paper). 

nn pic.png

(Top): The diagram of the NN term (a minimal GNN, geometrically symmetrized, specific to a pair of atom types (i.e. aliphatic C <-> aromatic N; determined by local intermolecular environment, 'book-kept' by a database (DB). (Bottom) An illustration of the generation of the fingerprints and the NN structure for interaction of two water molecules.

The pair-specific (i.e. property of the pair) NN term is not decomposable into single atom properties (the analytical term is - essentially the analytical term(s) extract all the 'nice' 'understandable' [e.g. we even give them names: Van der Waals / dispersion] properties and short, (intermediate) and long-range behavior). Another key technique here is the explicit decoupling of the non-bonded (intermolecular) and (much stronger) bonded (intramolecular) interactions.

diagrams 3.png

This figure is a summary of the final results: electrolyte / ionic systems, neutral molecular ensembles (ofc) and protein-ligand interactions (literally very complex) are modelled and predicted accurately, with the models created from dimer QM calculations only.

The accuracy is as desired as the model is virtually identical in energy to the QM calcs (in this work 'gold standard' coupled-cluster; generally 'silver'; DFT would not make it to 'bronze'. ). A possible exception of applicability are very large aromatic systems for which QM itself sometimes creaks.

 

The models run at approx 0.5X speed of a normal polarizable analytical model (Atomistic AI models are ~10^3X slower b.c. they need to sample and digest a large neighborhood for each atom at every step).

 

Further papers are a little bit of a victory lap.

 

​

An example:

​

2024 -- (JCC) : This paper uses the models to predict the PH of water from first principles and without any approximations, and it does so correctly. It also settles the debate on which ionic species of water is prevalent at temperature. This is a warm-up for simulating and analyzing catalytic and enzymatic reactions with full explicit solvent.

 

Current R&D is subordinate to generating and testing models (protein-protein interactions next), refactoring, streamlining the code stack(s), templatizing common in-silico tasks, and servicing customers, and producing POC studies for prospective clients (e.g. antibody maturation benchmarks), and fundraising. While energetic accuracy has largely been solved, the remaining barrier to predictive modeling is ensuring that the system explores relevant conformational states — i.e., sampling; and that is the aim of our current R&D efforts. We are working to achieve routine 0.5 µs / day simulation times with explicit solvent. We employ equivariant graph networks and flow-based NN's. Attempts to do this (Nature 2019) have stalled but we have understood the main obstacles and have devised several multi-scale techniques to bypass them; this project is currently going well.

 

​

Notes:

​

The manuscripts are not just theoretical proofs: all the results shown and done in these publications were produced by a parameterization stack(s) and a molecular dynamics and free energy stack(s) that were written by us. (In retrospect we would not have written the full MD stack. However this does show that we have the expertise and capability to do so; and in current times such a capability can be matched by a handful of teams 10X the size).

 

If we had the option, we would have preferred cleaner and more direct manuscript titles; but such titles have a hard time surviving the peer review process. The annotation helps with this.

 

Much of the work deals with solvation and hydration. These processes are much more than that; they demonstrate (up to sampling ) a complete and faithful representation of both the energy (all relevant sub-components) and also the free energy (e.g. the entropy at (a) temperature) of the molecular ensemble being modelled. For example, it is perplexing to us that all the 'benchmarks' of current molecular AI models (we do love the fact that the training sets are from first principles as has been our guiding principle for quite some time) are simply energy comparisons to QM for sets of molecules. Their free energy results - when that capability is enabled - will be seriously off.

 

The comparisons with structural data-mining generative AI methods (i.e. Alpha-Fold) would take another white-paper and are probably best left to a discussion during further DD).

 

The discussion of Sutton’s bitter lesson and whether it’s worthwhile to embed Physics structure into an AI model is a deep one; briefly here, Physics is still very necessary and will likely continue to be so for a decade; and why not use it if we have it.

© Freecurve Labs, 2025

bottom of page