We've finally got the first beta release of GROMACS 4.6 ready for you to try out! We've put a lot of very hard work into it, and we hope you'll like the good things we've done. Things won't be perfect yet, so we'll be looking forward to your help finding the things we haven't done well enough yet! Remember, if you want the big performance gains that will be available in 4.6, then you'll want to know things will build and work well on your hardware, and the best way of doing that is helping us over the next few weeks. At the same time, we discourage you from doing work with this code whose scientific reliability you need to trust - this is very much a draft version of the software!
It would be great for us if some of you want to try out the new code on lots of different hardware and operation systems and report build problems, inconsistencies, strange or lacking documentation and in worst case pure bugs. To tempt you to do so here's a bit of a carrot corresponding to the new features:
* A brand-new native GPU implementation layer. Gromacs now does heterogeneous parallalization using both CPUs and modern NVIDIA GPUs at the same time, the GPU port also works in parallel using both multiple cards in a node or multiple nodes, and it's smoking fast. There's lots of heroic work by Szilard Pall and Berk Hess here, and special thanks to NVIDIA and Mark Berger for their assistance in making this happen.
* Gromacs can now use OpenMP parallelization for better scaling inside nodes, in particular when doing the FFT part on the CPU while the GPU does the normal nonbonded interactions.
* Automatic load balancing between direct-space and PME nodes, and lots of improvements in domain decomposition load balancing and scaling.
* We have a brand new set of classical nonbonded interaction kernels, and Gromacs can now use either SSE2, SSE4.1, 128-bit AVX with FMA support (AMD) or 256-bit AVX (Intel), all of them in both single and double precision. The performance difference depends on your system and parallelization, but it is quite large in many cases - we have seen >40% improvement on ion channels running on modern AMD machines! Did we mention that the classical C kernels are faster too since we can now do force-only interactions for most steps?
* There are new kernels using analytical switch/shift functions that are quite a bit faster, and a new CPU-implementation of verlet kernels that guarantee buffered interactions (no atoms drifting in/out of the neighbor list range) that conserve energy extremely well.
* There is a large new module to do advanced free energy calculations, thanks to Michael Shirts. Trust us, you need the full manual to decipher all the possibilities…
* Gromacs has switched completely to CMake for configuration and building. To be honest, we do expect some hiccups from this, but it has enabled us to provide much more automation and advanced features as part of the setup - and Gromacs now works on Windows out-of-the-box. Please test as many parts of the build system as you can!
* All raw assembly has been replaced by machine intrinsics in C. This does wonders for readability, but it means the compiler and compiler flags matter. On x86, you will typically get 5-10% better performance from icc than gcc.