ABSTRACT: Far from the static, idealized conformations deposited into structural databases, proteins are highly dynamic molecules that undergo conformational changes on temporal and spatial scales that may span several orders of magnitude. These conformational changes, often intimately connected to the functional roles that proteins play, may be obscured by traditional biophysical techniques. Over the past 40 years, molecular dynamics (MD) simulations have complemented these techniques by providing the "hidden" atomistic details that underlie protein dynamics. However, there are limitations of the degree to which molecular simulations accurately and quantitatively describe protein motions. Here we show that although four molecular dynamics simulation packages (AMBER, GROMACS, NAMD, and ilmm) reproduced a variety of experimental observables for two different proteins (engrailed homeodomain and RNase H) equally well overall at room temperature, there were subtle differences in the underlying conformational distributions and the extent of conformational sampling obtained. This leads to ambiguity about which results are correct, as experiment cannot always provide the necessary detailed information to distinguish between the underlying conformational ensembles. However, the results with different packages diverged more when considering larger amplitude motion, for example, the thermal unfolding process and conformational states sampled, with some packages failing to allow the protein to unfold at high temperature or providing results at odds with experiment. While most differences between MD simulations performed with different packages are attributed to the force fields themselves, there are many other factors that influence the outcome, including the water model, algorithms that constrain motion, how atomic interactions are handled, and the simulation ensemble employed. Here four different MD packages were tested each using best practices as established by the developers, utilizing three different protein force fields and three different water models. Differences between the simulated protein behavior using two different packages but the same force field, as well as two different packages with different force fields but the same water models and approaches to restraining motion, show how other factors can influence the behavior, and it is incorrect to place all the blame for deviations and errors on force fields or to expect improvements in force fields alone to solve such problems.