A new study from Oxford University has uncovered why the deep neural networks (DNNs) that power modern artificial intelligence are so effective at learning from data. The new findings demonstrate that DNNs have an inbuilt 'Occam's razor,' meaning that when presented with multiple solutions that fit training data, they tend to favour those that are simpler. What is special about this version of Occam's razor is that the bias exactly cancels the exponential growth of the number of possible solutions with complexity. The study has been published today (14 Jan) in Nature Communications.
In order to make good predictions on new, unseen data - even when there are millions or even billions more parameters than training data points - the researchers hypothesised that DNNs would need a kind of 'built-in guidance' to help them choose the right patterns to focus on.
"Whilst we knew that the effectiveness of DNNs relies on some form of inductive bias towards simplicity – a kind of Occam's razor – there are many versions of the razor. The precise nature of the razor used by DNNs remained elusive" said theoretical physicist Professor Ard Louis (Department of Physics, Oxford University), who led the study.
To uncover the guiding principle of DNNs, the authors investigated how these learn Boolean functions – fundamental rules in computing where a result can only have one of two possible values: true or false. They discovered that even though DNNs can technically fit any function to data, they have a built-in preference for simpler functions that are easier to describe. This means DNNs are naturally biased towards simple rules over complex ones.
Furthermore, the authors discovered that this inherent Occam's razor has a unique property: it exactly counteracts the exponential increase in the number of complex functions as the system size grows. This allows DNNs to identify the rare, simple functions that generalise well (making accurate predictions on both the training data and unseen data), while avoiding the vast majority of complex functions that fit the training data but perform poorly on unseen data.
This emergent principle helps DNNs do well when the data follows simple patterns. However, when the data is more complex and does not fit simple patterns, DNNs do not perform as well, sometimes no better than random guessing. Fortunately, real-world data is often fairly simple and structured, which aligns with the DNNs' preference for simplicity. This helps DNNs avoid overfitting (where the model gets too 'tuned' to the training data) when working with simple, real-world data.
To delve deeper into the nature of this razor, the team investigated how the network's performance changed when its learning process was altered by changing certain mathematical functions that decide whether a neuron should 'fire' or not.
They found that even though these modified DNNs still favour simple solutions, even slight adjustments to this preference significantly reduced their ability to generalize (or make accurate predictions) on simple Boolean functions. This problem also occurred in other learning tasks, demonstrating that having the correct form of Occam's razor is crucial for the network to learn effectively.
The new findings help to 'open the black box' of how DNNs arrive at certain conclusions, which currently makes it difficult to explain or challenge decisions made by AI systems. However, while these findings apply to DNNs in general, they do not fully explain why some specific DNN models work better than others on certain types of data.
Christopher Mingard (Department of Physics, Oxford University), co-lead author of the study, said: "This suggests that we need to look beyond simplicity to identify additional inductive biases driving these performance differences."
According to the researchers, the findings suggest a strong parallel between artificial intelligence and fundamental principles of nature. Indeed, the remarkable success of DNNs on a broad range of scientific problems indicates that this exponential inductive bias must mirror something deep about the structure of the natural world.
"Our findings open up exciting possibilities." said Professor Louis. "The bias we observe in DNNs has the same functional form as the simplicity bias in evolutionary systems that helps explain, for example, the prevalence of symmetry in protein complexes . This points to intriguing connections between learning and evolution, a connection ripe for further exploration."