BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News University Researchers Create New Type of Interpretable Neural Network

University Researchers Create New Type of Interpretable Neural Network

This item in japanese

Researchers from Massachusetts Institute of Technology, California Institute of Technology, and Northeastern University created a new type of neural network: Kolmogorov–Arnold Networks (KAN). KAN models outperform larger perceptron-based models on physics modeling tasks and provide a more interpretable visualization.

KANs were inspired by the Kolmogorov-Arnold representation theorem, which states that any complex function of multiple variables can be re-written as the sum of several functions of a single variable. While today's neural networks are based on the perceptron, which learns a set of weights used to create a linear combination of its inputs that is passed to an activation function, KANs learn an activation function for each input, and the outputs of those functions are summed. The researchers compared the performance of KANs traditional multilayer perceptron (MLP) neural networks on the task of modeling several problems in physics and mathematics and found that KANs achieved better accuracy with fewer parameters; in some cases, 100x accuracy with 100x fewer parameters. The researchers also showed that visualizing the KAN's activation functions helped users discover symbolic formulas that could represent the physical process being modeled. According to the research team:

The reason why large language models are so transformative is because they are useful to anyone who can speak natural language. The language of science is functions. KANs are composed of interpretable functions, so when a human user [works with] a KAN, it is like communicating with it using the language of functions.

KANs have a structure similar to MLPs, but instead of learning weights for each input, they learn a spline function. Because of their layered structure, the research team showed KANs can not only learn features in the data, but "also optimize these learned features to great accuracy" because of the splines. The team also showed that KANs follow the same scaling laws as MLPs, such as increasing parameter count to improve accuracy, and they found that they could increase a trained KAN's number of parameters, and thus its accuracy, "by simply making its spline grids finer."

The researchers created an interface that allows human users to interpret and edit the KAN. The visualization will "fade out" activation functions with small magnitude, allowing users to focus on important functions. Users can simplify the KAN by pruning unimportant nodes. Users can also examine the spline functions and if desired replace them with symbolic forms, such as trigonometric or logarithmic functions.

In a Hacker News discussion about KANs, one user shared his own experience comparing KANs to traditional neural networks (NN):

My main finding was that KANs are very tricky to train compared to NNs. It's usually possible to get per-parameter loss roughly on par with NNs, but it requires a lot of hyperparameter tuning and extra tricks in the KAN architecture. In comparison, vanilla NNs were much easier to train and worked well under a much broader set of conditions. Some people commented that we've invested an incredible amount of effort into getting really good at training NNs efficiently, and many of the things in ML libraries (optimizers like Adam, for example) are designed and optimized specifically for NNs. For that reason, it's not really a good apples-to-apples comparison.

The KAN source code is available on GitHub.

About the Author

Rate this Article

Adoption
Style

BT