InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News University Researchers Create New Type of Interpretable Neural Network

AI, ML & Data Engineering

University Researchers Create New Type of Interpretable Neural Network

This item in japanese

Aug 20, 2024 2 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Researchers from Massachusetts Institute of Technology, California Institute of Technology, and Northeastern University created a new type of neural network: Kolmogorov–Arnold Networks (KAN). KAN models outperform larger perceptron-based models on physics modeling tasks and provide a more interpretable visualization.

KANs were inspired by the Kolmogorov-Arnold representation theorem, which states that any complex function of multiple variables can be re-written as the sum of several functions of a single variable. While today's neural networks are based on the perceptron, which learns a set of weights used to create a linear combination of its inputs that is passed to an activation function, KANs learn an activation function for each input, and the outputs of those functions are summed. The researchers compared the performance of KANs traditional multilayer perceptron (MLP) neural networks on the task of modeling several problems in physics and mathematics and found that KANs achieved better accuracy with fewer parameters; in some cases, 100x accuracy with 100x fewer parameters. The researchers also showed that visualizing the KAN's activation functions helped users discover symbolic formulas that could represent the physical process being modeled. According to the research team:

The reason why large language models are so transformative is because they are useful to anyone who can speak natural language. The language of science is functions. KANs are composed of interpretable functions, so when a human user [works with] a KAN, it is like communicating with it using the language of functions.

KANs have a structure similar to MLPs, but instead of learning weights for each input, they learn a spline function. Because of their layered structure, the research team showed KANs can not only learn features in the data, but "also optimize these learned features to great accuracy" because of the splines. The team also showed that KANs follow the same scaling laws as MLPs, such as increasing parameter count to improve accuracy, and they found that they could increase a trained KAN's number of parameters, and thus its accuracy, "by simply making its spline grids finer."

The researchers created an interface that allows human users to interpret and edit the KAN. The visualization will "fade out" activation functions with small magnitude, allowing users to focus on important functions. Users can simplify the KAN by pruning unimportant nodes. Users can also examine the spline functions and if desired replace them with symbolic forms, such as trigonometric or logarithmic functions.

In a Hacker News discussion about KANs, one user shared his own experience comparing KANs to traditional neural networks (NN):

My main finding was that KANs are very tricky to train compared to NNs. It's usually possible to get per-parameter loss roughly on par with NNs, but it requires a lot of hyperparameter tuning and extra tricks in the KAN architecture. In comparison, vanilla NNs were much easier to train and worked well under a much broader set of conditions. Some people commented that we've invested an incredible amount of effort into getting really good at training NNs efficiently, and many of the things in ML libraries (optimizers like Adam, for example) are designed and optimized specifically for NNs. For that reason, it's not really a good apples-to-apples comparison.

The KAN source code is available on GitHub.

About the Author

Anthony Alford

Anthony is a Senior Director, Development at Genesys where he is working on several AI and ML projects related to customer experience. He has over 20 years experience in designing and building scalable software. Anthony holds a Ph.D. degree in Electrical Engineering with specialization in Intelligent Robotics Software and has worked on various problems in the areas of human-AI interaction and predictive analytics for SaaS business optimization.

Show moreShow less

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

University Researchers Create New Type of Interpretable Neural Network

Write for InfoQ

About the Author

Anthony Alford

This content is in the AI, ML & Data Engineering topic

Related Topics:

Popular in AI, ML & Data Engineering

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter