BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Is Julia Production Ready? Q&A with Bogumił Kamiński

Is Julia Production Ready? Q&A with Bogumił Kamiński

This item in japanese

On the heels of JuliaCon 2020, SGH Warsaw School of Economics professor and DataFrames.jl maintainer Bogumił Kamiński summarized the status of the language and its ecosystem and stated that Julia is finally production-ready.

Professor Kamiński's article sparked some reactions on Hacker News. Some commenters expressed their doubts Julia can be considered production-ready as a general-purpose language due, in particular, to documentation, packages, tools, and support.

InfoQ has taken the change to speak with professor Kamiński to better understand his position.

InfoQ: Could you please describe your background and your involvement with Julia?

Bogumił Kamiński: I am a professor at SGH Warsaw School of Economics, Poland, doing most of my research in the area of operations research and simulation modeling.

I am in the top 5% of contributors to the Julia language by the number of commits, a significant contributor to Julia Data ecosystem, and in particular one of the core maintainers of DataFrames.jl.

I am ranked as the second answerer to [julia] tag on StackOverflow. Before switching full time to academia, I managed for over 10 years a team of several hundred developers and analysts deploying BI/DWH/data science projects for largest Polish corporations and institutions.

InfoQ: The main reasoning in your article seems to be that the Julia ecosystem has now reached a maturity level that makes it ready for production. Could you elaborate this point further? What hampered Julia's adoption in production and what are the most significant advancements that removed those blocking points?

Kamiński: Here it is crucial to define what I understand by Julia being ready for production.

I would define it as: having a language and core packages reach a level of stability that things do not change in a major way "every six months". That means when you start a project, you can safely expect that in a long term (several years) things should just work.

In the past this was a major problem. The language and core packages tended to change their API often; tutorials created a year ago would not work now without getting updates etc. This was a natural state when the language and the ecosystem was being developed. Now, I see that these things have significantly changed, especially for the core Julia language, but similar things are happening in the package ecosystem.

This does not mean that new packages are not in this "flux" state, but this is something that is witnessed by any package ecosystem, since new things evolve fast.

On top of that, the package manager is really mature and I would say it is currently one of the best in class. 

What this means is that there has been a lot of effort to make things "just work", especially for packages that have binary dependencies. On the contrary, a few years ago it could have happened that you tried to install a package and it failed because some external dependency did not compile. So you had to manually tweak the source code of that package to make it run — provided you knew how to do it, which was not always obvious.

In particular, since Julia 1.5 it has been possible to have a seamless enterprise deployment of Julia. Earlier there were problems due to the package manager protocol used to synchronize with GitHub which often clashed with firewall settings in corporate environments.

Why is this relevant? Well, you can easily "ship" a Julia project and expect that anyone on any environment should be able to relatively effortlessly run it. Of course there will be corner cases when things do not work; but my experience of using Linux and Windows 10 on a daily basis is that things just work on both platforms.

If you do a project in Julia, you can expect that either there is a package that does the thing you want or that you can easily use code written in C or Python and make it work. In my post I wanted to emphasize that we are in this precise state. As an example based on something I was working on this week, Julia has an excellent LightGraphs.jl package for working with graphs, but my collaborators use Python and prefer to use igraph. Here is a sample code using igraph (adapted from a tutorial for igraph in Python):

import igraph as ig
g = ig.Graph()
g.add_vertices(3)
g.add_edges([(0,1), (1,2)])
g.add_edges([(2, 0)])
g.add_vertices(3)
g.add_edges([(2, 3), (3, 4), (4, 5), (5, 3)])
g.pagerank()

Now you would ask what would be an equivalent Julia code. Here it is:

using PyCall
ig = pyimport("igraph")
g = ig.Graph()
g.add_vertices(3)
g.add_edges([(0,1), (1,2)])
g.add_edges([(2, 0)])
g.add_vertices(3)
g.add_edges([(2, 3), (3, 4), (4, 5), (5, 3)])
g.pagerank()

As you can see, it is identical. Of course not in all cases it is that simple, as e.g. dictionaries in Julia and Python have different syntaxes, but this is a general rule about how things work. You even have a tab-completion and direct access to docstrings.

What does this mean in practice? If you are doing a project you are not stuck thinking, "Can I use Julia, as in tbree months maybe I will need something in the project that is not available in Julia yet"? But rather you know, "I can relatively safely use Julia, as currently many general-purpose packages are already there, and even if something is missing I can just use it from another language and it is going to be relatively painless no matter if it is C/Python/R/...".

Also a part of production-readiness is that with PackageCompiler.jl you can create "apps which are a bundle of files including an executable that can be sent and run on other machines without Julia being installed on that machine." I do not see it as an essential part of production readiness (many scripting languages considered ready for production do not provide this option), but in many scenarios it is a nice to have feature.

Now let me clarify what I do not see as a part of the definition of "production-ready". I do not see Julia as a language that is best suited for any kind of project. Each programming language has its niche, and the niche of Julia is high performance computing/data science (or however you call it). If you want binaries with a very small footprint - for sure Julia is not the language of choice. If you want to develop apps for Android - it is not either.

What I believe is that if you want to do a project that Julia is well suited for, if you want to go to production you are likely to need to satisfy many general requirements. Those requirements are just needed to make your core features inter-operate well with the rest of the ecosystem in which you are deploying your code. And I believe that Julia has reached a maturity level where this is easy to attain, either via existing packages or via integration with external tools, which is really easy in Julia.

InfoQ: Your claim sounded a bit too far-fetched to some commenters on Hacker News, especifically when it comes to considering Julia ready for production as a general-purpose language. Would you like to add some additional insights into this?

Kamiński: To quote what I have written in the beginning of my post:

"I have spent 20 years now deploying data science-related projects in corporate environments (back then it was not called data science, but we were already training neural networks to make predictions) and have many colleagues who are deeply into enterprise software development."

This is the frame of my post, i.e. doing data science, but not in "your backyard" or an "academic research laboratory," but rather in the "business environment". As I already explained above, neither me, nor probably any of the developers involved in Julia, claim that it is a "one stop shop" for any kind of development project.

If you look at the Julia developer survey, slide 28, it is clear that people are using Julia for computing - and these are the areas where I believe Julia is production-ready.

Now, regarding things like: documentation, packages, tools, and support - sure this should and can, and will be improved. And I agree that more mature ecosystems like R/Python/Java have on average better coverage here. For example, as a maintainer of DataFrames.jl, I can tell you that most of recent PRs are documentation-related. But I would not underestimate the Julia community here. In general if you have any question and post it on SO, Julia Discourse or Julia Slack, you can expect to get an answer usually within minutes, at most hours. Here is a true story of this kind: people are usually very responsive and bugs are fixed quite fast.

If I were to name a major showstopper for Julia, it would be the availability of enough people skilled in this language within the general developer community. I can understand product owners/project managers and their feeling at risk of not being able to find enough people to work on their projects once they commit to Julia. However, here I am convinced that things are improving rapidly. Last JuliaCon 2020 was attended by over 20,000 participants. Also there is a lot of resources already available for free online.

InfoQ: Besides the question of whether Julia is production-ready or not, or for which domains it is, what are in your view the language's main strengths? Do you see it as a replacement for Python, R, or any other language, at least in the area of scientific computing and data science?

Kamiński: I think that here again it is best to quote last Julia developer survey, slides 8 to 11. I would focus on the three top things from slide 8:

  • Speed - here the situation is relatively straightforward. Take any mature package like TensorFlow or PyTorch that requires performance; they are written mostly in C++. And Python is just a thin wrapper around a C++ core. Now take Flux.jl or Knet.jl. They are essentially implemented in pure Julia. So the bottom line is: if you need your code to run fast, while at the same time taking advantage of a high level language, then Julia is a natural choice. Also, as explained above, if there is an external library that is very fast and you want to use it, usually that is relatively easy to do.

  • Ease of use - there are numerous aspects of this, but my experience is that when someone gets a handle of Julia's principles, it compares very well in terms of syntax and design with e.g. R/Python when doing numerical computing. The language was designed around supporting these kinds of tasks well. And it is not only syntax - it is also a choice of when things happen implicitly and when you have to be explicit (e.g. with broadcasting). Based on my experience, this makes Julia's code easy to maintain, and well-written code is self documenting to a large extent.

  • Code is open-source and can be modified - this aspect has two dimensions. First of all, most of Julia packages are MIT licensed - which is often very welcomed in enterprise environments. Second - as most of the packages are written in Julia, if you do not like how something works - you just modify it yourself (and it is much easier than doing it in R/Python where most likely the things that you need to modify are written in, e.g., C, C++, Fortran).

People often ask me if I see Julia as a replacement of R/Python. And my thinking about it is the following:

  • If you are doing complex computations that require performance - for sure I would choose Julia as a tool to develop these parts of your project.

  • If you have a non-performance critical new task - just use the language that you know best (and if you do a lot of point 1. above - as I do - you can safely choose Julia for this).

  • If you have a lot of legacy R/Python code and you are happy with it - just stick to it and remember that if you have some performance-critical parts in it, they can be rewritten in Julia relatively easily and integrated back with your original code base (I have had done many such projects).

InfoQ: What do you see in Julia's evolution?

Kamiński: First I would say that I agree with what is implicit in your question: this will be an "evolution," not a "revolution". The design of Julia has proven robust and I do not think it is going to change radically; rather, gradual improvements are going to be seen in many areas.

If I were to name some major dimensions here these would be:

  • Improvements in multi-threading support. I think it is really relevant for the core use cases of Julia. Julia already provides support for this, but still many things can be improved here. Similarly, improvements in GPU/TPU handling are to be expected.

  • Improvements in compiler latency. Again, with every release it gets much better, but this is something that everyone is aware that must be improved.

  • Having a more mature package ecosystem. Here I mainly mean the quality, stability, and documentation of the packages, as in terms of functionality coverage there are already thousands of packages available. Just to stress what I have said above, I believe that many core packages are already quite mature, but I agree with people commenting that things should improve here.

  • Growth of the community - I believe that the number of people interested in Julia is increasing significantly, as demonstrated e.g. by the number of JuliaCon2020 participants. This means that: a) it will be easier in the future to recruit quality Julia developers if they are needed,  and b) it will create a positive feedback loop for the quality of the Julia ecosystem, as more people report issues and get involved. Again, as a maintainer of DataFrames.jl I observe this shift: people who have never been in the "core" of the development of the package open issues/do PRs and discuss functionalities in social media.

If you are interested in Julia, all talks from JuliaCon 2020 are available on YouTube.

BT