BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Interviews Interview with Jay Jay Billings on the Eclipse Science Working Group

Interview with Jay Jay Billings on the Eclipse Science Working Group

Bookmarks
   

1. I'm here at Eclipse 2015 with Jay Jay Billings who is a researcher at the Oak Ridge National Laboratory and a member of the science working group at Eclipse. I wonder if you could start off by just saying what the science working group at Eclipse actually does.

I am glad to. Absolutely! Thank you for speaking with me. So, the science working group is one of Eclipse’s newest working groups, and it is really focused on two things. First, building a community to foster the development of scientific software on the Eclipse platform; and second, to work on standardization and interface descriptions and things like that. At the end of the day, this is all about building a place where scientists who use Eclipse can come together, share our technology and build something greater than we could on our own.

   

2. And presumably, this involves data processing and data visualization on the client side from results that get generated over experiments and stored in a central cluster.

It does. We have several different projects that have different types of data, different scales of data and different ways of visualizing them, kind of everything. We have some projects where there is small data and we have other projects with maybe a petabyte of data.

   

3. So what would a small data include then, in that scale?

Small data could be megabytes and big data – petabytes.

   

4. In terms of processing these things, do you use off the shelf software, like Hadoop or Spark for doing this processing? Or is there something where you have custom software to process them?

It depends on the project, but we lean heavily towards high performance computing technologies, such as MPI, the message passing interface. There are different implementations of that; MPICH is one, Open MPI is another; and these are off the shelf, but they do have to be compiled on your hardware. So it is not quite as easy as Hadoop or Spark to install them and work with them, but they are very powerful and MPI scales to millions of processes.

   

5. And do these processes run over a cloud farm of machines, or do you have a set of super-commuters that process them?

For Eclipse Ice and for PTP we deal with super computers that sit in a room that is an eighth of an acre. So, for example, at Oak Ridge National Laboratory, we have the leadership computing facility and we have about a half-acre or more of machine room space and in there we have really big computers that is basically rack, after rack, just sitting there. They are all in the same interconnect and it is custom hardware. So, we run on that type of thing. But, with some of the other projects, maybe they use AWS and they will use Hadoop or Spark or something like that.

   

6. So, on these big scale-outs so you can fire up processes and then runs millions of instances across all of these notes to detect processing in parallel?

Yes, it depends on the machine. In some cases, you may run 300,000, 400,000, 500,000 – that is kind of typical at Oak Ridge*; at Argonne [National Laboratory] with something like Mira – you can run a million processes** and the next generation of machines will have an even higher degree of parallelism.
Jay Jay's additional notes:
*When I say 300k-500k for ORNL, I am including all of the CPUs and GPUs, which have 16 cores and 14 cores respectively. http://en.wikipedia.org/wiki/Titan_%28supercomputer%29
**The number of processes turns out to be exactly 786,432. http://en.wikipedia.org/wiki/IBM_Mira

   

7. So, how does that data get crunched and then presented back to the users who are presumably using Eclipse to then see the end results?

Well, so they are not using Eclipse at the moment for that. That is what we are trying to change in the working group. We have the Eclipse integrated computational environment on one end, we have PTP, we have some other projects and we are all trying to figure out in one case, how to deal with visualization. Outside of Eclipse, there are lots of tools that can look at really large visualizations such as VisIt which it is a DOE product, it has been worked on by several national laboratories: Lawrence Livermore National Lab, Berkeley National Lab, Oak Ridge.

The last I heard – I think this number has been updated – they used Hopper at NERSC, which is a DOE supercomputing facility to render a mesh that was over, I think, six trillion elements in size, which is really massive. It took the whole machine and I think they had 150,000 processes to do it. So, it is a very tough job. So there are technologies like that: there are things like ParaView that can run big visualizations. Then some people would actually make custom libraries to do it. So, they will link it into their code and while they are running, as soon as the data comes off line, they will just immediately analyze it and dump it to file and it's ready to go.

Even at that point, you might have terabytes of post-processed data. You still have to get it to the user in some easy-to-consume way. We're trying to work on that by maybe saying “Well, they do not need to look at all of this data. They only need to look at the little piece”.

Alex: And say “Select that piece” and then just show that information that they are looking for.

Probably after some more computing, yes.

   

8. So who else is working as part of the science working group?

We have a lot of really great people. On the steering committee, we have Oak Ridge National Lab and we have IBM – that is Greg Watson with the PTP project – the Parallel Tools Platform from Eclipse, we have Diamond Light Source also on the steering committee – that is Matt Gerring and all the folks from the DawnSci project. Then we have a lot of smaller companies and projects: we have Open Chrome which is also ChemClipse, which was just proposed as an Eclipse project. So that is Open Chrome and we have Open Flow and I am afraid I don’t know where all the people are from, but I know the names of codes and not necessarily where everyone is from.

So, we have Open Chrome, we have Open Flow, we have some kind of French medical research that's being done, we have system in a cloud stuff. Let us see, what else do we have? I should really know this because I gave a talk on it last night. We also have people in the science working group* that are working on things like Control Systems Studio, which is another really big project that is Eclipse-based, but it is used all around the world at accelerators and different types of facilities that need to interact with the hardware and visualize those results. So, there are really people from all over that are involved with the group and it is an international collaboration: North America, Canada, Europe and maybe a few other places.
Jay Jay's additional note:
*The full list of members of the Science Working Group can be found at http://science.eclipse.org/members

   

9. So what are the goals for the science working group over the coming year?

Over the coming year? To be honest, I think we are still working on that. One of the big areas where collaboration is key is 3D visualization. We are all doing it and we are all doing something that each other wants. Right now we are wondering how to get this to each other. This is one of the things that we need to look at: “How can we do visualization better?” For example, with Oak Ridge, we are connecting to these really big visualization engines like VisIt, other people might have custom visualizations that might be really cool if we have some small data.

So, can we horse trade on those a little bit and share? I think, right in line with that, we are going to do some work with plotting, just for looking at regular line plots and then kind of other areas of collaboration after that start to get into – maybe we are thinking about how to approach a particular problem in some way. Maybe Oak Ridge does it with neutrons and Diamond does it with a light source. Are there things that we could share as part of that and are there other people in the group that could use it?

   

10. Presumably, one of the benefits of coming to an industry and country neutral organization like the Eclipse foundation is that it allows you to be that melting pot to bring together all those ideas.

Yes, absolutely. We have a great collaboration thanks to the Eclipse community.

Alex: It will be interesting to see what you come up with over the next year.

Oh, absolutely.

Alex: Jay Jay Billings, thank you very much.

Thank you.

Jun 10, 2015

BT