Codice Software, maker of Plastic SCM, have added Java support to the beta of their 3-way code-aware merge tool, SemanticMerge, which InfoQ first covered in April.
The tool can be used to do both merge and diff. It handles merging at the structure level rather than the text block level by parsing the code, creating intermediate trees which it then uses as the basis of the diff. A significant advantage of this approach is that the comparison is not confused by structural changes to class files - for example changing the order of methods. The tool has a degree of "code-awareness" as well, so that, for example if one developer adds an import (or a using in .NET) for java.sql.ResultSet in line 1, and another adds the same import at line 10, SemanticMerge knows that they are the same import and so will only add the line once.
There are products that offer similar capabilities for other languages (Altova's DiffDog does a similar job for XML, and the Java DaisyDiff library can compare HTML documents down at the code level), but Pablo Santos Luaces, Principal Software Engineer of Codice Software, told us, "We didn't really find a polished product able to deal with source code like SemanticMerge does. There were tools trying to achieve the same, but in my opinion they are either unfinished or have tried to deal with too complex scenarios".
As well as the expected support for Plastic SCM, it can be configured to work with Git, Subversion, Perforce, ClearCase, Team Foundation Server, Mercurial, and IntelliJ.
At the moment SemanticMerge is Windows only, and depends on .NET framework 4.0. Codice do have plans to port to OS X and Linux using Xamarin Studio for MacOS X and Gtk# on Linux. Both UIs will be native but they will be both running on C#/Mono.
InfoQ interviewed Pablo Santos Luaces to find out more.
InfoQ: Your Java support followed pretty quickly from C# and VB.NET. Are your other planned languages (C, C++, Objective-C, JavaScript) likely to follow on as rapidly, or are they more difficult to do?
We were developing Java support almost in parallel with C# and VB.NET, that's the reason why it only took a month to release it. Now we will focus on C and then C++. We already started with both of them but we expect C++ to be harder. We are using libclang to parse the code which greatly simplify things, but we will need to change some parts of our merge engine to deal with C++ features not present in the current languages. We'd also like to focus on JS as soon as possible since it is one of the top requests.
InfoQ: What algorithm are you using to do the diff?
Both the diff and the merge algorithms are the core of our development effort with SemanticMerge. They're based on dealing with the "code structure" instead of the blocks of text as our Xdiff/Xmerge algorithms do. So the tool first parses the code and then calculates "semantic differences" between the base and one of the contributors then the base and the other contributor. These two difference collections are used to calculate the merge.
The really interesting part of the differences algorithm is that it is able to track moved methods. So if a method changed location it is able to track it as a difference (which is going to be the basis for the merge calculation). The "parsing" stops at the body level, so that the body of the methods is handled as text. In order to see if a "method is still the same" even when it was modified, we run our previous Xdiff algorithm based on a heavily modified Levenshtein distance calculation. If the bodies are "similar enough" and the rest of elements of the method match (like params, and so on), it is the same method.
The merge uses the two difference collections to calculate the merge. If a method was modified on both, then you have a conflict. And the same applies to moved/moved (can be divergent move conflicts if they were moved to different classes, for instance), change/delete, move/delete and many more.
InfoQ: When do you expect to leave beta?
We expect to leave beta phase for C# quite soon. Java was just launched so it will still have to wait. We'll try to have C# ready to launch for commercial use before July. That being said, we still have work to do in many areas.
First we need to improve launch time. It takes a few seconds to launch now which is an issue if you've to merge a few hundred files, which is something you'll have to under real production scenarios. We've used it already for months down here and this is something we need to get fixed.
We also need to add an "automatic resolution mode" because now we're asking the user to review all merges, even when SemanticMerge turns them into fully automatic. This is something we did on purpose for the beta phase, but we need to add the --auto option soon.
We just added a better Text Editor in this release that was one of the requests made by users.
And, you know, we've to keep focus because we're willing to get the Mac version finished, jump into the Linux one... tons of things to do.
Pricing hasn't yet been set, but Pablo told us he is hoping to keep it low, with a subscription model at around $3/month, and an unlimited plan around $60. You can download the beta for free here.