I think you are giving a little bit too much praise. There are a lot of things that are not designed or built by one person, but by the whole team. In a previous life I was called "the Banana man" because I did some kind of crazy things with functional programming, but what I'm trying to do these days is to make life of programmers easier in general. If you are programming, there is a lot of noise that you have to deal with and that's what I'm trying to take away. That is my current goal: to create peace and quiet for developers.
Definitely. I think the nice thing that I like about LINQ is that it has really deep roots in functional programming and in category theory, which means that it has a lot of mileage. There are a lot of things that people can do with it, it's not just like some idea that will be obsolete soon. You can already see people picking it up, many people are writing LINQ providers, LINQ to Sharepoint, LINQ to Flicker, LINQ to Amazon. I think it will increase when people discover the actual power of LINQ.
The other thing that I think is quite cool, but not many people have discovered, is the fact that hidden inside LINQ is the ability to do meta programming, where you get data that represents your code. Currently we've used that to generate SQL from a LINQ query, but of course, you can use that to generate all kind of other target codes. You can imagine if you have any special purpose machine that has its own query language or its own little language, you can now write a LINQ query and then take that expression tree and compile it to that kind of target. I think that's completely unexplored territory, so there will be a lot of interesting developments when people out in the wild discover that capability.
I don't think that's actually true. If you look at Haskell, there are new things added to the language all the time, as well. It's not different in that respect. On the other hand you can even argue that object orientation is more powerful than functional programming. For example, in an object oriented language, you can do virtual methods and things like that - if you want to encode that in a functional language you'll have a hard time. I don't see that as a real problem.
Actually, if you look at where I'm going, I'm getting more and more into old school object orientation with interfaces and so on, going away from functional programming. If you are passing a high order function you are passing one thing, one closure. If you're passing an interface or a class, you are passing many because you are passing a whole V table. These can all be recursive and so on. I'm rediscovering object oriented programming and I feel that maybe I wasted some of my time doing functional programming and missed out on objects.
Again, I think the reality is not that clear. The reason that we had to use extension methods and used this sequence operator pattern is that generics in C# and VB are not powerful enough, so you don't have hierarchal typed variables, so you cannot abstract over typed constructors, you can only abstract over types. That's the reason why I had to do it like that.
I think type classes, in some sense have a lot of problems, too, because once you've introduced a name inside the type class you cannot reuse that in another type class. Once I've used Foo and now the compiler uses Foo to associate with that class. I don't think type classes are a solution to everything. What I would like for interfaces is something like intersection types where you don't have to declare an interface upfront, but I can just say "I assume that these parameters set describes this interface and that interface and that interface." You can do a little bit like that it constrains, but it would be nice if you had that and I think that would solve a lot of problems. It just shows that there is always evolution, so there is always a need to add stuff to a language. Once you are finished with that, we will be on the streets.
It's good that we discover new things that we need, all the time. Another thing is that you cannot really predict what you need in a language, because the way I look at it is: you are writing some piece of code and then you see patterns that appear all the time, so you are writing the same stuff over and over again. That is a sign that you have to add language support for that, but, as we change the things that we write there will be other patterns that we will see, so there will be need for new language constructs. The answer is, maybe, that it's hard to take things out of the language. They only grow, we never seem to take stuff out, but that's the way it is. At some point maybe you have to just start from scratch.
Definitely. I think people have not really discovered the real power. A lot of people, when they hear LINQ, they even think about "It's something that has to do with SQL and databases". I don't think people realize that any type that implements these sequence operators you can write queries over, like in Haskell, as long as you are going to implement the monads type class - you can do binds, return and so on. All that's comprehension and it's exactly the same here. It will just take a little time before people have discovered that and they will write the most crazy LINQ implementations.
These things take time and I think it's good that they can get used to it in its current shape for lists or for collections that they understand. They will go on and do state monads and exception monads - I don't what - because if you look at the sequence operators, they are the monadic operators. What is also interesting is that, besides "select from" and "where" we also have "group by" and "order by", which are not in Haskell's comprehensions. Then the funny thing is what happened is that the Haskell guys saw that and said "That's quite useful". Simon Peyton-Jones and Phil Wadler wrote a paper where they improved the Haskell comprehensions with things from LINQ, so it is a nice circle going on where the languages influence each other and get better.
There is definitely stuff going on there and I think what you will see more to start with is having a LINQ friendly interface on top of some API. It's more in the sense of LINQ to XML, where you are not providing different implementation of a sequence operators, but you are adapting the API such that you can use LINQ to either construct instances of classes or take things apart. I just did an interview with Bart Smith on Chanel 9, and he has been doing quite a lot of alternative LINQ implementation, so his favorite example is "LINQ to Simpsons" - his name is Bart - it's a kind of crazy implementation.
He has also a LINQ to Sharepoint, I think he has an implementation for PowerShell, so he is doing some interesting things. In my talk on LINQ, I have an example of a really weird implementation that shows that you don't need lambda expressions. If you implement the sequence operator correctly you can say "From X in default of int select X + 1" and that means lambda (X, X+1). You can do weird things like that, but again, I think we are very early in the development and when people will discover this, that LINQ is not just for collections, you will see many other applications of it.
In some sense, that is what methods are. When you create a delegate from an instance, that's when you get a function that closes over the instance, so I don't think there is that much difference. The other thing that you have to really understand well is that methods are special because of virtual dispatch, where you can overwrite things and it's not that easy, it's not just properties of function type, because in the subclass you can overwrite a certain method.
It's quite easy to encode both ways around, but by creating a delegate from an instance, you can take a method and turn it into a first class object. You can also define a property of function type and then call a method via that. In some sense, events in C# are like that - you expose an event but it's really like a couple of properties that you get and set. You see that pattern, for example, in the Async design pattern using events where people can use an event to dynamically change the method in a clause and then you can call that asynchronously - people are doing things basically like that.
Life is more complicated than that because you cannot really do these things by high order functions. For example, suppose you say "We don't need ‘for' loops because I can just have something like I pass a lambda for the function body or for the loop body", but that's not really the same. If I have a return inside of the fore loop, I return from the method.
If you would pass a delegate or something that has a return, you only return from that delegate, so you don't have non-local returns. What happens with exception handling? I definitely don't believe that we can take control structures and replace them by delegates or lambda expressions. The same for the "using" statement, because inside of the "using" statement I can throw an exception or do all kind of other things. For simple cases you can do it, but if you look into real more complicated things, you have to be really careful.
I don't want to make sweeping statements that this is dangerous for enterprise applications, but it is true that, if you have effects that are implicit, that it's hard to reason about because they are invisible, they hide in dark corners and they can come out and bite you. What I would like to see is that effects are become part of the signatures of methods, just like you say "This thing takes an integer and returns boolean", then also say "It can throw an exception and it will take a lock" and all these things, such that when you look at the signature of a function or of a method, you can understand what it does. Now you know you can reason about it.
In some sense, even current type systems, even if you say "It takes an integer, returns a bool" - that's not even expressive enough. You want to go to a full contract system where you can say "This takes an integer, but it's really a prime number that's greater than 13. It returns a boolean when it's bigger than 128" or something like that, you want to be able to be really precise about the contracts that your function implements. Effects are part of that, but definitely not the only part of that. If you have very precise information about your methods now, you can start to reason about it. Then the compiler can start reasoning about it, but you can never be precise enough because at some point all your knowledge goes into these contracts and to these types and there can be bugs in there, too, because you've made a mistake in your contract.
I think when you say it's a dangerous one, programming is inherently something dangerous - building is something inherently dangerous - that's what humans do: we are creating things and we can try to make sure that what we create is safe and so on, but you can never really be certain. You always make some sort of assumptions, because when you are building a building, you are making certain assumptions that there is not an earthquake bigger than a certain strength, but maybe something else happens - a really strong wind that you didn't think of. It's the same with codes - you define your contract, but maybe your program will be used in different circumstances or so on. That's a really hard question that you asked. You can always prove it correct with respect to certain assumptions, but not with everything.
One of the things that I am quite concerned about is that all our languages are really designed to deal with sequential synchronous computations. If you look at the basic building blocks using which we build programs, it's sequential composition - semicolon, do this, then that. When you are making a method call or procedure call it's synchronous: I'm calling this, I'm blocked until this thing computes and then it returns. You usually don't think about what goes wrong, so you write your code for the normal case and then you put exception handling for the error case, but you assume that's quite uncommon.
If you look now in a more distributed setting, first of all there will be errors all the time because the connection might be down or you don't have permission or the machine on the other side blows up or things like that. You have to deal with errors much more seriously, you have to deal with latency and things like that, so you cannot assume that you're making synchronous calls. All the assumptions that we use to build simple programs, simple "Hello world" style imperative programs, don't really apply to this other distributed setting. While I still believe that at the leaves of your computation you'll have just imperative programs that are stateful, to compose them we need something that can deal with asynchronous computation, that can deal with errors, maybe even deal with mobility - that's something that we cannot reason very well about -, programs that evolve over time.
I always make this kind of analogy that, if you have to have surgery, I'm not going to kill you, do surgery and then re-awake you, reboot you - no! I sedate you, you do some surgery and then I wake you up, but you are not completely gone. For the program, or with the system, we usually have to completely shut it down, change it and then bring it up again. There are lots and lots of problems that I do see, but I don't know yet where to write language constructs or what we should do. Definitely, looking at distributed and parallel programs, the field is quite young.
If you look at engineering or maths, we've been doing that for thousands of years, so we now know how to build a building and make it solid. With code, we've been doing computer science for 70-75 years, so we are still scratching the surface - we don't have a real theory or like physics, where they have a good foundation. We have the Turing machine that doesn't really reflect distributed computation. The lambda calculus captures certain parts, then there is a lot of process algebra, but it's not yet clear that we really have understood everything, which I think is fantastic, because that means there is opportunity to discover new things. This is quite a long minded answer, but I hope that makes sense to you.
You can look at a question like that from 2 points of view: you can say "Yes, the semantics is different, and that's dangerous or troublesome", but this semantics is often not really that different, because in some sense - I don't know who had that quote but - if you look at all the current imperative languages, they are really all variations of Fortran. Fundamentally, they are not that different - there are certain differences: how variables are captured and how exceptions are handled, but really they are not a decor, they are very similar, they are more similar than different.
A good programmer should be aware of these differences. It says more about the quality of the programmers than of the languages. If you are a good programmer, you know that these languages are more or less the same, but different to details. Having the syntax look similar is OK, because it's easier to transfer your knowledge - the semantics are very close and also the syntax is often a little bit different. Do you write "function", or do you just write "class" or "class C:", "base class" or "extends"? - these are syntactic differences. Be aware that these things exist and then it's OK. It makes fun puzzles like "This thing looks very similar, but it behaves quite differently" - that keeps us awake.
First of all, all the stuff that we did for LINQ didn't require any changes on the VM level. We wrote a paper about that, Lost in Translation, where we show that you can take everything in VB 9 and C# 3.0 and just compile that to unmodified CLR. It doesn't require any changes there. Now you ask me am I interested on doing this on the JVM? Personally, why would I do that? I think it's interesting if something would happen in other languages. It doesn't seem logical that I would do that, given who my employer is, but in some sense - as I mentioned this example before with the comprehensions - we got inspired by Haskell with LINQ, and then Haskell got inspired again by LINQ and so I can imagine that other languages get inspired by these things.
That's how things evolve - people have ideas and they cross-influence each other. In some sense, everybody is trying to solve the same problems - the problems don't change. We all have to access data, we all have to write distributed programs. If you look at the dynamic languages, at C# 4.0, they are adding support for dynamic typing in there. Things evolve - that's good -, there are always new things coming up. There are so many problems that we have to tackle, so that things cannot stop. Look at everything in society: cars change, there are new models coming out; it's the same with programming languages - there will always be developments and they will evolve. That's a natural thing.
You are asking tough questions. With type inference you have to be really careful. For example, in Haskell, or ML or the kind of Hindley-Milner based type inference in quite powerful, but the downside is because it's so powerful, when something goes wrong you get these weird error messages. It cannot unify 5 pages of type that look exactly the same, but deep there, there is int and there is bool. Because these things are often unification based, the place where this thing goes wrong is far removed from where the actual error really originates. Yes, the type inference is very powerful, but the usability goes down. It's a delicate balance between making things predictable and usable and making it more powerful.
One of the nice things that I like, for example one of the design principles in C# type inference is that we never cook up a type out of thin air. You can see that from the type for the conditional operator. One way to do that is you look at the then part and the else part and you say "Let's find a common supertype for the 2", but now the compiler is cooking up some type that the programmer never wrote so you might get weird error messages because I never mentioned that type. Plus, what happens is that common supertype will often be object because there is int and string - that's object, which is basically a sign that something is wrong.
What we do there is we say "Well, there must be one type to which this one converts or to that one converts", we are never trying to infer a type. When we write a variable declaration we only look at the initializer to infer the type, we are not looking elsewhere, because you get this thing, the promise of unification where it might be because you assumed something here, but you know you violated there. Which one was wrong? Was it wrong there or was it wrong there? It's hard to pinpoint. Whenever you pick a type inference algorithm, you have to be very careful that it's predictable, or you change your program slightly and certainly you cannot infer any more, which becomes highly non-monotonic - that's a possibility.
It's careful engineering to get that right and I don't think there is a right answer. Another thing that people often ask is that "Why can't I infer the types for fields inside a class?" The thing there is because these things can be mutually recursive, so I can have int X = Y and they can even be dependent to one field and one class can depend on another field in another class. You get this very complicated systems of equations that you have to solve. Usually things are more complicated than you think at first sight.
If you are trying to make it too complicated, first of all there is a high chance that you introduce bugs in your type checker, which is something you don't want, plus it's hard to show errors and so on. There should be a good balance between the 2.
For example, in C# 3.0 VB 9 we have this anonymous types and it's not really anonymous type, it's a compiler generated type, so you cannot write these down. That's where I think I would have liked to see something that we could have done, because sometimes you are forced to do type inference and there are cases where you cannot. For example I cannot return one of those types from a method, so there is a discontinuity there, where you suddenly have to jump paradigms.
With a lot of these things it's hard, if the underlying Runtime doesn't support that to make that all work. It's all a matter of engineering and you only have a fixed amount of dollars or time to spend, so you do it and then later you can improve it. The beauty of language design is that it's engineering and art and everything mixed together where you are trying to search for an optimum, but is not clear that there is a real best way to do things.
With a lot of these things, I see the problem, but I wish I would know the answer. Then, I would probably retire, I could win the jury award or something like that. The thing with parallel programming - first of all that's a very diverse field, because people sometimes make distinction between parallel and concurrent. There are certain things, like certain algorithms where we know how to do it in parallel, because they are well-structured, there are design patterns, so you can break a big problem in small things that can be solved independently. We have more problems when it's unstructured and when there is asynchrony involved and error cases and so on.
In some sense, our brains are not really wired to think about these things. We like to break things up into smaller things, in a compositional way, whereas a lot of this asynchronous programming is not at all compositional, it's very messy. You cannot have a big problem and say "Oh, that's composed out of 2 things.", but that's how traditional programming works. You write an arithmetic expression A+B, then I can look at A and understand what it does in B and then I can reason about the whole.
We already mentioned this with side effects - side effects muddy the waters a little bit, because you cannot think compositionally because, maybe, if I do something here, that might change some global state and that might affect that thing here. I think imperative programming is inherently non-compositional, but usually we can wave our hands or if you reason with pre and post conditions, you reason about the whole state and you reason how each statement changes the one state into another state and if there is some predicate that holds for the state before then if there is another predicate that holds for the state afterwards, you can use invariants.
People have found all kind of ways to reason about states and still make it compositional by making that state explicit. Let me give you another example: power consumption. Everybody is building these giant data centers and they use a lot of power; power is a resource that your program uses, just like space or memory or whatever, but it's something that we've abstracted over. How do you know that, thus using a conditional, you use more power than using a why loop or maybe foreach loop uses less power.
There is no way we can reason - looking at our programs - about the power consumption. Yes, we can reason - and we don't often use this - the computational complexity. You can say it's an end square or a cubic or something like that and there is some theoretical work that talks about the power consumption, in circuit design and so on, but you look at your C# program and you say "Does this one use more or less power than that one?" We don't know. Now, you ask me what are the language constructs and if we can do that. I say "No, I see the problem, but I don't know how to do that".
The other thing is that notation is often designed to be compositional - that's how you write expression. For example A+B - you write smaller things that you can glue together. With asynchronous programming is the opposite. You wait for the values, these things happen independently and it's inherently non-compositional. The question is "Can we do the same that we did for state and turn something that looks non-compositional into something that's compositional?" I don't know the answer to that, but that's the Holy Grail.
I do believe that compositionality is essential for humans to understand things, but that also makes parallel programming so hard. With sequential programming you can say "OK, I look at this piece of code and I can follow it in my head: it does this and then that", so you can break it up in smaller pieces. In parallel programming, when things run in parallel, their combinatorial explosion in your head is just too big to understand it, so we have to find other ways to keep it in our minds. That will be one of the biggest challenges that we are facing.
15. Did you give up on Haskell?
I never gave up on Haskell. The nice thing about Haskell is, because it's pure you cannot cheat, you have to be pure. The language forces that, you cannot put a print statement in your code to do debugging, because that's a side effect and everything that depends on that expression has a side effect, too, so it forces you to be absolutely honest - there is no cheating. On the other hand, in reality, you don't need those strict rules.
It's like in normal society: there is a lot of loss, or rules, and if everybody would strictly follow the rules, the whole society would come to a slowdown. People sometimes do strike, they don't strike, but they can start following the rules, precisely saying "We can only work for 3 hours, then we have to have 30 minutes rest". If you are really that strict, then nothing works. It's the same with programming - in certain situations you don't have to be that strict about purity, but it's good to look at that.
The other thing with Haskell is that, because it's pure - Sammy Pages says it's like a pottery dish - you can do lots of experiments in the language that influence other languages and you have to be careful that you don't, by taking one idea from here, and putting it there, you don't get yourself in trouble. For example - that was one of your previous questions - because everything is pure and lazy here, I put it in this imperative context and there are some unintended consequences, because a side effect can happen and it messes things up.
You have to be a little careful about that. Haskell is a very good inspiration, a very good experimental platform and I do believe that being precise about contracts of functions is really important.
Haskell is definitely one example, but there are other examples, too, if you look at I4 or Spec# or programming by contract - those are things in a type system, people have been doing that in Scheme. Those are things that Haskell doesn't have. There are some proposals of doing contracts in Haskell. It's not only Haskell that will give us the inspiration, but we should look at all kind of languages and all other approaches. I always had either Haskell or GHC installed on my laptop, every month I write a little Haskell program, I do a lot of little prototypes in Haskell.
That's a tough question because I'm not a big believer in user defined syntax. Let me give you a simple example: in Haskell you can define your own operators with precedents and priority and things like that. You can build this domain specific things with that, but it's like every time you have to learn a new language when the syntax of one language resembling the syntax of a different language even though the semantics is different - that's a good thing. The nice thing about the fact that certain languages use the same alphabet is that it's become slightly easier to learn a new language because you don't have to learn a new alphabet. If you are trying to learn Russian or Chinese or Japanese, not only do you have to learn the new language, but you also have to learn a new notation and you have 2 moving targets.
You mentioned Haskell with monad comprehensions - the thing there is that since this works for monads, even though the underlying monad is different, the notation is the same, but because all the monads satisfy this monad loss or whatever, you can still understand that program without having to understand the monad. In some sense, the way I always explain to people, you just pretend it's the List monad or maybe even the Maybe monad - you take the simplest monad, you try to understand that program and then you understand it for whatever monad it was, because they all satisfy the same loss.
You can look at that syntax, take one interpretation and it works for another interpretation - that's what helps. So, it's one syntax, different semantics with all these semantics are essentially the same, because they embody the same concept. The same with comprehensions in C# and VB, with "Select", "Where" and so on. Then you say "Can't we do the same for blocks and so on?" and asynchronous workflows in F#, but there you are making a quantum jump, because if you look at syntax or the semantics of blocks in C# or VB, it's very complicated because of exceptions, because of non-local returns, break return, continue - all these things whereas you cannot just lift this out and overload the control structures.
We can overload only one thing, foreach statement, because that boils down to "GetEnumerator" and "MoveNext" and "Current". The rest you can borrow from the normal statements, if there are exceptions or non-local returns, but it's really hard because there are so many hidden assumptions inside a normal sequential code, that to overload that notation, the contract that you would have to satisfy would be quite complicated, whereas with monads or with LINQ expressions you implement these methods and then it works.
In some sense, if you look at workflows in F# it's similar - like there is a "while" statement in there, but they don't have unstructured control flow where there is a "switch" statement or there is a "break" or something. You can write a very limited subset, which really is more or less the same as writing LINQ comprehensions, which has conditional and "select many" is really a for loop. I don't think there is a real fundamental difference between the 2 - the syntax looks a bit different -, but it's a very restricted set of operators that you implement and then you can get this special syntax that you can overload.
In C++ people try to do that where you can overload assignment and dot and things like that. Apparently, people find that useful and people want to do it. They want to do smart point or some things like that, but if that set of operators either is big or has very subtle semantics, then you cannot use that conceptual knowledge or the conceptual reading of the program because you cannot really understand it - you have to know the implementation.
The answer of my previous question applies here, too. If you look at the async workflows or Parallel.ForEach()in the ParallelFX, it's a step forward, definitely a step in the right direction, but it's still a small step, because it's still really hard to write asynchronous program that maintains it's states across several asynchronous calls. You can write an async workflow, if you use a very restricted set of control structures, but you cannot have a statement in there or any other things, so we are definitely not on par with sequential programming.
It's good that people start to think about it, it's way overdue. We saw this coming, and now people start thinking about it, but we are just scratching the surface. Then, you can say "Our current program model is not the right one" - there I say that also evolved for a good reason. We had this question before - why do languages add new things? Because there are certain problems that you want to solve, so we add language features.
You can complain about a language that it's big, but you look at every feature and there was a good reason that it ended up in the language, because it's not cheap to add something to a language. For every feature that ends up in a language there are ten that are maybe a little bit used for whatever, but don't end up there. At some point there was a good reason to put it in, then probably these things are useful in some way also for the asynchronous case. As long as you don't have that kind of full symmetry between the 2 worlds, there is still a lot of work to do.