So it’s a very simple rule, it’s really very intuitive and what it means is that if you have a type that is a subtype of another type and you use an object of that subtype in a context where you expect an object of the supertype, then the object of the subtype ought to behave like you expect. In other words you’re depending upon the specification of the supertype and the object should meet that specification even though it might belong to a subtype.
Absolutely. It means the semantics; the language and the compiler will typically give you the syntax, but what really matters is what the different methods do, how they perform.
3. In your keynote, you gave an example of a stack and a queue, how would that apply in this case?
So you might define the stack and a queue to have exactly the same methods with the same names and the same arguments, but a stack is a LIFO behavior and queue is a FIFO and that’s a big difference semantically. So if you are using a queue, you expect a FIFO behavior and you would be very surprised if you’d got the other kind of behavior. So that’s the behavioral part, that’s the important semantics of the definition.
Our compilers today are not powerful enough to support or enforce semantic constraints. So we have to handle the semantics extra language and the typical way we do this is by writing specifications in English, for example you might use JavaDoc to explain the meaning of your type and your methods. And then it’s up to the programmer to make sure that they do the right thing, which means that it’s a little bit on not the firmest ground and requires training and understanding on the part of the programmers. We might hope in the future that the compilers or the verification systems will get better and would be able to do more enforcement.
My understanding with design by contract is that there is an enforcement but it’s not really compiler enforcement, it’s more runtime enforcement, which is a lot better than not having anything, but what you’d ideally like is to be able to write a specification that could be proved at compiler time, so that you wouldn’t have to worry about misbehavior at runtime. And I think verification systems are getting better but they are not at a point yet where we can do this on a widespread basis.
Well I don’t know when these techniques will become practical, but if you think about what’s going on in most languages or rather programs, typically you know what you’re dealing with and if you are allowing for a set of possibilities then you’ll start out with something that represents the set of possibilities and you explicitly refine, and the refinement step is dynamic but the specifications on both sides are not. So if you are able to refine from a type to some subtype, than after you’ve done that the compiler knows that you’re working with a subtype and if you could do verification you can then verify that you’re using it properly, and before you refine, you can verify that you are using it according to the supertype definition. So I think that you can have what’s basically a static analysis to handle these things merged with a little bit of dynamic mechanism where it’s needed.
Alex's full question: You also mentioned about modularity, which is really about being able to plug together different subsystems or different modules with a way of separating out, I think in the descriptions you were using earlier the initial part of it was to separate out the teams that were working on it. But subsequently having a clean separation of that interface makes it easier for other modules to use. Do you think that those are in a higher level specification for how a module behaves versus how the types within it behave and maybe has a separate notion of the entry points, if you like, to the module as a means of aggregating types so is the module just an example of a bigger type?
I think the module is actually an example of a bigger type or maybe more specifically if you think about putting distributed systems together it’s actually an object belonging to a bigger type and it has an interface of its own with its own specification and then this specification of course might - other types might show up in this specification, like you might take an object of type T as an argument returning S as a result, but still it’s all specifications and what’s going on inside should be invisible.
Well yes, probably a different way of saying it is you would like to protect the implementation of the module from misbehavior by the users, so that other users who aren’t misbehaving can actually rely on what’s going on.
I’m not actually familiar with those, I’ve heard the names but I don’t really know what they are, so I’m not sure how to answer it, I mean I really do believe that functions should be first class arguments, so to that extent, passing modules as parameters seems ok, but modules to me mean just a way of grouping a whole bunch of code, so they don’t seem to correspond so much to abstractions. I think that the currency of our programs is actually abstractions and modules are just a way of putting the code together, at least that’s my understanding of what we’re talking about, when we’re talking about modules.
It seems like this is just a lesson that has to be learned over and over again. I feel like we’ve talked about it 40 years ago, we’ve talked about it 30 years ago, Perl, after all, has been around for quite a long time, but it just seems it happens over and over again, and it just seems to be something that’s hard for people to keep in mind that read-ability matters more then write-ability and cleverness on the part of the compiler is not necessarily a good thing although it’s technically very interesting but it doesn’t mean it’s a good thing from the point of view of the code that results.
I think you shouldn’t have to think about what the compiler should do. In other words, you should be able to read a manual for the language, understand the language and read the code based on that understanding, without having to think about the fancy stuff that the compiler is doing.
I’ve mostly been working in distributed computing for the last 20 years and distributed computing is a bit different from what we’re encountering on the multi-core machines because in the distributed world there is no shared state. Every computer has its own state, if you want to share, you’d have to explicitly pass stuff back and forth. If you are working on a multi-core machine, there is a shared memory and so it isn’t clear that the paradigms that we use for thinking about distributed systems are going to carry over directly into the multi core machines. You can think of multi core machines as being distributed systems, but it’s not clear that’s the right way to think about them, because then you’re not taking advantage of the shared memory which can have a big impact on performance. So I think we’re right now learning how to use those machines and what will end up is not so clear.
Well the interesting thing is I actually don’t think eventual consistency is the right model for distributed systems. Because the problem is that it doesn’t give you good semantics and it works fine for certain kind of applications, it’s fine for Facebook, it’s fine when you’re building your shopping cart, it’s not fine when you’re paying your money and so it isn’t a model that’s adequate for all applications and if you try to use it as the basis of your programming of those applications, you end up reinventing how to build real consistency on top of eventual consistency. Now in the case of a parallel machine I absolutely do not understand right now what the right memory model ought to be, because if you use sequential consistency, which is a very strong memory model there’s a big loss in performance and so it’s actually something I’m thinking about right now, how far down the road can you go with using a weaker semantics without ending up having a lot of trouble building your applications at the next level up.
Alex's full question: Is there any issue that databases really enforce some state rather than a transition between states and that if you could encode a transition and compose those transitions than perhaps it wouldn’t matter in which order you do them. You used it as your example of some kind of bank accounts. If your bank account could apply two transactions in either order than could there be some reasoning of the programs where you’re looking at the deltas, if you like, the changes rather than absolute values and then apply them?
Well first of all, if you have operations that are commutative, so it doesn’t matter what order they’re in, then this gives you a lot more flexibility in how you implement things, but many operations are not commutative and in that case the order in which they happen matters. Now whether you capture that by means of a log, and you run the log later to achieve the state or whether you operate on them right away and produce the state, I don’t think that matters. What matters is the semantics and if the semantics depends on an order you have to have a way of enforcing that order, one way or the other.
So if I map this into what I’ve been doing in distributed systems, we think of partitioning the database, so that you take “That table is over there, that table is over there” and everything works beautifully as long as your transactions only use one of those tables, but you always have to take account of the fact that some transactions will use multiple tables and then you have to have the answer for how that works. So it’s really a kind of a performance game that they’re playing. If you mostly have transactions that work on just one partition, everything works beautifully and if it’s just a once in a while that you have to have multiple sites involved, then you’ll pay the price for that, when it happens. So it depends on whether the data allows that kind of distribution or not.
Alex: And the presumably you have to include some sort of third party observer or entity or transaction system, when you have the different systems.
Yes, absolutely, you have to have transactions to really get true consistency.
Well I think the work on distributed computing is been all over the map. So there certainly has been work in which they’ve looked at “could we off-load some of the processes into the client machines” and this depends on what the client machines are, whether they’re available for other stuff and so forth. There has certainly also been a lot of work on disconnected operation, which you know, if you start to think about people are going to put stuff on their cell phones, you have to worry that, if they’re storing something essential, nobody else will be able to get hold of it for a while and so I think the model is very flexible and I think there’s been a lot of thought about these kinds of issues over the past 20 years, what’s been changing it’s the technology, so every time the technology changes, you have to sort of rethink things to see whether now, with the new technology, something that didn’t make sense before, makes sense now.
Alex's full question: Some time ago, there were discussions of agent-based processes, where you could have an agent on your behalf to go out and execute perhaps on one machine and then move to a different machine and then execute there to get data and then come back to you, has there been much progress in that as a view point or do we think that the agents themselves almost violate modularity and the fact that they need to have internal knowledge of how other systems work?
Well they could be visiting and still using stuff at an interface level, so you could imagine an agent that sort of flows around world, and as long as it runs at an abstract level, it’s not going to be violating modularity. Whether it’s better than “Here I am on my client machine and first I go there and then I go there and then I go there”, that’s less clear and sometimes it doesn’t seem to really matter, it’s more like an implementation, you just change the implementation a little bit, but it’s basically the same thing.
Alex: I think the real key in those things as you say are the interfaces and how you know where those services are and how you talk to them, so in some sense, service discoverability comes into that as well.
Absolutely, but you always need that when you’re dealing with a distributed system and typically you always put on to the client machines some code that’s able to help, figure out where stuff is.
Barbara Liskov, thank you very much.
You’re welcome!