InfoQ Homepage Podcasts Generally AI - Season 2 - Episode 4: Coordinate Systems in AI and the Physical World

Generally AI - Season 2 - Episode 4: Coordinate Systems in AI and the Physical World

Oct 23, 2024

Podcast with

Anthony Alford
Senior Director, Development at Genesys Cloud Services
Roland Meertens
ML engineer building self driving cars at Wayve

In this podcast, Roland Meertens and Anthony Alford discuss coordinate systems, both in AI and the physical world. They explore how a library's classification systems mirror the concept of embeddings in AI, where documents are organized based on similarity and how AI tools like RAG use vector spaces to efficiently retrieve the right content.

Key Takeaways

Retrieval-Augmented Generation (RAG) helps large language models (LLMs) access external content efficiently by finding relevant information using vector similarity.
Techniques like cosine similarity and nearest neighbor search help improve the efficiency of AI in retrieving data, especially when dealing with high-dimensional vector spaces.
Different map projections, like Mercator and Mollweide, balance the trade-off between preserving size, angles, or distances, each serving specific purposes such as navigation or area accuracy.
Historical coordinate systems, like the UK's Ordnance Survey and the Netherlands' triangle system, have technical inconsistencies that can still cause confusion when overlaid with modern GPS.
Legacy mapping systems are often retained to preserve land ownership records and avoid re-measuring boundaries, leading to interesting technical adaptations, such as correction grids in the Netherlands.

Subscribe on:

Transcript

Roland Meertens: Anthony, did you ever have to program a turtle robot when you were learning to program?

Anthony Alford: I've never programmed a turtle robot, no.

Roland Meertens: Okay, so I had to do this when I was learning Java and in robotics, the concept of a TurtleBot is often that you have some kind of robot you can move across the screen and it has some kind of pen, so it has some trace. So you can start programming, go upward or go forward by one meter, then turn right by 90 degrees, go forward by one meter, turn right by 90 degrees, so that way you trace a pen over a virtual canvas.

Anthony Alford: The Logo language was based on that, right?

Roland Meertens: Yes, indeed. So the history is that the computer scientist Seymour Papert, who created a programming language Logo in 1967, apparently they use this programming language, so these are things I don't know, to direct like a big robot with a pen in the middle, which would let you make drawings on actual paper.

Anthony Alford: Okay.

Roland Meertens: It's pretty cool, right?

Anthony Alford: It's a bit of a plotter, a printer.

Roland Meertens: So apparently in 1967 people learned to program with a physical moving plotter. They immediately start with a robot-

Anthony Alford: That's pretty cool.

Roland Meertens: Yes, instead of using a virtual canvas. It was round and crawled like a turtle. But the other thing I found is that turtle robots, their first mention is in the 1940s they were invented by Grey Walter and he was using analog circuits as brains, and his more advanced model could already go back to a docking station when the battery became empty.

Anthony Alford: That's pretty cool. In the '40s.

Roland Meertens: In the 1940s. Yes. I will put the video in the show notes and I will also put two articles in the show notes. One is the History of Turtle Robots. Someone wrote an article about it for Weekly Robotics and another article on the history of turtle robots as programming paradigms.

Anthony Alford: Very cool.

Roland Meertens: Yes.

Anthony Alford: Slow and steady.

Roland Meertens: Yes, slow and steady and is a great way to get started with programming.

At the Library [02:15]

Roland Meertens: All right, welcome to Generally AI, Season 2, Episode 4, and in this InfoQ Podcast, I, Roland Meertens, will be discussing coordinate systems with Anthony Alford.

Anthony Alford: How's it going, Roland?

Roland Meertens: Doing well. Do you want to get started with your coordinate system research?

Anthony Alford: Let's go for it. So I decided to go with an AI theme of coordinates and perhaps you can guess where I'm going. We'll see.

Roland Meertens: Tell me more.

Anthony Alford: Well, in the olden days, a teacher, for example, in a history class would often ask me and other students to write a paper about some topic. So let's say the topic is the Great Pyramid of Egypt. Now probably most students don't know everything about the Great Pyramid and the teacher says anyway, "You have to cite sources", so you can't just write anything you want.

Roland Meertens: I always hate this part. Yes, I always say, "I found this on the internet. These people can't lie".

Anthony Alford: Well, I'm talking about the days before the internet. But in the 20th century, let's say, we would go to the library, an actual physical building, and there would be a big drawer full of small cards, the card catalog. These are in alphabetical order, and so we'd scroll through till we get to the Ps and then P-Y, pyramid. Great. Pyramid of Egypt, right?

Roland Meertens: Yes.

Anthony Alford: This card has a number on it. This is the call number for books that are about the Great Pyramid of Egypt. So in the US a lot of libraries use a catalog system called the Dewey Decimal system for nonfiction books. It's a hierarchic classification system.

Books about history and geography in general, they have a call number in the range from 900 to 999. Within that, books about ancient history are in the range of 930 to 939. Books about Ancient Egypt specifically have call numbers that begin with the number 932. And then depending on what ancient Egyptian topic, there will be further numbers after the decimal point.

Roland Meertens: And maybe a weird question, but were you allowed to go through these cards yourself or did you ask someone else like, where can I find information about Ancient Egypt?

Anthony Alford: Both methods do work. If you're young and adventurous, perhaps you'll go to the card catalog and start rifling through. But yes, in fact, a lot of libraries had a person whose job was to answer questions like that: the reference librarian.

Roland Meertens: Yes, because I'm too young, I never saw these cards. My librarians would already have a computer they would use to search.

Anthony Alford: Right. But the point there is that the card catalog is pretty familiar to us in---speaking of search, the card catalog is an index. And it maps those keywords like Great Pyramid of Egypt, it maps those keywords to a call number or to maybe multiple call numbers.

Actually, university libraries, in my experience in the US, they don't use Dewey Decimal, they use a different classification, but the idea is the same. Anyway, it's a hierarchy and it assigns a call number to each book.

So to go actually get the physical book, it's hopefully on a shelf that's in a cabinet. We call these stacks. That's the lingo. So the classification hierarchy is itself mapped physically to these stacks. There will be a row of cabinets for the books that are in the 900 to 999 range and maybe one cabinet for the 930 to 939, and then maybe one shelf for 932 and so on. Now that I think of it, this structure is itself somewhat like a pyramid.

Roland Meertens: Perfect example.

Anthony Alford: Hopefully if nobody's messed with them, the physical order of the books matches the numeric order. So you're doing an index scan or index search if we're thinking about it in terms of a database or information retrieval. Because that's what this is, it's literally information.

Roland Meertens: Yes. And it is good that it's indexed by topic because otherwise you don't know if you're searching for P for pyramids or G for great pyramids or E for Egyptian great pyramids.

Anthony Alford: Right. If you're not talking to the reference librarian, you might try all those keyword searches in the index. So now that I've got a couple of books, I can use that content in those books to help me produce my essay about the Great Pyramid.

Now that was the bad old days of the 20th century. Here in the 21st century, it's like you said: you do an internet search or maybe you read Wikipedia. That's just the first quarter of the 21st century. Now we're into the second quarter of the 21st century, and we're in a golden age of AI. We don't have to even do that. We just go to ChatGPT and copy and paste the assignment from the syllabus web page as a prompt and ChatGPT writes the essay.

Roland Meertens: Quite nice, quite neat.

RAG Time [07:35]

Anthony Alford: Well, in theory. So there's a couple of problems. First, the teacher said, "Cite your sources", and you have to do that---in the content where maybe you quote something or a reference you need to put in there. Another thing is ChatGPT is good, but maybe it's not always a hundred percent historically accurate.

Roland Meertens: Yes, it sometimes makes up things.

Anthony Alford: And it really only knows things that are in its training data, which is large, but maybe there's some really good books or papers that are not on the internet that might not be in that training data. So I think you know what is the answer.

Roland Meertens: Are we going to retrieve some data before we're processing it?

Anthony Alford: Yes, it is RAG-time. So the key technology now is retrieval augmented generation, also known as RAG, R-A-G. So the general idea, we know that if we give an LLM some text, like the content of a history book, LLMs are quite good at answering questions or creating summaries of that content that you include with your prompt.

Now ignore the problem of limited context length, which is a problem. The other problem is: how do you know what content to provide it?

Roland Meertens: Yes, you can't give it the entire stack of books.

Anthony Alford: Exactly. And even if you had the content electronically, and you had picked it out, you want to automate this, right? You don't want to have to go hunt down the content to give to the LLM.

So finding the right history book, the right content in an electronic database of content, well, we already said it. This is information retrieval. And again, in the old days we'd use natural intelligence: we would use the reference librarian or go look up some keywords in the card catalog.

Roland Meertens: It is too bad that the librarian is not very scalable.

Anthony Alford: Exactly right. We want to automate this and scale it. So let's take an analogy. The key idea of RAG is: take your LLM prompt and automatically assign it a call number. So now you can go directly from your prompt---your instructions for writing the essay---now we have automatically assigned it a call number, and now you just go get those books automatically and add that with your prompt.

Roland Meertens: Sounds pretty good.

Anthony Alford: Yes, more precisely: we have an encoder that can map an arbitrary blob of text into a point in some vector space with the requirement that points that are near each other in this space represent texts that have similar meanings. So typically we call this an embedding.

So we take an encoder, we apply the encoder to the prompt that turns that into a vector. Then we have all of our books in the universe, we have encoders applied to them, and we get vectors for them. We find the vectors that are close to the vector for our prompt. So easy-peasy, right?

Roland Meertens: Easy-peasy.

Anthony Alford: Right. Well, here's the problem. So the encoder-

Roland Meertens: Encoding your data.

Anthony Alford: Well, there's that. Well, I'm just going to assume somebody encoded all the books in the library. That's a one-time job. The problem is that people usually use BERT as the encoder. Well, the embedding vector that you get is 768 dimensions. And so the question is: what does it mean to be nearby something in a 768-dimensional space?

Roland Meertens: Yes, that depends on what distance function you want to use.

Anthony Alford: That's exactly right. With call numbers, it was easy because they’re scalars. So the distance function is: subtract.

Roland Meertens: Oh, it's quite interesting. I never even realized that call numbers could be subtracted.

Anthony Alford: Well, that's how you do it, right? If you go to find your book 932.35, you probably don't do a scan. You probably do some kind of bisecting search, or you know you need to go over to the 900s, and then you jump to the middle of the 900s and scan back and forth depending on the number that you're at.

Roland Meertens: And also for the library, it of course makes sense that they put books which are similar close together.

Anthony Alford: Yes, well, you physically store them in order of their call number.

Roland Meertens: Yes.

Cosine Similarity [12:04]

Anthony Alford: More or less. Anyway, like you said, this distance, the closer they are to zero, like the closer the two call numbers are together, physically the books are closer together.

So anyway, we need a distance function, or the opposite of a distance function, which is a similarity, right? The smaller the distance, the more similar. In the case of these embeddings, people typically use a similarity measure called cosine similarity. Now, if you've ever worked with vectors, you probably remember the inner product or sometimes called the dot product.

To explain this without a whiteboard, let's say we're in 3D space. So each vector has X, Y, and Z. The dot product of two vectors is you take the X from the first one, multiply by the X from the second one. Then you do that for the two Y components, and then the two Z components, you add those all up. That's the dot product. And that's a single number, a scalar.

Roland Meertens: Yes.

Anthony Alford: The geometric interpretation of the dot product is: it's the length of the first one times the length of the second, and then times the cosine of the angle between them. So you could divide the dot product by the length of the two vectors, and what you're left with is the cosine of the angle. And if they're pointing in the same direction, that means the angle is zero and the cosine is 1. If they point in the opposite directions, the cosine is -1. And in between there, if it's zero, they're at right angles.

Roland Meertens: Yes, intuitively, you always think that it doesn't really matter what the magnitude of the interest is, as long as the interests are at least in the same direction, it is probably fine in your library.

Anthony Alford: Yes, and I'm going to explain why. Anyway, the cosine similarity is a number between -1 and +1. And the closer that is to +1, the nearer the two embeddings are for our purposes. And you may wonder why cosine similarity. So again, with 3D space, X, Y, and Z, there's a distance called the Euclidean distance, which is our normal "The distance between two points is a straight line", right?

Roland Meertens: Yes.

Anthony Alford: So you basically take the X is the square of the Y is the square of the Zs, add them up and take the square root.

Roland Meertens: As long as we are in a Euclidean space, that's the case.

Anthony Alford: And in vector terms, that's just the magnitude of the vector drawn between those two points. Well, if you wonder why you don't use that, why instead you use cosine similarity, if you look on Wikipedia, it's something called the curse of dimensionality.

Basically, when you have these really high-dimensional spaces, and if the points are uniformly spread around there, they actually aren't. The middle of the space and the corners of the space are empty-ish. And most of the points are actually concentrated near the surface of a sphere in the space.

So when all the points are on a sphere, their magnitudes are more or less all the same. And so you don't care about them. And so the thing that makes them different points is there are different angles. They are at different angles relative to some reference. So that means we don't care about the magnitude of vectors in the space, we care about the direction, and that's why cosine similarity.

Roland Meertens: Is there any reason that the magnitude of the vectors tends to be the same?

Anthony Alford: It's just the way that these sparse high-dimensional spaces…it's just the math that they work out. And in fact, because the magnitudes are all more or less the same or at least very…you can take a shortcut, you can just use the dot product. You don't have to get the cosine similarity, you can just do the dot product That's a nice shortcut because GPUs are very good at calculating dot products.

And so let's back up, right? We take our prompt, we encode it, we've already encoded the content of all the library. We just find the vectors in the library that have the largest dot product with our prompt vector. And in the original RAG paper they did that. It's called maximum inner product search. So basically you take your queries vector, you do the dot product with the vector of all the documents and take the ones that have the biggest.

What's the problem now? I bet you know.

Roland Meertens: What is the problem?

Anthony Alford: Well, the problem is you have to---basically every time you have a new prompt, you have to go and calculate the dot product against every other document.

Roland Meertens: If only there was a better way to store your data.

Who Is My Neighbor? [16:50]

Anthony Alford: Well, there's a better way to search it turns out. The default way is linear complexity. So for a small library, it may be no big deal, but if we're talking about every book ever written, well, if you compare it with index search in a database, that's complexity around log(n). So linear is way worse. It's terrible. So again, it turns out this is a well studied problem and it's called nearest neighbor search.

Roland Meertens: Approximate nearest neighbors or exact nearest neighbors?

Anthony Alford: Well, one is a subset of the other. So if you go back to the database search, that's log(n), and you can actually use a tree structure for nearest neighbor search. You can use something called a space partitioning tree and use a branch and bound algorithm. And with this strategy, you're not guaranteed log(n), but the average complexity is log(n). But this usually is better in a lower dimensional space.

Roland Meertens: Okay, so why is it on average? Do you keep searching or-

Anthony Alford: Well, I think it is just like you're not guaranteed, but based on the statistics, you can mathematically show that on average you get a log(n) complexity. But remember your favorite algorithm-

Roland Meertens: My favorite algorithm.

Anthony Alford: What was your favorite algorithm?

Roland Meertens: HyperLogLog.

Anthony Alford: Right. So, you already said it, approximate nearest neighbor. When you want to do things at scale, you approximate. So it turns out that a lot of RAG applications use an approximate nearest neighbor search or ANN, which also stands for "artificial neural network". But just a coincidence.

So there are several algorithms for ANN and they have different trade-offs between speed and quality and memory usage. Now, quality here is some kind of metric like recall. So with information retrieval, you want to get a high recall, which means that of all the relevant results that exist, your query gives you a high percentage of those.

One of the popular algorithms lately for ANN is called hierarchical navigable small world, or HNSW. HNSW is a graph-based approach that's used in a lot of vector databases. I actually wrote an InfoQ news piece about Spotify's ANN library, which uses HNSW.

Roland Meertens: Oh, is it Voyager?

Anthony Alford: That's correct, yes. You must have read it.

Roland Meertens: Oh, I tried it. It's pretty cool.

Anthony Alford: Oh, okay. Well, you know all about this stuff.

Roland Meertens: Oh, I love vector searching.

Anthony Alford: So I found a nice tutorial about HNSW, which we'll put in the show notes, and it expressed a very nice definition, concise:

Small world, referring to a unique graph with low average shortest path length, and a high clustering coefficient navigable, referring to the search complexity of the sub graphs which achieve logarithmic scaling, using a decentralized greedy search algorithm and hierarchical, referring to stacked sub graphs of exponentially decaying density.

So all of this to find out: who is my neighbor?

Roland Meertens: Who is your neighbor, and in which space are they your neighbor?

Anthony Alford: Yes. So I think I've filled up my context window for today. And for homework, I will let our listeners work out analogies between this topic and library stacks and pyramids.

Roland Meertens: For library stacks, I'm just hearing that they could have multiple boxes with the stacks and you just move from box to box, from room to room.

Anthony Alford: So here's a very interesting thing. Here in my hometown, there's a university, North Carolina State University, their engineering library has a robot that will go and get books out of the stacks. It's basically an XYZ robot, and it'll move around and get books out of the stacks for you.

Roland Meertens: Oh, nice. That's pretty cool.

Anthony Alford: Yes, it looks really cool.

Roland Meertens: Always adding an extra dimension, then you can represent way more knowledge.

Anthony Alford: So that's my fun fact.

Roland Meertens: That's a pretty good fun fact.

Real World Coordinate Systems [22:02]

Roland Meertens: All right. For my fun fact for today, as the topic is coordinate systems, for software there's many ways to represent a map in location software. So this can be important for your user data, maybe for helping with people and finding where they are, finding interesting locations close by, and the most popular format here is WGS84.

But what I wanted to dive into is the history of coordinate systems, especially how different countries chose them, and some of the legacy systems which are still in place because the history of coordinate systems, and of course longer than just computers, people wanted to know who owned what land for quite a long time, people wanted to know how to get somewhere for quite a long time.

And, first of all, there's different ways to project a map. So you want to have a map in 2D, and our Earth is a sphere. In that way, you can project a sphere onto a cylinder, a cone, or just a flat disc on top of the sphere, and you always get some kind of compromise.

So you can choose to keep the angles of the map accurate. That's, for example, the Mercator projection used by Google Maps. So if you're going on Google Maps, you're zooming out, then all the angles are preserved, but the sizes are not very true.

One fun question, by the way, maybe you can help me out with this, Anthony, is that I always ask what is bigger, Greenland or Australia, and by how much?

Anthony Alford: Oh, Australia is quite large. And again, I think the Mercator projection distorts our view of Greenland for those of us who are familiar with it. Australia is much larger, but I couldn't tell you like by a factor or whatever.

Roland Meertens: Yes, I like to ask this to people because first I ask them what they would estimate and then I show them the Google Maps projection and I ask them if they want to change their guess. And sometimes people change it in the wrong direction, even though they know that Mercator doesn't preserve size, even though they know that the map is lying. They just can't get around the fact that Greenland looks really big on the map.

So if you want to fix this, you can use the Mollweide equal-area projection to ensure that all the map areas have the same proportional relationships to areas on the Earth. And the other thing you can do is if you want, for example, to keep the distance constant, there are equidistant projections that have a correct distance from the center of the map.

So this is useful for navigation, for example, if you want to have something centered around the UK that you at least know if I want to go here, it's equally far as if I want to go here. And here, another fun fact for you is that azimuthal equidistant projection is the one they use for the emblem of the United Nations: this emblem where you see this map from the North Pole, that is an azimuthal equidistant projection where the distance is constant.

Anthony Alford: Okay, nice.

UK Ordnance Survey Maps [25:27]

Roland Meertens: But as I said, I wanted to talk a bit about other systems in the world and which projection they pick and perhaps some of the technical depth and incredibly smart choices they made when doing so.

And, first off, in the UK they have the Ordnance Survey Maps. It's basically the national mapping agency for Great Britain. And in a previous episode of Generally AI, I already told you about multiple telescopes in the observatory in Greenwich, right?

Anthony Alford: Right. Yes.

Roland Meertens: And I think I also told you that they have multiple telescopes which all have a different prime meridian line, which indicates zero or used to indicate zero. I discovered that the Ordnance Survey meridian was picked in 1801, which is 50 years before this newer prime meridian was released. And nowadays with GPS, the prime meridian moved again. But the Ordnance Survey Maps are basically two prime meridian switches away from what it used to be.

Anthony Alford: I don't know, but I'm guessing from the name that they would, in the worst case scenario, use these maps to choose targets for artillery. So hopefully they don't miss.

Roland Meertens: No, actually what I think is probably a good reason to keep the Ordnance Survey Maps the same is that they probably use it to determine whose land belongs to whom.

Anthony Alford: Sure.

Roland Meertens: So you want to be able to keep measuring in the old way as you already determined who owns what land.

Anthony Alford: Makes sense.

Roland Meertens: Otherwise, but we will see this later in this episode, you start publishing error maps like the Netherlands is doing. But it's interesting that since 1801, when they picked this survey meridian, they were for a long time simply six meters to the east of what people started to call zero for a long time.

I can also imagine that this is still confusing nowadays if people use their own GPS device and compare it to some older document from the 1800s and discover that their place is very much farther away from where they thought it should be. But I'll post an article to this Ordnance Survey Zero Meridian in show notes.

Netherlands Triangle Coordinate System [27:49]

Roland Meertens: Anyway, moving to a different country, in the Netherlands, the geographic information system, the GIS system, is called Rijksdriehoekscoördinaten. So it's a "national triangle coordinate system". And as you can already guess, this mapping is accurate in angles and Wikipedia says it approaches being accurate in the distances, so it's not accurate in distances.

Anthony Alford: Oh, I see. And so I guess it's basically you need to orient in the right direction, but the distance is approximate? Is that-

Roland Meertens: Well, the thing is that if you have these coordinates, the angles between your coordinates are the same as the angles in the real world.

Anthony Alford: Cosine distance!

Roland Meertens: Yes. So the coordinates are in kilometers and then meters, right? It's just that one kilometer in coordinates isn't a kilometer in the real world. So one kilometer on the map in coordinates isn't necessarily one kilometer in the real world. So the center of the map is a church in Amersfoort, so basically in the center of the Netherlands. Around there, the scale is 10 centimeters per kilometer too small.

Anthony Alford: Interesting.

Roland Meertens: Yes, I mean, it's not a big error, it's just only 10 centimeters.

Anthony Alford: This reminds me again of the last season where the king found out that his land was smaller than the map said it was.

Roland Meertens: Yes. So if you would take the Dutch triangle coordinate system and then determine that you're going to walk 10 kilometers in the center of the Netherlands, you would have walked one meter too little after walking 10 kilometers.

Anthony Alford: Would you even notice though, right?

Roland Meertens: Indeed, you probably wouldn't. On the edges, so if you go towards the coast areas into Germany, it's 18 centimeters per kilometer too large.

Anthony Alford: So you could wind up in Germany and not know it…or would you know it? You might know.

Roland Meertens: You will find out that you're crossing the border because it says you're crossing a border.

Anthony Alford: Well, wait, Schengen, you guys are all…you just walk, right?

Roland Meertens: Yes, from where my parents live, you can very easily cycle to Germany. But it's interesting that because you have such a small country, you can project things in a flat way and-

Anthony Alford: And the country is rather flat as well, I believe.

Roland Meertens: The country is rather flat as well. Yes, indeed. I will get to the height of the Netherlands actually, because that's also interesting because they use different landmarks than the landmarks used for the triangle coordinate system.

Anthony Alford: Okay.

Roland Meertens: So as I said for the triangle coordinate system, the center of the coordinate system, let me tell you a fun fact about that first. So that's a church in Amersfoort. And if you look at the coordinates, there's an X and Y component where X goes from west to east and Y goes from south through north. That's relatively simple.

But the X coordinates are between zero and 280 kilometers. The Y coordinates in the Netherlands are between 300 and 625. So (0,0) is basically somewhere to the north of Paris. And the nice trick here, which I think is just genius, is that all the coordinates in the Netherlands are positive and the Y coordinates in the Netherlands are always larger than the X coordinates-

Anthony Alford: Interesting.

Roland Meertens: ... unlike continental Netherlands. So this removes all the possible confusion around what coordinate. So if I give you two coordinates, I don't even have to tell you this is X, this is Y.

Anthony Alford: Got it.

Roland Meertens: I can turn them around, I can flip them around. Because as a software engineer, whenever it says coordinates, you get two numbers. I always plot latitude, longitude, trying out combinations to make sure that everything is correct. And here in the Netherlands, if only people would use the national triangle coordinate system, there would be no confusion in your software.

Anthony Alford: Is that a thing that most Netherlanders are aware of?

Roland Meertens: Probably not. I must also say that this coordinate system is not used a lot. Probably mostly for people who are doing navigation challenges or scouting or something.

Although I must say that it is quite nice to take one of those maps because they are divided in a very nice way. It's very clear how far everything is because with latitude and longitudes, the distance between one latitude or one longitude is different depending on where you are on Earth, right?

Anthony Alford: Yes. But there's a conversion to nautical miles, but I can't remember it off the top of my head.

Roland Meertens: That's a good point. I wanted to say in the Netherlands it's fixed, but we just learned that it's 10 centimeters per kilometer too small in the center and 18 centimeters per kilometer too large in the edges.

Anthony Alford: But originally part of the development of the metric system was to take the circumference of the Earth and make fractions of it to be the meter originally. I don't think it worked out.

Roland Meertens: I think there's also a map system where they try to keep the patches the same area, but then you get problems when you want to move from patch to patch. So if you have coordinates or if you have a route which crosses multiple patches, one point on one patch doesn't necessarily map to the same place on another patch.

Anthony Alford: It's a tough problem.

Roland Meertens: Yes, and that's why I like to talk about it. It's a lot of technical depth and it becomes more difficult once you start doing things with software or self-driving cars or things like that.

In terms of technical depth, the original map of the Netherlands was made between 1896 and 1926. And as you can imagine, we now have way more accurate mapping tools, but I already alluded to the fact that if you already mapped out a place and you say this is your property, you can't really say, oh, there's a new coordinate system, let's go measure everything again and assign this again.

So what they do in the Netherlands, I think on three different occasions they published a correction grid with corrections up to 25 centimeters. So you can take an original coordinate and then apply the correction grids to get the coordinates in what is actually measured.

Anthony Alford: Gotcha. Well, not to derail your talk, but here, again, in North Carolina we have a border with another state, South Carolina, and about 10 years ago they had to adjust it. Basically the border had become ambiguous. It was unclear where it actually was. And so they fixed it and agreed on where the border is. And there were some people who woke up one morning in a different state without having to move.

Roland Meertens: I can tell you one other fun fact about borders in the Netherlands and between Germany and that is that in the Netherlands after World War II, there were some proposals around like, can we maybe have some part of Germany to make up for the Second World War?

So they got a few parts of Germany, but those are super small regions like a village or something. And this wasn't really working out, taking a long time to move people, make sure everything was working well, build schools, et cetera.

So at some point they gave it back, but then weeks before they were giving back this country, big trucks would already start moving in with loads of goods in them. They would find places in the village to park and hours before this transition happened, big trucks would show up with loads of butter inside. So basically at 12 o'clock at night, the country swaps and these goods never crossed a border, so they didn't have to pay taxes.

Anthony Alford: Loophole.

Roland Meertens: Yes. So they found a loophole which you could only do one night because some parts changed country overnight.

Anthony Alford: Interesting.

Roland Meertens: One last fun fact here about coordinate systems. You already said the Netherlands is quite flat. Good point. But this grid only tells you XY coordinates and it's mostly based on locations of church towers to measure angles between. So it's quite neat. Those are relatively consistent places and you can see between them.

There's a separate mapping for height above sea level, the new Amsterdam Ordnance Datum, and this is actually used in a lot of Western European countries. And these points are indicated by screws on specific buildings. And I know this because once in high school we had to make an accurate map of a field close to a school and I was tasked to propagate the height from this screw to the rest of the field.

Anthony Alford: Wow.

Roland Meertens: We actually had these systems they use in professional area measuring setups.

Anthony Alford: The surveying tools…a transit.

Roland Meertens: There was something where something was perfectly flat and then we would stand somewhere with a height meter, measure the difference in height, place the measuring device somewhere else, have the person with the height meter stand somewhere else.

We also had to do it twice because the first time we made a mistake, I don't know anymore what we did, but it's just teenagers trying to come up with a way to measure a field.

Anthony Alford: Very cool.

Words of Wisdom [37:40]

Roland Meertens: All right. Words of wisdom. Did you learn anything in this podcast or did you learn anything yourself recently?

Anthony Alford: The fact that all the points in a high dimensional space are on a sphere was new to me. Maybe not all, but the fact that they all more or less have similar magnitude. That was an interesting fact that I was not aware of.

Roland Meertens: You would say that that means that there is space in the high dimensional space left over. The place in the middle and the corners could be utilized to store more information.

Anthony Alford: One would think, but then that would mess up the assumption of the cosine distance.

Roland Meertens: Yes, but more space to store. It's free. It's free storage.

Anthony Alford: Just add another dimension.

Roland Meertens: Yes, that's why I always throw all my stuff on the floor in my room. I pay for it, I can store it wherever I want, everywhere in the space.

Anthony Alford: Definitely.

Roland Meertens: One thing from my side in terms of learning things, one recommendation I want to give you is, have you heard of the post office scandal in the UK?

Anthony Alford: No. Tell me.

Roland Meertens: It's quite interesting. So the post office in the UK adopted a bookkeeping system by Fujitsu called Horizon, and it was basically plagued with bugs. Sometimes the system would duplicate transactions, sometimes it would change some balance when users would press enter at some frozen screen multiple times. So you're like, oh, it's frozen…let’s press enter.

Every time something would happen with your balance.

And it was possible to remotely log into the system. So Fujitsu or Horizon could remotely change the balances on these systems without the postmasters knowing. And I learned last week that rather than acknowledging these bugs, these postmasters were sued for the shortfalls in the system because the system would say, you owe us £30,000.

Anthony Alford: Oh, wow.

Roland Meertens: Yes. And so these postmasters were prosecuted, got criminal convictions, and this is still going on and still not fully resolved today.

Anthony Alford: That's terrible.

Roland Meertens: It is absolutely insane. So I watched this drama series called Mr. Bates versus The Post Office, and I can definitely recommend you to watch this because it tells you a lot about impact your software can have on individuals and to what great length companies are willing to go to hide the impact of bugs or systems like this.

Anthony Alford: Goodness gracious.

Roland Meertens: Yes, it's insane. We can do a whole episode about the post office scandal I think.

Anthony Alford: That would be depressing.

Roland Meertens: Yes, but I must say it's very interesting. Every time when you read about this and you think, surely by now they will acknowledge that there can be problems like this, the post office just doubled down, hired more lawyers, created bigger lawsuits, and absolutely ruined the lives of people who were postmasters in the last 20 years actually.

Anthony Alford: Wow.

Roland Meertens: As I said, can recommend this as a thing to watch.

Anthony Alford: Sounds good.

Roland Meertens: Anyways, talking about recommendations. If you enjoyed this podcast, please like it, please tell your friends about it. If you want to learn more things about technology, go to InfoQ.com. Take a look at our other podcasts, take a look at our art course and the conference talks we recorded. Thank you very much for listening and thank you very much, Anthony, for joining me again.

Anthony Alford: Fun time as always.

Roland Meertens: Fun time as always. Thank you very much.

Anthony Alford: So long.

Roland Meertens: Any last fun facts you want to share?

Anthony Alford: Well, I don't know if we want to put this one on the air, but I was looking at how property is described here in the US in a legal document. So you may know, you may not, that we have a system called Township and Range, and I think it was invented by our President Thomas Jefferson.

After our Revolution, we had all this land that legally speaking was not owned by anyone. So they divided it up into a grid. They laid a grid out over it. So here's a description of a piece of property:

Township four north, range 12 west. The south half of the north half of the west half of the northeast quarter of the northeast quarter of the north half of the south half of section six.

Roland Meertens: Okay. Yes. So they made a grid and then they went really, really, really, really deep.

Anthony Alford: Subdividing the grid. Yep.

Roland Meertens: Yes, I do like that. When people started mapping this, they were probably like, ah, there's so much land. It doesn't really matter how accurate this is. Probably North US, South US is probably enough.

Anthony Alford: Well, what's interesting, a surveyor was sort of a high status job in the colonial days. George Washington was a surveyor, and Thomas Jefferson amused himself by designing buildings. So these guys took it pretty seriously. That was the age of the Enlightenment and Renaissance men and all that.

Roland Meertens: But if you are not good at mapping, you don't come home on your ship.

Anthony Alford: Yes, exactly.

Roland Meertens: And if there's no maps of roads or you don't know where you are, you don't reach the village you wanted to get to.

Anthony Alford: Exactly.

Roland Meertens: Yes. Interesting.

Mentioned:

About the Authors

Anthony Alford

Anthony is a Senior Director, Development at Genesys where he is working on several AI and ML projects related to customer experience. He has over 20 years experience in designing and building scalable software. Anthony holds a Ph.D. degree in Electrical Engineering with specialization in Intelligent Robotics Software and has worked on various problems in the areas of human-AI interaction and predictive analytics for SaaS business optimization.

Show moreShow less

Roland Meertens

Roland is a ML Engineer working on Computer Vision for self driving cars. Previously he worked on social media platforms, deep learning approaches for natural language processing (NLP) problems, social robotics, and computer vision for drones.

Show moreShow less

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and YouTube. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.