question

GaryWheeler avatar image
GaryWheeler asked

Finding Distances between Lat/Long using STDistance

I have two tables that each have a latitude and Longitude. One table has 1691 (TableA) records and the other 5778 (TableB) records. Each record has a unique latitude and Longitude. I need to found the distance between every point in TableA to That of TableB and only display any records where the distance is only 10 meters or less apart. I'm currently using TSQLs geography function STDistance to do this. The whole process takes about 50 minutes. Is there a faster way? I'm using a cross join between the two tables so the ending record set is 9,770,598 records that need to be processed. Any help would be appreciated.,
stored-procedurestsqlgeographygeometrylatitude-longitude
4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

GaryWheeler avatar image GaryWheeler commented ·
The numbers listed here are a very limited testing set. TableA actually has close to 100000 and TableB over 3,000,000. This process will need to run every night to compare the new records added to TableB to TableA. As a record within the 10 meters is found in TableB then the matching recorded in TableA is marked Verified. New records can get added to TableA as well and those will need to be checked against TableB. So as you can see, this is a long term process. Again, any help would be appreciated.
1 Like 1 ·
GPO avatar image GPO commented ·
Interesting question! If this processing is largely a one-off exercise then 50 minutes is arguably not too high a price to pay. What I mean is that once you've calculated the distance between point x and point y you theoretically never need to calculate it again, they'll always be the same distance apart (plate tectonics notwithstanding). Store the output of your priming round of cals (the ids of pairs less than 10m apart) in a permanent table. Would it be fair to say that there are a lot fewer than 10 million pairs that meet the 10m criteria? As new rows get added to table A and Table B they can have their calcs done incrementally and the output added to the third table. Obviously I suggest this without having any idea how many (if any) rows are going to be added to table A and B over time.
0 Likes 0 ·
GPO avatar image GPO commented ·
@Grant-Fritchey How do we get this moved out of the Meta-Askssc questions? Seems to be in the wrong spot.
0 Likes 0 ·
GaryWheeler avatar image GaryWheeler commented ·
I would love to move this to the correct area. This area was the only choice I was given when I added the post.
0 Likes 0 ·

1 Answer

·
GPO avatar image
GPO answered
**Disclaimer** Until 2 hours ago I'd never heard of, let alone used the ST* geometry functions. What follows might be useless to you. Apologies in advance if it is. What I think we have to do is limit the amount of unnecessary use of the geometry functions. Without seeing your code, my hunch is that that is where your bottleneck is. Comparing everything to everything else when you're only interested in those locations that are less than 10m apart seems excessive. If there was a way we could do a preliminary pass over the data to narrow down the focus and knock out say 90% (or whatever) of the comparisons that would obviously speed things up. And perhaps we can. From what I can gather latitudes range from -90 to +90. Longitudes range from -180 to +180. There seems to be about 110 or 111 kms distance between one line of latitude and the next. Line of longitude converge as they approach the poles but appear to be about 100kms apart at the equator and about 47km apart and the arctic and antarctic circles. A fair bit more than 10m in any event. If you store the integer component of each set of coordinates (as smallint columns for example) then you can join your tables and limit each pair to be compared using geometry functions to only those that are +/- 1 degree lat and +/- 1degree long of each other. I'll go out on a limb here and say smallint comparisons in an inner join are going to be a bit faster than geometry functions in a cross join. Without a representative sample of your table A and table B data I can't tell whether my suggested solution should be taken literally but the idea might nevertheless be refined and customised to your specific circumstances. For instance, say your entire area of interest will only ever be within the lat/long rectangle (20 20, 20 0, 0 0, 0 20) then you might want to store the numeric versions of your coordinates as numeric(5,2) or whatever and narrow down your comparisons as required. I had a play around with AdventureWorks2012 using person.address to create two temp tables with the same number of rows as yours, and the number of geometry function comparisons required dropped from and 9.5 million to about 650,000 at +/- 1 degree. The following code (excluding temp table creation) took about 2 minutes 45 seconds to run on my laptop but that's possibly no relevance to you because it's not clear what other processing you have to do. IF OBJECT_ID('tempdb..#addresses_a') IS NOT NULL DROP TABLE #addresses_a; SELECT TOP 1691 a.addressid ,a.SpatialLocation ,cast(a.SpatialLocation.Lat AS smallint) AS latitude_integer ,cast(a.SpatialLocation.Long AS smallint) AS longitude_integer INTO #addresses_a FROM Person.[Address] a ORDER BY a.AddressID ASC IF OBJECT_ID('tempdb..#addresses_b') IS NOT NULL DROP TABLE #addresses_b; SELECT TOP 5778 b.addressid ,b.SpatialLocation ,cast(b.SpatialLocation.Lat AS smallint) as latitude_integer ,cast(b.SpatialLocation.Long AS smallint) AS longitude_integer INTO #addresses_b FROM Person.[Address] b ORDER BY b.AddressID DESC SELECT aa.SpatialLocation.STDistance(ab.SpatialLocation) FROM #addresses_a AS aa JOIN #addresses_b AS ab ON aa.latitude_integer between ab.latitude_integer - 1 and ab.latitude_integer + 1 AND aa.longitude_integer between ab.longitude_integer - 1 and ab.longitude_integer + 1 Note that the above probably won't work properly for that line that runs up the Pacific (-180/+180) so you'll miss coordinates <10m apart where they span this line, but for that pass you could write a specific test.
5 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
I agree with @GPO basic assessment, reduce the number of places where you need to run the spatial functions as a first step. The second step would be to make sure you have a spatial index in place and that it's being used. You'll have to experiment with grid sizes and distribution for your data.
2 Likes 2 ·
GaryWheeler avatar image GaryWheeler commented ·
That is actually an interesting concept that I hadn't thought of. I will see what I can do with that.
0 Likes 0 ·
GPO avatar image GPO commented ·
Let us know how you get on. Best of luck.
0 Likes 0 ·
GaryWheeler avatar image GaryWheeler commented ·
Thank you Grant Fritchey. I had never heard of "Spatial Indexes" until I read your post. That took my run time from 59 minutes down to just 12 seconds. Yes, I said seconds. We are good to go now and thank you all for the help.
0 Likes 0 ·
Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
Excellent. Be sure to do the experimentation with the different grid sizes as I mentioned. You'll find that you may need different indexes depending on if the query is including or excluding more data. Spatial indexes get pretty weird.
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.