question

Scot Hauder 3 avatar image
Scot Hauder 3 asked

The 'Nearest Store' SQL Problem

Here ya go Bob!

Black Friday is here and the marketing director for ZYX company wants to create a mass mail campaign to promote its new Wizbang2010 product. She wants to target 100,000 of her customers in the US and Canada whom, through data mining, are most likely to buy. A brochure along with a letter listing the three nearest stores will be mailed to each customer. You are already formulating a query in your head and think: no problem--but of course she has additional requirements.

Every one of her targeted customers are fiscally prudent (cheap) and will travel to adjacent states/provinces or even across the border (either direction, US <->Canada) to take advantage of lower sales tax rates. Customers will drive at most 10 miles to an adjacent state/province that has a lower sales tax than the state/province in which they reside. In addition, customers will drive at most 30 miles to a state with no sales tax.

Knowing this, the marketing director would like to accommodate each customer by listing these stores on the letter, even if there is a store closer within their own state. Most customers will simply have all three of their physically nearest retailers on the letter, those along borders may not.

Business Rules:

The precedence for consideration of a customer's three "closest" stores for imprinting on the letter is as follows:

  1. One or more stores in a state with no sales tax and are within 30 miles of the customer (whom resides in a state/province with sales tax)
  2. One or more stores in a state/province with lower sales tax and are within 10 miles of the customer (whom resides in a state/province with a higher sales tax)
  3. One or more stores physically closest to the customer

Given this, it could be the case where there exists a store 1 mile away in the same state as the customer and 3 stores 9 miles away in (one or more states) with no sales tax (or lower sales tax) Those 3 stores should be on the letter not the one(s) physically closest.

The EUR/USD is currently trading at $1.06, normally ZYX's US customers take advantage of the loonie's buying power but most Canadian provinces levy taxes greater than 10% so we will not consider that for this promotion.

Everything is going First Class mail (ZYX does not have a mail management app and delivery point data is not available for these customers) so the ordering should be by PostalCode, CustomerID, Nearest StoreID, TaxRate asc.

The result set should have 300,000 rows and the following columns: CustomerID,Customer.City,Customer.PostalCode,Customer.Country,StoreID,Store.City, Store.PostalCode,Store.Country,Distance(miles). For those customers who travel to a store in their own postal code, assume they are exactly 1 mile away from the store.

No calling Google maps api or the like. Use the geography/geospatial functionality of SQL Server or create your own Haversine function. We only care about great circle distance, not road distance. Assume the Earth is perfectly spherical.

The db tables are owned by a third-party app so the existing columns cannot be altered (you may add any columns needed). You may add any non-clustered indexes needed.

Finally, we would like to do some analysis on sales tax revenue and fuel consumption. We need one report on total revenue by state/province and the net gain/loss from customers crossing borders. So the report should have three columns: [State/Province],[TotalSalesTaxRevenue],[Net Gain/Loss] Ordered by Net Gain/Loss descending. The Wizbang2010 goes for $1000 and every customer will purchase one (give that data miner a raise)

To get a feel for the impact on the environment, as a direct result of this promotion, a second query should return one row showing the grand total miles traveled(round trip), total gallons of fuel burned(round trip), and the postal code & distance(one way) of the poor soul(s) who had to travel the farthest. Those with vehicles getting >=35MPG (the environmentally righteous elite) will always travel to the physically nearest store [printed on their letter] everyone else will opt for the first store listed.

You know you are the best at your job and you would never tell marketing something can’t be done, right? This is a toy problem so don’t get distracted by the incomplete postal code list, imprecise tax rates or horrid db design.

Thank you in advance for doing my homework and have a great Thanksgiving! You can download the data and table scripts Here 7.62MB

Cheers,

Scot

spatialchallenge
5 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Scot Hauder 1 avatar image Scot Hauder 1 commented ·
Thanks for the 1 up vote who ever you are!! N.B. not really my homework--we never did anything this fun in school.
0 Likes 0 ·
Gustavo avatar image Gustavo commented ·
This is indeed a challange, Maybe if you split this challange into parts would make it easier. Never worked with geo-spatial data, so this could be a good start.
0 Likes 0 ·
Scot Hauder 3 avatar image Scot Hauder 3 commented ·
Hi Bart, glad to have you aboard! You'll have fun with this one. I had to solve this for a client years ago, before the geography types. I can honestly say it was the only time I've used Trig outside of a classroom setting. I thought it would be a good time to revisit this using SQL2008. In addition I think others will benefit from seeing how it is done.
0 Likes 0 ·
Scot Hauder 3 avatar image Scot Hauder 3 commented ·
0 Likes 0 ·
RBarryYoung avatar image RBarryYoung commented ·
You should ask one question, not three or four.
0 Likes 0 ·
Scot Hauder 4 avatar image
Scot Hauder 4 answered

It's a challenge, not a question. Not urgent, no time limit, just for learning. ps if your favorite vehicle isn't in the list, reply back, I'd like to know what it is!

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

lmu92 avatar image
lmu92 answered

Hi Scot,

I'd like to give it a try but there are a couple of things that confuse me: "the ordering should be by PostalCode, CustomerID, Nearest StoreID, TaxRate asc" This will cause the nearest store to be shown at position 1.

This will make your requirement "...will always travel to the physically nearest store [printed on their letter] everyone else will opt for the first store listed." become obsolete since the physically nearest store always will be the first one.

Is it intentionally to have PostalCodes data in there with latitude = longitude = Zero?? That totally messes up any results... or why should someone from Miami, FL or Fort Worth, TX travel to Canada (Cappahayden, NL, "nearest" with distance of more than 4000 miles) as the nearest store?

What is the business rule if more than one shop have exactly the same distance (e.g. the 1751 Postalcodes for latitude = 45.959060 AND longitude = -66.667370)? Which ones should be selected?

Are there any special requirements regarding the method to calculate the distance? (I know about three different version having different accuracy)

What is the business rule to select the store the customer will go to to purchase the Wizbang2010?

Overall, it's a nice challenge idea but it needs some fine tuning. And, to second Barry and Gustavo - Bart: the number of questions should be reduced.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Scot Hauder 5 avatar image
Scot Hauder 5 answered

Hi Imu92, glad to have you on board!

Q:I'd like to give it a try but there are a couple of things that confuse me: "the ordering should be by PostalCode, CustomerID, Nearest StoreID, TaxRate asc" This will cause the nearest store to be shown at position 1.

This will make your requirement "...will always travel to the physically nearest store [printed on their letter] everyone else will opt for the first store listed." become obsolete since the physically nearest store always will be the first one.

A:"Nearest StoreID" is not always the physically closest: Business Rules:

The precedence for consideration of a customer's three "closest" stores for imprinting on the letter is as follows:

1 One or more stores in a state with no sales tax and are within 30 miles of the customer (whom resides in a state/province with sales tax) 2 One or more stores in a state/province with lower sales tax and are within 10 miles of the customer (whom resides in a state/province with a higher sales tax) 3 One or more stores physically closest to the customer...

Q:s it intentionally to have PostalCodes data in there with latitude = longitude = Zero?? That totally messes up any results... or why should someone from Miami, FL or Fort Worth, TX travel to Canada (Cappahayden, NL, "nearest" with distance of more than 4000 miles) as the nearest store?

A:No, these 2 postal code records and store 2077 can be deleted. I was not aware of them, they should not be zero. My apologies.

Q:What is the business rule if more than one shop have exactly the same distance (e.g. the 1751 Postalcodes for latitude = 45.959060 AND longitude = -66.667370)? Which ones should be selected?

A:Select the first postal code when ordering by postal code, so in your example select the store #6467 in E3A 6B4. The other Canadian stores with the same distance should not be selected.

Q:Are there any special requirements regarding the method to calculate the distance? (I know about three different version having different accuracy)

A:No, and I realize this probably will produce slightly different result sets depending on the accuracy of the function, even using a more accurate radius for the Earth might produce slight variances. That is ok. I've already solved this using a couple different functions so what I'm really looking for is a solution using SQL server's geospatial functionality. This will be a good learning tool for those that do not get exposure to this aspect of SS on a day-to-day basis.

Q:What is the business rule to select the store the customer will go to to purchase the Wizbang2010?

A: ...Those with vehicles getting >=35MPG (the environmentally righteous elite) will always travel to the physically nearest store [printed on their letter] everyone else will opt for the first store listed... Basically this means that the environmentally conscience will always travel to the physically nearest store (regardless if it is listed 1st, 2nd or 3rd on the letter) in order to conserve fuel no matter how much they might save in taxes by traveling to a more distant store. Everyone else will go to the first store listed, which may be the physically closest or it may be a little further if they can save on taxes and it is within the distance stated in the problem (up to 10 miles for lower sales tax and up to 30 miles for zero sales tax)

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.