The Suggested Shopping List Problem...

Imagine a web-based shop where users define product filters like: "its red", "its in the kitchen" or "has the word 'auto' on the name"; and every criteria has a weight that contributes to the final order of the suggested products.

After every criteria added/removed, the system should recalculate and report the top products to any online participant of that list.

This app is usefull when friends dont know what gift to give, and based on general tips the system try to guess the most suited gifts.

Right now i am saving a copy of the product key into a table ( that cant be #temporary or @variable ) and for each new criteria i am adding a new column, which is filled according to the matches it find. ( Upon removal of the criteria, i just update those column with zeros ).

Criterias can be added by multiple users on several days, and that shopping "suggestion" list stays on the database until its finalized ( or after 90 days of inactivity ).

The product list can grow up to millions of products ( slow insert/update statements ), and we have several Full Text Search enabled ( slow select statements ). Scale this search for up to a hundred concurrent users per list and thousands of lists enabled in paralel.

Unfortunatelly this approach is high cpu/io intensive and requires a LOT of disk space. I have spent nights trying to find ways to implement a better solution. Then i find some material about "Long Transactions" and become curious.

A "Long Transaction" can be defined as a standard transaction that can be "paused" and continued after a long period of time, even if that user disconnected and reconnected. It also enables someone to start a transaction and let someone else to complete it.

This concept is interesting in several aspects, but basically it can transform any web-based application into a client-server one, thus avoiding tricks like cookies, "fat" session variables, lots of temporary tables and several others.

  • Does SQL Server ( or any other database ) implements "Long Transactions" ? If not, what could be done to implement anything similar to this in the most automatic way.

  • What other solutions could you provide to solve The Suggested Shopping List Problem ?

Thanks. ( and sorry for my english, i am not a native speaker ).

more ▼

asked Oct 20 '09 at 03:55 PM in Default

Gustavo gravatar image

592 4 4 7

(comments are locked)
10|1200 characters needed characters left

1 answer: sort voted first

Edit: added weight to suggestion list criteria cross table and provided report example

First, the long transaction concept sounds scary. Transactions are used to isolate a batch of changes by performing all necessary locks, making the changes and then deciding to commit or roll back the entire batch. Keeping the transaction open for a long period of time or allowing a different user to continue a transaction both sound like they would cause much bigger problems than they would solve.

What about this database design:

  product_id int, name varchar(100), description varchar(max)

  criteria_id int, criteria_text varchar(100)

CREATE TABLE suggestion_list
  suggestion_list_id int, list_owner_id int, name varchar(100), description varchar(max)

CREATE TABLE product_X_criteria
  product_id int, criteria_id int

CREATE TABLE suggestion_list_X_criteria
  suggestion_list_id int, criteria_id int, weight decimal(4,2)

The last 2 tables are cross-tables for many-to-many relationships. A criteria administrator would be responsible for putting together the list of valid criteria. A product administrator UI would allow them to add or remove criteria that a product meets. An administrator for a given suggestion list (determined by the list owner id referencing back to a users table) would be able to add or remove criteria for that list.

Here's an example of a report:

SELECT TOP(10) p.name AS [Product Name], SUM(slxc.weight) AS [Weighted Score] FROM products p  INNER JOIN product_X_criteria pxc  ON pxc.product_id = p.product_id  INNER JOIN suggestion_list_X_criteria slxc  ON slxc.criteria_id = pxc.criteria_id  INNER JOIN suggestion_list sl  ON sl.suggestion_list_id = slxc.suggestion_list_id GROUP BY p.name ORDER BY SUM(slxc.weight) DESC 
more ▼

answered Oct 20 '09 at 04:33 PM

Tom Staab gravatar image

Tom Staab
5.8k 6 8 10

Thanks for your answer Tom,

Indeed a normalized aproach seems a reasonable solution, Up for that solution.

But i predict some future problems like: - the criteria list could really grow incredible fast, since anything the user types could be a new criteria. - It would require a product_X_criteria_X_list table to store the weight of that analysis to be used on the report. - Every new list still requires all available product keys to be inserted into the cross-tables. - the product_X_criteria_X_list could easily reach 500Gb today, making it the next very intense bottleneck.
Oct 20 '09 at 05:54 PM Gustavo

...and BTW, a "Long Transaction" seems scary but reading more about it i started to think its not impossible.

Obviously the name maybe shouldn't refer to transactions, but imagine a user that owns his a list, he can manage it like he wants, add, remove, undo and whatever as he wish isolated from the rest of the system.

And its because of this isolation that his data could be frozen in time, no one else could touch it.
Oct 20 '09 at 06:03 PM Gustavo

I recommend controlling the criteria list via moderators or a limited group of users who can edit the list.

A new list would not need every product. In fact, you could limit lists to a maximum product count. Do you really care what the 68th best product is for the list criteria?

I agree you might want weighted criteria for the lists, but the weight could be added to suggestion_list_X_criteria. Maybe I'll post a new answer elaborating on that.
Oct 20 '09 at 06:28 PM Tom Staab

All products should be used on the analysis because a new criteria could make the last product to become the top 1 at any time. But indeed, the report just need a few products.

But i will implement your idea on a development environment and let you know the results.
Oct 21 '09 at 08:24 AM Gustavo
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.



asked: Oct 20 '09 at 03:55 PM

Seen: 1602 times

Last Updated: Oct 20 '09 at 03:55 PM