question

siera_gld avatar image
siera_gld asked

Deduplication of records with continous dates

enter code here I need to remove records from a price history where there is not a gap in dates and have the same price. ITEM PRICE EFF_DT END_DT ACTV_IND 1234 10.00 2011-01-01 00:00:00.000 2011-03-31 00:00:00.000 N 1234 10.00 2011-04-01 00:00:00.000 2011-07-31 00:00:00.000 N 1234 10.00 2011-08-01 00:00:00.000 2099-01-01 00:00:00.000 Y 5678 20.00 2011-01-01 00:00:00.000 2011-03-31 00:00:00.000 N 5678 20.00 2011-04-01 00:00:00.000 2011-07-31 00:00:00.000 N 5678 20.00 2011-08-01 00:00:00.000 2099-01-01 00:00:00.000 Y 9876 5.00 2011-01-01 00:00:00.000 2011-03-31 00:00:00.000 N 9876 5.00 2011-04-01 00:00:00.000 2011-07-31 00:00:00.000 N 9876 5.00 2011-08-01 00:00:00.000 2099-01-01 00:00:00.000 Y So in this record set I need to just have one line that shows item 1234 with a price of 10.00 from 2011-01-01 to 2099-01-01 and an Active indicator of 'Y'. This table also have normal one record items so it's just these cases where there are multiple records. Create table #Price_Hist ( ITEM CHAR (4), PRICE MONEY, EFF_DT DATETIME, END_DT DATETIME, ACTIVE_IND CHAR (1) ) INSERT INTO #Price_Hist VALUES ('1234', 10.00, '2011-01-01', '2011-03-31', 'N') INSERT INTO #Price_Hist VALUES ('1234', 10.00, '2011-04-01', '2011-07-31', 'N') INSERT INTO #Price_Hist VALUES ('1234', 10.00, '2011-08-01', '2099-01-01', 'Y') INSERT INTO #Price_Hist VALUES ('5678', 20.00, '2011-01-01', '2011-03-31', 'N') INSERT INTO #Price_Hist VALUES ('5678', 20.00, '2011-04-01', '2011-07-31', 'N') INSERT INTO #Price_Hist VALUES ('5678', 20.00, '2011-08-01', '2099-01-01', 'Y') INSERT INTO #Price_Hist VALUES ('9876', 5.00, '2011-01-01', '2011-03-31', 'N') INSERT INTO #Price_Hist VALUES ('9876', 5.00, '2011-04-01', '2011-07-31', 'N') INSERT INTO #Price_Hist VALUES ('9876', 5.00, '2011-08-01', '2099-01-01', 'Y')
t-sqldeduplication
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Magnus Ahlkvist avatar image
Magnus Ahlkvist answered
I don't have SQL Server on this machine, so this is untested. ;WITH ActiveItems (ITEM, PRICE) AS( SELECT ITEM, PRICE FROM #PRICE_HIST WHERE ACTIVE_IND='Y' ) SELECT ITEM, PRICE, MIN(EFF_DT_DATETIME) AS StartDate, MAX(END_DT) AS EndDate, 'Y' AS ACTIVE_IND FROM #PRICE_HIST PH WHERE EXISTS(SELECT * FROM ActiveItems AI WHERE AI.ITEM=PH.ITEM AND AI.PRICE=PH.PRICE) GROUP BY ITEM, PRICE
3 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Magnus Ahlkvist avatar image Magnus Ahlkvist commented ·
As far as I could read the question, all that was to be preserved was the active prices. But you're right in a way - if price is 10, goes up to 12 and then back to 10, my method will fail, or at least present the wrong from-date. Regarding date gaps, it looks on the test data that the dates are gap-free.
1 Like 1 ·
Phil Factor avatar image Phil Factor commented ·
You're not checking for a gap in dates. Your method will lose all record of price changes!
0 Likes 0 ·
Phil Factor avatar image Phil Factor commented ·
Sorry. I was quoting from siera_gid 'where there is not a gap in dates and have the same price'. I took this to mean that, if the price was different for a while, that was, effectively, a gap in the history of the price. Where, for example, an invoice has to be generated, or the value of stock at any particular moment needs to be preserved, then so does the history of the price. De-duplication should remove contiguous records where the item and the price are the same but it should not do so if there was a fluctuation in the price, causing a 'gap'.
0 Likes 0 ·
Phil Factor avatar image
Phil Factor answered
This isn't really an answer, more to show you where to find the answer! This is a variation on the 'Gaps and Islands' problem. Here is a modified test data where I've added a couple more records that will add some price changes that need to be preserved. Itzik Ben-Gan provides several solutions in his chapter in the 'deep dives' book. (SQL Server MVP Deep dives) The task is to consolidate those contiguous records that show the same price and item In this application, contiguous means DATEDIFF(day,current_one.END_DT, next_one.EFF_DT)=1 Create table #Price_Hist ( ITEM CHAR (4), PRICE MONEY, EFF_DT DATETIME, END_DT DATETIME, ACTIVE_IND CHAR (1) ) INSERT INTO #Price_Hist VALUES ('1234', 10.00, '2011-01-01', '2011-03-31', 'N') INSERT INTO #Price_Hist VALUES ('1234', 10.00, '2011-04-01', '2011-5-31', 'N') INSERT INTO #Price_Hist VALUES ('1234', 9.00, '2011-06-01', '2011-07-31', 'N') INSERT INTO #Price_Hist VALUES ('1234', 10.00, '2011-08-01', '2099-01-01', 'Y') INSERT INTO #Price_Hist VALUES ('5678', 20.00, '2011-01-01', '2011-03-31', 'N') INSERT INTO #Price_Hist VALUES ('5678', 20.00, '2011-04-01', '2011-05-31', 'N') INSERT INTO #Price_Hist VALUES ('5678', 21.00, '2011-06-01', '2011-07-31', 'N') INSERT INTO #Price_Hist VALUES ('5678', 20.00, '2011-08-01', '2099-01-01', 'Y') INSERT INTO #Price_Hist VALUES ('9876', 5.00, '2011-01-01', '2011-03-31', 'N') INSERT INTO #Price_Hist VALUES ('9876', 5.00, '2011-04-01', '2011-07-31', 'N') INSERT INTO #Price_Hist VALUES ('9876', 5.00, '2011-08-01', '2099-01-01', 'Y')
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

siera_gld avatar image siera_gld commented ·
very cool function but won't work in this case...
0 Likes 0 ·
Phil Factor avatar image Phil Factor commented ·
It isn't an answer. I've just added some more test data to illustrate that Magnus's answer will not preserve previous price changes for the same item
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.