question

steveA avatar image
steveA asked

Extracting latest date from field with multiple records

I have 1000's of rows where I want to extract the latest date from a field containing multiple records with a separator, 2 examples: ROW 1: I=1;A=43.21;D=20090115|I=2;A=146.01;D=20090210|I=3;A=0.02;D=20090309|I=4;A=110.76;D=20090420 ROW 2: I=68;A=730.71;D=20090610|I=71;A=0.01;D=20090617|I=69;A=2798.56;D=20090617|I=70;A=0.01;D=20090617|I=73; A=499.96;D=20090618|I=75;A=175.43;D=20090630|I=76;A=6033.06;D=20090717|I=77;A=2706.27;D=20090720 There can be an unlimited records within the field. The records are separated with a '|' and the values within the record are separated with ','. The 'D=' value is the date. I want to extract the latest date for each row and update a "smalldatetime" field with the result. The latest date is NOT necessarily the last record
record
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image
Oleg answered
I would like to join Timothy in raising concern about doing it with T-SQL. However, if for whatever reason T-SQL is to be used then here is the solution. I will explain how it works first. The assumption is that you have a table where each row is uniquely identified by some column. Each such row has a varchar(max) column holding the pipe-delimited data and it also has some column of a smalldatetime type, which needs to be updated with the value corresponding to the largest datetime value in the varchar(max) column. Let's create some sample table and populate it with a couple of records: create table dbo.Sample ( record_id int identity(1, 1) not null constraint PK_Sample_ID primary key clustered, data varchar(max) not null, date_time smalldatetime null ); go insert into dbo.Sample (data) values ('I=1;A=43.21;D=20090115|I=2;A=146.01;D=20090210|' + 'I=3;A=0.02;D=20090309|I=4;A=110.76;D=20090420'); insert into dbo.Sample (data) values ('I=68;A=730.71;D=20090610|I=71;A=0.01;D=20090617|' + 'I=69;A=2798.56;D=20090617|I=70;A=0.01;D=20090617|' + 'I=73;A=499.96;D=20090618|I=75;A=175.43;D=20090630|' + 'I=76;A=6033.06;D=20090717|I=77;A=2706.27;D=20090720'); If you can create a table-valued function which will take the value from **data** column, split it using pipe character as delimiter and extract just the datetime value from each split item then you are almost done. Here is the function: create function dbo.ExtractDatePart(@input varchar(max)) returns @t table (date_value smalldatetime) as begin declare @xml xml; set @xml = '' + replace(@input, '|', '') + ''; insert into @t select cast(right(item.value('text()[1]', 'varchar(max)'), 8) as smalldatetime) date_value from @xml.nodes('//r') R(item); return; end; go Now, if you call this function for, say, data value in the first record, we will get 4 records returned, and if you call it cross applied with the whole table, you will get 12 records because there are 4 datetime values buried in the first and 8 in the second record: select record_id, date_value from dbo.Sample cross apply [dbo].[ExtractDatePart](data) order by date_value; -- results: record_id date_value ----------- ----------------------- 1 2009-01-15 00:00:00 1 2009-02-10 00:00:00 1 2009-03-09 00:00:00 1 2009-04-20 00:00:00 2 2009-06-10 00:00:00 2 2009-06-17 00:00:00 2 2009-06-17 00:00:00 2 2009-06-17 00:00:00 2 2009-06-18 00:00:00 2 2009-06-30 00:00:00 2 2009-07-17 00:00:00 2 2009-07-20 00:00:00 Data from the snippet above grouped by record_id and joined to the table will allow you to update all your records in one set-based statement. Here is the statement: update dbo.Sample set date_time = t.date_value from dbo.Sample inner join ( select record_id, max(date_value) date_value from dbo.Sample cross apply [dbo].[ExtractDatePart](data) group by record_id ) t on dbo.Sample.record_id = t.record_id; While the script above is a working solution (all you need is a script to create function and the final update statement), I still do not agree with abusing T-SQL with tasks it does not deserve to perform. There are other languages which should handle string manipulations. Oleg
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

TimothyAWiseman avatar image
TimothyAWiseman answered
First, are you certain you want to do this in T-SQL? Doing this in C# is much easier, and doing it in Python is practically trivial. If you use a .NET language you can even import it into SQL Server and call it as an external stored procedure. While I am a huge fan of T-SQL it is not a general purpose programming language, it is a query language and does not handle strings nearly as well as more general purpose languages. If you really want to do it in T-SQL, you can use a function like this: create function LatestDate (@string nvarchar(4000)) returns smalldatetime begin declare @dates table (datecol smalldatetime) declare @maxdate smalldatetime declare @cur int select @cur = charindex('D=', @string) while @cur > 0 begin if isdate(substring(@string, @cur + 2, 8)) = 1 insert into @dates select substring(@string, @cur + 2, 8) select @cur = charindex('D=', @string, @cur+1) end select @maxdate = max(datecol) from @dates return @maxdate End I quickly tested this with your two example rows, but I have not done extensive testing and I have not done optimizations. In particular, if you have a tally table, then you can use that to do the split more effectively then using this loop. Jeff Moden describes using a tablly table this way at [ http://www.sqlservercentral.com/articles/T-SQL/62867/][1] . Depending on your exact use case you may also want to look at using a TVF instead of a scalar function. [1]: http://www.sqlservercentral.com/articles/T-SQL/62867/
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.