question

PGrey avatar image
PGrey asked

How to extract just the results with numbers excluding dates from a comments field

From a free text field I'm trying to count those with just numbers or number and dates but dates only. Hope the text code helps to explain what I'm trying to solve. DECLARE @TestData AS TABLE (ID INT, [Results] VARCHAR(50), [To Be Included] VARCHAR(10)) INSERT INTO @TestData(ID, [Results], [To Be Included]) VALUES(2222,'Date 16/11/2017 No results found ','No'),(7777,'DATE 03/01/2018 39mmol/mol','Yes'),(9999,'35mmol','Yes') SELECT td.ID, td.[Results], td.[To Be Included], CASE WHEN PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9]%',[Results])>0 OR ([Results] LIKE '%[0-9]%' AND PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9]%',[Results])>0) THEN 'Yes' WHEN PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9]%',[Results])>0 AND [Results] LIKE '%[0-9]%' THEN 'A' WHEN PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9]%',[Results])>0 AND [Results] NOT LIKE '%[0-9]%' THEN 'C' WHEN [Results] LIKE '%[0-9]%' THEN 'Yes' ELSE 'ZZ' END [My Answer 1], CASE WHEN (PATINDEX('%[0-9]%',[Results])>0 AND PATINDEX('%[0-9]/%',[Results])=0) THEN 'Yes' ELSE 'No' END [My Answer 2] FROM @TestData AS td
searchpatindex
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

1 Answer

· Write an Answer
Usman Butt avatar image
Usman Butt answered
I am not quite sure but assuming that the date format will be the same, then one way of handling it could be to remove all occurrences of dates with the specified format and then check if there is still a number in the string. A function which will do the removal process will make things slightly easier to understand. The following code is doing what is required at the moment. A function is to be created which takes two parameters 1. The input string 2. The pattern of the date (can be changed if needed) From the above requirement it is evident that the the date pattern is 10 characters long and hence have been hard coded in the function. use tempdb go CREATE FUNCTION [dbo].[RemoveDatePatternFromString] --===== Define I/O parameters ( @pString VARCHAR(8000), @pPattern VARCHAR(200) ) RETURNS TABLE WITH SCHEMABINDING AS RETURN --===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000... -- enough to cover VARCHAR(8000) WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), --10E+1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front -- for both a performance gain and prevention of accidental "overruns" SELECT TOP (ISNULL(LEN(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ), cteDatePatternStartEnd(N1, N2) AS ( --== This provides the starting and Ending positions of Date string SELECT N, N + 9 --Assuming always 10 characters long pattern FROM cteTally WHERE PATINDEX(@pPattern, SUBSTRING(@pString COLLATE Latin1_General_BIN, N, 10)) > 0 ), cteWithoutDatePatternN AS( SELECT N FROM cteTally CROSS JOIN cteDatePatternStartEnd WHERE NOT(N BETWEEN N1 AND N2) ) SELECT StringWithoutDatePattern = (SELECT SUBSTRING(@pString , N, 1) FROM cteWithoutDatePatternN FOR XML PATH(''), TYPE).value('text()[1]', 'varchar(8000)'); go DECLARE @TestData AS TABLE (ID INT, [Results] VARCHAR(50), [To Be Included] VARCHAR(10)) INSERT INTO @TestData(ID, [Results], [To Be Included]) VALUES(2222,'Date 16/11/2017 No results found ','No'), (7777,'DATE 03/01/2018 39mmol/mol','Yes'), (9999,'35mmol','Yes') select *, case when ([Results] LIKE '%[0-9]%' AND StringWithoutDatePattern is null) or StringWithoutDatePattern like '%[0-9]%' then 'yes' else 'no' end from @TestData AS td OUTER APPLY [dbo].[RemoveDatePatternFromString](td.Results, '%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%') Last but not least, such problems can be solved more easily with CLR/.NET code. The above mentioned TSQL solution have room for improvement both in accuracy and performance. Since you did not mention the SQL version, I have tried to come up with a solution which should run on SQL2k5+ versions. Moreover, it may not be covering all the scenarios, so your testing is what is more important than anything else. Good luck with that.
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.