question

Phil avatar image
Phil asked

Counting matches and finding index

First off, this is my first time posting on this forum. I've got a database with a long list of strings, what I'd like to do is find how many of these strings contain other strings in the same table, and if they are sorted in descending order, what's it's index within that sub table?

For instance, if I have the strings a, await, wait, waiting, walk then the index for wait should be 2 (or 1 if we start counting from 0).

I was able to count how many occurrences there were (in this case 3).

Each name in the table is unique.

The query only needs to be run once every few months and has a list of about 2000 string.

aggregatessub-querystringrownum
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Phil avatar image
Phil answered

I thought up something that I've done before using SQL and Excel. I used the following query: Select B.name, A.name From types as A, types as B WHERE A.name LIKE '%' + B.name + '%' ORDER BY B.name, A.name Which gave me the list of As which matched something in B. Then I exported to Excel and computed a new column. This column would start at 0 if the row above it containing A was not equal to the current rows A. Otherwise, it would add one. I then create another column which is either 1 if A and B of the current row match, otherwise it is empty. By sorting by this new column, and deleting all other rows with no entries on this column, I am left with a list of search strings and indexes, which is exactly what I want. It's a bit of a hack but considering the infrequency I have to do this, I'm fine with this solution. If there's a better way through SQL only though, I'd love to hear it.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

nigelrivett avatar image
nigelrivett answered
select a.string, num = count(*)            
from tbl a            
join tbl b            
on b.string like '%' + a.string + '%'            

Shouldn't wait be 3 in your sample? await, wait, waiting all contain wait.

This won't be quick.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Benjamin avatar image
Benjamin answered

This answer requires SQL Server 2005 or above. I'm not sure of the exact output you were looking forward, but I'm sure you can tweak that having this solution.

I didn't know the structure of your table so I can only present an example solution and not the exact solution with your table name and field names. Skip down to the SELECT portion to see how this works.

--create a table to hold the terms to search

CREATE TABLE [dbo].[Terms] (            
    [Word] varchar(30) NOT NULL )            

--add a unique key that is clustered (for default ordering purposes)

CREATE UNIQUE CLUSTERED INDEX CX_Terms            
    ON [dbo].[Terms] ( [Word] )            

--add some values

INSERT INTO [dbo].[Terms] VALUES ('a')            
INSERT INTO [dbo].[Terms] VALUES ('await')            
INSERT INTO [dbo].[Terms] VALUES ('wait')            
INSERT INTO [dbo].[Terms] VALUES ('waiting')            
INSERT INTO [dbo].[Terms] VALUES ('walk')            

--view the list of terms

SELECT [Word] FROM [dbo].[Terms]            

--view the list of terms with the accompanying terms they are found in, the count and the

index position in the sub-table            
SELECT            
    a.[Word] AS [Search Word],            
    b.[Word] AS [Found in Word],            
    COUNT(b.[Word]) OVER (PARTITION BY a.[Word]) AS [Count], --count of word containing search text including the word you are searching            
    ROW_NUMBER() OVER (ORDER BY a.[Word]) - RANK() OVER (ORDER BY a.[Word]) AS [Index] --the index of the word in the ordered list, 0=the word itself 1=first word which contains the searched word, etc.            
FROM            
    [dbo].[Terms] a            
INNER JOIN            
    [dbo].[Terms] b            
ON            
    CHARINDEX(a.[Word], b.[Word]) > 0            
    --a.[Word] <> b.[Word] --don't match the same word to itself            
ORDER BY            
    a.[Word],            
    b.[Word]            

/*--results

Search Word	Found in Word	Count	Index            
a		a		5	0            
a		await		5	1            
a		wait		5	2            
a		waiting		5	3            
a		walk		5	4            
await		await		1	0            
wait		await		3	0            
wait		wait		3	1            
wait		waiting		3	2            
waiting		waiting		1	0            
walk		walk		1	0            

*/

--clean up after test

DROP TABLE  [dbo].[Terms]            
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.