x

Counting matches and finding index

First off, this is my first time posting on this forum. I've got a database with a long list of strings, what I'd like to do is find how many of these strings contain other strings in the same table, and if they are sorted in descending order, what's it's index within that sub table?

For instance, if I have the strings a, await, wait, waiting, walk then the index for wait should be 2 (or 1 if we start counting from 0).

I was able to count how many occurrences there were (in this case 3).

Each name in the table is unique.

The query only needs to be run once every few months and has a list of about 2000 string.

more ▼

asked Oct 12, 2009 at 01:06 AM in Default

Phil gravatar image

Phil
36 1 1 1

(comments are locked)
10|1200 characters needed characters left

3 answers: sort voted first

I thought up something that I've done before using SQL and Excel. I used the following query:
Select B.name, A.name From types as A, types as B WHERE A.name LIKE '%' + B.name + '%' ORDER BY B.name, A.name
Which gave me the list of As which matched something in B.
Then I exported to Excel and computed a new column. This column would start at 0 if the row above it containing A was not equal to the current rows A. Otherwise, it would add one.
I then create another column which is either 1 if A and B of the current row match, otherwise it is empty. By sorting by this new column, and deleting all other rows with no entries on this column, I am left with a list of search strings and indexes, which is exactly what I want. It's a bit of a hack but considering the infrequency I have to do this, I'm fine with this solution.
If there's a better way through SQL only though, I'd love to hear it.

more ▼

answered Oct 12, 2009 at 02:17 AM

Phil gravatar image

Phil
36 1 1 1

(comments are locked)
10|1200 characters needed characters left

This answer requires SQL Server 2005 or above. I'm not sure of the exact output you were looking forward, but I'm sure you can tweak that having this solution.

I didn't know the structure of your table so I can only present an example solution and not the exact solution with your table name and field names. Skip down to the SELECT portion to see how this works.

--create a table to hold the terms to search

CREATE TABLE [dbo].[Terms] (            
    [Word] varchar(30) NOT NULL )            

--add a unique key that is clustered (for default ordering purposes)

CREATE UNIQUE CLUSTERED INDEX CX_Terms            
    ON [dbo].[Terms] ( [Word] )            

--add some values

INSERT INTO [dbo].[Terms] VALUES ('a')            
INSERT INTO [dbo].[Terms] VALUES ('await')            
INSERT INTO [dbo].[Terms] VALUES ('wait')            
INSERT INTO [dbo].[Terms] VALUES ('waiting')            
INSERT INTO [dbo].[Terms] VALUES ('walk')            

--view the list of terms

SELECT [Word] FROM [dbo].[Terms]            

--view the list of terms with the accompanying terms they are found in, the count and the

index position in the sub-table            
SELECT            
    a.[Word] AS [Search Word],            
    b.[Word] AS [Found in Word],            
    COUNT(b.[Word]) OVER (PARTITION BY a.[Word]) AS [Count], --count of word containing search text including the word you are searching            
    ROW_NUMBER() OVER (ORDER BY a.[Word]) - RANK() OVER (ORDER BY a.[Word]) AS [Index] --the index of the word in the ordered list, 0=the word itself 1=first word which contains the searched word, etc.            
FROM            
    [dbo].[Terms] a            
INNER JOIN            
    [dbo].[Terms] b            
ON            
    CHARINDEX(a.[Word], b.[Word]) > 0            
    --a.[Word] <> b.[Word] --don't match the same word to itself            
ORDER BY            
    a.[Word],            
    b.[Word]            

/*--results

Search Word	Found in Word	Count	Index            
a		a		5	0            
a		await		5	1            
a		wait		5	2            
a		waiting		5	3            
a		walk		5	4            
await		await		1	0            
wait		await		3	0            
wait		wait		3	1            
wait		waiting		3	2            
waiting		waiting		1	0            
walk		walk		1	0            

*/

--clean up after test

DROP TABLE [dbo].[Terms]
more ▼

answered Oct 15, 2009 at 07:35 PM

Benjamin gravatar image

Benjamin
320 2 3 4

(comments are locked)
10|1200 characters needed characters left
select a.string, num = count(*)            
from tbl a            
join tbl b            
on b.string like '%' + a.string + '%'            

Shouldn't wait be 3 in your sample? await, wait, waiting all contain wait.

This won't be quick.

more ▼

answered Oct 12, 2009 at 04:36 AM

nigelrivett gravatar image

nigelrivett
92

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x61
x46
x34
x6

asked: Oct 12, 2009 at 01:06 AM

Seen: 1115 times

Last Updated: Oct 17, 2009 at 11:46 AM