question

Scot Hauder 1 avatar image
Scot Hauder 1 asked

The 'Word Permutation' SQL Problem

Hi all, while we're waiting for the next great Phil Factor Speed Phreak competition I would like to pose my own problem, no business value here, it's just for fun. Ten years ago when SQL Server 2000 was just coming out I wrote a query that I have not revisited since. If I recall, I wrote it to beat the Wordster game on the Megatouch consoles that were permeating the local bars. I am confident some of you can create a more eloquent solution (I don't want to influence your creativity so I will not reveal my old quirky solution until after some responses are given).

The Problem: given 8 or fewer letters, return all unique words of 3 or more letters that can be formed from the letters given, ordered by longest words first. You are free to create a word list table any way you want. The combinations of letters returned should be matched against this word list table so that only actual words are returned.

Cheers,

Scot

challengecombinationswordspermutationscombinatorial
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Bob Hovious avatar image
Bob Hovious answered

Found a substantial boost by losing the tally table.

--- improved CROSS JOIN, using 2008 syntax

DECLARE @Wordstring VARCHAR(8) 
set @wordstring = 'SUBTLEST'

declare @timer datetime
set @timer  = getdate()

;with cte1 as (
    select C,N
    from	(values		(1,substring(@wordstring,1,1)),
    					(2,substring(@wordstring,2,1)),
    					(3,substring(@wordstring,3,1)),
    					(4,substring(@wordstring,4,1)),
    					(5,substring(@wordstring,5,1)),
    					(6,substring(@wordstring,6,1)),
    					(7,substring(@wordstring,7,1)),
    					(8,substring(@wordstring,8,1))
    					) ca (N,C)
    )					


,cte2 (N1, C1, N2, C2) as (
    select c.N,c.C,c1.N,c1.C
    from cte1 c
    cross join cte1 c1
    where c.N <> c1.N
    )

,cte3 (N1,C1,N2,C2,N3,C3) as (
    select c.N1,c.C1,c.N2,c.C2,c1.N,c1.C
    from cte2 c
    cross join cte1 c1
    where c1.N not in (c.N1,c.N2)
    )

,cte4 (N1,C1,N2,C2,N3,C3,N4,C4) as (
    select c.N1,c.C1,c.N2,c.C2,c.N3,c.C3,c1.N,c1.C
    from cte3 c
    cross join cte1 c1
    where c1.N not in (c.N1,c.N2,c.N3)
    ) 

,cte5 (N1,C1,N2,C2,N3,C3,N4,C4,N5,C5) as (
    select c.N1,c.C1,c.N2,c.C2,c.N3,c.C3,c.N4,c.C4,c1.N,c1.C
    from cte4 c
    cross join cte1 c1
    where c1.N not in (c.N1,c.N2,c.N3,c.N4)
    ) 

,cte6 (N1,C1,N2,C2,N3,C3,N4,C4,N5,C5,N6,C6) as (
    select c.N1,c.C1,c.N2,c.C2,c.N3,c.C3,c.N4,c.C4,c.N5,c.C5,c1.N,c1.C
    from cte5 c
    cross join cte1 c1
    where c1.N not in (c.N1,c.N2,c.N3,c.N4,c.N5)
    ) 

,cte7 (N1,C1,N2,C2,N3,C3,N4,C4,N5,C5,N6,C6, N7, C7) as (
    select c.N1,c.C1,c.N2,c.C2,c.N3,c.C3,c.N4,c.C4,c.N5,c.C5,c.N6,c.C6,c1.N,c1.C
    from cte6 c
    cross join cte1 c1
    where c1.N not in (c.N1,c.N2,c.N3,c.N4,c.N5,c.N6)
    ) 

,cte8 (N1,C1,N2,C2,N3,C3,N4,C4,N5,C5,N6,C6, N7, C7, N8, C8) as (
    select c.N1,c.C1,c.N2,c.C2,c.N3,c.C3,c.N4,c.C4,c.N5,c.C5,c.N6,c.C6,c.N7,c.C7,c1.N,c1.C
    from cte7 c
    cross join cte1 c1
    where c1.N not in (c.N1,c.N2,c.N3,c.N4,c.N5,c.N6,c.N7)
    ) 

,combinations (combo) as (
    select C1+C2+C3 from cte3
    union all
    select C1+C2+C3+C4 from cte4
    union all
    select C1+C2+C3+C4+C5 from cte5 
    union all
    select C1+C2+C3+C4+C5+C6 from cte6
    union all
    select C1+C2+C3+C4+C5+C6+C7 from cte7
    union all
    select C1+C2+C3+C4+C5+C6+C7+C8 from cte8
    )

select combo
into #temp
from combinations
join dbo.WordList on word = combo 

select distinct combo from #temp order by combo
drop table #temp
select datediff(ms,@timer, getdate())
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Peso avatar image
Peso answered

I have tried both Bob's new code and my code. My code runs in 2.1 seconds and Bob's runs in 0.4 seconds. It's much faster, but also not so flexible as mine. With my suggestion, you only have to change the BETWEEN 3 AND LEN(@Wordstring) of you want to include shorter words than 3 letters. The upper limit is automatically included. With Bob's code, if you want to include shorter words, you have to alter the Combinations cte to include cte1 and cte2. It gets harder if you want to include longer words. You have to edit the code and add cte9, cte10 and so on cte's, and also change the Combinations cte to include the new cte's.

DECLARE @Wordstring VARCHAR(8) = 'Subtlest'

-- Peso
SELECT  	w.Word
FROM    	dbo.Words AS w
CROSS APPLY (
    		SELECT		SUBSTRING(w.Word, v.Number, 1) AS Letter,
    				COUNT(*) AS Items
    		FROM		master.dbo.spt_values AS v
    		WHERE		v.Type = 'P'
    				AND v.Number BETWEEN 1 AND LEN(w.Word)
    		GROUP BY	SUBSTRING(w.Word, v.Number, 1)
    	) AS q
INNER JOIN  (
    		SELECT		SUBSTRING(@Wordstring, Number, 1) AS Letter,
    				COUNT(*) AS Items
    		FROM		master.dbo.spt_values
    		WHERE		Type = 'P'
    				AND Number BETWEEN 1 AND LEN(@Wordstring)
    		GROUP BY	SUBSTRING(@Wordstring, Number, 1)
    	) AS t ON t.Letter = q.Letter
    		AND t.Items >= q.Items
WHERE   	w.Size BETWEEN 3 AND LEN(@Wordstring)
GROUP BY    w.Word,
    	w.Size
HAVING  	MIN(w.Size) = SUM(q.Items)
ORDER BY    w.Size DESC

-- Bob
;WITH cte1(chr1, num1)
AS  (
    	SELECT	chr,
    		num
    	FROM	(
    			VALUES	(1, SUBSTRING(@Wordstring, 1, 1)),
    				(2, SUBSTRING(@Wordstring, 2, 1)),
    				(3, SUBSTRING(@Wordstring, 3, 1)),
    				(4, SUBSTRING(@Wordstring, 4, 1)),
    				(5, SUBSTRING(@Wordstring, 5, 1)),
    				(6, SUBSTRING(@Wordstring, 6, 1)),
    				(7, SUBSTRING(@Wordstring, 7, 1)),
    				(8, SUBSTRING(@Wordstring, 8, 1))
    		) AS c(num, chr)
), cte2(num1, chr1, num2, chr2)
AS  (
    	SELECT		c.num1,
    			c.chr1,
    			c1.num1,
    			c1.chr1
    	FROM		cte1 AS c
    	INNER JOIN	cte1 AS c1 ON c1.num1 NOT IN (c.num1)
), cte3 (num1, chr1, num2, chr2, num3, chr3)
AS  (
    	SELECT		c2.num1,
    			c2.chr1,
    			c2.num2,
    			c2.chr2,
    			c1.num1,
    			c1.chr1
    	FROM		cte2 AS c2
    	INNER JOIN	cte1 AS c1 ON c1.num1 NOT IN (c2.num1, c2.num2)
), cte4 (num1, chr1, num2, chr2, num3, chr3, num4, chr4)
AS  (
    	SELECT		c3.num1,
    			c3.chr1,
    			c3.num2,
    			c3.chr2,
    			c3.num3,
    			c3.chr3,
    			c1.num1,
    			c1.chr1
    	FROM		cte3 AS c3
    	INNER JOIN	cte1 AS c1 ON c1.num1 NOT IN (c3.num1, c3.num2, c3.num3)
), cte5 (num1, chr1, num2, chr2, num3, chr3, num4, chr4, num5, chr5)
AS  (
    	select		c4.num1,
    			c4.chr1,
    			c4.num2,
    			c4.chr2,
    			c4.num3,
    			c4.chr3,
    			c4.num4,
    			c4.chr4,
    			c1.num1,
    			c1.chr1
    	FROM		cte4 AS c4
    	INNER JOIN	cte1 AS c1 ON c1.num1 NOT IN (c4.num1, c4.num2, c4.num3, c4.num4)
), cte6 (num1, chr1, num2, chr2, num3, chr3, num4, chr4, num5, chr5, num6, chr6)
AS  (
    	SELECT		c5.num1,
    			c5.chr1,
    			c5.num2,
    			c5.chr2,
    			c5.num3,
    			c5.chr3,
    			c5.num4,
    			c5.chr4,
    			c5.num5,
    			c5.chr5,
    			c1.num1,
    			c1.chr1
    	FROM		cte5 AS c5
    	INNER JOIN	cte1 AS c1 ON c1.num1 NOT IN (c5.num1, c5.num2, c5.num3, c5.num4, c5.num5)
), cte7 (num1, chr1, num2, chr2, num3, chr3, num4, chr4, num5, chr5, num6, chr6, num7, chr7)
AS  (
    	SELECT		c6.num1,
    			c6.chr1,
    			c6.num2,
    			c6.chr2,
    			c6.num3,
    			c6.chr3,
    			c6.num4,
    			c6.chr4,
    			c6.num5,
    			c6.chr5,
    			c6.num6,
    			c6.chr6,
    			c1.num1,
    			c1.chr1
    	FROM		cte6 AS c6
    	INNER JOIN	cte1 AS c1 ON c1.num1 NOT IN (c6.num1, c6.num2, c6.num3, c6.num4, c6.num5, c6.num6)
), cte8 (num1, chr1, num2, chr2, num3, chr3, num4, chr4, num5,chr5, num6, chr6, num7, chr7, num8, chr8)
AS  (
    	SELECT		c7.num1,
    			c7.chr1,
    			c7.num2,
    			c7.chr2,
    			c7.num3,
    			c7.chr3,
    			c7.num4,
    			c7.chr4,
    			c7.num5,
    			c7.chr5,
    			c7.num6,
    			c7.chr6,
    			c7.num7,
    			c7.chr7,
    			c1.num1,
    			c1.chr1
    	FROM		cte7 AS c7
    	INNER JOIN	cte1 AS c1 ON c1.num1 NOT IN (c7.num1, c7.num2, c7.num3, c7.num4, c7.num5, c7.num6, c7.num7)
), Combinations (s, Combo)
AS  (
    	--SELECT	1, chr1 FROM cte1 UNION ALL
    	--SELECT	2, chr1 + chr2 FROM cte2 UNION ALL
    	SELECT	3, chr1 + chr2 + chr3 FROM cte3 UNION ALL
    	SELECT	4, chr1 + chr2 + chr3 + chr4 FROM cte4 UNION ALL
    	SELECT	5, chr1 + chr2 + chr3 + chr4 + chr5 FROM cte5 UNION ALL
    	SELECT	6, chr1 + chr2 + chr3 + chr4 + chr5 + chr6 FROM cte6 UNION ALL
    	SELECT	7, chr1 + chr2 + chr3 + chr4 + chr5 + chr6 + chr7 FROM cte7 UNION ALL
    	SELECT	8, chr1 + chr2 + chr3 + chr4 + chr5 + chr6 + chr7 + chr8 FROM cte8
)

SELECT  	w.Word
INTO    	#Temp
FROM    	Combinations AS c
INNER JOIN  dbo.Words AS w ON w.Size = c.s
WHERE   	w.Word = c.Combo 

SELECT  	Word
FROM    	#Temp
GROUP BY    Word
ORDER BY    LEN(Word) DESC

DROP TABLE  #Temp
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Peso avatar image
Peso answered

If n denotes how many initial characters there are, this is how you calculate how many permutations there are for n letters to fill p positions in the final word

For example: You have 5 letters and want up to 8-letter words

SUM(5^p) where p between 1 and positions

EQUALS 488280. Then you substract the unwanted short words with letters up to 2 positions (5*5+5 = 30)

So with 5 letters to fill 8 positions, you have a total if 488,250 combination of 5-to-8 letter words.

And with SQL-code

DECLARE @Permutations NUMERIC(38, 0) = 0,
    @Letters TINYINT = 5,
    @MaxWordLength TINYINT = 8

WHILE @MaxWordLength >= 3
    SELECT	@Permutations += POWER(@Letters, @MaxWordLength),
    	@MaxWordLength -= 1

SELECT  @Permutations
2 comments
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

For 8 initial characters to populate 8 positions, there are 19,173,888 permutations.
0 Likes 0 ·
Hi Peso, this is good if I want the number of combinations but I want the actual combinations displayed eg the string 'eri' should generate a result set: eri eir rei rie ier ire and after joining with a word list table it would only return ire.
0 Likes 0 ·
Bob Hovious avatar image
Bob Hovious answered

Here you go. You will need to come up with your own comprehensive list of 3-8 character words and build a table, but the #words table in the example is sufficient to demonstrate the code will work.

if object_id(N'tempdb..#cte1') is not null drop table #cte1;
if object_id(N'tempdb..#words') is not null drop table #words;

create table #words (word char(8) primary key)
insert into #words     -- subtle
select 'sub' union all
select 'tub' union all
select 'but' union all
select 'let' union all
select 'bet' union all
select 'set' union all
select 'lust' union all
select 'bust' union all
select 'blest'union all
select 'bless' union all
select 'tubes' union all
select 'subtle';

select * from #words order by word;


declare @word char(8)
set @word =  'SUBTLEST'

;with tally (N) as (select row_number() over(order by (select null)) from sys.syscolumns)
,cte1 (N,chr) as (select N,substring(@word,N,1) from tally where N <= len(@word))

select * into #cte1 from cte1  -- materialize the substring table

;with cte2 (N1,C1,N2,C2) as (
 select c.N,c.chr,c1.N,c1.chr
 from #cte1 c
 cross join #cte1 c1
 where c.N <> c1.N
 )
,cte3 (N1,C1,N2,C2,N3,C3) as (
 select c.N1,c.C1,c.N2,c.C2,c1.N,c1.chr
 from cte2 c
 cross join #cte1 c1
 where c1.N not in (c.N1,c.N2)
)

,cte4 (N1,C1,N2,C2,N3,C3,N4,C4) as (
 select c.N1,c.C1,c.N2,c.C2,c.N3,c.C3,c1.N,c1.chr
 from cte3 c
 cross join #cte1 c1
 where c1.N not in (c.N1,c.N2,c.N3)
) 

,cte5 (N1,C1,N2,C2,N3,C3,N4,C4,N5,C5) as (
 select c.N1,c.C1,c.N2,c.C2,c.N3,c.C3,c.N4,c.C4,c1.N,c1.chr
 from cte4 c
 cross join #cte1 c1
 where c1.N not in (c.N1,c.N2,c.N3,c.N4)
) 

,cte6 (N1,C1,N2,C2,N3,C3,N4,C4,N5,C5,N6,C6) as (
 select c.N1,c.C1,c.N2,c.C2,c.N3,c.C3,c.N4,c.C4,c.N5,c.C5,c1.N,c1.chr
 from cte5 c
 cross join #cte1 c1
 where c1.N not in (c.N1,c.N2,c.N3,c.N4,c.N5)
) 

,cte7 (N1,C1,N2,C2,N3,C3,N4,C4,N5,C5,N6,C6, N7, C7) as (
 select c.N1,c.C1,c.N2,c.C2,c.N3,c.C3,c.N4,c.C4,c.N5,c.C5,c.N6,c.C6,c1.N,c1.chr
 from cte6 c
 cross join #cte1 c1
 where c1.N not in (c.N1,c.N2,c.N3,c.N4,c.N5,c.N6)
) 

,cte8 (N1,C1,N2,C2,N3,C3,N4,C4,N5,C5,N6,C6, N7, C7, N8, C8) as (
 select c.N1,c.C1,c.N2,c.C2,c.N3,c.C3,c.N4,c.C4,c.N5,c.C5,c.N6,c.C6,c.N7,c.C7,c1.N,c1.chr
 from cte7 c
 cross join #cte1 c1
 where c1.N not in (c.N1,c.N2,c.N3,c.N4,c.N5,c.N6,c.N7)
) 

,combinations (combo) as (
    select C1+C2+C3 from cte3
     union all
    select C1+C2+C3+C4 from cte4
     union all
    select C1+C2+C3+C4+C5 from cte5 
     union all
    select C1+C2+C3+C4+C5+C6 from cte6
     union all
    select C1+C2+C3+C4+C5+C6+C7 from cte7
     union all
    select C1+C2+C3+C4+C5+C6+C7+C8 from cte8
)

select distinct combo
from combinations
join #words on word = combo
order by combo

drop table #words
drop table #cte1
7 comments
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

+1 - top job! I got as far as CTE2 and got side-tracked by the kids... :)
1 Like 1 ·
If only you understood the power of cut-and-paste!!
0 Likes 0 ·
Now I'm going to critique my own solution. It's geared to words of a maximum eight characters. It takes an additional cross join for each additional character in length. So to handle 10-character words, it would require two additional ctes each with an additional cross join.
0 Likes 0 ·
Add a "Length" parameter/column to your CTE's and have that included to the INNER JOIN too. I think it will be somewhat faster.
0 Likes 0 ·
Yeah, it runs faster with shorter strings, even with all 8 ctes firing, and a length test in each would be an improvement. I suppose I could also shut off generating duplicate combinations when the base word has duplicate characters like the two S's and T's in "subtlest". That would eliminate the aggregation caused by using Select DISTINCT. Thanks, Peso. :-)
0 Likes 0 ·
Show more comments
Scot Hauder 2 avatar image
Scot Hauder 2 answered

Great solution Bob! Very similar to mine except 10 yrs ago I didn't know what a tally table was and instead of the ctes I accumulated the different lengths in a temp table. As an aside, I was running it on a 233MGHZ 512MB laptop at the time so it was taking over 3 minutes. Well Wordster only gave you about three minutes to find all the words you could so to speed it up I added a large where clause that excluded every prefix and suffix that was not a word in my 265,000 word list eg anything starting or ending with bb,bc,bd etc for every letter pair. In addition I had to narrow it to only 6,7 and 8 letter words (they were worth the most points anyway) If you want to give me your contact email I can send you the prize cash

4 comments
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

There's a cash prize ??
0 Likes 0 ·
If you post an email address I will contact you to find out where you want it sent. If I post an email, 100 others will contact me saying they are you.
0 Likes 0 ·
Good point. I'll put my work email up on my profile for a while and take it down once you email me. Thanks, Scot.
0 Likes 0 ·
Done, I posted this anonymously so I cannot go in and mark your answer correct
0 Likes 0 ·
Peso avatar image
Peso answered
DECLARE @Letters TABLE
    (
    	Letter CHAR(1) PRIMARY KEY NONCLUSTERED
    )

INSERT  @Letters
VALUES  --('D'),
    --('C'),
    --('N'),
    ('I'),
    ('E'),
    --('A'),
    --('T'),
    ('P')

;WITH Yak(Word, CurrentLetters, MaxLetters)
AS  (
    	SELECT	CAST(Letter AS VARCHAR(MAX)),
    		1,
    		5
    	FROM	@Letters

    	UNION ALL

    	SELECT		y.Word + l.Letter,
    			y.CurrentLetters + 1,
    			y.MaxLetters
    	FROM		Yak AS y
    	CROSS JOIN	@Letters AS l
    	WHERE		LEN(y.Word) < y.MaxLetters
)
SELECT  Word
FROM    Yak
WHERE   CurrentLetters >= 3
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Peso avatar image
Peso answered

Here is another shot

DECLARE @Word VARCHAR(8) = 'SUBTLEST'

DECLARE @Words TABLE
    (
    	Word VARCHAR(8) PRIMARY KEY
    )

INSERT  @Words
VALUES  ('itt'),
    ('sub'),
    ('tub'),
    ('but'),
    ('let'),
    ('bet'),
    ('set'),
    ('lust'),
    ('peter'),
    ('blest'),
    ('bless'),
    ('tubes'),
    ('subtle')

SELECT  	w.Word,
    	CASE COUNT(w.Letter)
    		WHEN COUNT(q.Letter) THEN 'Match'
    		ELSE 'No match'
    	END AS Matching
FROM    	(
    		SELECT		w.Word,
    				SUBSTRING(w.Word, v.Number, 1) AS Letter,
    				ROW_NUMBER() OVER (PARTITION BY w.Word, SUBSTRING(w.Word, v.Number, 1) ORDER BY SUBSTRING(w.Word, v.Number, 1)) AS Ordinal
    		FROM		@Words AS w
    		INNER JOIN	master..spt_values AS v ON v.Type = 'P'
    		WHERE		v.Number BETWEEN 1 AND LEN(w.Word)
    	) AS w
LEFT JOIN   (
    		SELECT	SUBSTRING(@Word, Number, 1) AS Letter,
    			ROW_NUMBER() OVER (PARTITION BY SUBSTRING(@Word, Number, 1) ORDER BY SUBSTRING(@Word, Number, 1)) AS Ordinal
    		FROM	master..spt_values
    		WHERE	Type = 'P'
    			AND Number BETWEEN 1 AND LEN(@Word)
    	) AS q ON q.Letter = w.Letter
    		AND q.Ordinal = w.Ordinal
GROUP BY    w.Word
ORDER BY    w.Word
9 comments
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Have you ever written a query that didn't rock?
1 Like 1 ·
Okay.... I'm agape. It took me 10 minutes to understand how this works. Peso, is this original thought from your fevered brain? If so, please share the thought processes that led to this elegant piece of work.
0 Likes 0 ·
The query is fast, isn't it? ;-)
0 Likes 0 ·
The idea was to get away to calculate all permutations/combinations to the later comparison. The only way (in my opinion) is to do a direct relation between the letters of word in table with the letters of wordstring. This underlying idea is surely used somewhere else, but I think this implementation is mine. With the direct relation in place, we now need a method to separate out duplicate letters, so that a specific word in table doesn't use same letter in wordstring twice. Here ROW_NUMBER is handy.
0 Likes 0 ·
For example BLESS in table has two "s" and sp does the wordstring. This gives four combinations (without the ordinal calculation). With the ordinal calculation (s1 and s2 in word) against s1 and s2 in wordstring, there is no mixup.
0 Likes 0 ·
Show more comments
Scot Hauder 2 avatar image
Scot Hauder 2 answered

Hi Peso, looks promising I will take a look at them after work. This is what I was looking for--something different than my bloated 2000 solution.

2 comments
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

The "another" shot doesn't have to permutate all combinations. It just matches the present characters against the wordstring, letter by letter (in case of duplicates, the ordinal too, so that no letter in word matches same letter in wordstring). That's it. And of course the last check, COUNT vs COUNT, to see that all original letters are found.
0 Likes 0 ·
Scot, have you got a dictionary table with thousands of words in it? If so, how about doing some time trials? I am terribly impressed by Peso's approach. (It really makes you look at the problem in a different way.) But I am curious about how it will scale.
0 Likes 0 ·
Scot Hauder 3 avatar image
Scot Hauder 3 answered

Very nice Peso. I think you unwittingly found the fastest way to sequence the human genome. Which brings me to part 2 of my challenge...

Bob, yes I have a word list, the query needs to be modified to only return the matches, right now it returns all 265,000 words in about 17sec

Your query using subtlest returned 116 words in 685ms (I modified it to order them by longest first since that is what we care about) My original runs about 2 seconds slower than yours

9 comments
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Thanks, Scot. I'm thinking maybe you can buy me a beer now. ;)
0 Likes 0 ·
Peso, I would like to try to apply your concept to another problem with string matching that I'm pondering. Your technique is nasty fast when the number of comparisons aren't gigantic. My hat's off to you.
0 Likes 0 ·
Of course. Keep me updated of your findings. You can find my email address at http://www.developerworkshop.net
0 Likes 0 ·
There is a part 2?
0 Likes 0 ·
Bob's query returns 116 words in 685 ms? And your original query takes about 3 seconds? How long time does my query take?
0 Likes 0 ·
Show more comments
Peso avatar image
Peso answered

Somewhat refined ;-)

DECLARE @Wordstring VARCHAR(8) = 'SUBTLEST'

CREATE TABLE    #Words
    	(
    		Word VARCHAR(8) PRIMARY KEY CLUSTERED
    	)

INSERT  #Words
VALUES  ('itt'),
    ('sub'),
    ('tub'),
    ('but'),
    ('let'),
    ('bet'),
    ('set'),
    ('lust'),
    ('peter'),
    ('blest'),
    ('bless'),
    ('tubes'),
    ('subtle')

-- Peso (CPU 0, Duration 2, Reads 44)
SELECT  	w.Word
FROM    	(
    		SELECT		w.Word,
    				SUBSTRING(w.Word, v.Number, 1) AS Letter,
    				ROW_NUMBER() OVER (PARTITION BY w.Word, SUBSTRING(w.Word, v.Number, 1) ORDER BY v.Number) AS Ordinal
    		FROM		#Words AS w
    		INNER JOIN	master..spt_values AS v ON v.Type = 'P'
    		WHERE		v.Number BETWEEN 1 AND LEN(w.Word)
    				AND LEN(w.Word) <= LEN(@Wordstring)
    	) AS w
LEFT JOIN   (
    		SELECT	SUBSTRING(@Wordstring, Number, 1) AS Letter,
    			ROW_NUMBER() OVER (PARTITION BY SUBSTRING(@Wordstring, Number, 1) ORDER BY Number) AS Ordinal
    		FROM	master..spt_values
    		WHERE	Type = 'P'
    			AND Number BETWEEN 1 AND LEN(@Wordstring)
    	) AS q ON q.Letter = w.Letter
    		AND q.Ordinal = w.Ordinal
GROUP BY    w.Word
HAVING  	COUNT(w.Letter) = COUNT(q.Letter)
ORDER BY    LEN(w.Word) DESC

-- Bob (CPU 1201, Duration 660, Reads 438583)
;WITH Tally(N)
AS  (
    	SELECT	ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    	FROM	sys.syscolumns
),cte1 (N, chr)
AS  (
    	SELECT	N,
    		SUBSTRING(@Wordstring, N, 1)
    	FROM	Tally
    	WHERE	N <= LEN(@Wordstring)
)

SELECT  *
INTO    #cte1
FROM    cte1 -- materialize the substring table

;WITH cte2 (N1, C1, N2, C2) as (
select c.N,c.chr,c1.N,c1.chr
from #cte1 c
cross join #cte1 c1
where c.N <> c1.N
)
,cte3 (N1,C1,N2,C2,N3,C3) as (
select c.N1,c.C1,c.N2,c.C2,c1.N,c1.chr
from cte2 c
cross join #cte1 c1
where c1.N not in (c.N1,c.N2)
)

,cte4 (N1,C1,N2,C2,N3,C3,N4,C4) as (
select c.N1,c.C1,c.N2,c.C2,c.N3,c.C3,c1.N,c1.chr
from cte3 c
cross join #cte1 c1
where c1.N not in (c.N1,c.N2,c.N3)
) 

,cte5 (N1,C1,N2,C2,N3,C3,N4,C4,N5,C5) as (
select c.N1,c.C1,c.N2,c.C2,c.N3,c.C3,c.N4,c.C4,c1.N,c1.chr
from cte4 c
cross join #cte1 c1
where c1.N not in (c.N1,c.N2,c.N3,c.N4)
) 

,cte6 (N1,C1,N2,C2,N3,C3,N4,C4,N5,C5,N6,C6) as (
select c.N1,c.C1,c.N2,c.C2,c.N3,c.C3,c.N4,c.C4,c.N5,c.C5,c1.N,c1.chr
from cte5 c
cross join #cte1 c1
where c1.N not in (c.N1,c.N2,c.N3,c.N4,c.N5)
) 

,cte7 (N1,C1,N2,C2,N3,C3,N4,C4,N5,C5,N6,C6, N7, C7) as (
select c.N1,c.C1,c.N2,c.C2,c.N3,c.C3,c.N4,c.C4,c.N5,c.C5,c.N6,c.C6,c1.N,c1.chr
from cte6 c
cross join #cte1 c1
where c1.N not in (c.N1,c.N2,c.N3,c.N4,c.N5,c.N6)
) 

,cte8 (N1,C1,N2,C2,N3,C3,N4,C4,N5,C5,N6,C6, N7, C7, N8, C8) as (
select c.N1,c.C1,c.N2,c.C2,c.N3,c.C3,c.N4,c.C4,c.N5,c.C5,c.N6,c.C6,c.N7,c.C7,c1.N,c1.chr
from cte7 c
cross join #cte1 c1
where c1.N not in (c.N1,c.N2,c.N3,c.N4,c.N5,c.N6,c.N7)
) 

,combinations (combo) as (
select C1+C2+C3 from cte3
union all
select C1+C2+C3+C4 from cte4
union all
select C1+C2+C3+C4+C5 from cte5 
union all
select C1+C2+C3+C4+C5+C6 from cte6
union all
select C1+C2+C3+C4+C5+C6+C7 from cte7
union all
select C1+C2+C3+C4+C5+C6+C7+C8 from cte8
)

select distinct combo
from combinations
join #words on word = combo 
order by combo

drop table #cte1

-- Clean up
DROP TABLE  #Words
2 comments
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Peso, did you see my comment above about adding another column to the word table? I just need some ideas on how to (quickly) convert a text string to binary. Unfortunate that sql will not let us do bitwise manipulation on two binary types, only one can be binary?
0 Likes 0 ·
Peso, have you tried reading only the eligible words that begin with a character found in the @Wordstring variable? That might be a quicker comparison when you are running against a multithousand word table and the words start with letters A-Z.
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.