# question

## COUNT DISTINCT words in a string.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

·

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

If you have SQL Server 2016 and compat level of the database where you run the script set to 130 then you can simply use the [string_split][1], i.e. select ltrim(rtrim(value)) Word, count(1) WordCount from string_split(@TextDoc, ' ') group by ltrim(rtrim(value)) order by WordCount desc; -- results (abridged) Word WordCount -------------------- ----------- et 289 at 262 In 235 magna 210 Nunc 210 eget 209 ac 209 Donec 209 Pellentesque 183 quis 183 -- etc Hope this helps Oleg [1]: https://msdn.microsoft.com/en-us/library/mt684588.aspx

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Thanks guys. I managed to find a function do the splitting, and used a query to clean the data a bit: --Step 1: Create Function CREATE FUNCTION [dbo].[ufn_WordSplit] ( @string NVARCHAR(MAX) ) RETURNS @output TABLE(splitdata NVARCHAR(MAX) ) BEGIN --Change string to be seperated by commas SET @string = REPLACE(@string, ' ', ',') SET @string = REPLACE(@string, '.',',') --Eliminate double commas SET @string = REPLACE(@string, ',,', ',') DECLARE @start INT, @end INT SELECT @start = 1, @end = CHARINDEX(',',@string) WHILE @start < LEN(@string) + 1 BEGIN IF @end = 0 SET @end = LEN(@string) + 1 INSERT INTO @output (splitdata) VALUES(SUBSTRING(@string, @start, @end - @start)) SET @start = @end + 1 SET @end = CHARINDEX(',', @string, @start) END RETURN END --Step2: SELECT DECLARE @DocumentText NVARCHAR(MAX) EXEC dbo.pr_GetDocumentText @DocumentID = 0, @DocumentText = @DocumentText OUTPUT SELECT TOP 10 x.splitdata ,SUM(x.splitdatacount) AS splitdatacount FROM ( SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(LTRIM(RTRIM(splitdata)),'!',''),'?',''),' ',''),char(9),''),char(10),''),char(13),'') AS splitdata ,COUNT(*) AS splitdatacount FROM dbo.ufn_WordSplit (@DocumentText) --WHERE -- splitdata like '%pellentesque%' OR splitdata like '%magna%' GROUP BY splitdata ) x GROUP BY x.splitdata ORDER BY SUM(x.splitdatacount) DESC

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.