question

GPO avatar image
GPO asked

Looking for a needle in an SQL haystack

Hi All (SQL 2K5) I have to query a database which contains dozens of tables each of which has dozens of columns. All up about 13000 columns across 600 tables. About 5000 cols are of a text/char-related data type. The table and column names are all but meaningless. There are no foreign keys to give you much of a clue as to which fields to join on. There is no published schema or ER diags. Primary keys do exist ;-). In terms of the number of rows in the various tables, it's not huge. The biggest table has about 800,000 rows. Most are far smaller. When people send me a screenshot of the user interface, and ask me to query a highlighted field, I sometimes have no idea what table it came from. Even if I did, I'd then have to work out what column. All I know is the somewhere in the database, a column in a table will have rows whose value equals (say) "Follow up" or "Work completed" amongst other things. What would be really cool would be a way to loop through every char-based column of every table in a given database and return the table and column name where a given string of characters was found to exist in ANY row (like SELECT TOP 1 then move on). FWIW there is no full-text indexing in place and no prospect of it. Also FWIW I've been back cap-in-hand to the previous person to do this job so often that we're both tiring of it. A lot of what I want to know is stored in his head and more often than not he ends up drip-feeding me vague, incomplete and semi-accurate information. There has to be a better way.
t-sqldynamic-sqlloopinformation_schemafriday-question
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

@GPO - seriously, if the guy isn't playing ball you should get some sort of "assistance" from higher up. That is unprofessional behaviour if he only supplies vague information and can often lead to disciplinary action. Hoarding information for job security often has the inverse effect! I would still suggest that you start some sort of ER diagram - read out the tables, then start manually creating the connections yourself as the information drips in from your colleague. It may also be a good idea to then start moving towards creating those missing foreign keys. It will allow you to improve data quality and, in certain circumstances, query performance. The whole idea of a database is to use PKs, FKs and UCs to uphold referential integrity. If the current design comes from your colleague, he may not really be worth all the knowledge he has in his head.
3 Likes 3 ·
Much easier said than done. I'm new here. There's nobody "higher up" who even knows what SQL stands for, and the people "higher up" seem to quite like the previous incumbent. I agree with your idea of creating my own ER diags, but with thousands of cols it will take some time :-)
0 Likes 0 ·
KenJ avatar image
KenJ answered

This is a quick update of an [ugly query][1] I wrote a number of years ago for SQL 2000. It's not super beautiful, but it does almost exactly what you are looking for.

This updated version searches all tables and character based columns in a database and returns the schema.table_name, the search term, the number of matching rows in the table, and the search query for each table with matching rows. The search query lists all of that table's character columns in the WHERE clause in an OR list.

Once it gives you the table(s) with matching rows, you just copy the source query out of the results and change it to a SELECT * to get a quick look at matching rows, including the matched column(s).

I just ran the query below against AdventureWorks and it showed that Production.ProductDescription has 27 rows that contain the word 'aluminum' and Production.ProductReview has two rows with the same word. A quick conversion of the source query contained in the result set to SELECT * shows the 'descripton' and 'comments' are the matching columns.

**EDIT**

I added one more column containing another query to the result set. This new column is a query that lists all of the character based columns in the table along with the number of rows for which that columnsh contains a match for the search term. The results now look like this:

![alt text][2]

Just copy the entire `column_analysis_query_text` column and execute it from a query window. No more converting to a SELECT * and inspecting to find the matching columns. You'll now see your table name, search term, and a summary of which columns have how many hits, like this result of the aforementioned AdventureWorks results:

![alt text][3]

This is more along the lines of what I always wanted it to do. Thanks again for the impetus to make this change!!

**END EDIT**

I look forward to seeing other solutions posted here!

    DECLARE @searchString nvarchar(100)
    
    SET @searchString = '%aluminum%'
    
    SET NOCOUNT ON
    
    CREATE TABLE ##search_results (result_table nvarchar(1000), search_term NVARCHAR(100), result_count bigint)
    CREATE TABLE ##search_queries (result_table nvarchar(1000), query_text NVARCHAR(MAX), column_analysis_query_text nvarchar(max))
    
    DECLARE @tbl sysname, @schema sysname
    DECLARE @prevTbl sysname, @prevSchema sysname
    
    SELECT TOP 1 
        @tbl = so.name, 
        @schema = ss.name 
    FROM 
        sysobjects AS so 
        INNER JOIN sys.schemas AS ss 
            ON so.uid = ss.schema_id 
        INNER JOIN syscolumns AS sc 
            ON so.id = sc.id 
        INNER JOIN systypes AS st 
            ON sc.xtype = st.xtype
    WHERE 
        so.type = 'u'
        AND st.name IN ('char', 'varchar', 'nchar', 'nvarchar', 'text', 'ntext')
    ORDER BY 
        quotename(ss.name) + '.' + quotename(so.name) DESC
        --order by so.name desc
    
    DECLARE @sql nvarchar(max), @where nvarchar(max), @column_analysis_sql nvarchar(max), @column_analysis_list nvarchar(max)
    
    WHILE @tbl IS NOT NULL AND @schema IS NOT NULL
    
    BEGIN
    
        SET @where = ''
    	SET @column_analysis_list = ''
    	
        SELECT 
            @where = @where + ' OR ' + quotename(sc.name) + ' LIKE ''' + @searchString + '''',
            @column_analysis_list = @column_analysis_list + ', sum(CASE WHEN ' + quotename(sc.name) + ' LIKE ''' + @searchString + ''' THEN 1 ELSE 0 END) AS ' + quotename(sc.name) 
        FROM
            sysobjects AS so
            INNER JOIN sys.schemas AS ss
                ON so.uid = ss.schema_id
            INNER JOIN syscolumns AS sc 
                ON so.id = sc.id 
            INNER JOIN systypes AS st 
                ON sc.xtype = st.xtype
        WHERE so.type = 'u'
            AND st.name IN ('char', 'varchar', 'nchar', 'nvarchar', 'text', 'ntext')
            AND ss.name = @schema
            AND so.name = @tbl
    
        -- get rid of the initial ' OR ' and initial ','
    
        SET @where = substring(@where, 4, len(@where))
        SET @column_analysis_list = substring(@column_analysis_list, 2, len(@column_analysis_list))
        
        SET @sql = 'SELECT ' + quotename(quotename(@schema) + '.' + quotename(@tbl), '''') + ' AS resultsTable, ' + quotename(@searchString, '''') + ' AS searchTerm, count(*) AS resultsCount FROM ' + quotename(@schema) + '.' + quotename(@tbl) + ' WHERE '
    	SET @column_analysis_sql = 'SELECT ' + quotename(quotename(@schema) + '.' + quotename(@tbl), '''') + ' AS resultsTable, ' + quotename(@searchString, '''') + ' AS searchTerm, ' + @column_analysis_list + ' FROM ' + quotename(@schema) + '.' + quotename(@tbl) + ' WHERE '
        
        INSERT ##search_queries (result_table, query_text, column_analysis_query_text) VALUES (quotename(@schema) + '.' + quotename(@tbl), @sql + @where, @column_analysis_sql + @where)
    
        SET @sql = 'INSERT ##search_results ' + @sql + @where
    
        EXEC sp_executesql @statement = @sql
        SELECT @prevTbl = @tbl, @prevSchema = @schema
        SELECT TOP 1
            @tbl = so.name, 
            @schema = ss.name 
        FROM 
            sysobjects AS so 
            INNER JOIN sys.schemas AS ss 
                ON so.uid = ss.schema_id 
            INNER JOIN syscolumns AS sc 
                ON so.id = sc.id 
            INNER JOIN systypes AS st 
                ON sc.xtype = st.xtype
        WHERE
            so.type = 'u'
            AND st.name IN ('char', 'varchar', 'nchar', 'nvarchar', 'text', 'ntext')
            --AND so.name < @tbl
            AND quotename(ss.name) + '.' + quotename(so.name) < QUOTENAME(@schema) + '.' + QUOTENAME(@tbl)
        ORDER BY 
            quotename(ss.name) + '.' + quotename(so.name) DESC
            --order by so.name desc
    
        IF @prevTbl = @tbl AND @prevSchema = @schema
        BEGIN
            SELECT @tbl = NULL, @schema = NULL
        END
    END
    
    SELECT 
        sr.result_table, sr.search_term, sr.result_count, sq.query_text, sq.column_analysis_query_text
    FROM 
        ##search_results AS sr 
        INNER JOIN ##search_queries AS sq 
            ON sr.result_table = sq.result_table 
    WHERE
        sr.result_count > 0 
    ORDER BY 
        sr.result_count DESC, 
        sr.result_table DESC
    
    DROP TABLE ##search_results
    DROP TABLE ##search_queries

[1]: http://kenj.blogspot.com/2009/11/finding-something-somewhere-in-database.html95 [2]: /upfiles/full_database_search_results.png [3]: /upfiles/full_database_search_details.png

2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

+1, awesome answer...
0 Likes 0 ·
I have written something similar 2 year ago when we had to find some values in AS400 DB2 tables over linked server. There was more than 6000 tables and more than 480 000 fields. But we had to search only a subset of the tables.
0 Likes 0 ·
bonskijr avatar image
bonskijr answered
Is this done only ad-hoc(from the looks of it) then you can do yourself a favor and install [Sql Search (free)][1] saved me countless of hours searching for text. Let it index the db (first time) first though. In your situation, getting a screenshot of the application is to use Sql Profiler(filtering that app + hostname) from there you can gather what sps+tables being used. HTH [1]: http://www.red-gate.com/products/SQL_Search/index.htm
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Yup - profile it while the requester runs the app. If I was in Europe and up and about that's where I would have headed in my answer.
0 Likes 0 ·
I love SQL Search for searching schema. I've completely given up the script route for schema searches.
0 Likes 0 ·
GPO avatar image
GPO answered
@bonskijr: I'll have to look into SQL Search - thanks for the tip. Pretty sure I don't have permissions to use the profiler but I'll look into that too. Also I'm reporting off a replication of the live DB so I'm not sure what use the profiler will be. Meanwhile I cooked up the following that I'd love some feedback on. It seems fairly similar to KenJ's. Because these things are pretty slow on databases that are many times the size of adventureworks, I've put a few things in to help take shortcuts where possible. --======= Looking for a needle in a haystack? --======= This (rather untested) code looks for a given string of characters in every text col of every table in your db --======= In large databases this will be very slow but you can speed things up considerably --======= with even a little homeground knowledge of the cols you want to search by excluding cols --======= you know you won't need, that are slow to search (like a disused col which has a million rows of NULLs). declare @counter as int; declare @sql as nvarchar(1000),@sql_timer as nvarchar(1000), @search_string as nvarchar(500),@col_name as nvarchar(500),@table_name as nvarchar(500); set @search_string = '%aluminum%' --======= create a table of columns we want to search IF OBJECT_ID('tempdb..#inf_schema') IS NOT NULL DROP TABLE #inf_schema; select IDENTITY(INT,1,1) AS N ,cls.TABLE_CATALOG ,cls.TABLE_SCHEMA ,cls.TABLE_NAME ,cls.COLUMN_NAME ,cls.ORDINAL_POSITION ,cast(0 as tinyint) as string_found ,cast(NULL as datetime) as column_search_started --let's time it to find the bottlenecks into #inf_schema FROM INFORMATION_SCHEMA.COLUMNS cls JOIN INFORMATION_SCHEMA.TABLES tbs ON cls.TABLE_NAME = tbs.TABLE_NAME WHERE DATA_TYPE in('char','varchar','nchar','nvarchar') and CHARACTER_MAXIMUM_LENGTH > 2 --pretty unlikely that I'll want to search on cols whose max length is < 3. and TABLE_TYPE = 'BASE TABLE' --don't care about views --======= here you could speed things up if you knew of slow tables or columns you didn't want to search --and cls.TABLE_NAME not like '%' --and cls.TABLE_NAME not like '%' --and case when cls.TABLE_NAME like '%' and cls.COLUMN_NAME = '%' then 1 -- when cls.TABLE_NAME like '%' and cls.COLUMN_NAME = '%' then 1 -- else 0 end = 0 --and cls.TABLE_SCHEMA not like '%' --and cls.COLUMN_NAME not like '%' --you'd only use this where you were convinced that EVERY col with that name pattern could be excluded. ORDER BY newid() --randomise the order. This allows you to select random samples during development in a large database, --to help estimate how long it might take to run overall. CREATE unique clustered index ix_tmp_inf_schema on #inf_schema(N); -- SELECT * FROM #inf_schema SET @counter = 1 set @col_name = (select top 1 '[' + COLUMN_NAME + ']' FROM #inf_schema WHERE @counter = N) set @table_name = (select top 1 TABLE_SCHEMA + '.[' + TABLE_NAME + ']' FROM #inf_schema WHERE @counter = N) WHILE @col_name is not null --or if you're just picking a random sample of 100 cols, use WHILE @counter
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

I get an error: Msg 245, Level 16, State 1, Line 6 Conversion failed when converting the nvarchar value '%aluminum%' to data type tinyint. I think you need `WHERE string_found = 1` as your last line
1 Like 1 ·
@Fatherjack: You're absolutely right. I had all sorts of trouble pasting the code in - particularly the start and the finish. I ended up writing the last line or two in manually and got it wrong. I'll go back and edit. Sincere apologies. Is there a knack to pasting code into this? I've had trouble before.
0 Likes 0 ·
TimothyAWiseman avatar image
TimothyAWiseman answered

There are a couple of really good answers posted up here, but let me add one. I wrote an article about something very similar to this a while ago at: https://www.sqlservercentral.com/articles/finding-values-with-numerous-columns

1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Thanks Pavel. Guess I forgot to do that.
0 Likes 0 ·
Mister Magoo avatar image
Mister Magoo answered
You might try SSMS tools as well - it has a built in data search that has options for searching for data at the start/end/contained/equal and tells you how many times it has found the search data and in which column...aside from all it's other great features. And no, I am not associated with the software, I just use it every day (not for searching)... You will find it here : www.ssmstoolspack.com
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

I missed this tool last time I re-imaged my machine. I had forgotten about the data search. Outstanding!
0 Likes 0 ·
Yes, it is an excellent add-in. I also love the way it keeps a history of all my queries (And also now saves even the ones I don't run!) - which I also synchronise with other machines so that it matters not where I am - I always have my sql history right there with me...but this is not the place for this...sorry!
0 Likes 0 ·
dvroman avatar image
dvroman answered
I've found that looking through stored procedures and views gives a good indicator of the table and link structures. This won't give you the total answer, but it will give you some pointers and may save in your searches.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.