On a client workstation, I have just extracted a page of text from a PDF via OCR and stored the results as a CSV. If I parse the file into words, it's probably around 300 to 400 words -- around 1200 to 1800 characters, on average.
In an Internet-facing (non-local) SQL/Server 2008 database I have an indexed field called name. The name could contain one or more words. I could have hundreds or thousands of unique names (probably not more than 1000 in most cases).
I need the most efficient way to find all names that are contained in my CSV.
Since SQL/Server is not local, my idea is to pass the extracted data as a text field to a stored procedure. Then write an SP to the text into words, check if any of my names begin with that word -- and if so get the whole name and see if the next words in the name match. This means running around 300 to 400 simple queries, which should be fast on the server.
Anther idea would be to insert the list of words in my CSV into a SQL/Server table and then use a query to compare the fields for matches.
Or I could pull down all my names to the workstation into an array and do the matching there.
Is there a more elegant way than the ones I listed to accomplish my objective?