question

PCChick2011 avatar image
PCChick2011 asked

What is the best string manipulation software?

Do you think it's reasonable to use SQL to automate string manipulation when files coming in are never the same? I have used SAS, Excel, and FoxPro in the past to do this type data cleansing and just want to make sure I'm headed down the right path in learning the manipulation techniques in SQL. Things that would need to be performed: - cleansing & parsing addresses - calculations on dates/dates & times for employment records - merging of records - identifying duplicates - reporting a summary of data issues and resolutions. These are the most common things. Please any advice would be helpful on what would be a development software to use or if i should continue to learn the sql techniques necessary.
sql-server-2005
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image
Grant Fritchey answered
If I had to work like that, I'd use PowerShell. It gives you a ton of flexibility and it works with SQL Server directly very well.
4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

PCChick2011 avatar image PCChick2011 commented ·
really? I'll look into that. Do you know any good tutorials?
0 Likes 0 ·
ThomasRushton avatar image ThomasRushton ♦♦ commented ·
Can you recommend a good book on Powershell? It's something that's missing from my armoury!
0 Likes 0 ·
Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
The best book I know is the one by Don Jones, Powershell 2.0: TFM. http://www.amazon.com/Windows-PowerShell-2-0-Don-Jones/dp/0982131429/ref=sr_1_2?ie=UTF8&s=books&qid=1294231389&sr=8-2 Also, Don offers classes all over the country. He's an excellent instructor.
0 Likes 0 ·
PCChick2011 avatar image PCChick2011 commented ·
Thank you.
0 Likes 0 ·
Kev Riley avatar image
Kev Riley answered
This sounds like a job for SQL Server Integration Services (SSIS)
4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Kev Riley avatar image Kev Riley ♦♦ commented ·
Get yourself a good book - I'm still at the bottom of the learning curve for SSIS myself! But running through the examples in a good text has taught me lots. The ones I have are the 2005 Wrox one , and
3 Likes 3 ·
PCChick2011 avatar image PCChick2011 commented ·
I have attempted this in the past and couldn't figure it out. I tried creating a SSIS package but I couldn't figure out how to edit it once i had created it. I would love to use this because it's kind of like SAS where you can visually see the steps taken with objects of sorts on the diagram. I originally used the import wizard to create a package, but could not find the package after creation for editing. I am probably missing some fundamental piece of knowledge to get me going or maybe several.
0 Likes 0 ·
Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
Or go to training. I'd recommend PragmaticWorks for the training. They're excellent.
0 Likes 0 ·
PCChick2011 avatar image PCChick2011 commented ·
thanks for the comments. I will take a look at the links.
0 Likes 0 ·
ThomasRushton avatar image
ThomasRushton answered
[Datenfabrik][1] produce what looks like a good data cleanser for [Address][2] information, among other SSIS plug-ins. I'm not a customer, but I did see a demo of it at [SQLBits 7][3] last year. [1]: http://www.datenfabrik.com/ [2]: http://www.datenfabrik.com/Produkte/datenfabrik-address.aspx [3]: http://www.sqlbits.com/events/event7/SQLBitsVII.aspx
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

PCChick2011 avatar image PCChick2011 commented ·
hmmm..never heard of that. I'll check into it. Thanks.
0 Likes 0 ·
Fatherjack avatar image
Fatherjack answered
Where is the data being collected? Its fine and dandy to have PoSh, SSIS, Excel Macros, Python, Perl, C# and any number of other methods to catch and cleanse data but where it is collected is the best place to fix the issue. Get the UI improved to 'encourage' the user to enter valid, well formatted values that then support the database functions rather than distract effort from other areas just to clean up a mess. I know its not easy but I would seriously be reporting the time/effort spent on sorting the mess so that someone can see that a bit of time from a developer could make things permanently a lot smoother later down the data life-cycle.
4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

ThomasRushton avatar image ThomasRushton ♦♦ commented ·
Good point. And fixing the front end will also prevent users from dirtying up data that you've just cleaned...
0 Likes 0 ·
Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
I agree. Fixing it at the source is the best approach, but speaking from experience, where I work, for example, we have 30 year old systems that are not going to get fixed, ever, but we're still using them as a data source. Sometimes, you have to fix it after the fact.
0 Likes 0 ·
Magnus Ahlkvist avatar image Magnus Ahlkvist commented ·
+1 Great answer. I was like others thinking "What should I recommend as a tool for *fixing* the data, while you come up with what everyone should be thinking in the first place: *Why does the data need fixing*.
0 Likes 0 ·
PCChick2011 avatar image PCChick2011 commented ·
I totally agree with your points on fixing the data on the front end but unfortunately our business entails received data in excel spreadsheets, csv's, payroll systems, etc. where we have no control over the entry first hand. We don't have access to the front end or the user entering the data. In the future we plan to have the ability to direct the person exporting the data, but as of right now my hands are tied with the fact that I have to scrub the data on receipt and integrate it into our database. make sense? Hopefully now you can see where I'm coming from. Thanks for your comments.
0 Likes 0 ·
TimothyAWiseman avatar image
TimothyAWiseman answered
As other answers have made clear, there are lots of options for this. I run into this regularly, and I personally use either [Python][1] or IronPython. I personally find Python to be more intuitive and easier to use than things like PowerShell, more flexible than SSIS, and skips the compile step for languages like C#. Now, with that said, PowerShell, SSIS, and C# are all fine tools that I have also used for this purpose depending on the details of the situation. And in particular, if you want to do substantially string manipulation inside of SQL Server for some reason, then most likely your best option is to create a CLR assembly in C# or VB.NET and attach it to SQL server. So, I do not think there is a best string manipulation software all around. All other things being equal, I use and recommend Python, but there are certainly times when SSIS, PowerShell, C#, etc. are very good candidates. One of the key questions is, which of the many tools available are you most comfortable with? [1]: http://www.simple-talk.com/sql/database-administration/python-for-the-sql-server-dba/
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.