I have a excel data source which has two coloumns (flg,div). I want to import data from excel data source, before importing i want to validate the coloumns. I have only one condition to validate the excel source. 1. It should have a DISTINCT div. flg div 1 sec a 0 sec b 1 sec c 1 sec d 1 sec a Suppose in the div coloumn there are two sec a, Instead of inserting 2 rows i want to insert only single row of sec a. Somthing like a DISTINCT functionality I guess we can achieve by using a script component to validate the excel sheet before loading. Can please anyone help me with this?? Or is there any other way ..please recommend.
You can use Open DataSource or OpenRowset SELECT DISTINCT * INTO XLImport3 FROM OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0', 'Data Source=C:\test\xltest.xls;Extended Properties=Excel 8.0')...[Customers$] SELECT DISTINCT * INTO XLImport4 FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0', 'Excel 8.0;Database=C:\test\xltest.xls', [Customers$]) SELECT DISTINCT * INTO XLImport5 FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0', 'Excel 8.0;Database=C:\test\xltest.xls', 'SELECT * FROM [Customers$]')
If you want to insert distinct values, then validating the excel for distinct values is wasting of processing power if you really want to inset them. For that purposes you can use a **Aggregate Transformation** to group the input data to receive distinct values and then insert them. There is also a possibility to use a **Lookup Transformation with partial caching** for detection whether particular data were already inserted or not and in case of duplicity simply ignore the duplicate rows. In case you want only validate whether there are some duplicities, again you can use Aggregate Transformation, as you can calculate counts and distinct counts and then compare those values to detect whether there are duplicities or not.