x

Phil Factor SQL Speed Phreak Challenge #4 - The Log Parsing Problem

It must be that time again!

Phil Factor SQL Speed Phreak Challenge #4 is here, and our friends at Red Gate have offered some truly excellent prizes this time:

First Prize: $100 and a license for SQL Data Generator

There will also be three categorised runner-up prizes:

  • Best Cursor-based solution
  • Best CLR solution
  • Best Unusual/experimental attempt

The winners of these prizes will get a licensed copy of either SQL Prompt or SQL Refactor, whichever they prefer.

Of course, the first place entry will also have privilege and honour of displaying this rather nice trophy jpg on your blog / website / bedroom wall in perpetuity.

Speed Phreak Trophy

So, the task...

It turns out that Robert Bar has a younger brother. Reggie Bar was asked by his company to so some analytics work on their web logs. Reggie started by using the information in this article to use LogParser to import the data from his IIS logs into the database. So far, so good.

Reggie then got hold of a geolocation IP database from here. Again, so far, so good.

That’s where the good times ended, however. Reggie decided that the best way to approach this task was to use cursors, and even on a small amount of data the solution is taking longer and longer to execute. Management aren’t happy, they need a solution that is fast and won’t get hugely slower over time.

The data

The main data sources are two standard sources of information – the output of LogParser (in the table [dbo].[IISAudit]) and the geolocation IP address (in the table [dbo].[ip_group_country]). The data for both of these tables is kindly being hosted by Peso here. Also included in the data file are the DDL for the tables, the BCP commands to import the data, and Reggie’s solution to the problem.

The data (update)

There is now available another table IISAudit2, which is exactly the same as the old table, except for the fact that it adds a sequence number to the data, to aid sorting the rows in a consistent and repeatable manner (something you would probably do if loading this data in from your own IIS server anyway). Feel free to use this in your solutions if it makes it easier for you - the file is available (again kindly hosted by Peso) here.

The data (update 2)

Peso found a bug in Reggie's code, where the lookup on the ip_group_country got it wrong for one value because it used < rather than <=. New code, both data files, DDL & bcp commands, again kindly hosted by Peso, are here.

The output

The output required is three result sets:

  • Visitor summary by country by day
  • Visitor summary by country
  • Page summary by day

Note that in this challenge, the order of the data in the result sets does matter.

The visitor summaries include the following calculations:

  • VisitCount – this is the number of unique visits. A visit is one or more page requests from the same IP address within a set time limit. The time limit is configurable through the @SessionTimeout parameter in Reggie’s solution. Note that only pages with the extensions .asp, .aspx, .htm, .html and .php are considered as page views.
  • PageCount – this is the total number of pages viewed. Again note that only pages with the extensions .asp, .aspx, .htm, .html and .php are considered as page views.
  • MaxPagesPerVisit – this is the highest number of pages per visit.
  • AvgPagesPerVisit – this is the average number of pages per visit, to 2 decimal places.
  • MaxTimeOnSite – this is the maximum time that any particular visitor spent on the site. The time on site is defined as the difference between the first and last page load for a particular visit.
  • AvgTimeOnSite - this is the average time that any particular visitor spent on the site.
  • GoalCount – this is the number of times that any of a particular set of pages were loaded. A single visit may build up multiple goals, and viewing the same goal page multiple times counts as multiple goals.
  • BounceRate – this is the number of visits where only a single page was loaded, divided by the total number of visits, and expressed as a percentage to two decimal places.

The page summary includes the following calculations:

  • PageCount – this is the number of times that the page was loaded that day.
  • EntryRate – this is the number of times that the page was the first page of the visit, divided by PageCount, and expressed as a percentage to two decimal places.
  • ExitRate – this is the number of times that the page was the last page of the visit, divided by PageCount, and expressed as a percentage to two decimal places.
  • BounceRate – this is the number of times that the page was both the first and the last page of the visit, divided by PageCount, and expressed as a percentage to two decimal places.

Note that for the country summary by day report, the day to which a visit is added is the day on which the visit started.

To lookup the geolocation data for an IP A.B.C.D, you need to find the row in [dbo].[ip_group_country] that has the highest [ip_start] value that is less than or equal to:

(A * 16777216) + (B * 65536) + (C * 256) + D

The rules

The rules for the challenge are here.

The finishing date: Midnight GMT 28th February 2010

Submissions must be in by 28th Feb 2010

Bootnote

Reggie Bar's solution will only return the correct result from a table loaded directly from the source data. He really should have put an order by on the cursor select, shouldn't he? *cough*

Leaderboard

+---------------------------------+----------+----------+----------+----------+----------+----------+
| Name                            | Min      | Avg      | Med      | Max      | Total    | Runs     |
+---------------------------------+----------+----------+----------+----------+----------+----------+
| Matts CLR Not An Entry          |      292 |      376 |      301 |    1,807 |    7,522 |       20 |
| Peso 2c                         |    1,718 |    1,863 |    1,779 |    3,572 |   37,263 |       20 |
| Peso 2a (Invalid)               |    1,681 |    2,385 |    1,925 |    4,745 |   47,706 |       20 |
| Dave1b (Almost Valid!!)         |    2,259 |    2,466 |    2,269 |    5,967 |   49,334 |       20 |
| Dave1a (Invalid)                |    2,469 |    2,533 |    2,483 |    3,287 |   50,660 |       20 |
| Peso 3a (Invalid)               |    2,697 |    2,863 |    2,723 |    4,927 |   57,279 |       20 |
| Peso 1a 20100216 (Invalid)      |    2,855 |    3,020 |    2,866 |    4,037 |   60,417 |       20 |
| Phil 1b (Invalid)               |    3,743 |    4,013 |    3,865 |    6,788 |   80,275 |       20 |
| Phil 1a (Invalid)               |    5,040 |    5,422 |    5,172 |   10,274 |  108,451 |       20 |
+---------------------------------+----------+----------+----------+----------+----------+----------+

Result

Congratulations to Peso - his 2c entry is the winner!

more ▼

asked Feb 11, 2010 at 10:49 AM in Default

Matt Whitfield gravatar image

Matt Whitfield ♦♦
29.4k 61 65 87

A visit is one or more page requests from the same IP address within a set time limit...We should include POSTs as a page request, correct?
Feb 12, 2010 at 01:16 AM Scot Hauder
Scot - correct - any sort of activity from that same IP keeps the session going...
Feb 12, 2010 at 07:46 AM Matt Whitfield ♦♦
I wouldn't do this in reality, because an entire enterprise or ISP can share an IP address. You actually have to look closely at the UserAgent to work out individual visitors as well as the IP address, but I suspect that Reggie didn't know about that.
Feb 13, 2010 at 04:27 PM Phil Factor
Agreed - Ideally you'd get the session ID and use that...
Feb 13, 2010 at 06:40 PM Matt Whitfield ♦♦
Even looking at UserAgent isn't the ideal way Phil because in an enterprise they all tend to be on the same one. Session ID is a good alternative however if I tend to have a requirement for user tracking through a system I have some code that saves user details on a page load ( ASP.NET/C#) that stores details of the user logged in or otherwise which I can then correlate to the stats.
Feb 14, 2010 at 06:20 AM Ian Roke
(comments are locked)
10|1200 characters needed characters left

11 answers: sort voted first

Peso 2c 2010-02-23

Same setup and teardown as Peso 2a (single index).

The main code

SET NOCOUNT ON

DECLARE @SessionTimeout INT = 900

CREATE TABLE #Goal ( theURL VARCHAR(255), CONSTRAINT PK_Goal PRIMARY KEY CLUSTERED ( theURL ) )

INSERT #Goal ( theURL ) VALUES ('ASPNETPage29.aspx'), ('ASPNETPage9.aspx')

CREATE TABLE #Stats ( RowID INT IDENTITY(1, 1), CONSTRAINT PK_Stats PRIMARY KEY CLUSTERED ( RowID ), cIP VARCHAR(15) NOT NULL, DateStr CHAR(10) NOT NULL, PageTime DATETIME NOT NULL, isGoal TINYINT NOT NULL, theURL VARCHAR(255) NOT NULL, CountryCode CHAR(2) NOT NULL, theVisit INT, isBounce TINYINT, isEntry TINYINT, isExit TINYINT, [PageCount] INT, theTime INT )

INSERT #Stats ( cIP, DateStr, PageTime, isGoal, theURL, CountryCode ) SELECT i.cIP, i.DateStr, CAST(i.DateStr + ' ' + i.TimeStr AS DATETIME) AS PageTime, CASE WHEN g.theURL IS NULL THEN 0 ELSE 1 END AS isGoal, i.csUriStem AS theURL, igc.CountryCode FROM dbo.IISAudit2 AS i LEFT JOIN #Goal AS g ON g.theURL = i.csUriStem CROSS APPLY ( SELECT TOP(1) w.Country_Code FROM dbo.Ip_Group_Country AS w WHERE w.ip_start <= CAST(16777216 AS BIGINT) * PARSENAME(i.cIP, 4) + 65536 * PARSENAME(i.cIP, 3) + 256 * PARSENAME(i.cIP, 2) + PARSENAME(i.cIP, 1) ORDER BY w.ip_start DESC ) AS igc(CountryCode) WHERE PARSENAME(i.csUriStem, 1) IN ('asp', 'aspx', 'htm', 'html', 'php') ORDER BY i.cIP, i.Seq_No

DROP TABLE #Goal

DECLARE @IP VARCHAR(15), @PageTime DATETIME, @Visit INT = 0

UPDATE #Stats SET @Visit = theVisit = CASE WHEN cIP <> @IP THEN @Visit + 1 WHEN DATEDIFF(SECOND, @PageTime, PageTime) < @SessionTimeout THEN @Visit ELSE @Visit + 1 END, @IP = cIP, @PageTime = PageTime

UPDATE s SET s.isBounce = CASE WHEN q.minID = s.RowID THEN q.isBounce END, s.isEntry = CASE WHEN q.minID = s.RowID THEN 1 END, s.isExit = CASE WHEN q.maxID = s.RowID THEN 1 END, s.[PageCount] = CASE WHEN q.minID = s.RowID THEN q.[PageCount] END, s.theTime = CASE WHEN q.minID = s.RowID THEN q.theTime END, s.PageTime = q.theDate FROM #Stats AS s INNER JOIN ( SELECT w.theVisit, MIN(w.minID) AS minID, MAX(w.maxID) AS maxID, CASE MAX(CASE WHEN s.RowID = w.minID THEN s.theURL ELSE NULL END) WHEN MAX(CASE WHEN s.RowID = w.maxID THEN s.theURL ELSE NULL END) THEN 1 ELSE 0 END AS isBounce, COUNT(*) AS [PageCount], MAX(w.theTime) AS theTime, DATEDIFF(DAY, 0, MIN(theDate)) AS theDate FROM ( SELECT theVisit, MIN(RowID) AS minID, MAX(RowID) AS maxID, DATEDIFF(SECOND, MIN(PageTime), MAX(PageTime)) AS theTime, MIN(PageTime) AS theDate FROM #Stats GROUP BY theVisit ) AS w INNER JOIN #Stats AS s ON s.theVisit = w.theVisit GROUP BY w.theVisit ) AS q ON q.theVisit = s.theVisit

-- 1 SELECT CountryCode, PageTime AS DayOnly, COUNT(DISTINCT theVisit) AS VisitCount, SUM([PageCount]) AS [PageCount], MAX([PageCount]) AS MaxPagesPerVisit, CAST(1.0E * SUM([PageCount]) / COUNT(DISTINCT theVisit) AS DECIMAL(12, 2)) AS AvgPagesPerVisit, MAX(theTime) AS MaxTimeOnSite, SUM(theTime) / COUNT(DISTINCT theVisit) AS AvgTimeOnSite, SUM(isGoal) AS GoalCount, CAST(100.0E * SUM(CASE WHEN [PageCount] = 1 THEN 1 ELSE 0 END) / COUNT(DISTINCT theVisit) AS DECIMAL(12, 2)) AS BounceRate FROM #Stats GROUP BY CountryCode, PageTime ORDER BY PageTime, CountryCode

-- 2 SELECT CountryCode, COUNT(DISTINCT theVisit) AS VisitCount, SUM([PageCount]) AS [PageCount], MAX([PageCount]) AS MaxPagesPerVisit, CAST(1.0E * SUM([PageCount]) / COUNT(DISTINCT theVisit) AS DECIMAL(12, 2)) AS AvgPagesPerVisit, MAX(theTime) AS MaxTimeOnSite, SUM(theTime) / COUNT(DISTINCT theVisit) AS AvgTimeOnSite, SUM(isGoal) AS GoalCount, CAST(100.0E * SUM(CASE WHEN [PageCount] = 1 THEN 1 ELSE 0 END) / COUNT(DISTINCT theVisit) AS DECIMAL(12, 2)) AS BounceRate FROM #Stats GROUP BY CountryCode ORDER BY CountryCode

-- 3 SELECT uriStem, PageDay, SUM([PageCount]) AS [PageCount], CAST(100.0E * SUM(isEntry) / SUM([PageCount]) AS DECIMAL(12, 2)) AS EntryRate, CAST(100.0E * SUM(isExit) / SUM([PageCount]) AS DECIMAL(12, 2)) AS ExitRate, CAST(100.0E * SUM(isBounce) / SUM([PageCount]) AS DECIMAL(12, 2)) AS BounceRate FROM ( SELECT theURL AS uriStem, PageTime AS PageDay, 0 AS [PageCount], SUM(isEntry) AS isEntry, SUM(isExit) AS isExit, SUM(isBounce) AS isBounce FROM #Stats GROUP BY PageTime, theUrl

        UNION ALL

        SELECT      theURL AS uriStem,
                DateStr AS PageDay,
                COUNT(*) AS [PageCount],
                0 AS isEntry,
                0 AS isExit,
                0 AS isBounce
        FROM        #Stats
        GROUP BY    DateStr,
                theUrl
    ) AS d

GROUP BY PageDay, uriStem ORDER BY PageDay, uriStem

DROP TABLE #Stats
more ▼

answered Feb 23, 2010 at 09:24 AM

Peso gravatar image

Peso
1.6k 5 6 8

It's getting close now!!
Feb 23, 2010 at 06:34 PM Matt Whitfield ♦♦
I did compare with COMP and they are exact matches, using IISAudit2 table and the new code from Reggie with ORDER BY seq_no added to the CURSOR definition.
Feb 23, 2010 at 07:31 PM Peso
+1 Got this to work. Very fast good effort Peso. I can only dream of writing SQL like you! :-D
Feb 23, 2010 at 07:33 PM Ian Roke
(comments are locked)
10|1200 characters needed characters left

Ok, I'll go first then. I am sure this will give some ideas to other how to solve different tasks. This is "Peso 1a 20100216". Matt, start to time this to give some sort of baseline.

The setup is extensive

ALTER TABLE dbo.IISAudit
ALTER COLUMN    DateStr DATETIME

ALTER TABLE dbo.IISAudit
ALTER COLUMN    TimeStr DATETIME

ALTER TABLE dbo.IISAudit
ADD 	TimeSpan AS CAST(86400 AS BIGINT) * DATEDIFF(DAY, 0, DateStr) + DATEDIFF(SECOND, 0, TimeStr) PERSISTED

ALTER TABLE dbo.IISAudit
ADD 	pageTime AS DateStr + TimeStr PERSISTED

ALTER TABLE dbo.IISAudit
ADD 	ValidExt AS	CASE SUBSTRING(csUriStem, CHARINDEX('.', csUriStem), 5)
    				WHEN '.asp' THEN 1
    				WHEN '.aspx' THEN 1
    				WHEN '.htm' THEN 1
    				WHEN '.html' THEN 1
    				WHEN '.php' THEN 1
    				ELSE 0
    			END PERSISTED

CREATE INDEX    IX_ValidExt
ON  	dbo.IISAudit
    	(
    		ValidExt
    	)
INCLUDE 	(
    		cIP,
    		TimeSpan,
    		DateStr,
    		pageTime,
    		csUriStem
    	)

CREATE UNIQUE
INDEX   	UCX_IP
ON  	dbo.ip_group_country
    	(
    		ip_start
    	)
INCLUDE 	(
    		country_code
    	)

The teardown is just ridicolous

DROP INDEX  IX_ValidExt
ON  	dbo.IISAudit

ALTER TABLE dbo.IISAudit
DROP COLUMN ValidExt

ALTER TABLE dbo.IISAudit
DROP COLUMN pageTime

ALTER TABLE dbo.IISAudit
DROP COLUMN TimeSpan

DROP INDEX  UCX_IP
ON  	dbo.ip_group_country

ALTER TABLE dbo.IISAudit
ADD 	wTemp CHAR(10)

GO

UPDATE  dbo.IISAudit
SET wTemp = CONVERT(CHAR(10), DateStr, 120)

ALTER TABLE dbo.IISAudit
ALTER COLUMN    DateStr VARCHAR(255)

UPDATE  dbo.IISAudit
SET DateStr = wTemp

UPDATE  dbo.IISAudit
SET wTemp = CONVERT(CHAR(8), TimeStr, 8)

ALTER TABLE dbo.IISAudit
ALTER COLUMN    TimeStr VARCHAR(255)

UPDATE  dbo.IISAudit
SET TimeStr = wTemp

ALTER TABLE dbo.IISAudit
DROP COLUMN wTemp

And here is the main part

SET NOCOUNT ON

DECLARE @SessionTimeout INT = 900

CREATE TABLE #Goal ( theURL VARCHAR(255), CONSTRAINT PK_Goal PRIMARY KEY CLUSTERED ( theURL ) )

INSERT #Goal ( theURL ) VALUES ('ASPNETPage29.aspx'), ('ASPNETPage9.aspx')

CREATE TABLE #Stats ( theIP VARCHAR(15), visitID INT, pageDate DATETIME, pageTime DATETIME, theURL VARCHAR(255), isGoal TINYINT, Country_Code VARCHAR(2), theFirst INT, Bounce INT, MaxPages INT, theTime INT )

CREATE INDEX IX_Stats ON #Stats (theIP, visitID, pageTime)

INSERT #Stats ( theIP, visitID, pageDate, pageTime, theURL, isGoal, Bounce ) SELECT i.cIP AS theIP, i.TimeSpan / @SessionTimeout AS VisitID, i.DateStr AS pageDate, i.pageTime, i.csUriStem AS theURL, CASE WHEN g.theURL IS NULL THEN 0 ELSE 1 END AS isGoal, 0 AS Bounce FROM dbo.IISAudit AS i LEFT JOIN #Goal AS g ON g.theURL = i.csUriStem WHERE i.ValidExt = 1

DROP TABLE #Goal

DECLARE curYak CURSOR FOR SELECT l.theIP, l.visitID AS fromID, r.visitID AS toID FROM ( SELECT theIP, visitID, MAX(pageTime) AS pageTime FROM #Stats GROUP BY theIP, visitID ) AS l INNER JOIN ( SELECT theIP, visitID, MIN(pageTime) AS pageTime FROM #Stats GROUP BY theIP, visitID ) AS r ON r.theIP = l.theIP AND r.visitID = l.visitID + 1 WHERE DATEDIFF(SECOND, l.pageTime, r.pageTime) <= @SessionTimeout ORDER BY l.visitID

DECLARE @IP VARCHAR(15), @fromID INT, @toID INT

OPEN curYak

FETCH NEXT FROM curYak INTO @IP, @fromID, @toID

WHILE @@FETCH_STATUS = 0 BEGIN UPDATE #Stats SET visitID = @toID WHERE theIP = @IP AND visitID = @fromID

    FETCH  NEXT
    FROM   curYak
    INTO   @IP,
       @fromID,
       @toID
END

CLOSE curYak DEALLOCATE curYak

CREATE TABLE #Lookup ( theIP VARCHAR(15), Country_Code VARCHAR(2) )

INSERT #Lookup ( theIP ) SELECT theIP FROM #Stats GROUP BY theIP

UPDATE l SET l.Country_Code = igc.Country_Code FROM #Lookup AS l CROSS APPLY ( SELECT TOP(1) w.Country_Code FROM dbo.Ip_Group_Country AS w WHERE w.ip_start <= CAST(16777216 AS BIGINT) * PARSENAME(l.theIP, 4) + 65536 * PARSENAME(l.theIP, 3) + 256 * PARSENAME(l.theIP, 2) + PARSENAME(l.theIP, 1) ORDER BY w.ip_start DESC ) AS igc

UPDATE s SET s.Country_Code = l.Country_Code FROM #Stats AS s INNER JOIN #Lookup AS l ON l.theIP = s.theIP

DROP TABLE #Lookup

UPDATE f SET theFirst = a, MaxPages = t, theTime = DATEDIFF(SECOND, theMin, theMax) FROM ( SELECT theFirst, ROW_NUMBER() OVER (PARTITION BY theIP, VisitID ORDER BY PageTime) AS a, MaxPages, COUNT(theURL) OVER (PARTITION BY theIP, visitID) AS t, theTime, MIN(pageTime) OVER (PARTITION BY theIP, visitID) AS theMin, MAX(pageTime) OVER (PARTITION BY theIP, visitID) AS theMax FROM #Stats ) AS f

UPDATE s SET s.Bounce = 1 FROM #Stats AS s INNER JOIN ( SELECT theIP, visitID, MAX(CASE theFirst WHEN 1 THEN theURL ELSE NULL END) AS theURL FROM #Stats WHERE theFirst IN (1, MaxPages) GROUP BY theIP, visitID HAVING MIN(theURL) = MAX(theURL) ) AS d ON d.theIP = s.theIP AND d.visitID = s.visitid WHERE s.theURL = d.theURL

-- 1 SELECT Country_Code, PageDate AS DayOnly, COUNT(DISTINCT visitID) AS VisitCount, COUNT(theURL) AS [PageCount], MAX(MaxPages) AS MaxPagesPerVisit, CAST(AVG(1.0E * MaxPages) AS DECIMAL(12, 2)) AS AvgPagesPerVisit, MAX(theTime) AS MaxTimeOnSite, AVG(theTime) AS AvgTimeOnSite, SUM(isGoal) AS GoalCount, CAST(100.0E * SUM(Bounce) / COUNT(*) AS DECIMAL(12, 2)) AS BounceRate FROM #Stats GROUP BY Country_Code, PageDate ORDER BY PageDate, Country_Code

-- 2 SELECT Country_Code, COUNT(DISTINCT visitID) AS VisitCount, COUNT(theURL) AS [PageCount], MAX(MaxPages) AS MaxPagesPerVisit, CAST(AVG(1.0E * MaxPages) AS DECIMAL(12, 2)) AS AvgPagesPerVisit, MAX(theTime) AS MaxTimeOnSite, AVG(theTime) AS AvgTimeOnSite, SUM(isGoal) AS GoalCount, CAST(100.0E * SUM(Bounce) / COUNT(*) AS DECIMAL(12, 2)) AS BounceRate FROM #Stats GROUP BY Country_Code ORDER BY Country_Code

-- 3 SELECT theURL AS uriStem, pageDate AS PageDay, COUNT(*) AS [PageCount], CAST(100.0E * SUM(CASE WHEN theFirst = 1 THEN 1 ELSE 0 END) / COUNT(*) AS DECIMAL(12, 2)) AS EntryRate, CAST(100.0E * SUM(CASE WHEN theFirst = MaxPages THEN 1 ELSE 0 END) / COUNT(*) AS DECIMAL(12, 2)) AS ExitRate, CAST(100.0E * SUM(Bounce) / COUNT(*) AS DECIMAL(12, 2)) AS BounceRate FROM #Stats GROUP BY theURL, PageDate ORDER BY PageDate, theURL

DROP TABLE #Stats
more ▼

answered Feb 16, 2010 at 04:07 AM

Peso gravatar image

Peso
1.6k 5 6 8

Peso, nice job as always...are we allowed to update a table in the setup by adding columns and not have it count towards the execution time?
Feb 16, 2010 at 09:21 AM Scot Hauder
@Scot - I need to ask Phil to clarify the rules around that!
Feb 16, 2010 at 09:46 AM Matt Whitfield ♦♦
My feeling is that, ingenious though it is, Peso is bending the rules a little bit by doing so much computation in the set up section. The real-life example would include the import of the logs into the file, but we decided it was a bit too dull an activity to include in the competition. However, the time to load will be slower by the same amount as is gained by doing the computed columns. It will certainly shave a bit off his execution time. for this competition, I think we'll just carry on, because the rest of us can work the same tricks.
Feb 16, 2010 at 02:22 PM Phil Factor
I rather like experimenting with computed columns so it is nice to see it in the competition and it is part of the spirit of 'anything goes'. Would it be possible to run a timing with, and without the process of adding the two computed columns? I think it is a good idea in principle not to time the setup, but we're probably going to have to tighten up our definition of what is setup and what isn't for the next competition.
Feb 16, 2010 at 02:25 PM Phil Factor
The harness doesn't support adding timings for setup at the moment - and I probably won't have time to alter that because I'm on a contract with 7am starts... but we certainly could make two versions, one where that code is in the setup and one where it isn't...
Feb 16, 2010 at 02:37 PM Matt Whitfield ♦♦
(comments are locked)
10|1200 characters needed characters left

/*  Phil1a --runs in 3.5 secs on my machine not including set up and tear down.  Sorry, didn't spot the code for the goal stuff but as it seemed to be all  zeros in Reggie's results I just put a zero in. i'll add it for the next try  DECLARE @timer TABLE  (  Time_ID INT,  whatWasIt VARCHAR(30),  whenWasIt DATETIME DEFAULT GETDATE()  )*/

IF EXISTS ( SELECT * FROM tempDB.information_schema.tables WHERE Table_name LIKE '#CountrySummary%' ) DROP TABLE #CountrySummary CREATE TABLE #CountrySummary ( CountryCode varchar, DayOnly [datetime], VisitCount [int], [PageCount] [int], MaxPagesPerVisit [int], AvgPagesPerVisit [decimal](12, 2), MaxTimeOnSite [int], AvgTimeOnSite [int], GoalCount [int], BounceRate [decimal](12, 2) )

IF EXISTS ( SELECT * FROM tempDB.information_schema.tables WHERE Table_name LIKE '#CountryOverallSummary%' ) DROP TABLE #CountryOverallSummary CREATE TABLE #CountryOverallSummary ( CountryCode varchar, VisitCount [int], [PageCount] [int], MaxPagesPerVisit [int], AvgPagesPerVisit [decimal](12, 2), MaxTimeOnSite [int], AvgTimeOnSite [int], GoalCount [int], BounceRate [decimal](12, 2) )

----this is the main table used for processing the relevant log entries IF EXISTS ( SELECT * FROM tempDB.information_schema.tables WHERE Table_name LIKE '#session%' ) DROP TABLE #session CREATE TABLE #session ( Surrogate INT IDENTITY(1, 1), TheDateTime DATETIME, URIStem VARCHAR(20), TheIP VARCHAR(15), session INT DEFAULT 0, firstdot INT, Seconddot INT, Thirddot INT, BigIP BIGINT, CountryCode VARCHAR(2), IsLastHit INT DEFAULT 0, IsFirstHit INT )

IF EXISTS ( SELECT * FROM tempDB.information_schema.tables WHERE Table_name LIKE '#countries%' ) DROP TABLE #countries CREATE TABLE #Countries ( surrogate_id INT IDENTITY(1, 1), Previous_IP BIGINT DEFAULT 0, IP_start BIGINT CONSTRAINT PK_Goal PRIMARY KEY CLUSTERED, countryCode VARCHAR(2) )

IF EXISTS ( SELECT * FROM tempDB.information_schema.tables WHERE Table_name LIKE '#aggregate%' ) DROP TABLE #aggregate CREATE TABLE #aggregate ( [Pagecount] INT, [day] DATETIME, TheIP VARCHAR(15), Session INT, LastHit DATETIME, FirstHit DATETIME, CountryCode VARCHAR(2) )

-- INSERT INTO @timer (WhatWasIt) SELECT 'start'

DECLARE @SessionTimeout INT, --how many seconds to assume new session @Session INT, --session number @LastDateTime DATETIME, --the date/time of the previous hit @ThePreviousIP VARCHAR(15),--the IP of the previous hit @firstdot INT, --the place in the IP string of the first dot @Seconddot INT,--the place in the IP string of the Second dot @Thirddot INT,--the place in the IP string of the third dot @bigIP BIGINT,--the int rendering of an IP address @IsFirstHit INT

SELECT @SessionTimeout = 900, @Session = 0, @LastDateTime = '1 jan 2000', @ThePreviousIP = '' --firstly, we get the relevant parts of the log table in a lightweight --temporary table INSERT INTO #session (TheDateTime, TheIP, URIStem) SELECT TheDateTime, TheIP, csuriStem FROM (SELECT CONVERT(DATETIME, datestr + ' ' + Timestr, 120) AS TheDateTime, LEFT(cip, 15) AS TheIP, csuriStem FROM iisaudit WHERE RIGHT(csuriStem, 5) IN ('.aspx', '.html') OR RIGHT(csuriStem, 4) IN ( '.asp', '.htm', '.php') ) f ORDER BY TheIP, TheDateTime --and work out which seesion everything belongs to. This code relies --on a Quirky update

UPDATE #session SET @Session = session = CASE WHEN TheIP <> @ThePreviousIP OR DATEDIFF(Second, @LastDateTime, TheDateTime) > @SessionTimeout THEN @Session + 1 ELSE @Session END, @isFirstHit = isFirstHit = CASE WHEN TheIP <> @ThePreviousIP OR DATEDIFF(Second, @LastDateTime, TheDateTime) > @SessionTimeout THEN 1 ELSE 0 END, @LastDateTime = TheDateTime, @ThePreviousIP = TheIP, @firstdot = firstdot = CHARINDEX('.', TheIP, 1), @Seconddot = Seconddot = CHARINDEX('.', TheIP, @firstdot + 1), @Thirddot = Thirddot = CHARINDEX('.', TheIP, @SecondDot + 1), @BigIP = BigIP = CONVERT(BIGINT, 16777216) * (LEFT(TheIP, @FirstDot - 1)) + CONVERT(BIGINT, 65536) * (SUBSTRING(TheIP, @FirstDot + 1, @SecondDot - @FirstDot - 1)) + CONVERT(BIGINT, 256) * (SUBSTRING(TheIP, @SecondDot + 1, @ThirdDot - @SecondDot - 1)) + CONVERT(BIGINT, SUBSTRING(TheIP, @ThirdDot + 1, 20))

-- To lookup the geolocation data for an IP A.B.C.D, you need to find the row in [dbo].[ip_group_country] -- that has the highest [ip_start] value that is less than or equal to: -- (A * 16777216) + (B * 65536) + (C * 256) + D -- INSERT INTO @timer (WhatWasIt) SELECT 'start loading countries' --now we stock a country table in order to work out where the IP is from

INSERT INTO #Countries (IP_Start, CountryCode) SELECT ip_Start, Country_Code FROM ip_group_country ORDER BY ip_start --now we chamelessly do a quirky update to get the -- bigint version of the previous IP address -- INSERT INTO @timer (WhatWasIt) SELECT 'update countries'

SELECT @BigIP = 0 UPDATE #countries SET @BigIP = Previous_IP = @BigIP, @BigIP = IP_Start

--create index IPStartPreviousIndex on #countries(Previous_IP,IP_Start)-- --create index BigIP_CoverTheDateTime on #Session(BigIP,TheDateTime)

--we only want to do this once as it is costly so we put --it in a temporary table -- INSERT INTO @timer (WhatWasIt) SELECT 'do the aggregate'

INSERT INTO #aggregate SELECT COUNT(*), DATEADD(d, DATEDIFF(d, 0, [TheDateTime]), 0) AS day, TheIP, session, MAX(TheDateTime), MIN(TheDateTime), c.countryCode FROM #session s INNER JOIN #countries c ON BigIP BETWEEN Previous_IP AND IP_Start GROUP BY TheIP, session, DATEADD(d, DATEDIFF(d, 0, [TheDateTime]), 0), c.countryCode

-- INSERT INTO @timer (WhatWasIt) SELECT 'find last hits' UPDATE #session SET islasthit = 1 FROM #session s INNER JOIN #aggregate ON lastHit = theDateTime -- INSERT INTO @timer (WhatWasIt) SELECT 'report aggregates' --and now we create the result tables.

INSERT INTO [#CountrySummary] ([CountryCode], [DayOnly], [VisitCount], [PageCount], [MaxPagesPerVisit], [AvgPagesPerVisit], [MaxTimeOnSite], [AvgTimeOnSite], goalcount, [BounceRate]) SELECT countryCode, day, COUNT(*), SUM([PageCount]), MAX([PageCount]), AVG([PageCount] * 1.0), MAX(DATEDIFF(s, [FirstHit], [LastHit])), AVG(DATEDIFF(s, [FirstHit], [LastHit])), 0, SUM(CASE WHEN [PageCount] = 1 THEN 1 ELSE 0 END) * 100.0 / (COUNT(*) * 1.0) FROM #aggregate GROUP BY [CountryCode], day

INSERT INTO [#CountryOverallSummary] ([CountryCode], [VisitCount], [PageCount], [MaxPagesPerVisit], [AvgPagesPerVisit], [MaxTimeOnSite], [AvgTimeOnSite], [GoalCount], [BounceRate]) SELECT countryCode, COUNT(*), SUM([PageCount]), MAX([PageCount]), AVG([PageCount] * 1.0), MAX(DATEDIFF(s, [FirstHit], [LastHit])), AVG(DATEDIFF(s, [FirstHit], [LastHit])), 0, SUM(CASE WHEN [PageCount] = 1 THEN 1 ELSE 0 END) * 100.0 / (COUNT(*) * 1.0) FROM #aggregate GROUP BY [CountryCode]

Select * from [#CountryOverallSummary]

Select * from [#CountrySummary]

SELECT uriStem, DATEADD(d, DATEDIFF(d, 0, [TheDateTime]), 0) AS PageDay, COUNT(*) AS pagecount, CONVERT([decimal](12, 2), SUM(isFirstHit) * 100.0 / COUNT(*)) AS [EntryRate], CONVERT([decimal](12, 2), SUM(isLastHit) * 100.0 / COUNT(*)) AS [ExitRate], CONVERT([decimal](12, 2), SUM(CASE WHEN isFirstHit + isLastHit = 2 THEN 1 ELSE 0 END) * 100.0 / COUNT(*)) AS [BounceRate] FROM #session GROUP BY uriStem, DATEADD(d, DATEDIFF(d, 0, [TheDateTime]), 0) ORDER BY PageDay, uristem

/*-- INSERT INTO @timer (WhatWasIt) SELECT 'end' SELECT * FROM @timer*/

IF EXISTS ( SELECT * FROM tempDB.information_schema.tables WHERE Table_name LIKE '#CountrySummary%' ) DROP TABLE #CountrySummary

IF EXISTS ( SELECT * FROM tempDB.information_schema.tables WHERE Table_name LIKE '#CountryOverallSummary%' ) DROP TABLE #CountryOverallSummary

IF EXISTS ( SELECT * FROM tempDB.information_schema.tables WHERE Table_name LIKE '#session%' ) DROP TABLE #session

IF EXISTS ( SELECT * FROM tempDB.information_schema.tables WHERE Table_name LIKE '#countries%' ) DROP TABLE #countries

IF EXISTS ( SELECT * FROM tempDB.information_schema.tables WHERE Table_name LIKE '#aggregate%' ) DROP TABLE #aggregate
more ▼

answered Feb 16, 2010 at 08:45 PM

Phil Factor gravatar image

Phil Factor
3.8k 8 9 16

This runs in about 5 seconds on my machine which is hugely impressive. The Goal URL score could be put in the aggregation stage so it wouldn't be too much of an additional overhead.
Feb 17, 2010 at 10:18 AM Ian Roke
Thanks for that. I haven't done any work in optimizing it yet so there is a potential for shaving quite a bit off the timings. Last time I checked, the main report was exactly the same as Reggie's but the country reports were quite different.
Feb 17, 2010 at 01:41 PM Phil Factor
Phil - do you want me to split this into setup, main & teardown and add deletes from the temporary table to the beginning of main? Also with the results - could well be the ordering issue, because there are multiple page requests from the same IP in the same second, which idiot me didn't notice previously... I thought of adding an IDENTITY to the set after bulk loading - what do you think?
Feb 17, 2010 at 02:19 PM Matt Whitfield ♦♦

As this is the first cut, I'd say do the timings on the code as it is if is less trouble. We only want rough timings at this stage.

I'd say leave the import table as is now. I reckon it is good practice to take data in as near as dammit to what it was like when it was in the import file. It is then part of the task to cope with things like this. This is why I put in the surrogate key into the session table. The problem is finding a primary key that will actually add value. It doesn't help to use the surrogate, of course.
Feb 17, 2010 at 03:45 PM Phil Factor
I did the algorithm to find the source of the IP address using excel and even a pocket calculator, and I still get different results to Reggie, so I must have got my understanding of the process entirely wrong.
Feb 17, 2010 at 03:45 PM Phil Factor
(comments are locked)
10|1200 characters needed characters left

/*Phil1b --runs in 2.5 secs on my machine not including set up and tear down. Again forgot the code for the goal stuff. It is a lot more accurate this time, and where I've checked it against pesos and Reggie's code, mine is more accurate than either.

DECLARE @timer TABLE ( Time_ID INT IDENTITY(1, 1), whatWasIt VARCHAR(30), whenWasIt DATETIME DEFAULT GETDATE() ) */

IF EXISTS ( SELECT * FROM tempDB.information_schema.tables WHERE Table_name LIKE '#CountrySummary%' ) DROP TABLE #CountrySummary CREATE TABLE #CountrySummary ( CountryCode varchar, DayOnly [datetime], VisitCount [int], [PageCount] [int], MaxPagesPerVisit [int], AvgPagesPerVisit [decimal](12, 2), MaxTimeOnSite [int], AvgTimeOnSite [int], GoalCount [int], BounceRate [decimal](12, 2) )

IF EXISTS ( SELECT * FROM tempDB.information_schema.tables WHERE Table_name LIKE '#CountryOverallSummary%' ) DROP TABLE #CountryOverallSummary CREATE TABLE #CountryOverallSummary ( CountryCode varchar, VisitCount [int], [PageCount] [int], MaxPagesPerVisit [int], AvgPagesPerVisit [decimal](12, 2), MaxTimeOnSite [int], AvgTimeOnSite [int], GoalCount [int], BounceRate [decimal](12, 2) )

----this is the main table used for processing the relevant log entries IF EXISTS ( SELECT * FROM tempDB.information_schema.tables WHERE Table_name LIKE '#session%' ) DROP TABLE #session /*CREATE TABLE #session ( Surrogate INT IDENTITY(1, 1), TheDay int, TheDateTime DATETIME, URIStem VARCHAR(20), TheIP VARCHAR(15), session INT DEFAULT 0, firstdot INT, Seconddot INT, Thirddot INT, BigIP BIGINT, IsLastHit INT DEFAULT 0, IsFirstHit INT )*/

IF EXISTS ( SELECT * FROM tempDB.information_schema.tables WHERE Table_name LIKE '#countries%' ) DROP TABLE #countries CREATE TABLE #Countries ( surrogate_id INT IDENTITY(1, 1), Next_IP BIGINT DEFAULT 0, IP_start BIGINT, countryCode VARCHAR(2), CONSTRAINT idxClusterd PRIMARY KEY (IP_Start DESC) )

IF EXISTS ( SELECT * FROM tempDB.information_schema.tables WHERE Table_name LIKE '#aggregate%' ) DROP TABLE #aggregate CREATE TABLE #aggregate ( [Pagecount] INT, [day] INT, TheIP VARCHAR(15), bigIP BIGINT, Session INT, LastHit DATETIME, FirstHit DATETIME, LastSessionID INT, CountryCode VARCHAR(2) )

-- INSERT INTO @timer (WhatWasIt) SELECT 'start'

DECLARE @SessionTimeout INT, --how many seconds to assume new session @Session INT, --session number @LastDateTime DATETIME, --the date/time of the previous hit @ThePreviousIP VARCHAR(15),--the IP of the previous hit @firstdot INT, --the place in the IP string of the first dot @Seconddot INT,--the place in the IP string of the Second dot @Thirddot INT,--the place in the IP string of the third dot @bigIP BIGINT,--the int rendering of an IP address @IsFirstHit INT, @NextCountry VARCHAR(2)

SELECT @SessionTimeout = 900, @Session = 0, @LastDateTime = '1 jan 2000', @ThePreviousIP = '' --firstly, we get the relevant parts of the log table in a lightweight --temporary table SELECT IDENTITY( INT,1, 1 ) AS Session_ID, CONVERT(INT, 0) AS TheDay, TheDateTime, CONVERT (VARCHAR(20), csURIStem) AS URIStem, CONVERT(VARCHAR(15), TheIP) AS TheIP, 0 AS session, 0 AS firstdot, 0 AS Seconddot, 0 AS Thirddot, CONVERT(BIGINT, 0) AS BigIP, 0 AS IsLastHit, 0 AS IsFirstHit, CONVERT(VARCHAR(2), ' ') AS countrycode INTO #session FROM (SELECT CONVERT(DATETIME, datestr + ' ' + Timestr, 120) AS TheDateTime, LEFT(cip, 15) AS TheIP, csuriStem FROM iisaudit WHERE RIGHT(csuriStem, 5) IN ('.aspx', '.html') OR RIGHT(csuriStem, 4) IN ( '.asp', '.htm', '.php') ) f ORDER BY TheIP, TheDateTime --and work out which seesion everything belongs to. This code relies --on a Quirky update

UPDATE #session SET @Session = session = CASE WHEN TheIP <> @ThePreviousIP OR DATEDIFF(Second, @LastDateTime, TheDateTime) > @SessionTimeout THEN @Session + 1 ELSE @Session END, TheDay = DATEDIFF(d, 0, [TheDateTime]), @isFirstHit = isFirstHit = CASE WHEN TheIP <> @ThePreviousIP OR DATEDIFF(Second, @LastDateTime, TheDateTime) > @SessionTimeout THEN 1 ELSE 0 END, @LastDateTime = TheDateTime, @ThePreviousIP = TheIP, @firstdot = firstdot = CHARINDEX('.', TheIP, 1), @Seconddot = Seconddot = CHARINDEX('.', TheIP, @firstdot + 1), @Thirddot = Thirddot = CHARINDEX('.', TheIP, @SecondDot + 1), @BigIP = BigIP = CONVERT(BIGINT, 16777216) * (LEFT(TheIP, @FirstDot - 1)) + CONVERT(BIGINT, 65536) * (SUBSTRING(TheIP, @FirstDot + 1, @SecondDot - @FirstDot - 1)) + CONVERT(BIGINT, 256) * (SUBSTRING(TheIP, @SecondDot + 1, @ThirdDot - @SecondDot - 1)) + CONVERT(BIGINT, SUBSTRING(TheIP, @ThirdDot + 1, 20))

-- To lookup the geolocation data for an IP A.B.C.D, you need to find the row in [dbo].[ip_group_country] -- that has the highest [ip_start] value that is less than or equal to: -- (A * 16777216) + (B * 65536) + (C * 256) + D -- INSERT INTO @timer (WhatWasIt) SELECT 'start loading countries' --now we stock a country table in order to work out where the IP is from

INSERT INTO #Countries (IP_Start, CountryCode) SELECT ip_Start, Country_Code FROM ip_group_country

--now we chamelessly do a quirky update to get the -- bigint version of the previous IP address -- INSERT INTO @timer (WhatWasIt) SELECT 'update countries'

SELECT @BigIP = 0, @NextCountry = '' UPDATE #countries SET @BigIP = Next_IP = @BigIP - 1, @BigIP = IP_Start

--we only want to do this once as it is costly so we put --it in a temporary table -- INSERT INTO @timer (WhatWasIt) SELECT 'do the aggregate' INSERT INTO #aggregate SELECT COUNT(*), Theday, MAX(TheIP), MAX(BigIP), session, MAX(TheDateTime), MIN(TheDateTime), MAX(Session_ID), 0--, countryCode FROM #session GROUP BY session, TheDay --, countryCode --now do the countrycode UPDATE #aggregate SET countrycode = c.countrycode FROM #aggregate s INNER JOIN #countries c ON BigIP BETWEEN IP_Start AND Next_IP

-- INSERT INTO @timer (WhatWasIt) SELECT 'find last hits' UPDATE #session SET islasthit = 1 FROM #session s INNER JOIN #aggregate ON session_ID = LastSessionID -- INSERT INTO @timer (WhatWasIt) SELECT 'report aggregates' --and now we create the result tables.

/*INSERT INTO [#CountrySummary] ([CountryCode], [DayOnly], [VisitCount], [PageCount], [MaxPagesPerVisit], [AvgPagesPerVisit], [MaxTimeOnSite], [AvgTimeOnSite], goalcount, [BounceRate])*/ SELECT countryCode, DATEADD(d, day, 0), COUNT(*), SUM([PageCount]), MAX([PageCount]), AVG([PageCount] * 1.0), MAX(DATEDIFF(s, [FirstHit], [LastHit])), AVG(DATEDIFF(s, [FirstHit], [LastHit])), 0, SUM(CASE WHEN [PageCount] = 1 THEN 1 ELSE 0 END) * 100.0 / (COUNT(*) * 1.0) FROM #aggregate GROUP BY [CountryCode], day ORDER BY countrycode, day

/*INSERT INTO [#CountryOverallSummary] ([CountryCode], [VisitCount], [PageCount], [MaxPagesPerVisit], [AvgPagesPerVisit], [MaxTimeOnSite], [AvgTimeOnSite], [GoalCount], [BounceRate])*/ SELECT countryCode, COUNT(*), SUM([PageCount]), MAX([PageCount]), AVG([PageCount] * 1.0), MAX(DATEDIFF(s, [FirstHit], [LastHit])), AVG(DATEDIFF(s, [FirstHit], [LastHit])), 0, SUM(CASE WHEN [PageCount] = 1 THEN 1 ELSE 0 END) * 100.0 / (COUNT(*) * 1.0) FROM #aggregate GROUP BY [CountryCode] ORDER BY countrycode

--Select * from [#CountrySummary]

SELECT uriStem, DATEADD(d, DATEDIFF(d, 0, [TheDateTime]), 0) AS PageDay, COUNT(*) AS pagecount, CONVERT([decimal](12, 2), SUM(isFirstHit) * 100.0 / COUNT(*)) AS [EntryRate], CONVERT([decimal](12, 2), SUM(isLastHit) * 100.0 / COUNT(*)) AS [ExitRate], CONVERT([decimal](12, 2), SUM(CASE WHEN isFirstHit + isLastHit = 2 THEN 1 ELSE 0 END) * 100.0 / COUNT(*)) AS [BounceRate] FROM #session GROUP BY uriStem, DATEADD(d, DATEDIFF(d, 0, [TheDateTime]), 0) ORDER BY PageDay, uristem

-- INSERT INTO @timer (WhatWasIt) SELECT 'end' /*SELECT t.WhatWasIt, DATEDIFF(ms, t.WhenWasIt, y.WhenWasIt) FROM @timer t INNER JOIN @timer y ON y.time_ID = t.time_ID + 1 UNION ALL SELECT 'total', DATEDIFF(ms, (SELECT WhenWasIt FROM @timer WHERE WhatWasIt = 'start' ), (SELECT WhenWasIt FROM @timer WHERE WhatWasIt = 'end' )) */

IF EXISTS ( SELECT * FROM tempDB.information_schema.tables WHERE Table_name LIKE '#CountrySummary%' ) DROP TABLE #CountrySummary

IF EXISTS ( SELECT * FROM tempDB.information_schema.tables WHERE Table_name LIKE '#CountryOverallSummary%' ) DROP TABLE #CountryOverallSummary

IF EXISTS ( SELECT * FROM tempDB.information_schema.tables WHERE Table_name LIKE '#session%' ) DROP TABLE #session

IF EXISTS ( SELECT * FROM tempDB.information_schema.tables WHERE Table_name LIKE '#countries%' ) DROP TABLE #countries

IF EXISTS ( SELECT * FROM tempDB.information_schema.tables WHERE Table_name LIKE '#aggregate%' ) DROP TABLE #aggregate
more ▼

answered Feb 20, 2010 at 08:16 PM

Phil Factor gravatar image

Phil Factor
3.8k 8 9 16

This is definitely closer - There's a few foibles now (for example, country 'BA' in the country summary shows 9 visits in your results, and it should be 8), but definitely much closer! Also the ordering in your country by day summary should be day, country and not country, day... But this is definitely the closest :)
Feb 21, 2010 at 12:50 PM Matt Whitfield ♦♦
But is Phil allowed to enter his own competition? ;-p
Feb 21, 2010 at 03:25 PM Ian Roke
Of course! :) At the rate he's going, he's going to be the winner!
Feb 21, 2010 at 04:23 PM Matt Whitfield ♦♦
What is the correct ORDER BY clause for Reggies code, so we can compare our results? I get parallellism all over the places so it's hard to know.
Feb 21, 2010 at 05:50 PM Peso
C'mon you others. I'm feeling lonely. Isn't anyone else going to enter the competition?
Feb 21, 2010 at 07:15 PM Phil Factor
(comments are locked)
10|1200 characters needed characters left
more ▼

answered Feb 21, 2010 at 06:32 PM

Matt Whitfield gravatar image

Matt Whitfield ♦♦
29.4k 61 65 87

+1 Why is this not an entry? This is by far the quickest thing I have managed to run on the code. Very impressive! :-)
Feb 23, 2010 at 07:32 PM Ian Roke
Matt is hosting this competition...
Feb 23, 2010 at 07:45 PM Peso
Yeah, it doesn't feel right entering a competition that I'm hosting. Plus the setup leans heavily on the computed column concept introduced by Peso... I just want to show people that the CLR definitely does have a place in solving problems like this one (and that you can come up with much quicker solutions in terms of the time taken to implement a very fast solution - i.e. no need to do multiple revisions, program it right first time, job done.)
Feb 23, 2010 at 08:06 PM Matt Whitfield ♦♦
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x985
x343
x14
x8
x7

asked: Feb 11, 2010 at 10:49 AM

Seen: 6485 times

Last Updated: Feb 27, 2010 at 09:00 PM