I have been running into something that seems like a limit on the number of transactional push subscriptions on one distributor. After reaching this 'magic' number, initializing publications will just hang.
We have 2 databases published on our OLTP server. The distributor is a stand-alone server where it's only job is being a distributor. We have 3 subscriber servers all at the same location as the publisher and distributor. All servers are running Windows 2008, SQL 2008 SP1 CU3. All servers have 64 GB memory and 4 proc quad core processors. All of the publications are transactional one-way only. No Merge replication.
All the tables we replicate are split up into 23 publications per published database. Each publication has 69 subscriptions (46 push, 23 pull). In total there are 92 push subscriptions and 46 pull subscriptions. The reason for the pull subscriptions is to workaround this 'issue' of the initialization of publications hanging during initial setup. The snapshot agent hangs after reintializing a subscription.
Our management has expressed that we will need a 3rd published database setup sometime in the near future. I am afraid that I will have to setup ALL subscriptions as pull subscriptions which I am hesitant to do. Has anyone experienced something like this before? If there is not a limitation on the distributor about a max number of push subscriptions, where is my bottleneck? Where should I look for problems? If this might be a scaling issue where I need to make the distributor more beefy, does anyone know of best practices to scale a distributor?
Thanks for any help! I'm out of ideas myself.
Brannon
-- 1/14/2010 I have a little more information about this.
To workaround the problem, we were creating pull subscriptions. I started moving the pull subscriptions back to push until I got the problem again.
After dropping the pull subscription and creating the push subscription and reinitializing the new subscription, the snapshot agent hangs in replication monitor. The first message for the snapshot agent is after 10 minutes when it says
"The replication agent has not logged a progress message in 10 minutes. This might indicate an unresponsive agent or high system activity. Verify that records are being replicated to the destination and that connections to the Subscriber, Publisher, and Distributor are still active."
The SQL Agent job history for the snapshot agent says this:
"Executed as user: Domain\srvacct. String or binary data would be truncated. [SQLSTATE 22001] (Error 8152). The step failed."
I'm wondering if this is a limit on the number of sql agent jobs allowed on the distributor? Or a limit on the number of replication agent jobs (distrib.exe) running on the distributor?