I have been running into something that seems like a limit on the number of transactional push subscriptions on one distributor. After reaching this 'magic' number, initializing publications will just hang.
We have 2 databases published on our OLTP server. The distributor is a stand-alone server where it's only job is being a distributor. We have 3 subscriber servers all at the same location as the publisher and distributor. All servers are running Windows 2008, SQL 2008 SP1 CU3. All servers have 64 GB memory and 4 proc quad core processors. All of the publications are transactional one-way only. No Merge replication.
All the tables we replicate are split up into 23 publications per published database. Each publication has 69 subscriptions (46 push, 23 pull). In total there are 92 push subscriptions and 46 pull subscriptions. The reason for the pull subscriptions is to workaround this 'issue' of the initialization of publications hanging during initial setup. The snapshot agent hangs after reintializing a subscription.
Our management has expressed that we will need a 3rd published database setup sometime in the near future. I am afraid that I will have to setup ALL subscriptions as pull subscriptions which I am hesitant to do. Has anyone experienced something like this before? If there is not a limitation on the distributor about a max number of push subscriptions, where is my bottleneck? Where should I look for problems? If this might be a scaling issue where I need to make the distributor more beefy, does anyone know of best practices to scale a distributor?
Thanks for any help! I'm out of ideas myself.
-- 1/14/2010 I have a little more information about this.
To workaround the problem, we were creating pull subscriptions. I started moving the pull subscriptions back to push until I got the problem again.
After dropping the pull subscription and creating the push subscription and reinitializing the new subscription, the snapshot agent hangs in replication monitor. The first message for the snapshot agent is after 10 minutes when it says
"The replication agent has not logged a progress message in 10 minutes. This might indicate an unresponsive agent or high system activity. Verify that records are being replicated to the destination and that connections to the Subscriber, Publisher, and Distributor are still active."
The SQL Agent job history for the snapshot agent says this:
"Executed as user: Domain\srvacct. String or binary data would be truncated. [SQLSTATE 22001] (Error 8152). The step failed."
I'm wondering if this is a limit on the number of sql agent jobs allowed on the distributor? Or a limit on the number of replication agent jobs (distrib.exe) running on the distributor?
It appears there is indeed an arbitrary limit on the number of replication jobs that can be currently running on the distributor server. Our threshold number was about 120 continuously running subscriptions.
This Microsoft Connect ticket https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=273892&wa=wsignin1.0, describes the same issue and some workarounds. We were able to create more push subscriptions if we put them on a schedule instead of running continuously. We were able to create more continuously running subscriptions if we changed the SQL Agent job type from Distribution to OS Command.
This appears to be a bug that Microsoft does not intend to fix although there are a few workarounds.
answered Jan 19, 2010 at 01:45 PM
Whats your through-put? I ask as the machine sounds like it should be able to handle the amount of subscriptions no problem. I have had over 1000 on a server before now. I would look to see if you have I/O queues or buffer issues. It may even be that the network card/line you are using isn't up to the job.
Without tracing the issues all the way through the server, it is hard to say what the issue is. Unless you are seeing RESOURCE_SEMAPHORE waits, then it is not memory and can probably count that out from the start seeing as you have plenty.
Get a trace running to see where your bottleneck is and work from there.
answered Dec 22, 2009 at 07:13 AM