Wednesday, 20 September 2017

Why Are My Integrations Running So Slow?

We've just resolved an issue with one of our production servers which is taking an extremely long time to integrate some very small, infrequent messages. The integration scenario is as follows:-

Source System
  • Exports .XML files of between 1k and 5k at a rate of about one file per minute.
  • The files contain all information needed in the destination system so no enrichment will be required.
  • The maximum node depth in the message is five deep.

Destination System
  • Inserts the data present in the XML files into two SQL Server 2005 database tables through a single stored procedure.
  • Although the tables have primary keys and other indexes and have around one million records the performance of the insert stored procedure indicates that insertion time should be around one second.

Integration
  • A file system receive port is monitoring for .XML files on the source server.
  • A WCF-Custom / SQL port is executing the stored procedure on the destination server.
  • The source system, destination system and BizTalk server are all on the same LAN which is performing well.
  • The source system and destination systems are performing well.
  • The integration writes the message to a file send location as an archive as well as processing the message through a simple orchestration to write to the destination system.
  • The integration normally executes as soon as the file is written and completes in one to two seconds.

Issue
  • The file is being created in the source system folder but is not read by BizTalk for up to twenty minutes.
  • No errors are reported in the application event log.
  • No running instances are created until the file is read.
  • No suspended instances are created.
  • Other than the time delay, no other problems exist.
So, how do we diagnose this?

First of all I needed to eliminate problems with the source folder. What if the file system event is not firing?

Well, the folder in question is a sub-folder of an overall "integration" share. Another BizTalk server is happily reading from other sub-folders in that share. Furthermore, the folder is also used by the source application to process incoming files. Files written to this folder are instantly read by the application, indicating that folder performance is unlikely to be an issue.

So, the next problem is, if the source file system is not at fault, why isn't BizTalk picking up the file and removing it?

My next thought is that maybe the source application is holding a lock on the file so BizTalk can't access the file until the lock has been released. It seems unlikely - BizTalk is logging no errors or warnings - but it's something I am lucky enough to be able to check by dropping my own .XML file into the folder. Sure enough it is not picked up by BizTalk either. It looks like the source application is off the hook on this one.

So, next I move my accusing finger to point to the destination application. If BizTalk is reporting no errors, perhaps the message is being delivered successfully but is not visible in the destination application. That this is not the case can be show in a number of ways...

1) The file hasn't left the source directory
2) The archive version of the file is not created
3) No records exist when querying the database using tSQL
4) A SQL profiler trace shows no activity at all from BizTalk

Yup, the destination system is off the hook too. This is a BizTalk issue. So why isn't BizTalk logging the problem?

Well, maybe BizTalk isn't logging the problem because it can't. Is our server out of disk space? Does it have any other issues?

A quick trawl through the event logs, through Windows Explorer and Task Manager show the server to be in tip-top condition. Checking BizTalk Admin console and the event logs also shows that it's not throttling either. It's just running very, very, very slowly.

So we've eliminated the source, the destination and the BizTalk server. What's left? The back end database server. Maybe it's out of space, is running slow, has a problem?

Again, a quick trawl through the event logs, through Windows Explorer and Task Manager show the server to be in tip-top condition. Opening SQL Server Management Studio takes a matter of moments and everything looks to be performing well.

Our mystery is almost at an end folks. We're just a couple of clicks away from the solution.

SQL Agent. Responsible for running the archiving, cleaning up and general performance enhancing jobs for the SQL back end of BizTalk. It's running OK, the service is started. It's a shame that the jobs themselves are all owned by the guy that built the server. Who left a while ago. Who had his AD account deleted a while ago!

Fixing the jobs took a matter of seconds, changing the owner to a much more suitable setting. Each job runs successfully as I watch. And flipping back to the source system I notice the backlog of files has already been cleared.

The lesson? What we do is hard. But we can make it much, much harder by doing the basics badly!!

No comments:

Post a Comment