Why is it considered a best practice to avoid inline SQL?
I have seen in a number of discussions the idea that it is a best practice to avoid inline SQL. Some seem to be of the opinion that inline SQL should be absolutely forbidden. However, most places that make this statement either contain no explanation for why or they have a simplistic statement that it makes the code brittle. So, is it truly a best practice to avoid inline SQL in all circumstances? If so, why? And if not, why is it so widely believed? --------------------------------- **Some Background** I know of discussions like the one at [StackOverflow]. But there many people seem to believe that it is acceptable without providing the other part of the story about why the belief is common. The ones that seem to consider it unacceptable generally use some form of the "brittle" statement without much explanation or a generalization that it is too much like hard coding (which is almost the same thing). I thought a group of SQL Experts might have a more detailed answer than the ones that were there. I also know quite well that there are times that ORMs make sense and times that stored procedures absolutely make sense. But it doesn't explain why so many developers consider inline SQL to be absolutely anathema. Thank you all for any insights. __________________________________________ **Conclusions** I gave this a few days and read over the answers as well as a few more [articles] on the topic. It seems the consensus is that there is no reason, in general, to consider inline SQL forbidden. There are of course reasons to make sure that any inline SQL is well written and [paramaterized]. There are also certain times in a project where you may want to enforce use of a ORM or force use of stored procedures, but that seems to be more of a project by project determination rather than a generalized best practice of always avoiding inline SQL. I accepted the highest voted answer, but all the answers were useful and informative. Thank you again. :
There are probably at least a dozen arguments on either side of this. I'll touch on a few that come right to mind after first dragging ORMs into it: In general, I prefer to develop with an ORM over hand-crafted SQL whether it's in-line or in a stored procedure. It sucks to write boilerplate data access code - I hope you generate yours either dynamically or up front. It's a pain to have to get SQL debugged and running in SSMS before pasting it into an application (the inefficiency/inconvenience of in-line SQL development argument). Where an ORM falls down - on very complex or large SQL statements - I fall back to the stored procedure approach both to reduce network traffic for very large queries and to hand craft code that is better tuned to the physical model for very complex queries. The very-long-SQL-statement-that-impacts-network-utilization argument against in-line SQL is only valid as an edge-case argument. Use stored procedures if this is really affecting you. The SQL-injection avoidance fallacy: this argument often works its way into in-line vs. stored procedure discussions but is really a red herring because if it's not parameterized, it's injectable, stored procedure or in-line. It even turns out that a stored procedure is injectable even when it **IS** parameterized if you have written dynamic SQL within the procedure. Amazingly enough, this means you could legitimately argue that in-line SQL is **LESS** injectable than a stored procedure. There is also the anti-in-line argument that, when the SQL needs tweaked, you need to rebuild/retest/and redeploy the application rather than just making the adjustment to a stored procedure. I tend to favor this argument because the shop in which I currently work makes these stored procedure-only adjustments several times per month across the dozens of internal systems they have built over the years. If you deliver software that is unlikely to need this level of frequent updates, the counter-argument would apply: if you change a query, you'll need to change the application too, so why bother with stored procedures? Looking forward to other answers!
I worked in a shop that went through a very rigorous PCI thing. One of the first requirements was that there was no in-line code. Everything was done through predefined stored procedures. All calls to those procedures were parameterized. Nowhere in the system was dynamic SQL allowed. The reason the PCI was over the top was that the database was compromised and many thousands of credit cards got out. The primary reason for the methods required was to eliminate SQL injection as much as possible.
edit: I got some information today which invalidates most of the arguments which I was given and which I was compelled to abide by. "Catch All Queries" by Gail Shaw [
http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/] shows an example of a fully parameterized query which is also dynamic SQL. :
As far as the server is concerned, SQL is SQL. A query is going to go through the optimization process, get stored in cache (depending), and then perform the actions it's defined to do. So that's the same with inline SQL or not. The real differences come down to three things and they all depend on how the inline SQL is built, not on whether or not you do it. First, are you parameterizing your queries or not. Yes, this is applicable to stored procedures, but that's not the issue. Most of the time when you see inline SQL it's a string compiled through code that is then sent to the server. The Bobby Tables issue is immediately in front of you. But, if you parameterize your queries, I'm fine. Second, how are you parameterizing your queries? I've mainly seen this from ORM tools, but I've seen other code do it to. If someone passes the string 'dog' the code sets up a parameter sized for three. Then, same query, but the string for the parameter is 'horse' and now a new value that is five is passed as the parameter. If the column is a varchar(50) you could have fifty seperate queries generated, and that's if we're only talking about a single parameter. More parameters, more plans generated (a very expensive operation in the server) and none reused. Third, let's ignore parameters and talk about how the code is written. Depending on how the strings are compiled, I've seen situations where, because of a different filter or two, the compiled strings have tables in different orders within the query, parameters in different orders, pretty much a mess. This has lead to horrific memory management issues in and around cache. Also, because of the difference in order in the tables, something that in a perfect world never makes a difference, but in the real world, it does, different plans for similar queries can lead to deadlocks. In short, I don't have a problem with inline code. I have a problem with how code is generated. Generate it correctly, and we largely don't have an issue. Oh, but a 3am when your queries are going all wonky, you get called, not me, or maybe, in addition to me. Because all the solutions are in the code for any type of query issues, that's just how that has to be.
Inline sql vs stored procedures Based on modern day databases and real world usage scenarios, I believe that stored procedure architects have serious downsides and little practical benefit. Thus using inline sql is of better benefit due to ways of manipulating code/data
The alternative to inline sql is **not** an ORM. If you want to use an ORM, away you go. There are valid reasons for preferring sql, and if you're going to use sql, by far the most civilized way to do this is to put the sql in its own file, and compile this file into the application. The sad truth is that this best practice is perhaps almost never used, because, in C# for example, you've got to grub into the properties to find the dropdown to change the build action, then look up the docs to remind yourself how to call GetManifestResourceStream(), then worry about the path, then remember how to access a stream etc. etc. But this could all be automated with a few hundred lines of code. So I automated it :-) in a Visual Studio extension, QueryFirst. You write your queries in their own file, which gives you intellisense for columns and tables, syntax validation, test execution, execution plans and the rest. When you save the file, my extension generates c# wrapper classes, so you never write a line of connection code, or command code, or parameter or reader code (or GetManifestResourceStream()). Your queries are all parameterized because there's no other way, and you have generated repositories and POCOs for unit testing, with intellisense for your input parameters and your results. And I've thrown in free steak knives... When an expert needs to come and rework your developer's sql, she's looking at a real sql file, and doesn't need to touch any C#. When she goes back and tidies up the DB, deleting some columns, references to missing columns jump straight out as compile errors in the c#. The database becomes just like another project in your solution that you can hack into, it's interface discoverable in code. If the solution compiles, you know that all your queries are working. There are no runtime errors due to invalid casts or invalid column names, or invalid queries. Martians can't believe that we construct SQL with string methods. We started making html like that, in 1994 or so. That lasted, what, a year? It's symptomatic of the incredibly low regard we have for the most successful computer language ever, now 40 years old and not a wrinkle :-) So there you go. If you've the cojones to look at an unkown data-access technology from an unheard of developer working in his spare time (with a young child and lots of other hobbies), [it's over here](