The Artful Dodgery of Estimation

Of the many necessary evils of development, I suspect I like estimation the least. Providing estimates of the number of hours to complete a task, or some set of tasks, is far more art than science. There is also a considerable amount of CYA involved: it’s a far safer bet to overestimate, and complete the project sooner, than to underestimate, and find oneself in the uncomfortable position of going cap-in-hand to the Project Manager to beg more hours.

It seems to me that most of the problem of estimation lies in methodology, or more accurately, the lack thereof. Not only is estimation mostly an art, but it’s mostly an art without ground rules. Tarot reading has more basis in science than most of the estimates I’ve provided. The majority of developers don’t keep detailed statistics on how long it takes to perform a particular task: writing a table script, say: and aren’t likely to start any time soon. The best efforts of management to force us down those channels: checklists, various integrated task generators, and so on: wind up being short-circuited, under-utilized, or flat-out ignored.

To compound matters still, there are estimates, and there are estimates. There’s something called a “Rough Order of Magnitude” estimate, in which one is given a fifty-thousand foot view of a fair guess as to the likely requirements, and expected to provide a fifty-thousand foot view of a fair guess as to how many hours it’s likely to take to satisfy same. Then, when the requirements are nailed down – or at least temporarily stabilized – we generate another estimate, which we estimate to be a better guess than the previous estimate, but not by much. Finally, when the technical specifications are complete, we can look at things at the task level, and provide what ought to be a reasonably accurate estimation of the time required for development.

Except it seldom is…is it.

And why is this? Much of blame lies on the nature of development itself. It’s never – despite the best efforts of many – matured into anything like an assembly-line process, and all the frameworks in the world won’t ever make it so. Yes, there are tasks that are common to every project: tables, constraints, indexes, stored procedures: but the commonality stops there. Every project brings its own devils into the details. It took an hour to write that stored procedure, which is what I estimated, but it turned out to be a performance dog, and so I had to spend four hours tuning it to some semblance of decent execution time. How to account for that four hours? Do we just rely on the old rule of thumb that says take your best guess, then double it? That’s not much of a methodology, is it?

Ah, there’s that word again: methodology. A methodology, simply enough, is a systematic approach intended to produce consistent results. When you follow the instructions in a recipe, you’re following a methodology, and you’re assured a consistent result each time you do so. But how many of us have a methodology for producing estimates? Not many, I’d guess. I don’t think I do: I think what I have is a grab-bag of rules of thumb, previous experiences, and stab-in-the-dark guesstimates. Sounds like the inmates running the asylum, non?

So my plan is to come up with one, and to document that process in these pages. Maybe I’ll come up with something good; maybe I’ll fall back on old ways. We’ll see. Meantime, if you’ve a methodology you’d like to share, or an idea, or just a comment, feel free.

Posted in Uncategorized | Leave a comment

On reusability and SQL Server

In my past life as a VB developer, I, like my peers, followed the mantra of reusability. Generic, we said, was good. Loosely coupled, parameter-driven code that made sound use of control flow was our goal.

When I crossed the Rubicon and became an SQL developer, I brought that mindset along with me, as did many (if not most) of my peers who likewise made the transition from the procedural to the declaritive. Many of us are still motivated by that reusability carrot, as evinced by the repeated use of (for example) optional parameters and the COALESCE keyword. The front-end folks say, “Make us a procedure that will take a date range, but let us pass empty parameters, and deliver to us ALL the rows if we pass NULL values.”

Easy for them to say. And easy to implement as well: just add that handy-dandy COALESCE in there (or CASE if you’re an SQL Server developer and you like to translate your own ISO keywords, thanks) and voila! We’re done. Next?

Not so fast. Sure, COALESCE will let you do this:

WHERE last_update >= COALESCE(@my_date, last_update)

but at what cost? A clustered index scan, that’s what, regardless of the number of rows you expect to return. Yes, the optimizer will take one look at this arrangement and elect to punt, and never mind what down it is. Two million rows in the table? Not my problem, bub.

That’s because in the declaritive world, generic does not necessarily equal good. SQL Server wants code that is specific to the task, and not particularly good for anything other than that task. The optimizer loves code tailored to an index. May I sniff your parameters, sir? You bet you can! And remember them the next time I want this data, will you?

We tweak our indexes, and ponder our primary keys, and agonize over the right way to cluster a table. But indexes and keys are not generic: they are specific to the table, and to the data, and ultimately to the goal of extracting information from that data. The same ought to be true of our queries and stored procedures: not to aim at reusability, or portability, or other wills-o’-the-wisp, but to craft our code carefully, to tailor it to the task, to remember that performance is our real goal, the real measure of our success.

Posted in SQL Server Development | Tagged | Leave a comment

On theory and practice

A conversation with a data modeler and a comment left on another blog led to this week’s commentary.

The former revolved around how a PK should be defined on a child table. The data model called for Parent_ID to be constrained as a FK in the child, and Child_ID to be the PK of the child table. I explained that from the POV of the SQL server optimizer, the child PK ought to be a compound of Parent_ID and Child_ID, in that order. This would permit the optimizer to choose a merge join for any query involving both tables, a join type that is fast and efficient (for the most part – there are exceptions). The data modeler agreed this was a reasonable idea, but we then wandered off into a discussion about this and that “best practice”, and whether or not such “best practices” were in fact practical at all, or whether they were based purely on philosophical principles that might or might not have any close relation to reality.

That same day, I came across a comment on a forum left by a well-known database developer, author of many books on the subject, quite the stickler for rules and standards, so much so that I privately refer to this individual as “the ISO Nazi”, because of the constant reference to the ISO SQL standards in this individual’s works and communications, and because this individual is in the habit of berating and mocking those who do not adhere religiously to the ISO party line.

There is, no doubt, a place in the world for SQL standards. It’s good to know that there is a set of keywords and concepts which appear in every flavor of SQL. At the same time, I think it crucial to acknowledge that, with some exceptions, most of us work with the same RDBMS day in and day out, and we have learned the idiosyncrasies of our tools, often to our advantage. Writing code that complies strictly with what the ISO considers as the way to do things may or may not produce the best results, and given that many of us spend our entire careers working with one vendor’s product, portability is usually not a driving consideration.

My interest in the ISO standard is, frankly, purely academic. When it comes to writing my code, and designing my objects, I consider that first and foremost I am an SQL Server developer, and I do my best to think like the optimizer. Doing so more often than not assures that I am coding to get the best performance out of the RDBMS, even if I am violating international standards and “best practice” monographs. It is performance and reliability that I owe to my users, not conformity to an ideal.

Posted in SQL Server Development | Tagged | Leave a comment

When standards become straitjackets

I am not a fan of the proliferation of coding standards. Don’t get me wrong: they have their uses. Database collation, ANSI settings, file locations, security practices: these are all sensible candidates for enterprise-level rules. But telling me that I must – not should, but must – capitalize my SQL keywords based on nothing more than someone’s unsupported (and unsupportable) contention that doing so somehow makes SQL code more readable is going beyond the pale. Likewise the assertion, once put forth by a colleague, that that which is not in the standard is eo ipso proscribed, and has no business appearing in one’s code.

All this is not to say that there are not preferred ways of doing things. Rather, if you’re going to say, “Do this, and don’t do that”, then you ought to have good reason for doing so, and those reasons ought to be enumerated, and backed up by demonstrable fact and reproducible evidence. Otherwise the standards seem capricious: do it this way because management says so.

I’ve heard the argument made that capriciousness is a necessary evil: that developers, particularly those whose careers are in their infancy, need the guidance of hard and fast rules so that they can contribute quickly to the team. Poppycock. We’re not talking about high school grads signing up for an apprenticeship: we’re talking about people who have completed four year degrees at accredited institutions of higher learning, or at the very least a two-year or three-year technical program. We’re talking about people with a demonstrated capacity to learn, and at least an above-average interest in exploring technical subjects. It’s condescending to assume they can’t understand why specifying the column names in a SELECT list is better than using the wildcard “*”, or why parameter sniffing can sometimes get you in trouble. Demonstrate, illustrate, illuminate: in a word, teach: don’t just dictate.

The worst offenders, I think, are standards that dictate not what your code does, but how it looks. Stylistic variations are condemned as apostasy; conformity for conformity’s sake is the goal. Do I care if you prefer “SELECT” to “select”, or vice versa? I do not. Nor do I care if you place your commas in front of or behind the column names, nor if you use “AS” before your aliases, nor if you put the joined table name on the same line as your join or on the next. I care that you know what an execution plan is, and how to read it, not that your code is symmetrical and aesthetically pleasing. I care that you put some effort into crafting that index, not what you named it.

Standards, carefully considered and properly administered, insure compatibility between databases, servers, and applications. Created reflexively and indiscriminately, they are the developing world’s equivalent of teaching a pig to sing: you will only frustrate yourself, and piss off the pig.

Posted in SQL Server Development | Tagged | Leave a comment

On getting better every day

I once had a conversation with a colleague regarding a script she had written for a one-time UPDATE and SELECT. I suggested some changes that I felt would improve the performance noticeably and save the DBA who would be running her code a few minutes of his/her life. To my great surprise (though in retrospect I should not have been), she demurred, and said that performance didn’t matter because it was a “one-off” and therefore not worthy of tuning.

I admit that I see her point. Why invest time and effort on a throw-away script? And I know many that would agree with her. But to me, it’s passing up an opportunity to get better. Every script or object I deploy gets scrutinized, because I never know when I’m going to see something that will lead me down the rabbit hole of discovery, into places I’ve never been, to execution plans I’ve never seen. I still remember the magical moment when I “got” merge joins, and understood the insight they provided into the structure and design of my tables. Making that connection – learning to think like the optimizer does, or at least get close – is my daily goal. I try to learn something every day, and the only way to do that is to question everything I do, and to never be satisfied with “it works”.

I admit that not everyone shares that passion. It’s disappointing sometimes to find that many just want their code “to work”, just want to move on to the next work order or project. I wonder if that is something in their character or if it’s more not knowing how to start critiquing your own work. It’s hard to be ruthless on yourself, but ultimately rewarding, and the satisfaction of knowing that you’ve learned something new – that you’re a little better at the end of the day than you were at the beginning – makes it all worthwhile.

Posted in SQL Server Development | Tagged | Leave a comment