Dumb, Dumber and Dumbest - 3 Stupid Database Things
Last Friday Pete and I were talking about database stupidity. Not programming stupidity, though we weren’t wanting for a good example, but the woeful stupidity of the underlying database system. That conversation has started me off on a bit of a rant, which begins with the stupidy of Microsoft, and then heads through Oracle to finally show that MySql is Teh Stupidest Database Ever.
Contender #1: SQL Server
It begins when Pete & I learned that SQL Server will allow you do this:
CREATE UNIQUE INDEX test_idx ON test ( id ) IGNORE_DUP_KEY;
Wow. Yes, you read it right – SQL Server allows you to create a unique index with duplicate values. “There are no duplicates, except for the ones that are.” Jeez. Surely that’s the stupidest database ever, right? Surely there couldn’t be something dumber than that?
Actually, it can get worse. The house may be shoddy, but the real rot starts in the foundations…
Interlude – What IS NULL?
Normally, logical statements have 2 values: TRUE or FALSE. Databases use tri-state logic that allows a third value, NULL. NULL is a place-holder for, “we don’t know what value it is.”
Databases go a little further than that in fact, allowing both definite and NULL values for different types of data. So you can have a specific age, like 28, or a NULL age, meaning, “We don’t (yet) know this person’s age.”
But, but, but – NULL is not a real value. NULL isn’t like true or false or 9 or “Bobby McGee” – it’s a placeholder for the unknown. (Much like the bottom of someone’s handbag.) It doesn’t represent something specific. It represents the lack of specificity1.
So now we’re clear, on with the stupidity.
Contender #2: Oracle
Imagine I’m writing a military security database. One table records all the details of people allowed access to the secure information. So the Captain phones me up and specifies that, “We must know the rank of every person with access.” Fair enough says I, there’s a table called PERSON, and it will have a column called RANK. And the value for rank must be known. Rather handily, my database will support this demand:
CREATE TABLE person (
...
rank CHAR NOT NULL
...
);
See that ‘NOT NULL’ instruction there? Bob’s my uncle, and the database will automatically guarantee that we can’t create a PERSON record unless we specify their RANK. Step forward Oracle to bugger it all up.
The Captain, being an old pal of mine, gives me access to the secure information. I phone up his rather dishy secretary to get my details put in the database, and she asks me what my rank is. That’s easy – I don’t have a rank; I’m a civilian.
Here’s where Oracle fluffs it.
You see my rank, the rank of Kris Jenkins, is, was and probably always will be none. No rank. Rank = “”. It’s not that we don’t know my rank – my rank is not a matter for question or further investigation – we know it very clearly. I don’t have one.
Do you see the difference? Oracle doesn’t. The secretary dutifully puts in “” as my rank, and Oracle baulks because it thinks “” IS NULL. It’s not. “” is not an unknown value. It’s a very clearly-defined empty string. This is a fundamental flaw in Oracle.
I could bang my head against the table all day, but hopefully you see more clearly than the big O, and you’re ready to move on to the stupidest database in the world, ever.
Contender #3: MySql
Code says it all here. Just look at this lunacy:
mysql> create table test (
-> id integer not null,
-> description varchar(200)
-> );
Query OK, 0 rows affected (0.00 sec)
mysql> insert into test ( description ) values ( ‘Is this a load of rubbish?’ );
Query OK, 1 row affected (0.01 sec)
mysql> select * from test;
————————————————+
| id | description |
————————————————+
| 0 | Is this a load of rubbish? |
————————————————+
1 row in set (0.00 sec)Oh. My. God. It’s a database that actually says, “Damn! ID is a NOT NULL column. Let’s silently make something up!”
Make it up?!? You’re inventing data for me?! Why in the name of Alan Turing are you doing that2?
Mysql – kindly bugger off and leave us alone. You are Teh Stupidest Database Ever.
1 Incidentally, this is why your database will baulk if you say, WHERE NULL = NULL. We can’t say that NULL equals NULL because we don’t know what value the first NULL is, and we don’t know what value the second NULL is. It might turn out that the first is “5” and the second is “5”; or that the first is “5” and the second is “Zebra”. We simply don’t know, and until we find out we can’t say that this NULL = that NULL, or even this NULL != that NULL, but only that a particular NULL is null.
2 For the record, here’s how a real database reacts:
postgres=> create table test (
postgres(> id integer not null,
postgres(> description varchar(200)
postgres(> );
CREATE TABLE
postgres=> insert into test ( description ) values ( ‘Is this a load of rubbish?’ );
ERROR: null value in column “id” violates not-null constraintpostgres=> create table test2 (
postgres(> rank varchar(20) not null,
postgres(> description varchar(200)
postgres(> );
CREATE TABLE
postgres=> insert into test2 ( rank, description ) values ( ‘’, ‘My rank is well defined.’ );
INSERT 12389614 1
Heh. That NULL, eh?
...although now I'm not sure whether I mean 'this NULL'.
In which case you mean NULL NULL.![[grin] [grin]](/sites/all/modules/smileys/packs/jenkster/biggrin.gif)
I didn't realise Oracle was so stoopid and I worked on Oracle for four years!
Flat files it is, then
Erm...that wasn't quite the message we wanted you to take from this, Pete.![[tongue] [tongue]](/sites/all/modules/smileys/packs/jenkster/tongue.gif)
See, I just don't go near the stuff. It's explosive.
Hmm - I am afraid I must disagree a bit. Once you set the NOT NULL constraint, Oracle is handling the situation correctly. Your rank is not blank, it is CIVILIAN or UNCLEARED or BIN_LADEN_FAMILY_MEMBER or some other actual, defined value that correctly describes you to that attribute.
In other words, the business rule is: people accessing the database must have a rank. A perhaps unstated correlary of that is "there must be a rank that corresponds to every type of person in the world". You haven't created all the members of that set, so the application rightly barfs.
sPh
I think you're just coding around the fundamental problem. And it's a good workaround - exactly the same one I'd use.
Try this for a different example: We're writing a Kevin Smith database. We need a column that holds everything Silent Bob says in a given scene. We already know what he says, beyond doubt. It's NOT NULL, because it's perfectly well known, and in most scenes it's, "".
First of all, I agree with sPh on the Oracle. Your rank is known, and its CIVILIAN (or something).
Furthermore, if your rank is legitimately NULL, well then your data model is wrong if you don't allow for that, right?
You didn't mention the MS SQL/Sybase blend. I love the way they have a configuration variable "ansinull" so you can actually decide to make your database non-ANSI-Compliant with regards to NULL.
http://sybasease.blogspot.com/2005/05/nulls-in-sybase-ase.html
Sorry Robert, but I can't agree. CIVILIAN isn't a rank. You can't get promoted or demoted to the rank of CIVILIAN. No-one in the history of the military has been a CIVILIAN. You're using a magic value to work around the Oracle problem. And it's a good workaround when you're using an Oracle database, but it's not really modelling reality.
Don't get me wrong - I'd do exactly what you're suggesting. And I think Oracle's a great database. But it fluffs it up at this one point and it's always irked me.
Nope. If you accept that NULL means, "unknown", then my rank is not legitimately unknown. As I said in the post, it's perfectly well known. It's empty. No-thing is not the same as unknown-thing.
Put it this way: "" is the string equivalent of 0.
How many times have you and I met face-to-face? It's not unknown, it's 0.
And what have we said during those meetings? It's not unknown, it's "".
"" is an *empty* string and so contains nothing.
NULL is nothing
so "" = NULL.
You guys are attaching too many symantics to NULL. I don't really think it means "I don't know". I think it means there is NOTHING THERE.
Need to store "CIVILIAN" in your military PERSON database table? Redesign it. Sheesh.
Take a field with 1 sheep in it. There is 1 sheep in the field. Take that sheep away. There are now 0 sheep in the field. It's not NULL; nothing magical has happened; nothing has gone into an unknown state. There are zero sheep.
Take a string with 1 character in it. There is 1 character in the string. Take that character away. There are now 0 characters in the string. It's not NULL; nothing magical has happened; nothing has gone into an unknown state. There are zero characters.
Frank, I completely agree with you.
Kris, NULL is NOT NOTHING, it is unknown.
That's why NULL != NULL, and: if NULL=NULL evaluates to NULL (in Oracle, that is), and not to TRUE.
As for your field: the number of hedgehogs is NULL.
Because I cannot see your field, I cannot positively state the number of hedgehogs (frogs, people, horses) is 0. That would require knowledge. NULL is unknown, "don't know", "no knowledge of it".
And Oracle does flaw the ANSI standards with regard to empty strings being evaluated as NULL, indeed.
Guess it breaks too much fixing it. That means you will have to redesign your implementation of the business model, due to restrictions in the engine. Done at more than one occasion (know any database that supports M:M relations?).
Suppose instead of counting sheep, you store a row for every sheep in a SHEEP_IN_THE_FIELD table. You start out with 1 row/sheep. Take that sheep away. There are now no sheep in the field, and the SHEEP_IN_THE_FIELD table has no rows: is that a count of ZERO or NULL?
Also, do no records in SHEEP_IN_THE_FIELD mean you haven't bothered to count any sheep yet, or there are none?
And regarding DBMS quirks -- if the rules are known, then they can be dealt with. It's when you don't know them, assume it's behavior you are familiar with, that you get that fabulous school-o-hard-knocks experience.
Interesting question. Assuming the database is kept up-to-date, then the empty table would be 0 sheep. If the table hasn't been created, then the number of sheep is unknown by the system, and hence NULL. And if you haven't bothered to keep the database up-to-date, then that's a whole different issue.
As for the DBMS quirks argument, I don't really buy it. It's great to have tolerance and understanding in the fields of religion, music and art, and even in software design, but the rules of tri-state logic are as defined and rigorous as the rules of addition and multiplication. You can bend them for preference. Mathematics is one area of study where we can still say, "This is absolutely correct. And that is demonstrably wrong."
Oracle got it wrong. I can deal with that. I deal with it every day and it very rarely bothers me. And I'm not proposing we change it because it would be impractical. But that doesn't make Oracle quirky or characterful. It remains logically incorrect.
I thought of the SHEEP_IN_THE_FIELD table when considering the statement "There are now 0 characters in the string. It's not NULL; nothing magical has happened; nothing has gone into an unknown state. There are zero characters." -- specifically I wondered what would happen if the string had to be stored character-by-character in a separate table.
Would one ever store a NULL sheep row in the table?
I'm not sure how you concluded that "Oracle got it wrong" from the statement that "Mathematics is one area of study where we can still say, 'This is absolutely correct. And that is demonstrably wrong.'". Mathematics can make a number of definition statements. Neither Euclidean nor non-Euclidean geometry is wrong, but they have different starting assumptions/definitions and areas of modeling applicability.
Similarly, I'm not convinced that "Oracle got it wrong", rather they defined things a bit differently than the ANSI committee. Is this really a violation of 3-value logic or just a convention? Having worked with Oracle for 8 years now, having a zero-length string being automatically called NULL seems reasonable, and having to test for the condition of either zero-length strings Or NULLs (as I did recently for a DB2 query) irksome. I really don't know which came first: Oracle's implementation or the ANSI standard.
Actually, I think your table design was flawed. You insisted in storing a rank with a NULL value, which means that the rank is unknown.
If you would be able to store a zero length string, than that would be ‘the rank without a name’... Which seems like a rank to me![[wink] [wink]](/sites/all/modules/smileys/packs/jenkster/wink.gif)
If you really want to satisfy the requirement, you would need two tables – one for PERSON and another for RANK, (related to PERSON): The fact that you don’t have a rank, should be deduced from the absence of a row in the ranks table.
Cheers.
That’s an interesting approach Mel, but I’m not sure how far you’d take it. Would you do away will all NULLable columns? Or if not, when would you allow them, and what would they mean?
To be honest, I just wanted to show you how to meet this specific requirement. I think that in most cases, you can get away using nullable columns.
Most of these ambiguity problems can be solved by sticking to a convention. Any convention will do, as long as everybody on your team agrees on using the same convention. Make sure you include conventions in your documentation and everybody will be all right.
Oracle’s behavior is documented, so developers know what to expect. You are correct that this behavior differs from the ANSI standard – conflicting conventions, which is where the discussion got started.
Personally, I think that Oracle’s implementation requires me to write less code to maintain data integrity. With less code, there’s less chance for bugs creeping in.
In the end, it all depends on the requirements.
So Oracle RANK NOT NULL DEFAULT ‘0’ —this covers you as a civilian :)
Postgres what’s in the NOT NULL rank field for empty string ‘’ an unprintable char ascii 0?
SQLServer that’s just ridiculous not forcing elimination of non-unique rows first….
Sorry Dave, but you’re wrong. ‘0’ isn’t a rank, it’s an arbitrary magic value. Having a magic value to mean “no rank” is just a workaround. Perfectly practical, but not perfectly correct.
The empty string can be represented several ways. Two that spring to mind are:
If memory serves, the first approach is used in C, the second in Pro*C. So, yes, Postgres probably does do it the way you suggest.
And you’re right, SQLServer’s nuts.![[smile] [smile]](/sites/all/modules/smileys/packs/jenkster/smile.gif)
I in my opinion the article author wins in this competition hands down (with MySql as the pretender).
For MS-SQL – this feature reflects very common real-life business requirement “we have some dirty data (e..g from a legacy system)that we are unable to clean right now, but do not allow them to be made even dirtier”. BTW, Oracle has the similar (even more advanced) feature, while better worded: alter table … add constraint … ENABLE NOVALIDATE.
As for Oracle – yes, it is bad that empty string is indistinguishable from NULL (and violates SQL standard BTW), but your explanation is completely wrong – empty string is a magical value for “no rank” situation – neither worse, nor better then any other magical value (e.g. “0”). The real cause of the problem is known deficiency of the SQL standard, namely 2 special values shall be defined – “no value” and “value unknown”. They are completely different. IF you consider your rank example – first reflect you (it is definitely known that you have no rank) the second reflect caught military spy – we definitely know that he has some rank, but we do not know which one.
Kris,
Your assumptions are simply inaccurate as to meaning of null. NULL in Mathematics (which is what RDBMS, and the System-R paper are based upon), and in Oracle, and in the ANSII board’s definition for databases, is defined as the “absence of value”. It is NOT nothing, or “we don’t know what value it is”, or “unknown”, or “I don’t know”, or “hasn’t been determined yet”, or “the lack of specificity”, or “an unknown value”, or even “an empty string”, if empty string is one or more blank spaces, as in ‘ ‘ or ASCII code 32. And, NULL is not zero as in your sheep example, since zero is a value, as in 1-1=0. There is no logical reason that “absence of value” can not be known in advance, as in your Silent Bob example. But, it doesn’t change the fact that NULL the “absence of value”. Which is not true of the example you offered for no rows —- the number of rows in a table at any given time is a precise numeric value with a minimum value of zero.
NULL in Mathematics (and database definitions) is not the same as NULL in programming definitions —- and this goes back long before ANSII boards were formed, much less issued standards. Is this an example of semantics? Of course it is, especially since we have many ways to “say” what we mean.
NULL columns in Oracle do not necessarily consume any storage if there are no more columns with values after them. Example: If you have a 10 column table with the first four columns being defined as NOT NULL, and you inserted a row with values for only the first four defined columns, then Oracle would consume zero storage for columns 5 through 10 and the RDBMS would know this when reading the row because after column 4 there would only be an “end-of-row” indicator.
Respectfully,
RB
Post new comment