![]() If you don't believe me, download this correctly-formatted, standards-compliant UTF-8 CSV file and use MS SQL Server to calculate the average string length (i.e. The whole thing took perhaps 100 lines of code and three hours – two of which were spent getting to grips with SWIG, which was new to me at the time). This is especially baffling because CSV parsers are trivially easy to write (I wrote one in C and plumbed it into PHP a year or two ago, because I wasn't happy with its native CSV-handling functions. ![]() How did they manage to overcomplicate something as simple as CSV? MS SQL Server exporting broken, useless CSV.Don't trust anyone who doesn't know what these things are) They are fundamental concepts in literally every human-readable data serialisation specification. MS SQL Server throwing an error message because it doesn't understand quoting or escaping (contrary to popular belief, quoting and escaping are not exotic extensions to CSV.MS SQL Server's text encoding handling going wrong.MS SQL Server silently truncating a text field.Then, at some point, they see for themselves. Most people don't believe me when I tell them this. MS SQL Server can neither import nor export CSV. It makes sense: would you rather find out your import went wrong now, or a month from now when your client complains that your results are off?) (This may sound fussy or inconvenient, but it is actually an example of a well-established design principle. The slightest whiff of a problem and it abandons the import and throws a helpful error message. If PostgreSQL says your import worked, then it worked properly. Importantly, they will not silently corrupt, misunderstand or alter data. When an error occurs, they give helpful error messages. The COPY TO and COPY FROM commands support the spec outlined in RFC4180 (which is the closest thing there is to an official CSV standard) as well as a multitude of common and not-so-common variants and dialects. Let's not understate this: a data analytics platform which cannot handle CSV robustly is a broken, useless liability. In practice, this means that it needs to be able to ingest and excrete CSV quickly, reliably, repeatably and painlessly. All RDBMSes can dump data into proprietary formats that nothing else can read, which is fine for backups, replication and the like, but no use at all for migrating data from system X to system Y.Ī data analytics platform has to be able to look at data from a wide variety of systems and produce outputs that can be read by a wide variety of systems. CSV supportĬSV is the de facto standard way of moving structured (i.e. This section is a comparison of the two databases in terms of features relevant to data analytics. Why PostgreSQL is way, way better than MS SQL Server Do please use it if you wish I will do my best to respond.ĭISCLAIMER: all the subjective opinions in here are strictly my own. Maybe MS SQL Server kicks PostgreSQL's arse as an OLTP backend (although I doubt it), but that's not what I'm writing about here, because I'm not an OLTP developer/DBA/sysadmin.įinally, there is an email address at top right. I am comparing the two databases from the point of view of a data analyst. If I find out that I've got something wrong, I'll fix it. I have done my honest best to get my facts about MS SQL Server right – we all know it is impossible to bullshit the whole internet. I know it's not scientifically rigorous to do a comparison like this when I don't have equal experience with both databases, but this is not an academic exercise – it's a real-world comparison. Where I have made claims about MS SQL Server I have done my best to check that they apply to version 2014 by consulting Microsoft's own documentation – although, for reasons I will get to, I have also had to rely largely on Google, Stack Overflow and the users of the internet. Unless otherwise stated I am referring to PostgreSQL 9.3 and MS SQL Server 2014, even though my experience with MS SQL Server is with versions 2008 R – for the sake of fairness and relevance I want to compare the latest version of PostgreSQL to the latest version of MS SQL Server. This document is my way of automating that conversation. A well-known principle in IT says: if you're going to do it more than once, automate it. Over the years I have discussed the issue of PostgreSQL vs. I frequently come into contact with people who know very little about these things – although some of them don't realise it. I have spent that decade dealing with data, database software, database hardware, database users, database programmers and data analysis methods, so I know a fair bit about these things. I have been doing this for about a decade. I work as a data analyst in a global professional services firm (one you have certainly heard of). A comparison of two relational databases from the point of view of a data analyst 0.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |