I’ve spent the last 20 years working with data and databases, more specifically relational databases. I even taught database development for a short while including the whole – what is normalisation, the work of Codd (1970) and getting students to develop a database to third normal form [link]
I now do normalisation without even thinking about it – not saying I’m great at it, but it’s pretty efficient.
So every time I’ve looked across at NoSQL databases they just frustrate me – it seems to be the exact opposite of how my brain thinks about storing data.
But recently I’ve been forced into using NoSQL for a personal project I am working on (I’ll share more soon). In short, I had two options – create my own backend for my project OR use Firebase including Firestore.
My few takeaways and lessons so far…
- No tables, instead it’s just a huge document repository with everything in each document (JSON style)
- Firestore has a 1mb document size limit so you can’t put everything in
- Firestore has 50k document reads per day (on Free Tier) which might not go as far as you’d expect
- Documents can include arrays – which have been pretty useful
- Querying seems easier in SQL -and maybe I’m wrong but it seems like you can’t do a WHERE and SORT on 2 different fields (i.e WHERE name = ‘bob’ ORDER BY age) [maybe I missed something]
- You really have to plan your data and storage structure carefully to minimise document reads (no joining of records)
- The transition from SQL to NoSQL isn’t as straightforward as I would have expected – but once you get there it’s pretty good.
- I’d love to see a good example of Twitter or Instagram done with NoSQL – the ones I’ve seen don’t seem to scale very well. I’m sure you can do it – I’d just like to see the examples.
More recently (today) I installed MongoDB and used it to store my financial data and some analysis (used earlier in this newsletter). To be fair it was a smooth process, and you can use MongoDB Compass as a GUI.
Anyways, to the point of NoSQL as a Data Scientist – none of my previous companies has used NoSQL – everything has been SQL, and the flavour is often MySQL (some MSSQL or Oracle). For large offline data, I have typically used HIVE and PrestoDB.
So you can probably get away with not learning NoSQL but at the same time, it might be a good opportunity to learn something new.