¿Cuál es el papel de DW en un mundo NoSql? [cerrado]

Why would anyone keep both of these systems at the same time?

Biggest problem with DW is the expensive start up cost. It requires a good understanding of your data and business domain before you can break them up into facts and dimensions. Anytime along the process if your assumption had been wrong, you're tied to either leave things be or go through another arduous maintenance cycle. I have seen DW process never going anywhere due to this high overhead costs. Not to mention, if your DW guy leaves, it's extremely hard to train a replacement because the domain knowledge goes with him. Yeah it's like the classic waterfall process, rigid and brittle and usually unable to cope with changing requirement/business landscape.

NoSql on the other hand is agile. You can create your indices on the fly, as needed, in an ad-hoc manner. There is almost never any need to understand your data before you store them. And as your understanding improves, NoSql solution tends to scale well.

Given how easily NoSql can assume the role of a DW, but not the other way around, why bother with DW at all? How do you justify the expensive existence of the DW system when you already have a NoSql solution? Is there a place for the 2 to co-exist?

preguntado el 12 de septiembre de 13 a las 00:09

StackOverflow isn't for opinion-based questions like this. You make lots of assumptions in your question about what data warehouses are and how they work. Each of your points can be refuted and counter-examples provided, but SO is not the appropriate forum for discussing this. -

It's an honest question, I have seen a lot of SQL vs. NoSQL comparison even on SO. I haven't seen one for DW vs. NoSQL. Everyone make assumptions, I tried to make mine as academic as possible. Now I know only of 2 kinds of DW: tabular and dimensional. Tabular are usually in memory, and when dealing with in memory data, objects and OOP is by far the dominant and most successful paradigm. If so most NoSQL solution allows you to run query in memory w/ minimal object-relational impedance. I covered the dimensional aspect previously. Again what's the benefit of having both in your ecosystem? -

DW does not have to obey a waterfall methodology any more than NoSQL must obey the tenants of Agile. See Agile Data Warehousing Project Management: Business Intelligence Systems Using Scrum or any of another host of books on the subject. -

Please see the SO "about" page: "Avoid questions that are primarily opinion-based, or that are likely to generate discussion rather than answers." -

3 Respuestas

NoSQL is just the name of a set of database technologies and products and is of course one possible platform for data warehouse / decision support -type applications. So yes, NoSQL and DW can and certainly do co-exist.

However, you seem to be equating a data warehouse solution with dimensional modelling, which is simply one technique that can be used to create data marts or an OLAP presentation tier. DW and dimensional models are not the same thing.

I don't think there's much, if anything, about NoSQL that makes NoSQL systems more suitable for agile projects than other technologies. I'm sure there are many more people and teams using an agile approach with SQL DBMSs than with NoSQL!

On the other hand if you don't have or aren't able to retain business domain knowledge in your development teams then you have a management issue that no technology or project delivery approach is going to solve for you.

Respondido el 12 de Septiembre de 13 a las 05:09

I'd like to frame the last paragraph and upvote it a few times - billinkc

If you building a decision support system for the business you need to understand the business process, reagrdless of what platform you build it on.

It's an investment like anything else. You get a return on your investment if it is planned and executed properly.

I'm not a NoSQL expert but surely it answers different questions to a DW. For example how do you generate a Cost/Headcount figure out of a NoSQL database? How do you load boring row based cost info from the ERP and row based Headcount info from the HR system into NoSQL and have it perform?

Respondido el 12 de Septiembre de 13 a las 03:09

You think of an object model, push them into NoSQL db, let it build the indexes in the background. It's actually easier since you can use object model as intermediaries and not having to transform the object to fit different persistence model eg. SQL, DW etc. Multiple methods of query is also supported using familiar notation eg. SQL, LINQ, which are easier to integrate into services than either DAX or MDX. NoSQL also allows you to shard your data very easily so loading usually take seconds depending on the no. of hardware dedicated as opposed to minutes, so ditto on performance. - Alwyn

offline indexes, sharding, these are all just NoSQL terms for things that have already existed in relational databases for a long time. I agree with the comment below. Design issues and technology are only part of the problem. I find I waste a LOT of time on business politics and management issues. - Nick.McDermaid

No, RDBMS is a poor choice for sharding due to how easy it is to introduce cross server joins and that actually hurt performance than help. RDBMS is also slow for reporting, slow for insert, hence we have HOLAP/MOLAP, but NoSQL can encompass all this at the cost of ACID, which we don't care in DW anyways. I can see SQL vs. NoSQL, but what'd be the plus of having DW in addition to NoSQL? - Alwyn

Federated queries are a way of improving joins between distributed data but I see your point. I just can't think a 'Cost per Headcount' type KPI would work very well when you have millions of records stored in a EAV model (I may be showing my ignorance here) - Nick.McDermaid

In short, for data warehousing, I think that the relational / OLAP world has significant advantages, mostly because in many BI scenarios, you want to allow the users to explore the data, which is easy with the SQL toolset, and harder with NoSQL solutions. But when you get too large (and large in OLAP scenarios is really large), you might want to consider limiting the users’ options and going with a NoSQL solution tailor to what they need.

Desde: http://ayende.com/blog/4552/nosql-and-data-warehousing

I guess BI integration for end user would be the primary driver. From ease of integration/services/scalability PoV no sql takes the crown easily.

Respondido el 12 de Septiembre de 13 a las 20:09

BI and the user's needs must indeed be the driver. You need to consider the whole BI stack, not just the DBMS. SQL DW solutions can and do scale to 100s of Terabytes while providing the full range of self-service analytics. If you need Petabyte scale and/or don't need the same level of analytical and presentation tools then you might find some advantages in NoSQL. - nvogel

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.