Sunday, October 10, 2010

new trends in data modelling...

This is actually a continuation of the earlier post. Although i haven't got the feedbacks and the answers i was seeking , the journey continues . I came across the following enlightening post from thoughtworks about the current scenary of data modelling , a great read indeed and a recommended reading to anybody interested in the topic.
Following are points i have gleaned from these and other topics :-
1. These are interesting times in history when the case for polyglot db architecture is slowly but surely breaking into the scene. The days of 1-size-fits-all solution of rdbms is over , now it is back to the future where each piece of data that an application interacts with has to be understood in context of its probable reads, writes , performance requirements, functional requirements etc. as, given the various sets of newer technologies that are maturing , there might be cases for various solutions other than rdbms .
As the article points out few of these cases are :-
i. static/lookup data (readOnly) - these are config data that are changed only at deploy times. Currently there are 2 solutions ti.e a) employed put them in .properties file and read them with java/cache them in jvm at load time or b) keep them in db if there is use case of having a configuration management software to update such data on the fly. So the data is mostly "read-only" or sometimes "mostly-read-only-rare-writes". So just to accomodate rare writes having to store them in a rdbms is an overkill.
ii. Semi-structured documents - This is a niche case , but doubt here is a universal one , how do you work with semi-structured data(undefined schema ) in a object-oriented language as java . I mean, to ask a very dumb question here , is how do u write the freaking models ?? and how do u write logic . Do u assume a schema with all possibilities ??
A related area is structured documents like xml that comes with additional facility to query. Its once again a judgement call of how much information can be left with the document itself and how much of it is to be extracted to separate db.
iii. Object store - Given the romance of ORM, where all u see are objects and by some magic they are persisted and whenever u see them u get 'em shiny objects . This is perfect if u never need to dissect the magic below by querying etc.
iv. Metadata or data that are to be heavily mined to glean information for reporting and other non-realtime requirements. The rigour of relational data here proves to be its undoing as the zillion of joins creates serious performance issues. Normalization were devised in an age where storage was expensive but given the situation today where storage comes rather cheap, its the response time that holds the key. Denormalized reporting tables is a traditional solution, and as the article describes graph db is suited for niche cases like social networking etc and i am sure there are other scenarios that can also be served by that approach.
Bottomline here is the store for such "non-realtime data " has to be segragated from the main transactional data so that these solutions can be implemented later.

This vision of future do have one caveat as i see it, the data of an application is distributed over many different systems and apart from very smart and versatile developers , you require a polished IT and deployment strategy to hold it all together.

And finally , to harness the power of these different technologies from a OO world , new paradigms are to be invented as also all these technologies have to come up with query language support/portability to survive and make it big