Thursday, 8 December 2016

Embeeded vs Reference approach in MongoDB

In General, there are 2 ways to maintain a relation between entities in Mongo :

1. Embedded approach (Strong association- By default) : The document will be persisted in the same collection. Read operation will result better performance as compared to Relational database as we don't require join operation at all but this approach should be avoid in case of

⦁    weak association
⦁    More data growth- i.e.  when the document fields has more chances of growth  i.e. frequent insert operations such as pushing elements to an array or adding new fields.
⦁    In Many to Many association

However Put as much in as possible by considering the document size .MongoDB imposes a 4MB (16MB with 1.8) size limit on a single document.In general  any data that is not useful apart from its parent document should be part of the same document.
The another good news is that we can lazily load the data also.

However one more disadvantage of embedded approach is that An unique ID (_id) is not set on nested subdocuments by default it is only only on root docuemnts.

MongoDB CRUD operations (insert, update, find, remove) all operate on top-level documents exclusively -- although of course you can filter by fields in embedded documents. Embedded documents are always returned within the parent document.

The _id field is a required field of the parent document, and is typically not necessary or present in embedded documents. If you require a unique identifier, you can certainly create them, and you may use the _id field to store them if that is convenient for your code or your mental model; more typically, they are named after what they represent (e.g. "username", "otherSystemKey", etc). Neither MongoDB itself, nor any of the drivers will automatically populate an _id field except on the top-level document.

Specifically in Java, if you wish to generate ObjectId values for the _id field in embedded documents, you can do so with:

someEmbeddedDoc._id = new ObjectId();

Therefore while performing any CRUD operation we have to explicitly take care of generating the unique id for all the subdocuments.

Therefore it is better to implement embedded approach for 1 : 1 relationship or 1: M relationship only when M value is not so large.

2. Reference approach  (Weak association) : Using @DBRef annotation that persist the referenced class in a separate collection therefore Separate data that can be referred to from multiple places into its own collection. If many records will refer to the same data it is more efficient and less error prone to update a single record and keep references to it in other places. The use of DBRef require an extra query for read operation and hence affect performance.

Why we shouldn't use Embedded approach in case of data growth :
Let say a collection has an association with a 1000 of document.

Consider an example -  Let say a User collection has strong association with posts document and posts document itself has strong association with comments. Now If there are 1000 comments per post, and the user commented on 3 posts, you’ll have to wade through 3000 comments to get a specific comment matched with your query (word or phrase).There’s no way to get only the matched comment back.

Reason  :  It is impossible to return a subset of elements in a document. If you need to pick-and-choose a few bits of each document, it will be easier to separate them out.

Summary :

As a general rule, if you have a lot of "documents" in a collection or if they are large, a separate collection might be best.

Smaller and/or fewer documents tend to be a natural fit for embedding.

So we need to focus on the entities  that contain too many relationship with other entity so we will keep only the reference of child entity in parent and if we know let say merchant has only few location then it would be better to make it embedded.

However while designing the model schema & business need, we must keep all these factors in mind.

However I am pretty sure about the above behavior but not sure which entities has more chances of growth in future according to the business needs? It would help us to make best use of MONGO and better comparison with SQL.
Please suggest and advice.

No comments:

Post a Comment