In General, there are 2 ways to maintain a relation between entities
in Mongo :
1.
Embedded approach (Strong association- By default) : The
document will be persisted in the same collection. Read operation
will result better performance as compared to Relational database as
we don't require join operation at all but this approach should be
avoid in case of
⦁ weak association
⦁ More data growth- i.e. when the document fields has more
chances of growth i.e. frequent insert operations such as pushing
elements to an array or adding new fields.
⦁ In Many to Many association
However Put as much in as possible by considering the document size
.MongoDB imposes a 4MB (16MB with 1.8) size limit on a single
document.In general any data that is not useful apart from its
parent document should be part of the same document.
The another good news is that we can lazily load the data also.
However one more disadvantage of embedded approach is that An unique
ID (_id) is not set on nested subdocuments by default it is only
only on root docuemnts.
MongoDB CRUD operations (insert, update, find, remove) all operate
on top-level documents exclusively -- although of course you can
filter by fields in embedded documents. Embedded documents are
always returned within the parent document.
The _id field is a required field of the parent document, and is
typically not necessary or present in embedded documents. If you
require a unique identifier, you can certainly create them, and you
may use the _id field to store them if that is convenient for your
code or your mental model; more typically, they are named after what
they represent (e.g. "username", "otherSystemKey", etc). Neither
MongoDB itself, nor any of the drivers will automatically populate
an _id field except on the top-level document.
Specifically in Java, if you wish to generate ObjectId values for
the _id field in embedded documents, you can do so with:
someEmbeddedDoc._id = new ObjectId();
Therefore while performing any CRUD operation we have to explicitly
take care of generating the unique id for all the subdocuments.
http://stackoverflow.com/questions/11255100/mongodb-embedded-objects-have-no-id-null-value/11263912#11263912
Therefore it is better to implement embedded approach for 1 : 1
relationship or 1: M relationship only when M value is not so large.
2.
Reference approach (Weak association) : Using @DBRef
annotation that persist the referenced class in a separate
collection therefore Separate data that can be referred to from
multiple places into its own collection. If many records will refer
to the same data it is more efficient and less error prone to update
a single record and keep references to it in other places. The use
of DBRef require an extra query for read operation and hence affect
performance.
Why we shouldn't use Embedded approach in case of data growth :
Let say a collection has an association with a 1000 of document.
Consider an example - Let say a User collection has strong
association with posts document and posts document itself has strong
association with comments. Now If there are 1000 comments per post,
and the user commented on 3 posts, you’ll have to wade through 3000
comments to get a specific comment matched with your query (word or
phrase).There’s no way to get only the matched comment back.
Reason : It is impossible to return a subset of elements in a
document. If you need to pick-and-choose a few bits of each
document, it will be easier to separate them out.
Summary :
As a general rule, if you have a lot of "documents" in a collection
or if they are large, a separate collection might be best.
Smaller and/or fewer documents tend to be a natural fit for
embedding.
So we need to focus on the entities that contain too many
relationship with other entity so we will keep only the reference of
child entity in parent and if we know let say merchant has only few
location then it would be better to make it embedded.
However while designing the model schema & business need, we
must keep all these factors in mind.
However I am pretty sure about the above behavior but not sure which
entities has
more chances of growth in future according to
the business needs? It would help us to make best use of MONGO and
better comparison with SQL.
Please suggest and advice.