General mailing list for discussions and development of PeerLibrary and related software.

List archive Help


Re: [PeerLibrary dev] PeerDB performance and fields selector


Chronological Thread 
  • From: Mitar < >
  • To:
  • Subject: Re: [PeerLibrary dev] PeerDB performance and fields selector
  • Date: Fri, 22 Aug 2014 14:09:09 -0700

Hi!

> I recently stumbled upon an article by Sarah Mei [1] where she explains
> why you should never use MongoDB (worth reading!).

I think for social networks neo4j is showing promise:

http://www.neo4j.org/

Interesting is also:

https://code.google.com/p/leveldb/

> In the paragraph titled 'is there no hope?' she explains that
> fetching related documents is bad because you have to do it
> 'manually'.

I had a presentation on this topic recently. :-) I talk about it here as
well:

https://github.com/peerlibrary/meteor-peerdb#references

So for me MongoDB is interesting because it allows you to build stuff on
top of it. So if you want more complicated things, you can do that. But
of course it is a bit more complicated and there are performance and
latency overheads (you still have to communicate between server and your
MongoDB instance, instead of being done directly in the database instance).

For now we are using it because it is what Meteor provides and it is
good enough for our current scale. We might have to move in the future,
though.

> With PeerDB, these relations can be modeled nicely and fetching the
> related documents takes place on the server.

I think you are misunderstanding what PeerDB is doing. PeerDB is not
fetching related documents at all. PeerDB fetches related documents at
*write* time and syncs them as subdocuments into the main document. Then
when you read, you don't have to do anything anymore, you have all data
you need available. And you can use normal MongoDB and all other queries
on whole documents with subdocuments already in.

meteor-related does fetching:

https://github.com/peerlibrary/meteor-related

This is for use cases when data is not the same for all users, but
should depend on the user accessing it. There you have to do it at
runtime. And no, we don't cache anything there. Mostly because in our
case the whole point is to have reactivity and reactive changes being
pushed to the client as they come, so having a caching layer would
somehow defeat this purpose.

It is also a different worflow model. Instead of client making request
to the server, you subscribe to all data you need and then the question
is just how to figure which data you have to subscribe to. So I would
not say that meteor-related is fetching related data, but that it just
help you compute what all things you have to subscribe to (at some point
we could use information from PeerDB to help there as well). So you say
I want this document with this ID, and then meteor-related can send you
also some other documents along with it, because it can know that you
will need them as well, because they are related.

Mostly we use this for permissions. Where you want to limit access to
publications to those who are in some group.

> 1. Have you ever measured the performance of PeerDB? Sarah raises
> several concerns in her article, one of which addresses caching. Are
> PeerDB's results cached in any way?

In some way, PeerDB is a cache. :-) It caches data directly into
subdocuments of the main document.

It trade-offs read-time performance with write-time performance.

So read-time performance should not be impacted at all, should be
exactly as fast as reading one document.

The issue is only in write time. Now this is something which is quite
slowed down. But nothing that horizontal scaling to multiple instances
could not solve (because documents are still independent). But overhead
still is, and current performance impact on one instance is definitelly
super-linear. You can observe that if you trigger populating with sample
data, you will see that it initially is adding documents very fast, but
then it slows down slower and slower.

The current PeerDB code is completely not-optimized. It does a very
straightforward things and does not try to be smart in any way. So some
optimizations I think are possible.

The only real worry I have is that we are doing Meteor observe on whole
collections: collection.find({}, {fields: {which we are interested in to
sync}}). Meteor caches all those documents then internally so that it
can send it to you as changes come. So you end up with a copy of whole
database. Which might not be a problem until it is only few GBs in
memory because it makes things fast.

But we should entertain the idea of modifying Meteor a bit so that it
does not cache anything, but it refetches the data ever time. So MongoDB
informs Meteor about a change in a field and then Meteor fetches that
document and sends it to PeerLibrary. It would make things a bit slower,
but it would mean that it would scale.

> 2. Does PeerDB respect the 'fields' option of 'find' in an efficient
> way? E.g., does it only fetch a related document if it is necessary?

This is irrelevant question, as discussed above. But yes, limiting the
fields are the most important part and it does this (when reading the
related documents to sync, because it simply uses Meteor reactivity to
keep track of all changes to those fields all the time, to keep them in
sync). And when you are then using normal MongoDB, you can also limit
data going from the database using fields, because this is what it is
meant to. So if you are not interested in subdocuments, you don't do
anything.


Mitar

--
http://mitar.tnode.com/
https://twitter.com/mitar_m



Archive powered by MHonArc 2.6.18.

Top of page