Tutorial: Sync mongoDB with Elasticsearch

Dish M. Eldishnawy
4 min readApr 4, 2018

--

Elastic-MongoDB Credit: Medium

ELK stack by elastic is great! Many companies nowadays rely on elasticsearch to search & analyse their data in real time, kibana to visualise that analysis or search, and logstash as a data processing pipeline that can take as an input different types of data, filter it, and then sends it to whatever “stash” you prefer.

The plugins libraries of ELK is rich, but so far, there is only a stable plugin to output/write to mongo, and in feb 2018, a new plugin to input in mongo was out, but it depends on JDBC by Java, and elastic does not provide the needed mongodb JDBC drive! So you will end up trying to find a working and cheap version, which can be both difficult and expensive. Not to mention the whole Java related bugs you might fall in. That leaves us with two final options:

  1. Mongo-connector:

A decent python based library, but unfortunately haven’t been updated for almost a year, not to mention that it does not support Elasticsearch version 6+ out of the box.

2. Transporter:

A great and updated frequently library, but it does not actually sync the data between mongodb and Elasticsearch in realtime! Which means that it’s job is done once the sync is done. The only workaround would be to run it several times, which will create unnecessary load on your servers. So use transporter if you are looking for a one-time sync only!

Realtime sync with Monstache:

Monstache is a sync daemon written in Go that continuously indexes your MongoDB collections into Elasticsearch. And here is how to implement Monstache in few steps:

Note: This steps is for Ubuntu based servers, the same steps can be followed for any other server, but make sure you are using the right commands for your server type.

Step 1:

Make sure you have go on your server. You can install go by running

sudo apt-get install golang-go

Step 2:

Download the latest release by Monstache from here. Unzip the folder. Inside the folder you will find a folder for each machine type, in our case linux-amd64 is the one for Ubuntu. Remember the path to that folder. If you do not want to keep files you don’t need, move the content of the folder specific to your server (linux-amd64 for Ubuntu) at any other folder on your server using FTP or any other method, and remember the path (we will need it in Step three).

Step 3:

Export your path to the .profile file of your server by editing it, in Ubuntu that is ~/.bashrc where you need to do the changes.

sudo vi ~/.bashrc

Add the following line to the file, consider the path I asked you to keep in hand in step two

export PATH=“/home/ubuntu/build/linux-amd64:$PATH”

Notice that the path ends with the folder name related to your server type

Remember to source your profile file

source ~/.bashrc

Run the following command to make sure everything is set correct, if it is correct, it should return to you the version of your Monstache setup

monstache -v

Step 4:

Monstache works by monitoring Mongodb oplog, you can read more about mongodb oplog, but that is not the purpose of that tutorial, in simple english, oplog keeps a record of any changes happens in a mongodb object, and saves with that change the _id of the object that was modified. I recommend you to learn more about oplog.

To make sure your mongodb starts with oplog active, you need to run it with master option as true, you can set that up in the mongodb.conf file, or simply by adding — master to mongo when running

mongod --master

Step 5 & Last:

Run Monstache by simply typing in the terminal

monstache

By default monstache will connect to Elasticsearch and MongoDB on localhost on the default ports and begin tailing the MongoDB oplog.

Advanced options:

Both mongodb oplog and Monstache comes packed with several options, one of the changes you might consider is the mongodb oplog size, especially if you intend to use this in a data intensive task. Clearing mongodb oplog as well frequently is advised to save space in your disk. Please note that clearing oplog does not mean at all that Elasticsearch will remove the indexed object from it’s index.

Monstache as well comes with a configuration file that you can find here

monstache -f /path/to/config.toml

you can check the meaning of each configuration here. For example, namespace-regex and namespace-exclude-regex can become useful if you wish to filter what to index in Elasticsearch.

That’s it, you are done! By default Monstache will create an index for each collection with the following syntax

db.collection

So if your database name is test and your collection is data, the index would be test.data.

I hope that was helpful, feel free to ask any questions, or share any other approach you have with the community. And finally, thanks to Ryan Wynn for making Monstache.

--

--

Dish M. Eldishnawy

Human Centrist Tech geek who got curious about minds and operating businesses. Product Manager @Metacore Games, Ex. Unity, Ex. Rovio, and philanthropy advocate!