Skip to main content



A cross-platform open-source document-oriented NoSQL database with an Enterprise level distribution that includes additional security and management features. MongoDB is supported and steered by the company MongoDB.

What do we mean by NoSQL?

The NoSQL database category loosely incorporates all databases which do not have a rigid schema and are not queryable via a structured query language. They are normally designed specifically to scale easily to support very large datasets.

What is a document based database?

Instead of storing information as a table with a series of records or rows, MongoDB stores data as searchable documents. A document is analogous to the object model in object orientated programming languages.

How does MongoDB represent documents?

MongoDB uses JSON to represent its documents. JSON is a simple key-value pair textual serialization of an object, for example, a bunch of grapes could be represented as the following JSON string:
  Name: "Bunch of apples",
  Origin: "UK",
  Colour: "Red",
  Weight: 0.15,
  PickDate: 2014-06-10T12: 20: 12Z,
  Apples: [
      Name: "Apple",
      Weight: 0.00068
      Name: "Apple",
      Weight: 0.00073
JSON supports different value types like string, integer, double and dates; it also supports nesting and depth to objects. MongoDB stores JSON on disk in a proprietary binary format called BSON, which is optimised for retrieval.

Document-based databases are schema-less in the sense that there is no requirement for any object stored in the database to have the same, or similar, structure or document keys.

How does MongoDB efficiently search for data?

A user can programmatically specify indexes on document keys. Just like indexes in relational databases, they make querying times shorter by enabling MongoDB to more rapidly locate documents matching the query constraints. MongoDB can have up to 64 indexes per collection and these can be compound, i.e., indexes that incorporate multiple document keys into a single index. Additionally, MongoDB can scale horizontally, distributing its data and the query load across multiple servers ('nodes'). MongoDB calls this process 'sharding'.

What is Sharding?

To support large data sets efficiently, MongoDB allows collections to be physically partitioned and spread across multiple hosts. Each collection can specify a set of keys, which each document in that collection must have, that is used to determine how the documents are distributed across the nodes. This shard key defines an index, of which each host in the cluster is allocated a portion. MongoDB supports up to 1024 shards in a single cluster, and new shards can be added over time as the data grows (without taking the database offline). Shard members should be co-located, and the shard key should be chosen carefully, for optimum performance.

How is the cluster accessed?

MongoDB is shipped with a small routing process (called mongos) that sits in between the cluster and client app. Its job is route client requests into the cluster and then to manage the streaming of the response back to the client. If the query is predicated on the shard key then MongoDB can direct it to the particular node that is responsible for that part of the shard key index; otherwise, it must send it to all nodes in the cluster and aggregate the responses for the client. A cluster can have multiple routing processes and a typical deployment will have one or two per application process for redundancy.

How can I secure MongoDB

Security can be applied at the database level. Accounts can be created within the database to give users read-only or full read-write access. An imminent version of MongoDB will have Active Directory integration. Interprocess communications can be secured using SSL.

How does MongoDB deal with redundancy and backup?

Each shard in the MongoDB cluster can become part of a replica set. A replica set is a group of computers which, under normal load conditions, are kept in sync. Consistency across the entire replica set is not guaranteed since writes are not (by default) synchronous across the entire replica set, but members eventually become consistent over time. ACID-like updates are implemented by nominating a single primary member in each set to which all writes must be made and to which (by default) all reads are made. For applications where read consistency is not important secondary members of the set can be used for reads, easing the load on the primary. If the original primary fails then a new primary is elected amongst the survivors.

Replica set members need not be located in the same physical location and a set can contain up to 6 members. Members can also be marked as delayed members, where writes are only propagated to these members after a specified time period.

Snapshot backups of the database can be taken by temporarily taking the cluster formed by secondary replica set members offline and creating copies of the database files.
What APIs are available to use with MongoDB?

Client APIs are available for all popular programming languages, including Java, C#, C++, C and Python. These are open-source and maintained by 10gen, the company behind MongoDB.


Download MongoDB distribution:

  • Windows:
    • I tested with 2.x version and installed it in C:\work\mongoDB
    • Create C:\work\mongoDB\dbpath directory for database storage to provide a -dbpath command argument when starting mongodb server.
    • Run C:\work\mongoDB\bin\mongod.exe in the distribution folder.
  • Linux
    • Download the latest stable distribution, unpack it to /work/mongodb/ (the mongodb home folder).
    • Create /work/mongodb/dbpath directory its data storage.
    • Run mongoDB:
      cd /work/mongodb
      ./bin/mongod -dbpath 
    • If you are having a shortage of disk space, then you can try running MongoDB instace with -smallfiles option
      ./mongod -dbpath  -smallfiles

MongoDB clients

Both for Windows and Linux you can use the default mongo javascript shell that comes with the distribution.
It connects to localhost and test database by default.
After connecting, you can run commands to play with it.
Help is available by tying help.

MongoDB also comes with a variety of tools that can be used to administer or monitor MongoDB instances. Most basic operations include connecting to a local or any remote MongoDB instance, viewing collections (Mongo term for tables), running queries, creating indices, dropping collections, etc.


MongoDB supports drivers for various different programming languages. For Java you can use MongoDB Java Driver. Here is a documentation to get started writing simple apps with with MongoDB Java Driver.
There is also a Spring Data MongoDB project with POJO centric model for interacting with a MongoDB DBCollection and easily writing a Repository style data access layer.


Popular posts from this blog

ElasticSearch max file descriptors too low error

ElasticSearch 5.x requires a minimum of Max file descriptors 65536 and Max virtual memory areas 262144.
It throws an error on start-up if these are set to very low value.
ERROR: bootstrap checks failed max file descriptors [16384] for elasticsearch process is too low, increase to at least [65536] max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
Check current values using:
$ cat /proc/sys/fs/file-max 16384 $ cat /proc/sys/vm/max_map_count 65530 $ ulimit -Hn 16384 $ ulimit -Sn 4096
To fix this, following files need to change/add below settings:
Recommended: Add a new file 99-elastic.conf under /etc/security/limits.d with following settings:
elasticsearch - nofile 800000 elasticsearch - nproc 16384 defaultusername - nofile 800000 defaultusername - nproc 16384 Alternatively, edit /etc/sysctl.conf with following settings:
fs.file-max = 800000 vm.max_map_count=300000

Kafka performance tuning

Performance Tuning of Kafka is critical when your cluster grow in size. Below are few points to consider to improve Kafka performance:
Consumer group ID: Never use same exact consumer group ID for dozens of machines consuming from different topics. All of those commits will end up on the same exact partition of __consumer_offsets, hence the same broker, and this might in turn cause performance problems. Choose the consumer group ID to group_id+topic_name.
Skewed: A broker is skewed if its number of partitions is greater that the average of partitions per broker on the given topic. Example: 2 brokers share 4 partitions, if one of them has 3 partitions, it is skewed (3 > 2). Try to make sure that none of the brokers is skewed.
Spread: Brokers spread is the percentage of brokers in the cluster that has partitions for the given topic. Example: 3 brokers share a topic that has 2 partitions, so 66% of the brokers have partitions for this topic. Try to achieve 100% broker spread.
Leader skew…

Procedure for name and date of birth change (Pune)

For change of name, the form (scribd) is available free of cost at Government Book Depot (Shaskiya Granthagar), which is located near Collector’s office, next to Saint Helena's School. The postal address is:
Government Photozinco Press Premises and Book Depot,
5, Photozinco Press Road, Pune, MH, 411001.
Wikimapia link

Charges for name or date of birth change, in the Maharashtra Government Gazette:
INR 120.00 per insertion (for two copies of the Gazette)
For backward class applicants: INR 60.00
Charges for extra copy of the Gazette: INR 15.00 per copy (two copies are enough, so you may not want to pay extra for extra copies).

Backward class applicants are required to submit a xerox of caste certificate of old name as issued by the Collector of the District concerned.

Once the form is duly submitted, it normally takes 10 to 15 days for publication of advertisement in the Maharashtra Government Gazette. The Gazette copy reaches to the address filled in the form within next 7 to 15 day…