MongoDB is a NoSQL, document-oriented database, and its data model fundamentally differs from traditional relational databases. Instead of using JOIN operations to combine data across separate tables, MongoDB employs an aggregation pipeline with the $lookup stage to perform a left outer join. However, joining in MongoDB is also possible through application-level joins and for specialized use cases, recursive searches with $graphLookup.
The aggregation pipeline with $lookup
The most common method for performing a join-like operation in MongoDB is using the $lookup stage within the aggregation framework. The aggregation pipeline processes a stream of documents through a series of stages. The $lookup stage adds a new array field to each document from the input collection. This array contains documents from a "foreign" collection that match a specified join condition.
$lookup with a single join condition (basic syntax)
This performs a simple left outer join. For each document in the input collection, the $lookup stage finds matching documents in the specified "from" collection and adds them to a new array field.
Syntax:
{
$lookup: {
from: "<collection to join>",
localField: "<field from the input documents>",
foreignField: "<field from the documents of the 'from' collection>",
as: "<output array field>"
}
}
Use code with caution.
**Example:**Imagine two collections, orders and customers.
ordershas a fieldcustomer_id.customershas a field_id.
To retrieve a list of orders with the corresponding customer details:
db.orders.aggregate([
{
$lookup: {
from: "customers",
localField: "customer_id",
foreignField: "_id",
as: "customerDetails"
}
}
])
Use code with caution.
The output will add a customerDetails array to each order document. If a match is found, the array will contain the customer's document. If no match is found, the array will be empty.
**Simulating an inner join:**To achieve the equivalent of an inner join (only returning documents that have a match), you can add a $match stage after the $lookup to filter out any documents where the joined array is empty.
db.orders.aggregate([
{
$lookup: {
from: "customers",
localField: "customer_id",
foreignField: "_id",
as: "customerDetails"
}
},
{
$match: {
"customerDetails": { $ne: [] }
}
}
])
Use code with caution.
**Flattening the results with $unwind:**The $lookup stage always returns the joined documents in an array. If you know there will only be one matching document, or you simply want to "flatten" the results, you can use the $unwind stage.
db.orders.aggregate([
{
$lookup: {
from: "customers",
localField: "customer_id",
foreignField: "_id",
as: "customerDetails"
}
},
{
$unwind: "$customerDetails"
}
])
Use code with caution.
The $unwind stage deconstructs the customerDetails array field and outputs a document for each element. This produces a result where the fields from both collections are at the top level of the document, making the output more closely resemble a traditional joined table.
$lookup with a pipeline (advanced joins)
For more complex join conditions that go beyond a simple equality match, you can use a sub-pipeline within $lookup. This allows you to perform operations like filtering, calculating, or sorting the joined documents before they are added to the output.
Syntax:
{
$lookup: {
from: "<collection to join>",
let: { <variable_1>: <expression>, ..., <variable_n>: <expression> },
pipeline: [ <pipeline to run on joined collection> ],
as: "<output array field>"
}
}
Use code with caution.
**Example:**Imagine you want to join a list of products with orders, but only retrieve the orders placed in the last year.
db.products.aggregate([
{
$lookup: {
from: "orders",
let: { productId: "$_id" },
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ["$product_id", "$$productId"] },
{ $gte: ["$order_date", new Date("2024-01-01T00:00:00Z")] }
]
}
}
},
{ $sort: { order_date: -1 } }
],
as: "recentOrders"
}
}
])
Use code with caution.
- The
letfield defines a variable ($$productId) that holds the_idof the document from theproductscollection. - The
pipelineis an array of aggregation stages executed on theorderscollection. - The
$matchstage uses the$$productIdvariable to join on theproduct_idfield and applies an additional filter on theorder_date. - The
$sortstage sorts the orders before they are embedded.
Recursive joins with $graphLookup
For data with hierarchical or graph-like relationships, such as an organizational chart or social network connections, the $graphLookup stage is the most powerful tool. It performs a recursive search to traverse a specified path within a single collection or across collections.
Syntax:
{
$graphLookup: {
from: "<collection to join>",
startWith: "<expression that specifies the starting point for the recursion>",
connectFromField: "<field in the documents of the 'from' collection>",
connectToField: "<field in the documents of the 'from' collection>",
as: "<output array field>",
maxDepth: <integer> // Optional, limits recursion depth
}
}
Use code with caution.
**Example:**To find all the employees who report up the hierarchy to a specific manager:
db.employees.aggregate([
{ $match: { _id: "manager123" } },
{
$graphLookup: {
from: "employees",
startWith: "$_id",
connectFromField: "_id",
connectToField: "reportsTo",
as: "subordinates"
}
}
])
Use code with caution.
This query starts with the manager's document (startWith: "$_id") and recursively finds all documents where the reportsTo field matches the _id of a superior, populating the subordinates array with the result.
Alternative methods for joining data
While $lookup is the official and most robust method, other techniques are available depending on your application's architecture.
1. Application-level joins
Instead of performing the join within MongoDB, you can perform two separate queries from your application and join the results in memory.
- Query the primary collection to get a list of foreign key references (e.g., all
customer_ids from theorderscollection). - Use these IDs to query the foreign collection (e.g.,
customers) and retrieve the necessary data. - Use your application code to combine the data.
This approach gives you maximum flexibility and control but can be less performant for large datasets as it requires multiple round trips to the database.
2. Denormalization (embedding)
For fields that are frequently accessed together and have a one-to-one or one-to-few relationship, a better approach may be to embed the related data directly within the document. This eliminates the need for any join operation at query time, significantly improving performance.
**Example:**Instead of a separate address collection, the address details can be stored directly within the customer document.
{
"_id": 123,
"name": "John Doe",
"address": {
"street": "123 Main St",
"city": "Anytown"
}
}
Use code with caution.
Conclusion: When to use each method
| Join Method | Best for... | Performance | Drawbacks |
|---|---|---|---|
$lookup Stage |
Left outer joins, complex conditions, filtering joined documents. | Good, as the join is performed natively by the database engine. | Can be less performant than a single-document retrieval. |
$graphLookup Stage |
Recursive searches, traversing graph-like or hierarchical data. | Optimized for graph traversal but can have higher memory usage. | Can be less performant than simple $lookup for straightforward joins. |
| Application-Level Join | Maximum control over the join logic, simple relationships, small datasets. | Dependent on network latency and application logic. | Multiple database queries required, slower for large datasets. |
| Denormalization / Embedding | One-to-one or one-to-few relationships, frequently accessed data. | Excellent, as it avoids joins entirely by having all data in one document. | Increased storage size and potential data redundancy. Harder to update embedded data. |
For most join requirements, the aggregation pipeline with $lookup is the recommended and most flexible approach within MongoDB. However, understanding the alternatives and the database's document-oriented nature will help you design the most efficient and scalable solution for your specific use case.