Introduction:
The specific functions (aggregate) that perform calculations on a set of values and return a single meaningful value. We can also say that these are the mathematical functions that process multiple values to create a single summary statistic. These functions are used in many programming languages, spreadsheets and relational algebra, common in SQL and now in cosmos database.
Why we need Aggregate Functions:
Aggregate functions allow us to summarize large datasets into easily understandable results, allowing us to quickly retrieve desired values from millions of records. They eliminate the need for complex queries or procedural code to generate summaries. These summarized values are crucial for business reporting, facilitating report generation that plays a key role in decision-making. Ultimately, aggregate functions are essential for efficiently analyzing and interpreting extensive datasets.
Dataset and Json Format:
ID | First_Name | Last_Name | Phone | Address | Points |
1 | John | Doe | 555-123-4567 | 123 Main | 1500 |
2 | Jane | Smith | 555-987-6543 | 456 Elm | 2300 |
3 | Alice | Johnson | 555-543-2345 | 789 Oak | 1100 |
4 | Bob | Brown | 555-321-9876 | 321 Pine | 1750 |
5 | Sarah | Davis | 555-654-7890 | 654 Maple | 2000 |
6 | Michael | Taylor | 555-765-4321 | 987 Cedar | 1850 |
7 | Emily | Martinez | 555-246-1357 | 123 Birch | 2400 |
8 | David | Lee | 555-654-3210 | 741 Willow | 900 |
9 | Laura | Wilson | 555-987-1234 | 258 Cherry | 2100 |
10 | Chris | Evans | 555-321-6547 | 369 Ash | 1750 |
Json format file of customers is attached.
Supported Aggregate Functions in Cosmos DB
Cosmos DB offers a range of aggregate functions that enable calculations across multiple documents in a collection, allowing for efficient data summarization and analysis in a distributed database setting. The aggregate functions supported by Azure Cosmos DB are: AVG, COUNT, Maximum, Minimum, and SUM. Here is the description of each function with query and result using above dataset.
1. AVERAGE - AVG():
This function calculates the average of given numeric values column from a set of documents and returns the average value. It operates in the same way as calculating a mathematical average. The syntax for calling this function is `AVG(Column_Name)`, which sums the values in a specified column and then divides the total by the number of rows in the dataset. Here is the example query and result using above dataset.
Query:
SELECT AVG(c.Points) 'Avg Points'
FROM c
WHERE c.entity_type = 'Customer'
Result:
[
{
"Avg Points": 1765
}
]
2. COUNT - COUNT():
The Count function returns the number of items/documents that match a specific condition. This function is often used to count documents in a collection based on some filtering condition. Here is the example query and result using above dataset.
Query:
SELECT COUNT (1) 'Total Customers'
FROM c
WHERE c.entity_type = 'Customer'
Result:
[
{
"Total Customers": 10
}
]
3. SUM - SUM():
This function is used to calculate the sum of all -non-null values from the set of items/documents. This can be used to calculate the total of numeric fields across multiple documents and only works with numeric fields. Here is the example query and result using above dataset.
Query:
<code>SELECT SUM(c.Points) 'Total Points'
FROM c
WHERE c.entity_type = 'Customer'</code>
Result:
<code>[
{
"Total Points": 17650
}
]</code>
4. MAXIMUM - MAX():
This function returns the maximum value from a set of documents. This is used to find the highest value in a collection of documents for a numeric or comparable field. Here is the example query and result using above dataset.
Query:
SELECT MAX(c.Points) 'Max Point'
FROM c
WHERE c.entity_type = 'Customer'
Result:
[
{
"Max Point": 2400
}
]
5. MINIMUM - MIN():
This function returns the minimum value from a set of documents. This is used to find the lowest value in a collection of documents for a numeric or comparable field. Here is the example query and result using above dataset.
Query:
SELECT MIN(c.Points) 'Min Point'
FROM c
WHERE c.entity_type = 'Customer'
Result:
[
{
"Min Point": 900
}
]
Conclusion
The aggregate functions in Cosmos DB are highly effective for calculating unique values within a database or data in the container. We have discussed the importance of these functions, and the ones supported by Cosmos DB, which perform similar roles to their counterparts in MS Excel on datasets. In this article, we explored various examples of using aggregate functions through Cosmos queries. We can apply filters as extensively as needed, based on the specific scenario or criteria with query in where clause.