Aggregation in Pandas
Summarize or compute statistics on a dataset in Pandas
Aggregation in Pandas refers to applying functions to summarize or compute statistics on a dataset. It is commonly used to calculate sum, mean, count, min, max, standard deviation, etc.
Aggregating a Single Column
You can apply aggregation functions directly to a specific column using .agg()
or built-in functions like .sum()
,
.mean()
, etc.
import pandas as pd
# Sample Data
data = {
"Department": ["HR", "IT", "IT", "HR", "Finance", "Finance", "HR"],
"Salary": [50000, 60000, 70000, 55000, 65000, 72000, 53000],
"Experience": [5, 7, 10, 3, 8, 12, 2]
}
df = pd.DataFrame(data)
print(df)
- Using Built-in Aggregation Functions
# Sum of salaries
total_salary = df["Salary"].sum()
print(total_salary) # Output: 425000
# Mean experience
average_experience = df["Experience"].mean()
print(average_experience) # Output: 6.71
Aggregating Multiple Columns
You can use .agg()
to apply multiple functions to multiple columns.
df.agg({
"Salary": ["sum", "mean", "max"],
"Experience": ["min", "std"]
})
Output
Salary Experience
sum 425000.0 NaN
mean 60714.3 NaN
max 72000.0 NaN
min NaN 2.0
std NaN 3.8
Pivot Table Aggregation
A pivot table provides another way to aggregate and summarize data.
pivot = df.pivot_table(values="Salary", index="Department", aggfunc=["sum", "mean"])
print(pivot)
Output
sum mean
Department
Finance 137000 68500.0
HR 158000 52666.7
IT 130000 65000.0
Summary Table
Method | Description |
---|---|
df["col"].sum() | Sum of a column |
df.agg({"col": ["sum", "mean"]}) | Apply multiple aggregation functions |
df.pivot_table(values="col2", index="col1", aggfunc="sum") | Pivot table for aggregation |
No questions available.