Aggregation in Pandas

Summarize or compute statistics on a dataset in Pandas

Aggregation in Pandas refers to applying functions to summarize or compute statistics on a dataset. It is commonly used to calculate sum, mean, count, min, max, standard deviation, etc.

Aggregating a Single Column

You can apply aggregation functions directly to a specific column using .agg() or built-in functions like .sum(), .mean(), etc.

import pandas as pd

# Sample Data
data = {
    "Department": ["HR", "IT", "IT", "HR", "Finance", "Finance", "HR"],
    "Salary": [50000, 60000, 70000, 55000, 65000, 72000, 53000],
    "Experience": [5, 7, 10, 3, 8, 12, 2]
}

df = pd.DataFrame(data)
print(df)
  • Using Built-in Aggregation Functions
# Sum of salaries
total_salary = df["Salary"].sum()
print(total_salary)  # Output: 425000

# Mean experience
average_experience = df["Experience"].mean()
print(average_experience)  # Output: 6.71

Aggregating Multiple Columns

You can use .agg() to apply multiple functions to multiple columns.

df.agg({
    "Salary": ["sum", "mean", "max"],
    "Experience": ["min", "std"]
})

Output

            Salary  Experience
sum      425000.0         NaN
mean      60714.3         NaN
max       72000.0         NaN
min           NaN         2.0
std           NaN         3.8

Pivot Table Aggregation

A pivot table provides another way to aggregate and summarize data.

pivot = df.pivot_table(values="Salary", index="Department", aggfunc=["sum", "mean"])
print(pivot)

Output

                 sum    mean
Department                    
Finance       137000  68500.0
HR            158000  52666.7
IT            130000  65000.0

Summary Table

MethodDescription
df["col"].sum()Sum of a column
df.agg({"col": ["sum", "mean"]})Apply multiple aggregation functions
df.pivot_table(values="col2", index="col1", aggfunc="sum")Pivot table for aggregation
No questions available.