Group By in Pandas

Group data based on one or more columns in Pandas

The groupby() function in Pandas is used to group data based on one or more columns and then apply an aggregation function to summarize or analyze the grouped data.

Grouping by a Column and Aggregating

You can group by a column and apply an aggregation function (e.g., sum, mean, count).

import pandas as pd

# Sample Data
data = {
    "Department": ["HR", "IT", "IT", "HR", "Finance", "Finance", "HR"],
    "Salary": [50000, 60000, 70000, 55000, 65000, 72000, 53000],
    "Experience": [5, 7, 10, 3, 8, 12, 2]
}

df = pd.DataFrame(data)
print(df)

# Group by "Department" and Sum Salaries
df.groupby("Department")["Salary"].sum()

Output

Department
Finance    137000
HR         158000
IT         130000
Name: Salary, dtype: int64

Using Multiple Aggregation Functions

You can apply multiple aggregation functions to grouped data.

df.groupby("Department")["Salary"].agg(["sum", "mean", "max"])

Output

              sum    mean   max
Department                     
Finance    137000  68500.0  72000
HR         158000  52666.7  55000
IT         130000  65000.0  70000

Grouping by Multiple Columns

You can group by multiple columns.

df.groupby(["Department", "Experience"])["Salary"].sum()

Output

Department  Experience
Finance     8             65000
            12            72000
HR          2             53000
            3             55000
            5             50000
IT          7             60000
            10            70000
Name: Salary, dtype: int64

Applying Multiple Aggregations to Multiple Columns

You can use .agg() to apply different aggregation functions to multiple columns.

df.groupby("Department").agg({
    "Salary": ["sum", "mean"],
    "Experience": ["max", "min"]
})

Output

               Salary           Experience     
                 sum    mean    max    min
Department                                 
Finance      137000  68500.0     12      8
HR          158000  52666.7      5      2
IT          130000  65000.0     10      7

Using `apply()` for Custom Functions

If you need custom aggregation, you can use .apply().

df.groupby("Department")["Salary"].apply(lambda x: x.max() - x.min())

Output

Department
Finance     7000
HR         5000
IT        10000
Name: Salary, dtype: int64

This calculates the salary range for each department.

Resetting Index

By default, groupby() creates a multi-indexed DataFrame. You can use .reset_index() to convert it back to a normal DataFrame.

df.groupby("Department")["Salary"].sum().reset_index()

Output

  Department  Salary
0   Finance  137000
1        HR  158000
2        IT  130000

Using `transform()` for Group-wise Operations

The .transform() method returns values of the same size as the original DataFrame.

df["Average_Salary"] = df.groupby("Department")["Salary"].transform("mean")
print(df)

Output

  Department  Salary  Experience  Average_Salary
0        HR   50000          5    52666.67
1        IT   60000          7    65000.00
2        IT   70000         10    65000.00
3        HR   55000          3    52666.67
4   Finance   65000          8    68500.00
5   Finance   72000         12    68500.00
6        HR   53000          2    52666.67

This adds a column with the average salary per department.

Using `filter()` to Filter Groups

The .filter() method keeps only groups that satisfy a condition.

df_filtered = df.groupby("Department").filter(lambda x: x["Salary"].mean() > 60000)
print(df_filtered)

Output

  Department  Salary  Experience
4   Finance   65000          8
5   Finance   72000         12
1        IT   60000          7
2        IT   70000         10

This keeps only Finance and IT departments where the average salary is greater than 60,000.

Using `ngroup()` to Assign Group Numbers

ngroup() assigns a unique number to each group.

df["Group_Number"] = df.groupby("Department").ngroup()
print(df)

Output

  Department  Salary  Experience  Group_Number
0        HR   50000          5             1
1        IT   60000          7             2
2        IT   70000         10             2
3        HR   55000          3             1
4   Finance   65000          8             0
5   Finance   72000         12             0
6        HR   53000          2             1

Each department is assigned a unique group number.

Summary Table

Method	Description
`df.groupby("col")["col2"].sum()`	Aggregate values grouped by a column
`df.groupby(["col1", "col2"])["col3"].mean()`	Group by multiple columns
`df.groupby("col").agg({"col2": ["sum", "mean"]})`	Multiple aggregations per group
`df.groupby("col")["col2"].apply(lambda x: x.max() - x.min())`	Custom aggregation
`df.groupby("col").transform("mean")`	Group-wise operation returning original size
`df.groupby("col").filter(lambda x: condition)`	Keep only specific groups
`df.groupby("col").ngroup()`	Assign group numbers

No questions available.

Aggregation

Joining Data

Basics

Data Structure

File Handling

Pandas

NumPy

Data Visualization

Handling Datetime

Handling Text data

Group By in Pandas

Grouping by a Column and Aggregating

Using Multiple Aggregation Functions

Grouping by Multiple Columns

Applying Multiple Aggregations to Multiple Columns

Using `apply()` for Custom Functions

Resetting Index

Using `transform()` for Group-wise Operations

Using `filter()` to Filter Groups

Using `ngroup()` to Assign Group Numbers

Summary Table

On this page

Basics

Data Structure

File Handling

Pandas

NumPy

Data Visualization

Handling Datetime

Handling Text data

Group By in Pandas

Grouping by a Column and Aggregating

Using Multiple Aggregation Functions

Grouping by Multiple Columns

Applying Multiple Aggregations to Multiple Columns

Using apply() for Custom Functions

Resetting Index

Using transform() for Group-wise Operations

Using filter() to Filter Groups

Using ngroup() to Assign Group Numbers

Summary Table

On this page

Using `apply()` for Custom Functions

Using `transform()` for Group-wise Operations

Using `filter()` to Filter Groups

Using `ngroup()` to Assign Group Numbers