Merging Data in Pandad

Join DataFrames in Pandas

The .merge() function is the most powerful way to join DataFrames in Pandas, similar to SQL joins.

import pandas as pd

# Employees Data
df1 = pd.DataFrame({
    'Employee_ID': [101, 102, 103, 104],
    'Name': ['Jasmeet', 'Bob', 'Charlie', 'David'],
    'Dept_ID': [1, 2, 1, 3]
})

# Departments Data
df2 = pd.DataFrame({
    'Dept_ID': [1, 2, 3, 4],
    'Dept_Name': ['HR', 'IT', 'Finance', 'Marketing']
})

print(df1)
print(df2)

Inner Join (how="inner")

Keeps only matching rows from both DataFrames.

df_inner = df1.merge(df2, on="Dept_ID", how="inner")
print(df_inner)

Output

   Employee_ID     Name       Dept_ID   Dept_Name
0         101      Jasmeet    1         HR
1         103      Charlie    1         HR
2         102      Bob        2         IT
3         104      David      3         Finance

Only matching Dept_IDs are included.

Left Join (how="left")

Keeps all rows from the left DataFrame (df1) and matches data from the right (df2).

df_left = df1.merge(df2, on="Dept_ID", how="left")
print(df_left)

Output

   Employee_ID     Name      Dept_ID     Dept_Name
0         101      Jasmeet   1           HR
1         102      Bob       2           IT
2         103      Charlie   1           HR
3         104      David     3           Finance

All employees are retained, even if their department is missing in df2.

Right Join (how="right")

Keeps all rows from the right DataFrame (df2) and matches data from the left (df1).

df_right = df1.merge(df2, on="Dept_ID", how="right")
print(df_right)

Output

   Employee_ID   Name      Dept_ID   Dept_Name
0       101.0    Jasmeet   1         HR
1       103.0    Charlie   1         HR
2       102.0    Bob       2         IT
3       104.0    David     3         Finance
4       NaN      NaN       4         Marketing

The Marketing department is included, even though no employee belongs to it.

Outer Join (how="outer")

Keeps all rows from both DataFrames (fills missing values with NaN).

df_outer = df1.merge(df2, on="Dept_ID", how="outer")
print(df_outer)

Output

   Employee_ID   Name          Dept_ID   Dept_Name
0       101.0    Jasmeet       1         HR
1       103.0    Charlie       1         HR
2       102.0    Bob           2         IT
3       104.0    David         3         Finance
4         NaN    NaN           4         Marketing

Includes all departments and employees, even if they don’t have a match.

No questions available.