Merging Data in Pandad
Join DataFrames in Pandas
The .merge()
function is the most powerful way to join DataFrames in Pandas, similar to SQL joins.
import pandas as pd
# Employees Data
df1 = pd.DataFrame({
'Employee_ID': [101, 102, 103, 104],
'Name': ['Jasmeet', 'Bob', 'Charlie', 'David'],
'Dept_ID': [1, 2, 1, 3]
})
# Departments Data
df2 = pd.DataFrame({
'Dept_ID': [1, 2, 3, 4],
'Dept_Name': ['HR', 'IT', 'Finance', 'Marketing']
})
print(df1)
print(df2)
how="inner"
)
Inner Join (Keeps only matching rows from both DataFrames.
df_inner = df1.merge(df2, on="Dept_ID", how="inner")
print(df_inner)
Output
Employee_ID Name Dept_ID Dept_Name
0 101 Jasmeet 1 HR
1 103 Charlie 1 HR
2 102 Bob 2 IT
3 104 David 3 Finance
Only matching Dept_IDs are included.
how="left"
)
Left Join (Keeps all rows from the left DataFrame (df1
) and matches data from the right (df2
).
df_left = df1.merge(df2, on="Dept_ID", how="left")
print(df_left)
Output
Employee_ID Name Dept_ID Dept_Name
0 101 Jasmeet 1 HR
1 102 Bob 2 IT
2 103 Charlie 1 HR
3 104 David 3 Finance
All employees are retained, even if their department is missing in df2
.
how="right"
)
Right Join (Keeps all rows from the right DataFrame (df2
) and matches data from the left (df1
).
df_right = df1.merge(df2, on="Dept_ID", how="right")
print(df_right)
Output
Employee_ID Name Dept_ID Dept_Name
0 101.0 Jasmeet 1 HR
1 103.0 Charlie 1 HR
2 102.0 Bob 2 IT
3 104.0 David 3 Finance
4 NaN NaN 4 Marketing
The Marketing department is included, even though no employee belongs to it.
how="outer"
)
Outer Join (Keeps all rows from both DataFrames (fills missing values with NaN
).
df_outer = df1.merge(df2, on="Dept_ID", how="outer")
print(df_outer)
Output
Employee_ID Name Dept_ID Dept_Name
0 101.0 Jasmeet 1 HR
1 103.0 Charlie 1 HR
2 102.0 Bob 2 IT
3 104.0 David 3 Finance
4 NaN NaN 4 Marketing
Includes all departments and employees, even if they don’t have a match.
No questions available.