Sorting data
Sorting Data in Pandas
Sorting in Pandas allows you to rearrange rows in a DataFrame based on column values.
Sorting DataFrame by a Single Column
Use sort_values()
to sort by a specific column.
import pandas as pd
# Sample DataFrame
data = {
"Name": ["Jasmeet", "Bob", "Charlie", "David", "Emma"],
"Age": [25, 30, 35, 40, 29],
"Salary": [50000, 60000, 55000, 70000, 65000]
}
df = pd.DataFrame(data)
# Sort by Age (default: ascending order)
sorted_df = df.sort_values(by="Age")
print(sorted_df)
Output:
Name Age Salary
0 Jasmeet 25 50000
4 Emma 29 65000
1 Bob 30 60000
2 Charlie 35 55000
3 David 40 70000
Sorting in Descending Order
Use ascending=False
to sort in descending order.
sorted_df = df.sort_values(by="Age", ascending=False)
print(sorted_df)
Output:
Name Age Salary
3 David 40 70000
2 Charlie 35 55000
1 Bob 30 60000
4 Emma 29 65000
0 Jasmeet 25 50000
Sorting by Multiple Columns
Sort using multiple columns by passing a list of column names.
# Sort by Age (Ascending), then by Salary (Descending)
sorted_df = df.sort_values(by=["Age", "Salary"], ascending=[True, False])
print(sorted_df)
Output:
Name Age Salary
0 Jasmeet 25 50000
4 Emma 29 65000
1 Bob 30 60000
2 Charlie 35 55000
3 David 40 70000
- The DataFrame is sorted by Age (Ascending) first, and if two values are the same, they are sorted by Salary (Descending).
Sorting by Index
Use sort_index()
to sort based on the index.
# Sort by Index (Descending)
sorted_df = df.sort_index(ascending=False)
print(sorted_df)
Sorting with Missing Values
Missing values (NaN
) are placed at the end by default.
# Sorting with NaN Values
data = {
"Name": ["Jasmeet", "Bob", "Charlie", "David", "Emma"],
"Age": [25, None, 35, 40, 29]
}
df = pd.DataFrame(data)
# Sort by Age
sorted_df = df.sort_values(by="Age")
print(sorted_df)
Output:
Name Age
0 Jasmeet 25.0
4 Emma 29.0
2 Charlie 35.0
3 David 40.0
1 Bob NaN
- To place
NaN
at the beginning, usena_position="first"
.
sorted_df = df.sort_values(by="Age", na_position="first")
Sorting Methods
Method | Description |
---|---|
df.sort_values(by="col") | Sort by a column (ascending). |
df.sort_values(by="col", ascending=False) | Sort by a column (descending). |
df.sort_values(by=["col1", "col2"], ascending=[True, False]) | Sort by multiple columns. |
df.sort_index() | Sort by index. |
df.sort_values(by="col", na_position="first") | Place NaN values first. |
No questions available.