Filtering Data
Filtering Data in Pandas
Filtering in Pandas allows you to extract specific rows based on conditions applied to the DataFrame.
Filtering Rows Based on a Condition
You can filter rows where a column meets a specific condition.
# Filter Rows Where Age > 30
import pandas as pd
# Sample DataFrame
data = {
"Name": ["Jasmeet", "Bob", "Charlie", "David", "Emma"],
"Age": [25, 30, 35, 40, 29],
"City": ["New York", "Los Angeles", "Chicago", "Houston", "Seattle"]
}
df = pd.DataFrame(data)
# Filter rows where Age > 30
filtered_df = df[df["Age"] > 30]
print(filtered_df)
Output:
Name Age City
2 Charlie 35 Chicago
3 David 40 Houston
Filtering Using Multiple Conditions
# Age > 25 AND City is 'Chicago
filtered_df = df[(df["Age"] > 25) & (df["City"] == "Chicago")]
print(filtered_df)
Output:
Name Age City
2 Charlie 35 Chicago
- Use
&
for AND conditions. - Use
|
for OR conditions.
Filtering Using OR Condition
# Age > 35 OR City is 'New York'
filtered_df = df[(df["Age"] > 35) | (df["City"] == "New York")]
print(filtered_df)
Output:
Name Age City
0 Jasmeet 25 New York
3 David 40 Houston
Filtering Rows with Specific Values
Use isin()
to filter rows based on multiple values in a column.
# Filter Rows Where City is 'New York' or 'Seattle'
filtered_df = df[df["City"].isin(["New York", "Seattle"])]
print(filtered_df)
Output:
Name Age City
0 Jasmeet 25 New York
4 Emma 29 Seattle
Filtering Rows Based on Text
Use str.contains()
to filter rows that contain a specific substring.
# Filter Rows Where City Contains 'o'
filtered_df = df[df["City"].str.contains("o", case=False)]
print(filtered_df)
Output:
Name Age City
3 David 40 Houston
- Set
case=False
to make it case-insensitive.
Filtering Null or Non-Null Values
To filter rows with missing values, use isnull()
or notnull()
.
# Filter Rows Where Age is NOT Null
filtered_df = df[df["Age"].notnull()]
print(filtered_df)
- Similarly, use
df[df["Age"].isnull()]
to filter rows with missing values.
Filtering Rows Using Query Method
The query()
function provides an easy way to filter.
# Filter Rows Where Age > 30
filtered_df = df.query("Age > 30")
print(filtered_df)
Filtering Methods
Method | Description |
---|---|
df[df["col"] > value] | Filter based on a condition. |
df[(df["col1"] > value1) & (df["col2"] == "value2")] | Filter using multiple conditions (AND). |
df[(df["col1"] > value1) | (df["col2"] == "value2")]` |
df[df["col"].isin([val1, val2])] | Filter rows where a column matches multiple values. |
df[df["col"].str.contains("text", case=False)] | Filter rows where column contains a substring. |
df[df["col"].isnull()] | Filter rows with missing values. |
df[df["col"].notnull()] | Filter rows without missing values. |
df.query("col > value") | Use SQL-like syntax for filtering. |
No questions available.