Filtering Data

Filtering Data in Pandas

Filtering in Pandas allows you to extract specific rows based on conditions applied to the DataFrame.

Filtering Rows Based on a Condition

You can filter rows where a column meets a specific condition.

# Filter Rows Where Age > 30

import pandas as pd

# Sample DataFrame
data = {
    "Name": ["Jasmeet", "Bob", "Charlie", "David", "Emma"],
    "Age": [25, 30, 35, 40, 29],
    "City": ["New York", "Los Angeles", "Chicago", "Houston", "Seattle"]
}

df = pd.DataFrame(data)

# Filter rows where Age > 30
filtered_df = df[df["Age"] > 30]
print(filtered_df)

Output:

    Name      Age      City
2   Charlie   35       Chicago
3   David     40       Houston

Filtering Using Multiple Conditions

# Age > 25 AND City is 'Chicago
filtered_df = df[(df["Age"] > 25) & (df["City"] == "Chicago")]
print(filtered_df)

Output:

     Name      Age   City
2    Charlie   35    Chicago
  • Use & for AND conditions.
  • Use | for OR conditions.

Filtering Using OR Condition

# Age > 35 OR City is 'New York'
filtered_df = df[(df["Age"] > 35) | (df["City"] == "New York")]
print(filtered_df)

Output:

    Name      Age   City
0   Jasmeet   25    New York
3   David     40    Houston

Filtering Rows with Specific Values

Use isin() to filter rows based on multiple values in a column.

# Filter Rows Where City is 'New York' or 'Seattle'
filtered_df = df[df["City"].isin(["New York", "Seattle"])]
print(filtered_df)

Output:

   Name      Age   City
0  Jasmeet   25    New York
4  Emma      29    Seattle

Filtering Rows Based on Text

Use str.contains() to filter rows that contain a specific substring.

# Filter Rows Where City Contains 'o'
filtered_df = df[df["City"].str.contains("o", case=False)]
print(filtered_df)

Output:

    Name   Age  City
3   David  40   Houston
  • Set case=False to make it case-insensitive.

Filtering Null or Non-Null Values

To filter rows with missing values, use isnull() or notnull().

# Filter Rows Where Age is NOT Null
filtered_df = df[df["Age"].notnull()]
print(filtered_df)
  • Similarly, use df[df["Age"].isnull()] to filter rows with missing values.

Filtering Rows Using Query Method

The query() function provides an easy way to filter.

# Filter Rows Where Age > 30
filtered_df = df.query("Age > 30")
print(filtered_df)

Filtering Methods

MethodDescription
df[df["col"] > value]Filter based on a condition.
df[(df["col1"] > value1) & (df["col2"] == "value2")]Filter using multiple conditions (AND).
df[(df["col1"] > value1)(df["col2"] == "value2")]`
df[df["col"].isin([val1, val2])]Filter rows where a column matches multiple values.
df[df["col"].str.contains("text", case=False)]Filter rows where column contains a substring.
df[df["col"].isnull()]Filter rows with missing values.
df[df["col"].notnull()]Filter rows without missing values.
df.query("col > value")Use SQL-like syntax for filtering.
No questions available.