Data Frames in Pandas
Pandas DataFrame Explained.
A Pandas DataFrame is a two-dimensional, labeled data structure similar to an Excel spreadsheet or SQL table. It consists of rows and columns, where:
- Rows represent individual records (indexed by default with numbers 0, 1, 2, …)
- Columns represent attributes/features, each with a column label
Creating a DataFrame
1. Creating a DataFrame from a Dictionary
import pandas as pd
data = {
"Name": ["Jasmeet", "Chris", "Charlie"],
"Age": [25, 30, 35],
"City": ["New York", "Los Angeles", "Chicago"]
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age City
0 Jasmeet 25 New York
1 Chris 30 Los Angeles
2 Charlie 35 Chicago
- Each column represents an attribute (
Name
,Age
,City
). - Each row represents a record (Jasmeet, Chris, Charlie).
2. Creating a DataFrame from a List of Lists
data = [
["Jasmeet", 25, "New York"],
["Chris", 30, "Los Angeles"],
["Charlie", 35, "Chicago"]
]
df = pd.DataFrame(data, columns=["Name", "Age", "City"])
print(df)
3. Creating a DataFrame from a CSV File
df = pd.read_csv("data.csv") # Read a CSV file into a DataFrame
print(df.head()) # Display the first 5 rows
Accessing Data in a DataFrame
1. Accessing Columns
print(df["Name"]) # Access 'Name' column
print(df.Age) # Access 'Age' column (alternative syntax)
2. Accessing Rows
print(df.loc[0]) # Access first row using label-based index
print(df.iloc[1]) # Access second row using numeric index
3. Accessing Multiple Columns
print(df[["Name", "Age"]]) # Select multiple columns
4. Filtering Data
print(df[df["Age"] > 30]) # Get all rows where Age > 30
Modifying a DataFrame
1. Adding a New Column
df["Salary"] = [50000, 60000, 70000] # Add a new column
print(df)
2. Updating Column Values
df["Age"] = df["Age"] + 1 # Increase all ages by 1
3. Deleting a Column
df.drop("Salary", axis=1, inplace=True) # Remove 'Salary' column
4. Deleting a Row
df.drop(1, axis=0, inplace=True) # Remove the second row
DataFrame vs. Series
Feature | Pandas Series | Pandas DataFrame |
---|---|---|
Structure | 1D (Single Column) | 2D (Multiple Columns) |
Data Type | Single data type | Multiple data types |
Indexing | One index per value | Row & Column indexing |
Example | pd.Series([1, 2, 3]) | pd.DataFrame({"A": [1, 2], "B": [3, 4]}) |
No questions available.