Topic 3.6 - Pandas 中的数据筛选¶
在本节开始前,我们先导入一下 Pandas 库:
import pandas as pd
1. 基于条件的数据筛选¶
在前面的章节中,我们介绍了按行和列的索引进行数据筛选,在实际的数据分析中,更常见的是基于条件进行数据筛选。
(1) 基于单个条件的筛选¶
在 Pandas 中,可以使用布尔索引来实现基于单个条件的筛选
- 格式是
df[条件],其中条件的格式是:df["列名"]后面跟上比较运算符和具体的值,注意这里不能只跟列名 - 例如,在学生成绩表中筛选出数学成绩大于 90 的记录:
df1[df1["math"] > 90]
df1 = pd.read_csv("../data/students.csv")
df1
| name | class | gender | age | chinese | math | english | |
|---|---|---|---|---|---|---|---|
| 0 | Alice Johnson | 1 | Female | 18 | 85 | 88 | 90 |
| 1 | Bob Smith | 2 | Male | 19 | 78 | 75 | 80 |
| 2 | Charlie Brown | 3 | Male | 18 | 92 | 90 | 94 |
| 3 | David Miller | 1 | Male | 20 | 88 | 85 | 87 |
| 4 | Emma Wilson | 2 | Female | 19 | 76 | 78 | 74 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 86 | Jonah Larson | 3 | Male | 21 | 78 | 76 | 77 |
| 87 | Kylie Newman | 1 | Female | 18 | 85 | 83 | 84 |
| 88 | Lucas Porter | 2 | Male | 22 | 91 | 92 | 93 |
| 89 | Megan Arnold | 3 | Female | 19 | 79 | 77 | 78 |
| 90 | Nick Keller | 1 | Male | 20 | 87 | 85 | 88 |
91 rows × 7 columns
df1[df1["math"] > 90]
| name | class | gender | age | chinese | math | english | |
|---|---|---|---|---|---|---|---|
| 5 | Frank Taylor | 3 | Male | 21 | 90 | 92 | 89 |
| 8 | Ivy Moore | 3 | Female | 22 | 91 | 93 | 90 |
| 11 | Leo Walker | 3 | Male | 21 | 95 | 94 | 96 |
| 17 | Rachel Hill | 3 | Female | 18 | 90 | 92 | 91 |
| 21 | Vera Baker | 1 | Female | 21 | 92 | 94 | 93 |
| 27 | Brian Turner | 1 | Male | 18 | 93 | 95 | 94 |
| 30 | Ella Parker | 1 | Female | 20 | 90 | 92 | 91 |
| 35 | Jason Sanchez | 3 | Male | 20 | 91 | 93 | 92 |
| 40 | Ophelia Morgan | 2 | Female | 20 | 92 | 94 | 93 |
| 43 | Ryan Howard | 2 | Male | 22 | 90 | 92 | 91 |
| 48 | Wendy Wood | 1 | Female | 22 | 93 | 95 | 94 |
| 51 | Zoe Kelly | 1 | Female | 21 | 91 | 92 | 93 |
| 55 | Daisy Ross | 2 | Female | 20 | 90 | 92 | 91 |
| 61 | Julia Long | 2 | Female | 21 | 92 | 94 | 93 |
| 65 | Nina Washington | 3 | Female | 20 | 91 | 92 | 93 |
| 69 | Sara Alexander | 1 | Female | 19 | 90 | 92 | 91 |
| 71 | Ursula Griffin | 3 | Female | 21 | 93 | 95 | 94 |
| 77 | Abby Sullivan | 3 | Female | 18 | 91 | 92 | 93 |
| 82 | Felix Stone | 2 | Male | 18 | 92 | 94 | 93 |
| 85 | Irene Olson | 2 | Female | 20 | 90 | 92 | 91 |
| 88 | Lucas Porter | 2 | Male | 22 | 91 | 92 | 93 |
事实上:
df1["math"] > 90会创建一个新的 Series,是布尔类型的,表示每一行的数学成绩是否大于 90- 将这个布尔 Series 再传入
df1[...]中,就会只保留对应为True的那些行
df1["math"] > 90
0 False
1 False
2 False
3 False
4 False
...
86 False
87 False
88 True
89 False
90 False
Name: math, Length: 91, dtype: bool
(2) 基于多个条件的筛选¶
如果基于多个条件进行筛选,可以使用逻辑运算符将多个条件组合起来:
-
Pandas 中的逻辑运算符和 Python 中的有所不同:
-
逻辑与(AND):
& - 逻辑或(OR):
| -
逻辑非(NOT):
~ -
并且,每个条件都需要用括号括起来
例如:
- 我们想筛选出数学成绩大于 90 且英语成绩大于 85 的记录:
df1[(df1["math"] > 90) & (df1["english"] >= 85)]
| name | class | gender | age | chinese | math | english | |
|---|---|---|---|---|---|---|---|
| 5 | Frank Taylor | 3 | Male | 21 | 90 | 92 | 89 |
| 8 | Ivy Moore | 3 | Female | 22 | 91 | 93 | 90 |
| 11 | Leo Walker | 3 | Male | 21 | 95 | 94 | 96 |
| 17 | Rachel Hill | 3 | Female | 18 | 90 | 92 | 91 |
| 21 | Vera Baker | 1 | Female | 21 | 92 | 94 | 93 |
| 27 | Brian Turner | 1 | Male | 18 | 93 | 95 | 94 |
| 30 | Ella Parker | 1 | Female | 20 | 90 | 92 | 91 |
| 35 | Jason Sanchez | 3 | Male | 20 | 91 | 93 | 92 |
| 40 | Ophelia Morgan | 2 | Female | 20 | 92 | 94 | 93 |
| 43 | Ryan Howard | 2 | Male | 22 | 90 | 92 | 91 |
| 48 | Wendy Wood | 1 | Female | 22 | 93 | 95 | 94 |
| 51 | Zoe Kelly | 1 | Female | 21 | 91 | 92 | 93 |
| 55 | Daisy Ross | 2 | Female | 20 | 90 | 92 | 91 |
| 61 | Julia Long | 2 | Female | 21 | 92 | 94 | 93 |
| 65 | Nina Washington | 3 | Female | 20 | 91 | 92 | 93 |
| 69 | Sara Alexander | 1 | Female | 19 | 90 | 92 | 91 |
| 71 | Ursula Griffin | 3 | Female | 21 | 93 | 95 | 94 |
| 77 | Abby Sullivan | 3 | Female | 18 | 91 | 92 | 93 |
| 82 | Felix Stone | 2 | Male | 18 | 92 | 94 | 93 |
| 85 | Irene Olson | 2 | Female | 20 | 90 | 92 | 91 |
| 88 | Lucas Porter | 2 | Male | 22 | 91 | 92 | 93 |
- 数学成绩大于 90 且英语和语文至少一个大于 70 的记录:
df1[(df1["math"] > 90) & ( (df1["english"] >= 70) | (df1["chinese"] >= 70) )]
| name | class | gender | age | chinese | math | english | |
|---|---|---|---|---|---|---|---|
| 5 | Frank Taylor | 3 | Male | 21 | 90 | 92 | 89 |
| 8 | Ivy Moore | 3 | Female | 22 | 91 | 93 | 90 |
| 11 | Leo Walker | 3 | Male | 21 | 95 | 94 | 96 |
| 17 | Rachel Hill | 3 | Female | 18 | 90 | 92 | 91 |
| 21 | Vera Baker | 1 | Female | 21 | 92 | 94 | 93 |
| 27 | Brian Turner | 1 | Male | 18 | 93 | 95 | 94 |
| 30 | Ella Parker | 1 | Female | 20 | 90 | 92 | 91 |
| 35 | Jason Sanchez | 3 | Male | 20 | 91 | 93 | 92 |
| 40 | Ophelia Morgan | 2 | Female | 20 | 92 | 94 | 93 |
| 43 | Ryan Howard | 2 | Male | 22 | 90 | 92 | 91 |
| 48 | Wendy Wood | 1 | Female | 22 | 93 | 95 | 94 |
| 51 | Zoe Kelly | 1 | Female | 21 | 91 | 92 | 93 |
| 55 | Daisy Ross | 2 | Female | 20 | 90 | 92 | 91 |
| 61 | Julia Long | 2 | Female | 21 | 92 | 94 | 93 |
| 65 | Nina Washington | 3 | Female | 20 | 91 | 92 | 93 |
| 69 | Sara Alexander | 1 | Female | 19 | 90 | 92 | 91 |
| 71 | Ursula Griffin | 3 | Female | 21 | 93 | 95 | 94 |
| 77 | Abby Sullivan | 3 | Female | 18 | 91 | 92 | 93 |
| 82 | Felix Stone | 2 | Male | 18 | 92 | 94 | 93 |
| 85 | Irene Olson | 2 | Female | 20 | 90 | 92 | 91 |
| 88 | Lucas Porter | 2 | Male | 22 | 91 | 92 | 93 |
(3) 高级筛选方法¶
除了常见的数值判断之外,Pandas 还支持一些更高级的筛选方法:
- 范围筛选:使用
between()方法筛选在某个范围内的值,例如筛选数学成绩在 80 到 90之 间的记录(包括边界值 80 和 90):
df1[df1["math"].between(80, 90)]
| name | class | gender | age | chinese | math | english | |
|---|---|---|---|---|---|---|---|
| 0 | Alice Johnson | 1 | Female | 18 | 85 | 88 | 90 |
| 2 | Charlie Brown | 3 | Male | 18 | 92 | 90 | 94 |
| 3 | David Miller | 1 | Male | 20 | 88 | 85 | 87 |
| 6 | Grace Anderson | 1 | Female | 20 | 84 | 82 | 86 |
| 9 | Jack Martin | 1 | Male | 19 | 87 | 85 | 88 |
| 12 | Mia Hall | 1 | Female | 18 | 82 | 80 | 84 |
| 13 | Nathan Young | 2 | Male | 22 | 88 | 86 | 87 |
| 15 | Paul Wright | 1 | Male | 20 | 85 | 83 | 84 |
| 18 | Sam Scott | 1 | Male | 22 | 83 | 80 | 82 |
| 19 | Tina Green | 2 | Female | 19 | 86 | 88 | 85 |
| 23 | Xander Carter | 3 | Male | 22 | 89 | 88 | 90 |
| 24 | Yara Mitchell | 1 | Female | 19 | 84 | 82 | 83 |
| 25 | Zack Perez | 2 | Male | 20 | 88 | 90 | 87 |
| 26 | Amy Roberts | 3 | Female | 21 | 81 | 80 | 82 |
| 29 | Dylan Campbell | 3 | Male | 19 | 85 | 83 | 86 |
| 32 | Gina Edwards | 3 | Female | 18 | 82 | 80 | 81 |
| 33 | Harry Collins | 1 | Male | 22 | 87 | 85 | 88 |
| 36 | Kelly Morris | 1 | Female | 21 | 83 | 82 | 84 |
| 37 | Liam Rogers | 2 | Male | 18 | 88 | 87 | 89 |
| 39 | Noah Cook | 1 | Male | 19 | 86 | 85 | 87 |
| 42 | Queen Foster | 1 | Female | 18 | 84 | 82 | 83 |
| 45 | Tom Cox | 1 | Male | 20 | 85 | 83 | 84 |
| 46 | Uma Diaz | 2 | Female | 21 | 89 | 88 | 90 |
| 47 | Victor Richardson | 3 | Male | 18 | 81 | 80 | 82 |
| 50 | Yusuf Brooks | 3 | Male | 20 | 86 | 85 | 87 |
| 53 | Bella Bennett | 3 | Female | 22 | 84 | 82 | 83 |
| 54 | Chris Barnes | 1 | Male | 19 | 88 | 87 | 89 |
| 57 | Fiona Coleman | 1 | Female | 18 | 82 | 80 | 81 |
| 58 | George Jenkins | 2 | Male | 22 | 87 | 86 | 88 |
| 60 | Ian Powell | 1 | Male | 20 | 85 | 83 | 84 |
| 63 | Laura Hughes | 1 | Female | 22 | 89 | 88 | 90 |
| 64 | Mike Flores | 2 | Male | 19 | 83 | 82 | 84 |
| 67 | Paula Simmons | 2 | Female | 18 | 88 | 87 | 89 |
| 68 | Ray Bryant | 3 | Male | 22 | 84 | 82 | 83 |
| 70 | Tony Russell | 2 | Male | 20 | 81 | 80 | 82 |
| 73 | Whitney Myers | 2 | Female | 22 | 86 | 85 | 87 |
| 74 | Xavier Ford | 3 | Male | 19 | 82 | 80 | 81 |
| 75 | Yvonne Hamilton | 1 | Female | 20 | 88 | 87 | 89 |
| 76 | Zane Graham | 2 | Male | 21 | 85 | 83 | 84 |
| 79 | Clara West | 2 | Female | 19 | 84 | 82 | 83 |
| 80 | Daniel Jordan | 3 | Male | 20 | 89 | 88 | 90 |
| 83 | Gloria Hawkins | 3 | Female | 22 | 83 | 82 | 84 |
| 84 | Howard Dunn | 1 | Male | 19 | 86 | 85 | 87 |
| 87 | Kylie Newman | 1 | Female | 18 | 85 | 83 | 84 |
| 90 | Nick Keller | 1 | Male | 20 | 87 | 85 | 88 |
- 集合筛选:使用
isin()方法筛选某列中值属于指定集合的记录,例如筛选一班和二班的学生记录:
df1[df1["class"].isin([1, 2])]
| name | class | gender | age | chinese | math | english | |
|---|---|---|---|---|---|---|---|
| 0 | Alice Johnson | 1 | Female | 18 | 85 | 88 | 90 |
| 1 | Bob Smith | 2 | Male | 19 | 78 | 75 | 80 |
| 3 | David Miller | 1 | Male | 20 | 88 | 85 | 87 |
| 4 | Emma Wilson | 2 | Female | 19 | 76 | 78 | 74 |
| 6 | Grace Anderson | 1 | Female | 20 | 84 | 82 | 86 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 84 | Howard Dunn | 1 | Male | 19 | 86 | 85 | 87 |
| 85 | Irene Olson | 2 | Female | 20 | 90 | 92 | 91 |
| 87 | Kylie Newman | 1 | Female | 18 | 85 | 83 | 84 |
| 88 | Lucas Porter | 2 | Male | 22 | 91 | 92 | 93 |
| 90 | Nick Keller | 1 | Male | 20 | 87 | 85 | 88 |
61 rows × 7 columns
-
字符串筛选:
str.contains()筛选包含特定子字符串的记录str.startswith()和str.endswith()分别筛选以特定子字符串开头或结尾的记录
df1[df1["name"].str.contains("A")]
| name | class | gender | age | chinese | math | english | |
|---|---|---|---|---|---|---|---|
| 0 | Alice Johnson | 1 | Female | 18 | 85 | 88 | 90 |
| 6 | Grace Anderson | 1 | Female | 20 | 84 | 82 | 86 |
| 20 | Ulysses Adams | 3 | Male | 20 | 79 | 77 | 78 |
| 26 | Amy Roberts | 3 | Female | 21 | 81 | 80 | 82 |
| 52 | Aaron Price | 2 | Male | 18 | 79 | 77 | 78 |
| 69 | Sara Alexander | 1 | Female | 19 | 90 | 92 | 91 |
| 77 | Abby Sullivan | 3 | Female | 18 | 91 | 92 | 93 |
| 89 | Megan Arnold | 3 | Female | 19 | 79 | 77 | 78 |
df1[df1["name"].str.startswith("A")]
| name | class | gender | age | chinese | math | english | |
|---|---|---|---|---|---|---|---|
| 0 | Alice Johnson | 1 | Female | 18 | 85 | 88 | 90 |
| 26 | Amy Roberts | 3 | Female | 21 | 81 | 80 | 82 |
| 52 | Aaron Price | 2 | Male | 18 | 79 | 77 | 78 |
| 77 | Abby Sullivan | 3 | Female | 18 | 91 | 92 | 93 |
df1[df1["name"].str.endswith("Taylor")]
| name | class | gender | age | chinese | math | english | |
|---|---|---|---|---|---|---|---|
| 5 | Frank Taylor | 3 | Male | 21 | 90 | 92 | 89 |
2. 基于索引的数据筛选¶
事实上,我们之前讲过的 .loc 方法同样可以用于基于条件的筛选:
- 语法格式是:
df.loc[条件, 列名列表],这里的条件和前面介绍的布尔索引是一样的,需要写成df["列名"]后面跟上比较运算符和具体的值 - 这样做的好处是,可以同时指定要筛选的列,而不是将所有列都保留下来
我们来看以下例子:
- 筛选出数学成绩大于 90 的学生的姓名和数学成绩:
df1.loc[df1["math"] > 90, ["name", "math"]]
| name | math | |
|---|---|---|
| 5 | Frank Taylor | 92 |
| 8 | Ivy Moore | 93 |
| 11 | Leo Walker | 94 |
| 17 | Rachel Hill | 92 |
| 21 | Vera Baker | 94 |
| 27 | Brian Turner | 95 |
| 30 | Ella Parker | 92 |
| 35 | Jason Sanchez | 93 |
| 40 | Ophelia Morgan | 94 |
| 43 | Ryan Howard | 92 |
| 48 | Wendy Wood | 95 |
| 51 | Zoe Kelly | 92 |
| 55 | Daisy Ross | 92 |
| 61 | Julia Long | 94 |
| 65 | Nina Washington | 92 |
| 69 | Sara Alexander | 92 |
| 71 | Ursula Griffin | 95 |
| 77 | Abby Sullivan | 92 |
| 82 | Felix Stone | 94 |
| 85 | Irene Olson | 92 |
| 88 | Lucas Porter | 92 |
- 筛选出英数学成绩大于 85 且语文成绩大于 80 的学生的姓名、语文、数学、英语成绩:
df1.loc[(df1["english"] > 85) & (df1["chinese"] > 80), ["name", "chinese", "math", "english"]]
| name | chinese | math | english | |
|---|---|---|---|---|
| 0 | Alice Johnson | 85 | 88 | 90 |
| 2 | Charlie Brown | 92 | 90 | 94 |
| 3 | David Miller | 88 | 85 | 87 |
| 5 | Frank Taylor | 90 | 92 | 89 |
| 6 | Grace Anderson | 84 | 82 | 86 |
| 8 | Ivy Moore | 91 | 93 | 90 |
| 9 | Jack Martin | 87 | 85 | 88 |
| 11 | Leo Walker | 95 | 94 | 96 |
| 13 | Nathan Young | 88 | 86 | 87 |
| 17 | Rachel Hill | 90 | 92 | 91 |
| 21 | Vera Baker | 92 | 94 | 93 |
| 23 | Xander Carter | 89 | 88 | 90 |
| 25 | Zack Perez | 88 | 90 | 87 |
| 27 | Brian Turner | 93 | 95 | 94 |
| 29 | Dylan Campbell | 85 | 83 | 86 |
| 30 | Ella Parker | 90 | 92 | 91 |
| 33 | Harry Collins | 87 | 85 | 88 |
| 35 | Jason Sanchez | 91 | 93 | 92 |
| 37 | Liam Rogers | 88 | 87 | 89 |
| 39 | Noah Cook | 86 | 85 | 87 |
| 40 | Ophelia Morgan | 92 | 94 | 93 |
| 43 | Ryan Howard | 90 | 92 | 91 |
| 46 | Uma Diaz | 89 | 88 | 90 |
| 48 | Wendy Wood | 93 | 95 | 94 |
| 50 | Yusuf Brooks | 86 | 85 | 87 |
| 51 | Zoe Kelly | 91 | 92 | 93 |
| 54 | Chris Barnes | 88 | 87 | 89 |
| 55 | Daisy Ross | 90 | 92 | 91 |
| 58 | George Jenkins | 87 | 86 | 88 |
| 61 | Julia Long | 92 | 94 | 93 |
| 63 | Laura Hughes | 89 | 88 | 90 |
| 65 | Nina Washington | 91 | 92 | 93 |
| 67 | Paula Simmons | 88 | 87 | 89 |
| 69 | Sara Alexander | 90 | 92 | 91 |
| 71 | Ursula Griffin | 93 | 95 | 94 |
| 73 | Whitney Myers | 86 | 85 | 87 |
| 75 | Yvonne Hamilton | 88 | 87 | 89 |
| 77 | Abby Sullivan | 91 | 92 | 93 |
| 80 | Daniel Jordan | 89 | 88 | 90 |
| 82 | Felix Stone | 92 | 94 | 93 |
| 84 | Howard Dunn | 86 | 85 | 87 |
| 85 | Irene Olson | 90 | 92 | 91 |
| 88 | Lucas Porter | 91 | 92 | 93 |
| 90 | Nick Keller | 87 | 85 | 88 |
- 筛选出数学成绩大于 90 且语文或英语至少一科成绩大于 85 的学生的姓名、语文、数学、英语成绩:
df1.loc[(df1["math"] > 90) & ( (df1["english"] > 85) | (df1["chinese"] > 85) ), ["name", "chinese", "math", "english"]]
| name | chinese | math | english | |
|---|---|---|---|---|
| 5 | Frank Taylor | 90 | 92 | 89 |
| 8 | Ivy Moore | 91 | 93 | 90 |
| 11 | Leo Walker | 95 | 94 | 96 |
| 17 | Rachel Hill | 90 | 92 | 91 |
| 21 | Vera Baker | 92 | 94 | 93 |
| 27 | Brian Turner | 93 | 95 | 94 |
| 30 | Ella Parker | 90 | 92 | 91 |
| 35 | Jason Sanchez | 91 | 93 | 92 |
| 40 | Ophelia Morgan | 92 | 94 | 93 |
| 43 | Ryan Howard | 90 | 92 | 91 |
| 48 | Wendy Wood | 93 | 95 | 94 |
| 51 | Zoe Kelly | 91 | 92 | 93 |
| 55 | Daisy Ross | 90 | 92 | 91 |
| 61 | Julia Long | 92 | 94 | 93 |
| 65 | Nina Washington | 91 | 92 | 93 |
| 69 | Sara Alexander | 90 | 92 | 91 |
| 71 | Ursula Griffin | 93 | 95 | 94 |
| 77 | Abby Sullivan | 91 | 92 | 93 |
| 82 | Felix Stone | 92 | 94 | 93 |
| 85 | Irene Olson | 90 | 92 | 91 |
| 88 | Lucas Porter | 91 | 92 | 93 |
3. 基于查询的数据筛选¶
Pandas 还提供了一种更简洁的基于查询的方法 query() 来实现数据筛选:
- 语法格式是:
df.query("条件表达式"),其中条件表达式是一个字符串,直接写列名和比较运算符 - 大家如果学过 SQL,会发现这种语法和 SQL 的
WHERE子句非常类似
在这个条件表达式中:
- 列名可以直接使用,不需要加引号或方括号
- 逻辑运算符和前面介绍的布尔索引中使用的是一样的:
&表示逻辑与,|表示逻辑或,~表示逻辑非 - 条件和条件之间不强制使用括号,括号的使用主要是表明优先级的
我们来看以下例子,和上面的例子其实是对应的:
- 筛选出数学成绩大于 90 的记录:
df1.query("math > 90")
| name | class | gender | age | chinese | math | english | |
|---|---|---|---|---|---|---|---|
| 5 | Frank Taylor | 3 | Male | 21 | 90 | 92 | 89 |
| 8 | Ivy Moore | 3 | Female | 22 | 91 | 93 | 90 |
| 11 | Leo Walker | 3 | Male | 21 | 95 | 94 | 96 |
| 17 | Rachel Hill | 3 | Female | 18 | 90 | 92 | 91 |
| 21 | Vera Baker | 1 | Female | 21 | 92 | 94 | 93 |
| 27 | Brian Turner | 1 | Male | 18 | 93 | 95 | 94 |
| 30 | Ella Parker | 1 | Female | 20 | 90 | 92 | 91 |
| 35 | Jason Sanchez | 3 | Male | 20 | 91 | 93 | 92 |
| 40 | Ophelia Morgan | 2 | Female | 20 | 92 | 94 | 93 |
| 43 | Ryan Howard | 2 | Male | 22 | 90 | 92 | 91 |
| 48 | Wendy Wood | 1 | Female | 22 | 93 | 95 | 94 |
| 51 | Zoe Kelly | 1 | Female | 21 | 91 | 92 | 93 |
| 55 | Daisy Ross | 2 | Female | 20 | 90 | 92 | 91 |
| 61 | Julia Long | 2 | Female | 21 | 92 | 94 | 93 |
| 65 | Nina Washington | 3 | Female | 20 | 91 | 92 | 93 |
| 69 | Sara Alexander | 1 | Female | 19 | 90 | 92 | 91 |
| 71 | Ursula Griffin | 3 | Female | 21 | 93 | 95 | 94 |
| 77 | Abby Sullivan | 3 | Female | 18 | 91 | 92 | 93 |
| 82 | Felix Stone | 2 | Male | 18 | 92 | 94 | 93 |
| 85 | Irene Olson | 2 | Female | 20 | 90 | 92 | 91 |
| 88 | Lucas Porter | 2 | Male | 22 | 91 | 92 | 93 |
- 筛选出数学成绩大于 90 且英语成绩大于 85 的记录:
df1.query("math > 90 & english > 85")
| name | class | gender | age | chinese | math | english | |
|---|---|---|---|---|---|---|---|
| 5 | Frank Taylor | 3 | Male | 21 | 90 | 92 | 89 |
| 8 | Ivy Moore | 3 | Female | 22 | 91 | 93 | 90 |
| 11 | Leo Walker | 3 | Male | 21 | 95 | 94 | 96 |
| 17 | Rachel Hill | 3 | Female | 18 | 90 | 92 | 91 |
| 21 | Vera Baker | 1 | Female | 21 | 92 | 94 | 93 |
| 27 | Brian Turner | 1 | Male | 18 | 93 | 95 | 94 |
| 30 | Ella Parker | 1 | Female | 20 | 90 | 92 | 91 |
| 35 | Jason Sanchez | 3 | Male | 20 | 91 | 93 | 92 |
| 40 | Ophelia Morgan | 2 | Female | 20 | 92 | 94 | 93 |
| 43 | Ryan Howard | 2 | Male | 22 | 90 | 92 | 91 |
| 48 | Wendy Wood | 1 | Female | 22 | 93 | 95 | 94 |
| 51 | Zoe Kelly | 1 | Female | 21 | 91 | 92 | 93 |
| 55 | Daisy Ross | 2 | Female | 20 | 90 | 92 | 91 |
| 61 | Julia Long | 2 | Female | 21 | 92 | 94 | 93 |
| 65 | Nina Washington | 3 | Female | 20 | 91 | 92 | 93 |
| 69 | Sara Alexander | 1 | Female | 19 | 90 | 92 | 91 |
| 71 | Ursula Griffin | 3 | Female | 21 | 93 | 95 | 94 |
| 77 | Abby Sullivan | 3 | Female | 18 | 91 | 92 | 93 |
| 82 | Felix Stone | 2 | Male | 18 | 92 | 94 | 93 |
| 85 | Irene Olson | 2 | Female | 20 | 90 | 92 | 91 |
| 88 | Lucas Porter | 2 | Male | 22 | 91 | 92 | 93 |
- 筛选出数学成绩大于 90 且语文和英语至少一科成绩大于 85 的记录:
df1.query("math > 90 & (english > 85 | chinese > 85)")
| name | class | gender | age | chinese | math | english | |
|---|---|---|---|---|---|---|---|
| 5 | Frank Taylor | 3 | Male | 21 | 90 | 92 | 89 |
| 8 | Ivy Moore | 3 | Female | 22 | 91 | 93 | 90 |
| 11 | Leo Walker | 3 | Male | 21 | 95 | 94 | 96 |
| 17 | Rachel Hill | 3 | Female | 18 | 90 | 92 | 91 |
| 21 | Vera Baker | 1 | Female | 21 | 92 | 94 | 93 |
| 27 | Brian Turner | 1 | Male | 18 | 93 | 95 | 94 |
| 30 | Ella Parker | 1 | Female | 20 | 90 | 92 | 91 |
| 35 | Jason Sanchez | 3 | Male | 20 | 91 | 93 | 92 |
| 40 | Ophelia Morgan | 2 | Female | 20 | 92 | 94 | 93 |
| 43 | Ryan Howard | 2 | Male | 22 | 90 | 92 | 91 |
| 48 | Wendy Wood | 1 | Female | 22 | 93 | 95 | 94 |
| 51 | Zoe Kelly | 1 | Female | 21 | 91 | 92 | 93 |
| 55 | Daisy Ross | 2 | Female | 20 | 90 | 92 | 91 |
| 61 | Julia Long | 2 | Female | 21 | 92 | 94 | 93 |
| 65 | Nina Washington | 3 | Female | 20 | 91 | 92 | 93 |
| 69 | Sara Alexander | 1 | Female | 19 | 90 | 92 | 91 |
| 71 | Ursula Griffin | 3 | Female | 21 | 93 | 95 | 94 |
| 77 | Abby Sullivan | 3 | Female | 18 | 91 | 92 | 93 |
| 82 | Felix Stone | 2 | Male | 18 | 92 | 94 | 93 |
| 85 | Irene Olson | 2 | Female | 20 | 90 | 92 | 91 |
| 88 | Lucas Porter | 2 | Male | 22 | 91 | 92 | 93 |
注意:
query()方法只能用于筛选行,不能像.loc那样同时指定要筛选的列- 但是要想筛选列,直接在
query()之后再使用列索引来筛选即可 - 例如,筛选出数学成绩大于 90 的学生的姓名和数学成绩:
df1.query("math > 90")[["name", "math"]]
| name | math | |
|---|---|---|
| 5 | Frank Taylor | 92 |
| 8 | Ivy Moore | 93 |
| 11 | Leo Walker | 94 |
| 17 | Rachel Hill | 92 |
| 21 | Vera Baker | 94 |
| 27 | Brian Turner | 95 |
| 30 | Ella Parker | 92 |
| 35 | Jason Sanchez | 93 |
| 40 | Ophelia Morgan | 94 |
| 43 | Ryan Howard | 92 |
| 48 | Wendy Wood | 95 |
| 51 | Zoe Kelly | 92 |
| 55 | Daisy Ross | 92 |
| 61 | Julia Long | 94 |
| 65 | Nina Washington | 92 |
| 69 | Sara Alexander | 92 |
| 71 | Ursula Griffin | 95 |
| 77 | Abby Sullivan | 92 |
| 82 | Felix Stone | 94 |
| 85 | Irene Olson | 92 |
| 88 | Lucas Porter | 92 |