pandas read

您所在的位置：网站首页 › row(1:100)什么意思 › pandas read

pandas read

2024-07-12 02:52| 来源: 网络整理| 查看: 265

read_csv函数 import pandas as pd 本文所用的数据文件

head.csv(包含“字符串”表头，同时可以用id当index做实验)

id,shuju,label 1,3,postive 2,7,negative 5,7,postive 6,8,postive 3,5,negative

fff.csv

9,6 1,3 2,4 3,5 4,6 5,7 header这个属性详解当表头的type和其下面内容的type不相同时，比如表头是字符串，内容是数字的时候当header属性不设置（缺省）的时候 ##############可以看到，就用了那一堆字符串来当表头了 a=pd.read_csv("head.csv") a idshujulabel013postive127negative257postive368postive435negative 当header属性设置为None时候 ###############可以看到，甚至连那一堆字符串都不能当表头了 a=pd.read_csv("head.csv",header=None) a 0120idshujulabel113postive227negative357postive468postive535negative 当没有表头，或者表头的type和csv内容的type相一致的时候 header缺省时 #########可以看到，会拿第一行来直接当表头 a=pd.read_csv("fff.csv") a 96013124235346457 header=None时候 ############可以看到，不用header=None a=pd.read_csv("fff.csv",header=None) a 01096113224335446557

可以看到，如果表头的type和csv内容的type相一致的时候，那么直接读取，会让第一行来当表头此时加header=None，可以让第一行不当表头，而默认给0、1 来当表头所以 header这个属性，是指，在不加header=None这个属性所出来的数据的基础上，把那个数据的表头去掉，换成0开头的表头

names属性

以下两个代码块表明了！！！！当设置了names属性之后，header无论设不设置，都会是None

a=pd.read_csv("fff.csv",header=None) a 01096113224335446557 a=pd.read_csv("fff.csv",header=None,names=['a','b']) a ab096113224335446557 skiprows属性

head.csv(包含“字符串”表头，同时可以用id当index做实验)

id,shuju,label 1,3,postive 2,7,negative 5,7,postive 6,8,postive 3,5,negative

fff.csv

9,6 1,3 2,4 3,5 4,6 5,7 pd.read_csv("head.csv",skiprows=2,header=None) 012027negative157postive268postive335negative pd.read_csv("head.csv",skiprows=2,header=None,names=['a','b','c']) abc027negative157postive268postive335negative pd.read_csv("fff.csv",skiprows=2,header=None) 01024135246357

对比上面两段代码的效果可以发现，无论是带表头还是不带表头，skiprows=2的效果，都是读第三行（也就是跳了两行读）如果是带表头的文件，那么，其原理是把第一行的id,shuju,label 也当成一行了

nrows属性

这个属性非常实用，他可以被用在数据量非常大的时候，直接用这个属性来取一个大文件中的几行数据！！ head.csv

id,shuju,label 1,3,postive 2,7,negative 5,7,postive 6,8,postive 3,5,negative

fff.csv

9,6 1,3 2,4 3,5 4,6 5,7 有字符串表头的时候 pd.read_csv("head.csv",nrows=2,header=None) 0120idshujulabel113postive

连表头也会取着

没有字符串表头的时候 pd.read_csv("fff.csv",nrows=2,header=None) 01096113 nrows和skiprows结合使用！！！

head.csv

id,shuju,label 1,3,postive 2,7,negative 5,7,postive 6,8,postive 3,5,negative

fff.csv

9,6 1,3 2,4 3,5 4,6 5,7 pd.read_csv("head.csv",nrows=2,skiprows=3,header=None) 012057postive168postive

由此可见，这个实际上是先把 id,shuju,label 1,3,postive 2,7,negative 这三行跳过之后再用nrows取数那么，其实，当文件有表头，想跳过文档“内容”（也就是不包含表头）的前500条，再取5000条数据的时候需要记得，skiprows会把表头也算一行！！！

最后需要注意的一点，就是 header和name属性，都是在其他的属性执行完后比如skiprows跳完之后在跳完行之后的数据上决定表头

chunksize属性

这个属性返回的就是一个迭代器，用于分批次读取数据他是每次取文档“内容”（即不包含表头）的数据的前**条

【本文地址】

pandas read

pandas read

今日新闻

推荐新闻