如何在Java中读取和解析CSV文件

您所在的位置:网站首页 java写csv文件有逗号 如何在Java中读取和解析CSV文件

如何在Java中读取和解析CSV文件

2024-07-06 12:55| 来源: 网络整理| 查看: 265

country-csv-example

逗号分隔值(CSV)文件只是普通的纯文本文件,将数据逐列存储,并用分隔符将其分开(例如,通常是逗号“,”)。

例如 :

1,US,United States 2,MY,Malaysia 3,AU,Australia

要么

"1","US","United States" "2","MY","Malaysia" "3","AU","Australia"

注意 阅读此RFC4180文档,了解逗号分隔值(CSV)格式。

在CSV文件中,通常存在两个问题:

1.包含分隔符的字段,例如,separator是逗号,而包含逗号的字段:

"aaa","b,bb","ccc"

2.双引号用于包围字段,并且该字段包含双引号。 要解决此问题,必须在字段内部出现双引号,方法是在其前面加上另一个双引号( RFC4180 )

"aaa","b""bb","ccc"

在本教程中,我们向您展示了三个示例,这些示例用于读取,解析和打印CSV文件中的值。

解析简单格式的CSV文件的简单解决方案。 解析奇怪格式的CSV文件(包含分隔符或双引号的字段)的高级解决方案 第三方解决方案,OpenCSV示例。 1.简单的解决方案

如果您确定CSV文件中不包含“分隔符或双引号”,则只需使用标准split()即可解析CSV文件。

1.1查看一个简单的CSV文件

/Users/mkyong/csv/country.csv "1.0.0.0","1.0.0.255","16777216","16777471","AU","Australia" "1.0.1.0","1.0.3.255","16777472","16778239","CN","China" "1.0.4.0","1.0.7.255","16778240","16779263","AU","Australia" "1.0.8.0","1.0.15.255","16779264","16781311","CN","China" "1.0.16.0","1.0.31.255","16781312","16785407","JP","Japan" "1.0.32.0","1.0.63.255","16785408","16793599","CN","China" "1.0.64.0","1.0.127.255","16793600","16809983","JP","Japan" "1.0.128.0","1.0.255.255","16809984","16842751","TH","Thailand"

1.2没有什么魔力,只需阅读上面的文本文件,并用逗号分隔符将其拆分即可。

CSVReader.csv package com.mkyong.csv; import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.IOException; public class CSVReader { public static void main(String[] args) { String csvFile = "/Users/mkyong/csv/country.csv"; BufferedReader br = null; String line = ""; String cvsSplitBy = ","; try { br = new BufferedReader(new FileReader(csvFile)); while ((line = br.readLine()) != null) { // use comma as separator String[] country = line.split(cvsSplitBy); System.out.println("Country [code= " + country[4] + " , name=" + country[5] + "]"); } } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { if (br != null) { try { br.close(); } catch (IOException e) { e.printStackTrace(); } } } } }

1.3对于JDK 7及更高版本,请使用try-resources。

CSVReader.csv package com.mkyong.csv; import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; public class CSVReader { public static void main(String[] args) { String csvFile = "/Users/mkyong/csv/country.csv"; String line = ""; String cvsSplitBy = ","; try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) { while ((line = br.readLine()) != null) { // use comma as separator String[] country = line.split(cvsSplitBy); System.out.println("Country [code= " + country[4] + " , name=" + country[5] + "]"); } } catch (IOException e) { e.printStackTrace(); } } }

输出量

Country [code= "AU" , name="Australia"] Country [code= "CN" , name="China"] Country [code= "AU" , name="Australia"] Country [code= "CN" , name="China"] Country [code= "JP" , name="Japan"] Country [code= "CN" , name="China"] Country [code= "JP" , name="Japan"] Country [code= "TH" , name="Thailand"] 2.提前解决

此解决方案将解决包含“分隔符或双引号”的字段,并且还支持自定义分隔符和自定义封闭字段。 查看以下CSV解析示例以及JUnit测试用例,以了解其工作原理。

注意 同样,如果必须在字段中出现双引号而在其前面加上另一个双引号来进行转义,例如

"aaa","b""bb","ccc"

2.1查看另一个CSV文件

/Users/mkyong/csv/country2.csv 10,AU,Australia 11,AU,Aus""tralia "12","AU","Australia" "13","AU","Aus""tralia" "14","AU","Aus,tralia"

2.2以下示例受本文启发–“ 您只需要CSV文件的类 ”(其中一些已固定为支持其他功能,请阅读“固定注释”)以及此第三方OpenCSV库。

CSVUtils.java package com.mkyong.utils; import java.io.File; import java.util.ArrayList; import java.util.List; import java.util.Scanner; public class CSVUtils { private static final char DEFAULT_SEPARATOR = ','; private static final char DEFAULT_QUOTE = '"'; public static void main(String[] args) throws Exception { String csvFile = "/Users/mkyong/csv/country2.csv"; Scanner scanner = new Scanner(new File(csvFile)); while (scanner.hasNext()) { List line = parseLine(scanner.nextLine()); System.out.println("Country [id= " + line.get(0) + ", code= " + line.get(1) + " , name=" + line.get(2) + "]"); } scanner.close(); } public static List parseLine(String cvsLine) { return parseLine(cvsLine, DEFAULT_SEPARATOR, DEFAULT_QUOTE); } public static List parseLine(String cvsLine, char separators) { return parseLine(cvsLine, separators, DEFAULT_QUOTE); } public static List parseLine(String cvsLine, char separators, char customQuote) { List result = new ArrayList(); //if empty, return! if (cvsLine == null && cvsLine.isEmpty()) { return result; } if (customQuote == ' ') { customQuote = DEFAULT_QUOTE; } if (separators == ' ') { separators = DEFAULT_SEPARATOR; } StringBuffer curVal = new StringBuffer(); boolean inQuotes = false; boolean startCollectChar = false; boolean doubleQuotesInColumn = false; char[] chars = cvsLine.toCharArray(); for (char ch : chars) { if (inQuotes) { startCollectChar = true; if (ch == customQuote) { inQuotes = false; doubleQuotesInColumn = false; } else { //Fixed : allow "" in custom quote enclosed if (ch == '\"') { if (!doubleQuotesInColumn) { curVal.append(ch); doubleQuotesInColumn = true; } } else { curVal.append(ch); } } } else { if (ch == customQuote) { inQuotes = true; //Fixed : allow "" in empty quote enclosed if (chars[0] != '"' && customQuote == '\"') { curVal.append('"'); } //double quotes in column will hit this! if (startCollectChar) { curVal.append('"'); } } else if (ch == separators) { result.add(curVal.toString()); curVal = new StringBuffer(); startCollectChar = false; } else if (ch == '\r') { //ignore LF characters continue; } else if (ch == '\n') { //the end, break! break; } else { curVal.append(ch); } } } result.add(curVal.toString()); return result; } }

输出量

Country [id= 10, code= AU , name=Australia] Country [id= 11, code= AU , name=Aus"tralia] Country [id= 12, code= AU , name=Australia] Country [id= 13, code= AU , name=Aus"tralia] Country [id= 14, code= AU , name=Aus,tralia]

3.3复查以下单元测试,它测试“逗号和双引号”问题。

CSVUtilsTest.java package com.mkyong.csv; import com.mkyong.utils.CSVUtils; import org.hamcrest.core.IsNull; import org.junit.Test; import java.util.List; import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.core.Is.is; public class CSVUtilsTest { @Test public void test_no_quote() { String line = "10,AU,Australia"; List result = CSVUtils.parseLine(line); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Australia")); } @Test public void test_no_quote_but_double_quotes_in_column() throws Exception { String line = "10,AU,Aus\"\"tralia"; List result = CSVUtils.parseLine(line); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Aus\"tralia")); } @Test public void test_double_quotes() { String line = "\"10\",\"AU\",\"Australia\""; List result = CSVUtils.parseLine(line); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Australia")); } @Test public void test_double_quotes_but_double_quotes_in_column() { String line = "\"10\",\"AU\",\"Aus\"\"tralia\""; List result = CSVUtils.parseLine(line); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Aus\"tralia")); } @Test public void test_double_quotes_but_comma_in_column() { String line = "\"10\",\"AU\",\"Aus,tralia\""; List result = CSVUtils.parseLine(line); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Aus,tralia")); } }

3.4复查另一个单元测试,它测试自定义分隔符和自定义封闭字段。

CSVUtilsTestCustom.java package com.mkyong.csv; import com.mkyong.utils.CSVUtils; import org.hamcrest.core.IsNull; import org.junit.Test; import java.util.List; import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.core.Is.is; public class CSVUtilsTestCustom { @Test public void test_custom_separator() { String line = "10|AU|Australia"; List result = CSVUtils.parseLine(line, '|'); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Australia")); } @Test public void test_custom_separator_and_quote() { String line = "'10'|'AU'|'Australia'"; List result = CSVUtils.parseLine(line, '|', '\''); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Australia")); } @Test public void test_custom_separator_and_quote_but_custom_quote_in_column() { String line = "'10'|'AU'|'Aus|tralia'"; List result = CSVUtils.parseLine(line, '|', '\''); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Aus|tralia")); } @Test public void test_custom_separator_and_quote_but_double_quotes_in_column() { String line = "'10'|'AU'|'Aus\"\"tralia'"; List result = CSVUtils.parseLine(line, '|', '\''); assertThat(result, IsNull.notNullValue()); assertThat(result.size(), is(3)); assertThat(result.get(0), is("10")); assertThat(result.get(1), is("AU")); assertThat(result.get(2), is("Aus\"tralia")); } } 3. OpenCSV示例

如果您不满意上述简单先进的解决方案,请尝试使用第三方CSV库– OpenCSV 。

3.1 Maven。

pom.xml com.opencsv opencsv 3.8

3.2查看CSV文件。

/Users/mkyong/csv/country3.csv 10,AU,Australia 11,AU,Aus""tralia "12","AU","Australia" "13","AU","Aus""tralia" "14","AU","Aus,tralia"

3.2使用OpenCSV示例解析上述CSV文件。

CSVReaderExample.java package com.mkyong.csv; import com.opencsv.CSVReader; import java.io.FileReader; import java.io.IOException; public class CSVReaderExample { public static void main(String[] args) { String csvFile = "/Users/mkyong/csv/country3.csv"; CSVReader reader = null; try { reader = new CSVReader(new FileReader(csvFile)); String[] line; while ((line = reader.readNext()) != null) { System.out.println("Country [id= " + line[0] + ", code= " + line[1] + " , name=" + line[2] + "]"); } } catch (IOException e) { e.printStackTrace(); } } }

输出量

Country [id= 10, code= AU , name=Australia] Country [id= 11, code= AU , name=Aus"tralia] Country [id= 12, code= AU , name=Australia] Country [id= 13, code= AU , name=Aus"tralia] Country [id= 14, code= AU , name=Aus,tralia]

注意 有关更多示例,请参考此OpenCSV官方文档 。

做完了

参考文献 CSV文件所需的唯一类 CSVHelper示例 Ostermiller Java实用程序–逗号分隔值(CSV) RFC4180 –逗号分隔值(CSV)的格式 OpenCSV网站 Java –如何将数据导出到CSV文件 标签: CSV Java

翻译自: https://mkyong.com/java/how-to-read-and-parse-csv-file-in-java/



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3