判断从输入流中获取的字符串是什么编码（UTF

您所在的位置：网站首页 › java指定字符串编码格式是什么 › 判断从输入流中获取的字符串是什么编码（UTF

判断从输入流中获取的字符串是什么编码（UTF

2024-07-13 03:17| 来源: 网络整理| 查看: 265

当你从一个未知编码的文件中，通过输入流读取内容时，假如是乱码怎么办？

如果你不知道字符串的编码，可能你只能靠尝试常用的编码的方式，将字符串处理成正确编码格式。

举个例子：“#鍑借喘鍚岃櫣娆惧紡f” 这是从某文件中读取的一行信息，怎么处理，一个一个去尝试么？不妨这样思考，用程序来帮忙判断其编码格式，同时将之转换成UTF-8编码怎么样。

特别说明：（1）以下代码仅适用于UTF-8的编译环境下，即Java文件使用UTF-8编码（2）代码中仅列举了常见的几种编码格式，感兴趣的话请自行增加其他编码（3）将一些编码格式转换成UTF-8不能成功，暂不知怎么解决，希望懂的大神不吝赐教，在此非常感谢

1、首先写一个枚举类

/** * Unicode编码枚举类 * 特别注意：仅适合编码格式为UTF－8的编译系统中 * @author WolfShadow * @date 2018年11月28日 */ public enum UnicodeEnum { UTF_8("UTF-8",(byte)35 , (byte)-27 , (byte)-121), UTF_16("UTF-16",(byte)-30 , (byte)-113 , (byte)-91), GBK("GBK",(byte)35 , (byte)-23 , (byte)-115), GB2312("GB2312",(byte)35 , (byte)-17 , (byte)-65), ISO_8859_1("ISO-8859-1",(byte)35 , (byte)-61 , (byte)-91), NULL("未知编码",(byte)-1 , (byte)-1 , (byte)-1); private String encoding;//编码 private byte byte1;//第1个字节 private byte byte2;//第2个字节 private byte byte3;//第3个字节 private UnicodeEnum(String encoding, byte byte1,byte byte2, byte byte3) { this.encoding = encoding; this.byte1 = byte1; this.byte2 = byte2; this.byte3 = byte3; } public static UnicodeEnum getUnicodeEnum(byte byte1,byte byte2, byte byte3){ UnicodeEnum[] values = UnicodeEnum.values(); for(UnicodeEnum enum1 : values){ if (enum1.getByte1()==byte1 && enum1.getByte2()==byte2 && enum1.getByte3()==byte3) { return enum1; } } return NULL; } public String getEncoding() { return encoding; } public void setEncoding(String encoding) { this.encoding = encoding; } public byte getByte1() { return byte1; } public void setByte1(byte byte1) { this.byte1 = byte1; } public byte getByte2() { return byte2; } public void setByte2(byte byte2) { this.byte2 = byte2; } public byte getByte3() { return byte3; } public void setByte3(byte byte3) { this.byte3 = byte3; } }

2、然后增加一个工具类

/** * 字符串编码工具类 * （1）检测字符串编码 * （2）各种编码之间的转换（请自行完善） * （3）UTF－8、UTF-16、GBK、GB2312、ISO-8859-1等 * @author WolfShadow * @date 2018年11月28日 */ public class UnicodeUtil { /** * 返回字符串的编码格式 * @param str * @return * @auther WolfShadow * @date 2018年11月28日 */ public static String getUnicode(String str){ if (StringUtil.isEmpty(str)) { return null; } byte[] bytes = str.getBytes(); UnicodeEnum unicodeEnum = UnicodeEnum.getUnicodeEnum(bytes[0], bytes[1], bytes[2]); if (unicodeEnum == null) { return null; } return unicodeEnum.getEncoding(); } /** * 将字符串转换成UTF－8格式 * @param str * @return * @throws UnsupportedEncodingException * @auther WolfShadow * @date 2018年11月28日 */ public static String getUTF_8(String str) throws UnsupportedEncodingException{ String unicode = getUnicode(str); if (unicode == null || unicode.equals(UnicodeEnum.NULL.getEncoding())) { return null; } return new String(str.getBytes(unicode),UnicodeEnum.UTF_8.getEncoding()); } }

3、写一个测试方法（或新建一个测试类）

main方法为：

public static void main(String[] args) throws UnsupportedEncodingException { String test = "#函购同虹款式f"; String str1 = new String(test.getBytes(),"UTF-8"); String str2 = new String(test.getBytes(),"GBK"); String str3 = new String(test.getBytes(),"ISO-8859-1"); String str4 = new String(test.getBytes(),"UTF-16"); String str5 = new String(test.getBytes(),"GB2312"); String str6 = new String(test.getBytes(),"Unicode"); System.out.println(getUnicode(str1)); System.out.println(getUnicode(str2)); System.out.println(getUnicode(str3)); System.out.println(getUnicode(str4)); System.out.println(getUnicode(str5)); System.out.println(getUnicode(str6)); System.out.println(getUTF_8(str6)); System.out.println(getUTF_8(str5)); System.out.println(getUTF_8(str4)); System.out.println(getUTF_8(str3)); System.out.println(getUTF_8(str2)); System.out.println(getUTF_8(str1)); }

4、输出结果

很明显，转换成UTF-8的方法有Bug；使用UTF-16和使用Unicode对字符串编码，结果为什么是一样的呢；有没有哪位好心的大神指点一下，谢谢咯！

【本文地址】

判断从输入流中获取的字符串是什么编码（UTF

判断从输入流中获取的字符串是什么编码（UTF

今日新闻

推荐新闻