python探测字符串编码

发表于 2019-03-09 更新于 2021-11-23

chardet可以用来探测字符串编码，简单易用。

import chardet

# 检测出的编码是ascii
print(chardet.detect(b'Hello, world!'))

# 检测出的编码是GB2312(GBK是GB2312的超集,两者是同一种编码)
data1 = '离离原上草，一岁一枯荣'.encode('gbk')
print(chardet.detect(data1))

# 检测出的编码是utf-8
data2 = '离离原上草，一岁一枯荣'.encode('utf-8')
print(chardet.detect(data2))

# 检测出的编码是euc-jp
data3 = '最新の主要ニュース'.encode('euc-jp')
print(chardet.detect(data3))