文字コード

今更だけどなんかちょっと曖昧に理解してたのでまとめ。
私はWindows使いなので、SJISとUTF8といったりきたりすることが頻繁にあるので、それだけに絞ってまとめてみた。
utf8.txt と cp932.txt というファイルを作成しておく(中身はそれぞれファイル名と同じ文字コードの内容だとする)

  • utf8.rb
# coding:utf-8
s = open("utf8.txt"){|fp| fp.read}
puts "utf8.txt = #{s.encoding}"

s = open("utf8.txt", "r:utf-8"){|fp| fp.read}
puts "utf8.txt r:utf-8 = #{s.encoding}"

s = open("utf8.txt", "r:utf-8:utf-8"){|fp| fp.read}
puts "utf8.txt r:utf-8:utf-8 = #{s.encoding}"

s = open("cp932.txt", "r:cp932"){|fp| fp.read}
puts "cp932.txt r:utf-8 = #{s.encoding}"

s = open("cp932.txt", "r:cp932:utf-8"){|fp| fp.read}
puts "cp932.txt r:cp932:utf-8 = #{s.encoding}"

# 単純にutf-8のファイルを読み込むのならこれが一番簡単だろう
s = IO.read("utf8.txt", :encoding => "utf-8")

# 結果は以下のようになる
# utf8.txt = Windows-31J
# utf8.txt r:utf-8 = UTF-8
# utf8.txt r:utf-8:utf-8 = UTF-8
# cp932.txt r:utf-8 = Windows-31J
# cp932.txt r:cp932:utf-8 = UTF-8
  • cp932.rb
# coding:cp932

s = open("cp932.txt"){|fp| fp.read}
puts "cp932.txt = #{s.encoding}"

s = open("cp932.txt", "r:cp932"){|fp| fp.read}
puts "cp932.txt r:cp932 = #{s.encoding}"

s = open("cp932.txt", "r:cp932:cp932"){|fp| fp.read}
puts "cp932.txt r:cp932:cp932 = #{s.encoding}"

s = open("utf8.txt", "r:utf-8"){|fp| fp.read}
puts "utf8.txt r:utf-8 = #{s.encoding}"

s = open("utf8.txt", "r:utf-8:cp932"){|fp| fp.read}
puts "utf8.txt r:utf-8:cp932 = #{s.encoding}"

# 結果は以下のようになる
# cp932.txt = Windows-31J
# cp932.txt r:cp932 = Windows-31J
# cp932.txt r:cp932:cp932 = Windows-31J
# utf8.txt r:utf-8 = UTF-8
# utf8.txt r:utf-8:cp932 = Windows-31J

参考URL:http://d.hatena.ne.jp/Gimite/20080101/1199199332