文字コード
今更だけどなんかちょっと曖昧に理解してたのでまとめ。
私はWindows使いなので、SJISとUTF8といったりきたりすることが頻繁にあるので、それだけに絞ってまとめてみた。
utf8.txt と cp932.txt というファイルを作成しておく(中身はそれぞれファイル名と同じ文字コードの内容だとする)
- utf8.rb
# coding:utf-8 s = open("utf8.txt"){|fp| fp.read} puts "utf8.txt = #{s.encoding}" s = open("utf8.txt", "r:utf-8"){|fp| fp.read} puts "utf8.txt r:utf-8 = #{s.encoding}" s = open("utf8.txt", "r:utf-8:utf-8"){|fp| fp.read} puts "utf8.txt r:utf-8:utf-8 = #{s.encoding}" s = open("cp932.txt", "r:cp932"){|fp| fp.read} puts "cp932.txt r:utf-8 = #{s.encoding}" s = open("cp932.txt", "r:cp932:utf-8"){|fp| fp.read} puts "cp932.txt r:cp932:utf-8 = #{s.encoding}" # 単純にutf-8のファイルを読み込むのならこれが一番簡単だろう s = IO.read("utf8.txt", :encoding => "utf-8") # 結果は以下のようになる # utf8.txt = Windows-31J # utf8.txt r:utf-8 = UTF-8 # utf8.txt r:utf-8:utf-8 = UTF-8 # cp932.txt r:utf-8 = Windows-31J # cp932.txt r:cp932:utf-8 = UTF-8
- cp932.rb
# coding:cp932 s = open("cp932.txt"){|fp| fp.read} puts "cp932.txt = #{s.encoding}" s = open("cp932.txt", "r:cp932"){|fp| fp.read} puts "cp932.txt r:cp932 = #{s.encoding}" s = open("cp932.txt", "r:cp932:cp932"){|fp| fp.read} puts "cp932.txt r:cp932:cp932 = #{s.encoding}" s = open("utf8.txt", "r:utf-8"){|fp| fp.read} puts "utf8.txt r:utf-8 = #{s.encoding}" s = open("utf8.txt", "r:utf-8:cp932"){|fp| fp.read} puts "utf8.txt r:utf-8:cp932 = #{s.encoding}" # 結果は以下のようになる # cp932.txt = Windows-31J # cp932.txt r:cp932 = Windows-31J # cp932.txt r:cp932:cp932 = Windows-31J # utf8.txt r:utf-8 = UTF-8 # utf8.txt r:utf-8:cp932 = Windows-31J