html_unicoder
html_unicoder
Convert html page to utf-8 for Crystal language.
Features
- Encoding name parsed from http headers
- Encoding name parsed from page meta tag
- Encoding name normalized to be used in internal Crystal decoder.
- Correctly handle many edge cases
- Convert page as IO to IO
- Result page should be safe utf-8 to use in Crystal
Installation
Add this to your application's shard.yml
:
dependencies:
html_unicoder:
github: kostya/html_unicoder
Usage
require "html_unicoder"
# basic usage, encoding only from meta tag, or use UTF-8//ignore, by default
page = HtmlUnicoder.new(page, default_encoding: "UTF-8").to_s
# use headers Array(String)
page = HtmlUnicoder.new(page, headers: ["Content-type: text/html; charset=Windows-1251"]).to_s
# use custom encoding
page = HtmlUnicoder.new(page, encoding: "CP1251").to_s
# set use default encoding
HtmlUnicoder.default_encoding = "CP1251"
page = HtmlUnicoder.new(page).to_s
# with http client
require "http/client"
page = ""
HTTP::Client.get("http://www.example.com") do |response|
page = HtmlUnicoder.new(response.body_io, response.headers).to_s
end
# io -> io
io = HtmlUnicoder.new(io).io
# streaming http client body_io
# steps to find encoding
# 1) finding encoding in headers
# 2) finding encoding in response.body meta tag
# 3) consider io as CP1251
body_io = HtmlUnicoder.new(response.body_io, response.headers, default_encoding: "CP1251").io
Repository
html_unicoder
Owner
Statistic
- 2
- 0
- 0
- 0
- 1
- about 6 years ago
- September 9, 2016
License
MIT License
Links
Synced at
Thu, 07 Nov 2024 02:09:32 GMT
Languages