I parse my request with Cheerio like this:node
var url = http://shop.nag.ru/catalog/16939.IP-videonablyudenie-OMNY/16944.IP-kamery-OMNY-c-vario-obektivom/16704.OMNY-1000-PRO; request.get(url, function (err, response, body) { console.log(body); $ = cheerio.load(body); console.log($(".description").html()); });
And as output I see content but in unreadable strange encoding:app
//Plain body console.log(body) (p.s. russian chars): <h1><span style="font-size: 16px;">Уличная 3Мп IP HD камера OMNY - попробуйте найти лучше</span></h1><p style // cheerio's console.log $(".description").html() <h1><span style="font-size: 16px;">Уличная 3Мп IP HD камера OMNY
Target url link coding is in UTF-8 format. So why Cheerio breaks my encoding?ide
Trying to use iconv to encode my body responce:post
var body1 = iconv.decode(body, "utf-8");
but console.log($(".description").html());
still returns weird text.this
[回答]url
Cheerio hasn't broken anything. The HTML it outputs will be rendered by any browser exactly the same as the HTML input. Take a look at this snippet:spa
<h1><span style="font-size: 16px;">Уличная 3Мп IP HD камера OMNY - попробуйте найти лучше</span></h1> <h1><span style="font-size: 16px;">Уличная 3Мп IP HD камера OMNY - попробуйте найти лучше</span></h1>
It's merely the case that У
is the HTML "entity" for the UTF-8 character У
, in the same way the entity >
represents >
.code
However, if you want to get the unencoded text, you can set the decodeEntities
option to false
:
const $ = cheerio.load( `<h1><span style="font-size: 16px;">Уличная 3Мп IP HD камера OMNY - попробуйте найти лучше</span></h1>`, { decodeEntities: false } ); console.log($('span').html()) // => Уличная 3Мп IP HD камера OMNY - попробуйте найти лучше
.as-console-wrapper{min-height:100%}
<script src="https://wzrd.in/standalone/cheerio@latest"></script>