Reply to this topicStart new topic
YQL html extracted output is garbled
reedom
post Aug 20 2009, 07:23 PM
Post #1
Group: Members
Posts: 1



YQL html extractor seems that it doesn't handle any encoding specs in the target document.

Here is a query againt a blog site written in `euc-jp' encoding:
CODE
select * from html 
where url="http://blog.livedoor.jp/dankogai/"
and xpath='//div[contains(concat(" ", @class, " "), " plugin-monthly ")]//a'

The result document contains garbled characters.
Will you fix up it?
Go to the top of the page
 
+Quote Post
Nagesh Susarla
post Aug 24 2009, 12:44 PM
Post #2
Group: Yahoos
Posts: 79



QUOTE (reedom @ Aug 20 2009, 07:23 PM) *
YQL html extractor seems that it doesn't handle any encoding specs in the target document.

Here is a query againt a blog site written in `euc-jp' encoding:
CODE
select * from html 
where url="http://blog.livedoor.jp/dankogai/"
and xpath='//div[contains(concat(" ", @class, " "), " plugin-monthly ")]//a'

The result document contains garbled characters.
Will you fix up it?



Hi,

You can use the 'charset' parameter available on the html table to pass in the correct encoding and it should work as desired

CODE
 
select * from html
where url="http://blog.livedoor.jp/dankogai/"
and xpath='//div[contains(concat(" ", @class, " "), " plugin-monthly ")]//a' and charset='euc-jp'


-- Nagesh
Go to the top of the page
 
+Quote Post
« Next Oldest · Y!OS Documentation · Next Newest »
 

Reply to this topicStart new topic

 



rss YDN Forum RSS feeds

YDN Content Copyright © 2010 Yahoo! Inc. All rights reserved. Copyright | Privacy Policy

Help us continue to improve the Yahoo! Developer Network - Send Your Suggestions