Python: lxml xpath to extract content -
below code able extract pe reuters link below. however, method not robust webpage stock has 2 lines lesser , result shift of data. how can encounter problem. point straight part of pe extract data not know how it. link 1: http://www.reuters.com/finance/stocks/financialhighlights?symbol=myeg.kl link 2: http://www.reuters.com/finance/stocks/financialhighlights?symbol=annj.kl
from lxml import html import lxml page2 = requests.get('http://www.reuters.com/finance/stocks/financialhighlights?symbol=myeg.kl') treea = html.fromstring(page2.content) tree4 = treea.xpath('//td[@class]/text()') pe= tree4[37]
this part wish code can extract part changes of webpage not affected.
<tr class="stripe"> <td>p/e ratio (ttm)</td> <td class="data">36.79</td> <td class="data">25.99</td> <td class="data">21.70</td> </tr>
use text find first td extract sibling td's:
treea.xpath('//td[contains(.,"p/e ratio")]/following-sibling::td/text()')
that work regardless:
in [8]: page2 = requests.get('http://www.reuters.com/finance/stocks/financialhighlights?symbol=myeg.kl') in [9]: treea = html.fromstring(page2.content) in [10]: tree4 = treea.xpath('//td[contains(.,"p/e ratio")]/following-sibling::td/text()') in [11]: print(tree4) ['36.79', '25.99', '21.41'] in [12]: page2 = requests.get('http://www.reuters.com/finance/stocks/financialhighlights?symbol=annj.kl') in [13]: treea = html.fromstring(page2.content) in [14]: tree4 = treea.xpath('//td[contains(.,"p/e ratio")]/following-sibling::td/text()') in [15]: print(tree4) ['--', '25.49', '17.30']
Comments
Post a Comment