标签云

微信群

扫码加入我们

WeChat QR Code

I'm trying to extract the contents of the class name. How to do I extract all the contents including the ones inside the 'em' tags and after the 'em' tags too? See picture below:I tried the following and these were the results:Trial 1:driver = webdriver.Chrome(options=options)sel = Selector(text = driver.page_source)sel.xpath("//*[@class ='st']").extract()Output 1:>> <span class="st"><span class="f">Nov 26, 2018 - </span>First #<em>GDPR fine</em> awarded in Germany. 330,000 user data stolen. Usernames and passwords stored in plaintext. €20,000 <em>fine</em>. Why "so low"?</span>Trial 2:driver = webdriver.Chrome(options=options)sel = Selector(text = driver.page_source)sel.xpath("//*[@class ='st']/text()").extract()Output 2:>> First #Ideally, the output I want to get is:>> Nov 26, 2018 - First #GDPR fine awarded in Germany. 330,000 user data stolen. Usernames and passwords stored in plaintext. €20,000 fine. Why "so low"?


I don't know Parsel but have you tried something like //*[class='st']::text or a CSS selector span.st::text? See the docs

2019年04月25日55分46秒

JeffC Those expressions throw up errors. So far I've tried sel.xpath("string(//span[class = 'st'])").getall() See link. This works for extracting the full text but only returns the first list element, where I would ideally want a list of all matching class names in the page to be returned.

2019年04月25日55分46秒