Scraping linkedin public profile with selenium and phantomjs in Python -
i try getting body of https://www.linkedin.com/pub/dir/paul.
on local, work under ubuntu 14.04 x86_64 x86_64 x86_64 gnu/linux, phantomjs 2.1.1 , selenium 2.53 , python 2.7 , script works fine.
when send script server debian jessie x86_64 gnu/linux (with same version of phantomjs, selenium , python local), there nothing under body tag of document.
this code main.py
from selenium import webdriver selenium.webdriver.common.by import selenium.webdriver.support import expected_conditions ec selenium.webdriver.support.ui import webdriverwait import sys def initial(nom): #browse linkdin url url = "https://www.linkedin.com" browser = webdriver.phantomjs(service_args=['--ssl-protocol=any', '--load-images=no', '--ignore-ssl-errors=true']) browser.get(url) #set first name input value inpt = webdriverwait(browser, 10).until( ec.visibility_of_element_located((by.name, "first")) ) inpt.send_keys(nom) #submit form submit = webdriverwait(browser, 10).until( ec.element_to_be_clickable((by.name, "search")) ) submit.click() #print result content page webdriverwait(browser, 10).until( ec.presence_of_element_located((by.class_name, "professionals")) ) print(browser.current_url) def main(args): nom = args[1] initial(nom) if __name__ == '__main__': main(sys.argv)
this output of request when running on server
<html><head> <script type="text/javascript"> window.onload = function() { // parse tracking code cookies. var trk = "sentinel_org_block"; var cookies = document.cookie.split("; "); (var = 0; < cookies.length; ++i) { if ((cookies[i].indexof("trkcode=") == 0) && (cookies[i].length > 8)) { trk = cookies[i].substring(8); } } // protocol redirect url. var protocol = "http:"; if (window.location.protocol == "https:") { protocol = "https:"; } else { // if "sl" cookie set, redirect https. (var = 0; < cookies.length; ++i) { if ((cookies[i].indexof("sl=") == 0) && (cookies[i].length > 3)) { window.location.href = "https:" + window.location.href.substring(window.location.protocol.length); return; } } } // new domain. touch.www.linkedin.com or tablet.www.linkedin.com // strip "touch." , "tablet.". international domains such // fr.linkedin.com, convert www.linkedin.com var domain = location.host; if (domain.substr(0, 6) == "touch.") { domain = domain.substr(6); } else if (domain.substr(0, 7) == "tablet.") { domain = domain.substr(7); } else if (domain.charat(2) == ".") { domain = "www" + domain.substr(2); } window.location.href = "https://" + domain + "/uas/login?trk=" + trk + "&session_redirect=" + encodeuricomponent(protocol + "//" + domain + window.location.href.substr(window.location.href.search(window.location.host) + window.location.host.length)); } </script> </head><body> </body></html>
log output when run script on server
[info - 2016-09-06t12:57:11.706z] ghostdriver - main - running on port 39646 [info - 2016-09-06t12:57:12.467z] session [6dbb4db0-7431-11e6-85da-e9ae670edee0] - _decoratenewwindow - page.settings: {"xssauditingenabled":false,"javascriptcanclosewindows":true,"javascriptcanopenwindows":true,"javascriptenabled":true,"loadimages":false,"localtoremoteurlaccessenabled":false,"useragent":"mozilla/5.0 (unknown; linux x86_64) applewebkit/534.34 (khtml, gecko) phantomjs/1.9.2 safari/534.34","websecurityenabled":true} [info - 2016-09-06t12:57:12.467z] session [6dbb4db0-7431-11e6-85da-e9ae670edee0] - page.customheaders: - {} [info - 2016-09-06t12:57:12.467z] session [6dbb4db0-7431-11e6-85da-e9ae670edee0] - constructor - desired capabilities: {"platform":"any","browsername":"phantomjs","version":"","javascriptenabled":true} [info - 2016-09-06t12:57:12.467z] session [6dbb4db0-7431-11e6-85da-e9ae670edee0] - constructor - negotiated capabilities: {"browsername":"phantomjs","version":"1.9.2","drivername":"ghostdriver","driverversion":"1.0.4","platform":"linux-unknown-64bit","javascriptenabled":true,"takesscreenshot":true,"handlesalerts":false,"databaseenabled":false,"locationcontextenabled":false,"applicationcacheenabled":false,"browserconnectionenabled":false,"cssselectorsenabled":true,"webstorageenabled":false,"rotatable":false,"acceptsslcerts":false,"nativeevents":true,"proxy":{"proxytype":"direct"}} [info - 2016-09-06t12:57:12.468z] sessionmanagerreqhand - _postnewsessioncommand - new session created: 6dbb4db0-7431-11e6-85da-e9ae670edee0 [error - 2016-09-06t12:57:13.436z] session [6dbb4db0-7431-11e6-85da-e9ae670edee0] - page @ 'https://www.linkedin.com/' - console error (msg): typeerror: 'undefined' not function (evaluating 'a.bind(a,!1)') [error - 2016-09-06t12:57:13.437z] session [6dbb4db0-7431-11e6-85da-e9ae670edee0] - page @ 'https://www.linkedin.com/' - console error (stack): [ { "file": "https://static.licdn.com/scds/concat/common/js?h=69w33ou4umkyupw2uqgn7za7w", "line": 2, "function": "" } ] [error - 2016-09-06t12:57:13.941z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166633906 [error - 2016-09-06t12:57:14.545z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166634498 [error - 2016-09-06t12:57:15.280z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166635239 [error - 2016-09-06t12:57:15.998z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166635959 [error - 2016-09-06t12:57:16.706z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166636661 [error - 2016-09-06t12:57:17.406z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166637366 [error - 2016-09-06t12:57:18.104z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166638067 [error - 2016-09-06t12:57:18.826z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166638783 [error - 2016-09-06t12:57:19.529z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166639491 [error - 2016-09-06t12:57:20.224z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166640189 [error - 2016-09-06t12:57:20.936z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166640888 [error - 2016-09-06t12:57:21.635z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166641597 [error - 2016-09-06t12:57:22.344z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166642298 [error - 2016-09-06t12:57:23.039z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166643004 [error - 2016-09-06t12:57:23.744z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166643702
Comments
Post a Comment