Scraping linkedin public profile with selenium and phantomjs in Python -


i try getting body of https://www.linkedin.com/pub/dir/paul.
on local, work under ubuntu 14.04 x86_64 x86_64 x86_64 gnu/linux, phantomjs 2.1.1 , selenium 2.53 , python 2.7 , script works fine.
when send script server debian jessie x86_64 gnu/linux (with same version of phantomjs, selenium , python local), there nothing under body tag of document.

this code main.py

from selenium import webdriver selenium.webdriver.common.by import selenium.webdriver.support import expected_conditions ec selenium.webdriver.support.ui import webdriverwait import sys      def initial(nom):         #browse linkdin url         url = "https://www.linkedin.com"         browser = webdriver.phantomjs(service_args=['--ssl-protocol=any', '--load-images=no', '--ignore-ssl-errors=true'])         browser.get(url)          #set first name input value         inpt = webdriverwait(browser, 10).until(                     ec.visibility_of_element_located((by.name, "first"))             )         inpt.send_keys(nom)          #submit form         submit = webdriverwait(browser, 10).until(                         ec.element_to_be_clickable((by.name, "search"))                     )         submit.click()          #print result content page         webdriverwait(browser, 10).until(                 ec.presence_of_element_located((by.class_name, "professionals"))                  )             print(browser.current_url)        def main(args):         nom = args[1]         initial(nom)       if __name__ == '__main__':         main(sys.argv) 


this output of request when running on server

    <html><head> <script type="text/javascript"> window.onload = function() {   // parse tracking code cookies.   var trk = "sentinel_org_block";   var cookies = document.cookie.split("; ");   (var = 0; < cookies.length; ++i) {     if ((cookies[i].indexof("trkcode=") == 0) && (cookies[i].length > 8)) {       trk = cookies[i].substring(8);     }   }    // protocol redirect url.   var protocol = "http:";   if (window.location.protocol == "https:") {     protocol = "https:";   } else {     // if "sl" cookie set, redirect https.     (var = 0; < cookies.length; ++i) {       if ((cookies[i].indexof("sl=") == 0) && (cookies[i].length > 3)) {         window.location.href = "https:" + window.location.href.substring(window.location.protocol.length);         return;       }     }   }    // new domain. touch.www.linkedin.com or tablet.www.linkedin.com   // strip "touch." , "tablet.". international domains such   // fr.linkedin.com, convert www.linkedin.com   var domain = location.host;   if (domain.substr(0, 6) == "touch.") {     domain = domain.substr(6);   } else if (domain.substr(0, 7) == "tablet.") {     domain = domain.substr(7);   } else if (domain.charat(2) == ".") {     domain = "www" + domain.substr(2);   }    window.location.href = "https://" + domain + "/uas/login?trk=" + trk + "&session_redirect=" +       encodeuricomponent(protocol + "//" + domain +       window.location.href.substr(window.location.href.search(window.location.host) +                                   window.location.host.length)); } </script> </head><body> </body></html> 


log output when run script on server

[info  - 2016-09-06t12:57:11.706z] ghostdriver - main - running on port 39646 [info  - 2016-09-06t12:57:12.467z] session [6dbb4db0-7431-11e6-85da-e9ae670edee0] - _decoratenewwindow - page.settings: {"xssauditingenabled":false,"javascriptcanclosewindows":true,"javascriptcanopenwindows":true,"javascriptenabled":true,"loadimages":false,"localtoremoteurlaccessenabled":false,"useragent":"mozilla/5.0 (unknown; linux x86_64) applewebkit/534.34 (khtml, gecko) phantomjs/1.9.2 safari/534.34","websecurityenabled":true} [info  - 2016-09-06t12:57:12.467z] session [6dbb4db0-7431-11e6-85da-e9ae670edee0] - page.customheaders:  - {} [info  - 2016-09-06t12:57:12.467z] session [6dbb4db0-7431-11e6-85da-e9ae670edee0] - constructor - desired capabilities: {"platform":"any","browsername":"phantomjs","version":"","javascriptenabled":true} [info  - 2016-09-06t12:57:12.467z] session [6dbb4db0-7431-11e6-85da-e9ae670edee0] - constructor - negotiated capabilities: {"browsername":"phantomjs","version":"1.9.2","drivername":"ghostdriver","driverversion":"1.0.4","platform":"linux-unknown-64bit","javascriptenabled":true,"takesscreenshot":true,"handlesalerts":false,"databaseenabled":false,"locationcontextenabled":false,"applicationcacheenabled":false,"browserconnectionenabled":false,"cssselectorsenabled":true,"webstorageenabled":false,"rotatable":false,"acceptsslcerts":false,"nativeevents":true,"proxy":{"proxytype":"direct"}} [info  - 2016-09-06t12:57:12.468z] sessionmanagerreqhand - _postnewsessioncommand - new session created: 6dbb4db0-7431-11e6-85da-e9ae670edee0 [error - 2016-09-06t12:57:13.436z] session [6dbb4db0-7431-11e6-85da-e9ae670edee0] - page @ 'https://www.linkedin.com/' - console error (msg): typeerror: 'undefined' not function (evaluating 'a.bind(a,!1)') [error - 2016-09-06t12:57:13.437z] session [6dbb4db0-7431-11e6-85da-e9ae670edee0] - page @ 'https://www.linkedin.com/' - console error (stack): [   {     "file": "https://static.licdn.com/scds/concat/common/js?h=69w33ou4umkyupw2uqgn7za7w",     "line": 2,     "function": ""   } ] [error - 2016-09-06t12:57:13.941z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166633906 [error - 2016-09-06t12:57:14.545z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166634498 [error - 2016-09-06t12:57:15.280z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166635239 [error - 2016-09-06t12:57:15.998z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166635959 [error - 2016-09-06t12:57:16.706z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166636661 [error - 2016-09-06t12:57:17.406z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166637366 [error - 2016-09-06t12:57:18.104z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166638067 [error - 2016-09-06t12:57:18.826z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166638783 [error - 2016-09-06t12:57:19.529z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166639491 [error - 2016-09-06t12:57:20.224z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166640189 [error - 2016-09-06t12:57:20.936z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166640888 [error - 2016-09-06t12:57:21.635z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166641597 [error - 2016-09-06t12:57:22.344z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166642298 [error - 2016-09-06t12:57:23.039z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166643004 [error - 2016-09-06t12:57:23.744z] webelementlocator - _handlelocatecommand - element(s) not found: gave up. search stop time: 1473166643702 


Comments

Popular posts from this blog

java - Jasper subreport showing only one entry from the JSON data source when embedded in the Title band -

serialization - Convert Any type in scala to Array[Byte] and back -

SonarQube Plugin for Jenkins does not find SonarQube Scanner executable -