基于selenium的爬蟲(chromedriver)-java環境下

      網友投稿 842 2025-04-03

      說一下使用場景先:


      selenium是常用的網頁自動化測試框架,我這次的使用場景是這樣的,項目爬蟲范圍拓展到了一個新的站點,雖然登錄還是原來的單點登錄,但是這個網站后續判斷是否登錄授權中有使用一些前端js動態添加的cookies,這段邏輯具體會產生sessionId等cookie,其中使用了https://github.com/broofa/node-uuid等機制,通過觀察發現這部分邏輯一時無法在不依賴與瀏覽器引擎的后端爬蟲里面模擬再現;

      導致的情況就是使用原有的單點登錄后的cookie進行后續爬蟲鑒權失敗,如果直接復制瀏覽器中帶有sessionId等信息的cookie后續操作正常,很明顯的是復制的cookie過一段時間肯定就失效了;

      然后我的解決方案就是通過selenium動態獲取這個生成的cookie,因為原始項目已經比較肥胖,所以我采用的是spring +cxf? +selenium,以服Rest務的方式將這個爬蟲需要的cookie提供給原始的httpclient+jsoup爬蟲使用;

      最終效果是符合我的預期的,注意spring?+selenium時需要移除pom中的一個guaua庫;

      基于apache cxf微服務例子

      服務接口 @Path("/r") @Produces("application/json") public?interface?XXXService?{ ????@GET ????@Path("/{uid}/{password}/{headless}") ????public?XXXModel?get(@PathParam("uid")?String?uid,@PathParam("password")?String?password,@PathParam("headless")?Integer?headless); ????@POST ????public?void?post(XXXModel?xxxModel); } @Service("xxxService") public?class?XXXServiceImpl?implements?XXXService?{ ????@Override ????public?XXXModel?get(String?uid,String?password,Integer?headless)?{ ????????prepare(headless); ????????String?cookies=""; ????????try?{ ???????? cookies?=?getCookies(uid,new?String(Base64.decode(password))); ????????}catch(Exception?e)?{ ??????????????e.printStackTrace(); ????????}finally?{ ???????? itsdown(); ????????} ????????return?new?XXXModel(cookies); ????} ????@Override ????public?void?post(XXXModel?xxxModel)?{ ????} ????private?String?testUrl; ????private?WebDriver?driver; ????public?void?prepare(Integer?headless)?{ ????????System.setProperty( ????????????????"webdriver.chrome.driver", ????????????????"D:\XXX\chrome\Chrome-bin\chromedriver.exe"); ????????testUrl?=?"https://xxxx/login"; ????????ChromeOptions?options?=?null; ????????try?{ ????????????options?=?new?ChromeOptions(); ???? }catch(Exception?e)?{ ???? e.printStackTrace(); ???? } ????????options.setBinary("D:\XXX\chrome\Chrome-bin\chrome.exe"); ????????options.setHeadless(headless!=0); ???? try?{ ????????????driver?=?new?ChromeDriver(options);//options ???? }catch(Exception?e)?{ ???? e.printStackTrace(); ???? } ????????driver.get(testUrl); ????} ???? ????public?String?getCookies(String?uid,String?password)??{ ???? /*try?{ Thread.sleep(3000); }?catch?(InterruptedException?e)?{ e.printStackTrace(); }*/ ???? (new?WebDriverWait(driver,?5)).until( ???? ExpectedConditions.visibilityOfElementLocated(By.id("password")) ????????); ???? //WebElement?uidE=?driver.findElement(By.id("uid")); ???? WebElement?passwordE=?driver.findElement(By.id("password")); ???? JavascriptExecutor?jsExecutor?=?(JavascriptExecutor)?driver; ???? try?{ ???????? //jsExecutor.executeScript("document.getElementById('password').setAttribute('value',?'"+password+"')"); ???????? passwordE.sendKeys(password); ???????? jsExecutor.executeScript("document.getElementById('uid').setAttribute('value',?'"+uid+"')"); ???????? jsExecutor.executeScript("submitForm()"); }?catch?(Exception?e)?{ e.printStackTrace(); } ???? (new?WebDriverWait(driver,?5)).until( ???? ExpectedConditions.visibilityOfElementLocated(By.className("head_searchBtn")) ???? /*new?ExpectedCondition()?{ ???? public?Boolean?apply(WebDriver?d)?{ ???????? ????return?((JavascriptExecutor)?driver).executeScript("return?document.readyState").equals("complete"); ???? } ???? }*/ ?????????); ???? driver.navigate().to("http://xxx.com"); ???? (new?WebDriverWait(driver,?5)).until( ???????? new?ExpectedCondition()?{ ???????? public?Boolean?apply(WebDriver?d)?{ ???????????? ????return?((String)?jsExecutor.executeScript("return?document.cookie")).contains("PLM_REMOTE_USER"); ???????? } ???????? } ????????);???? ???? String?cookiesString?=?(String)?jsExecutor.executeScript("return?document.cookie"); ???? return?cookiesString; ????} ???? ????private?Set?parseBrowserCookies(String?cookiesString)?{ ????????Set?cookies?=?new?HashSet<>(); ????????if?(StringUtils.isBlank(cookiesString))?{ ????????????return?cookies; ????????} ????????Arrays.asList(cookiesString.split(";?")).forEach(cookie?->?{ ????????????String[]?splitCookie?=?cookie.split("=",?2); ????????????cookies.add(new?Cookie(splitCookie[0],?splitCookie[1],?"/")); ????????}); ????????return?cookies; ????} ????public?void?itsdown()?{ ????????driver.quit(); ????} }

      基于selenium的爬蟲(chromedriver)-java環境下

      參考文檔:

      https://stackoverflow.com/questions/35776826/how-to-specify-the-chrome-binary-location-via-the-selenium-server-standalone-com

      https://stackoverflow.com/questions/45500606/set-chrome-browser-binary-through-chromedriver-in-python

      https://stackoverflow.com/questions/47396547/how-to-set-the-geo-location-through-code

      https://stackoverflow.com/questions/22130109/cant-use-chrome-driver-for-selenium

      https://stackoverflow.com/questions/20349844/how-chromedriverservice-is-useful-in-selenium-automation

      https://webcache.googleusercontent.com/search?q=cache:9Q8V7fW2DrUJ:https://xiaojingjing.iteye.com/blog/2382701+&cd=1&hl=en&ct=clnk&gl=sg

      https://webcache.googleusercontent.com/search?q=cache:rjkU_qxcMkQJ:https://zhuanlan.zhihu.com/p/30644530+&cd=10&hl=en&ct=clnk&gl=sg

      https://stackoverflow.com/questions/49788257/what-is-default-location-of-chromedriver-and-for-installing-chrome-on-windows

      https://stackoverflow.com/questions/16689426/how-to-set-google-chrome-in-webdriver

      版權聲明:本文內容由網絡用戶投稿,版權歸原作者所有,本站不擁有其著作權,亦不承擔相應法律責任。如果您發現本站中有涉嫌抄襲或描述失實的內容,請聯系我們jiasou666@gmail.com 處理,核實后本網站將在24小時內刪除侵權內容。

      版權聲明:本文內容由網絡用戶投稿,版權歸原作者所有,本站不擁有其著作權,亦不承擔相應法律責任。如果您發現本站中有涉嫌抄襲或描述失實的內容,請聯系我們jiasou666@gmail.com 處理,核實后本網站將在24小時內刪除侵權內容。

      上一篇:如何在Word2013中繪制斜線表頭 實例教程(word2013怎么繪制斜線表頭)
      下一篇:word換頁快捷鍵是什么?(word里換頁的快捷鍵)
      相關文章
      国产成人99久久亚洲综合精品| 亚洲AV无码成H人在线观看 | 亚洲成AV人片在WWW| 亚洲国产综合精品| 亚洲色图黄色小说| 亚洲精品国产情侣av在线| 亚洲av永久无码制服河南实里 | 亚洲五月午夜免费在线视频| 亚洲国产av无码精品| 亚洲国产精品不卡毛片a在线| 亚洲VA综合VA国产产VA中| 亚洲AV无码乱码在线观看性色扶 | 亚洲福利电影一区二区?| 亚洲高清美女一区二区三区| 中文字幕亚洲精品| 亚洲最大中文字幕| 亚洲mv国产精品mv日本mv| 亚洲1234区乱码| 国产亚洲玖玖玖在线观看| 亚洲熟妇无码AV不卡在线播放 | 久久久亚洲精品视频| 亚洲免费视频在线观看| 久久亚洲精品成人无码网站| 亚洲欧洲自拍拍偷综合| 色婷五月综激情亚洲综合| 亚洲乱码无人区卡1卡2卡3| 国产成人综合亚洲一区| 亚洲一级特黄大片在线观看| 伊人久久大香线蕉亚洲五月天| 国产v亚洲v天堂无码网站| 亚洲狠狠综合久久| 亚洲理论片在线观看| 亚洲一区二区三区在线网站| 亚洲精品中文字幕| 亚洲精品动漫人成3d在线| 亚洲一区AV无码少妇电影☆| 亚洲精品天天影视综合网| 亚洲啪啪免费视频| 亚洲AV无码国产一区二区三区| 亚洲国产一区视频| 久久青草亚洲AV无码麻豆|