Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

How to Scrape Google Scholar

Oxylabs promo code

YouTube

Take a look at the process of getting titles, authors, and citations from Google Scholar using Oxylabs SERP Scraper API (a part of Web Scraper API) and Python. You can get a 1-week free trial by registering on the dashboard.

For a detailed walkthrough with explanations and visuals, check our blog post. Also, do not hesitate to check this Best SERP APIs list

The complete code

import requests from bs4 import BeautifulSoup USERNAME = "USERNAME" PASSWORD = "PASSWORD" def get_html_for_page(url): payload = { "url": url, "source": "google", } response = requests.post( "https://realtime.oxylabs.io/v1/queries", auth=(USERNAME, PASSWORD), json=payload, ) response.raise_for_status() return response.json()["results"][0]["content"] def get_citations(article_id): url = f"https://scholar.google.com/scholar?q=info:{article_id}:scholar.google.com&output=cite" html = get_html_for_page(url) soup = BeautifulSoup(html, "html.parser") data = [] for citation in soup.find_all("tr"): title = citation.find("th", {"class": "gs_cith"}).get_text(strip=True) content = citation.find("div", {"class": "gs_citr"}).get_text(strip=True) entry = { "title": title, "content": content, } data.append(entry) return data def parse_data_from_article(article): title_elem = article.find("h3", {"class": "gs_rt"}) title = title_elem.get_text() title_anchor_elem = article.select("a")[0] url = title_anchor_elem["href"] article_id = title_anchor_elem["id"] authors = article.find("div", {"class": "gs_a"}).get_text() return { "title": title, "authors": authors, "url": url, "citations": get_citations(article_id), } def get_url_for_page(url, page_index): return url + f"&start={page_index}" def get_data_from_page(url): html = get_html_for_page(url) soup = BeautifulSoup(html, "html.parser") articles = soup.find_all("div", {"class": "gs_ri"}) return [parse_data_from_article(article) for article in articles] data = [] url = "https://scholar.google.com/scholar?q=global+warming+&hl=en&as_sdt=0,5" NUM_OF_PAGES = 1 page_index = 0 for _ in range(NUM_OF_PAGES): page_url = get_url_for_page(url, page_index) entries = get_data_from_page(page_url) data.extend(entries) page_index += 10 print(data)

Final word

Check our documentation for more API parameters and variables found in this tutorial.

If you have any questions, feel free to contact us at support@oxylabs.io.

Read More Google Scraping Related Repositories: Google Sheets for Basic Web Scraping, Google Play Scraper, How To Scrape Google Jobs, Google News Scrpaer, How to Scrape Google Flights with Python, How To Scrape Google Images, Scrape Google Search Results, Scrape Google Trends

关于 About

A guide for extracting titles, authors, and citations from Google Scholar using Python and Oxylabs SERP Scraper API.
google-scholargoogle-scholar-scrapergoogle-scholar-scrappergoogle-search-scraperpythonpython-scraperscraper-apiweb-scraperweb-scraping

语言 Languages

Python100.0%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
4
Total Commits
峰值: 2次/周
Less
More

核心贡献者 Contributors