How one can Scrape & Analyze Google Search Outcomes with Python


Ever spent hours analyzing Google search outcomes and ended up extra annoyed and confused than earlier than?

Python hasn’t.

On this article, we’ll discover why Python is a perfect selection for Google search evaluation and the way it simplifies and automates an in any other case time-consuming job.

We’ll additionally carry out an search engine optimisation evaluation in Python from begin to end. And supply code so that you can copy and use.

However first, some background.

Why to Use Python for Google Search and Evaluation

Python is called a flexible, easy-to-learn programming language. And it actually shines at working with Google search information. 

Why? 

Listed below are a number of key causes that time to Python as a best choice for scraping and analyzing Google search outcomes:

Python Is Straightforward to Learn and Use

Python is designed with simplicity in thoughts. So you’ll be able to concentrate on analyzing Google search outcomes as an alternative of getting twisted up in difficult coding syntax.

It follows a straightforward to know syntax and elegance. Which permits builders to put in writing fewer strains of code in comparison with different languages.

an infographic showing the same example using Python, Java and C++

Python Has Effectively-Geared up Libraries

A Python library is a reusable chunk of code created by builders you could reference in your scripts to offer additional performance with out having to put in writing it from scratch.

And Python now has a wealth of libraries like:

  • Googlesearch, Requests, and Lovely Soup for net scraping
  • Pandas and Matplotlib for information evaluation 

These libraries are highly effective instruments that make scraping and analyzing information from Google searches environment friendly.

Python Affords Help from a Massive Group and ChatGPT

You’ll be well-supported in any Python mission you undertake, together with Google search evaluation.

As a result of Python’s reputation has led to a big, lively neighborhood of builders. And a wealth of tutorials, boards, guides, and third-party instruments. 

And when you’ll be able to’t discover pre-existing Python code on your search evaluation mission, chances are high that ChatGPT will be capable to assist.

When utilizing ChatGPT, we advocate prompting it to:

  • Act as a Python knowledgeable and
  • Assist with an issue

Then, state:

  • The aim (“to question Google”) and 
  • The specified output (“the best model of a question”)
an example of Python-based query in ChatGPT

Setting Up Your Python Surroundings

You will must arrange your Python surroundings earlier than you’ll be able to scrape and analyze Google search outcomes utilizing Python. 

There are various methods to get Python up and working. However one of many quickest methods to begin analyzing Google search engine outcomes pages (SERPs) with Python is Google’s personal pocket book surroundings: Google Colab.

Right here’s how simple it’s to get began with Google Colab:

1. Entry Google Colab: Open your net browser and go to Google Colab. In case you have a Google account, sign up. If not, create a brand new account.

2. Create a brand new pocket book: In Google Colab, click on on “File” > “New Pocket book” to create a brand new Python pocket book.

open new notebook in Google Colab

3. Examine set up: To make sure that Python is working appropriately, run a easy check by coming into and executing the code under. And Google Colab will present you the Python model that’s at present put in:

import sys
sys.model

Wasn’t that straightforward? 

There’s only one extra step earlier than you’ll be able to carry out an precise Google search.

Importing the Python Googlesearch Module

Use the googlesearch-python package deal to scrape and analyze Google search outcomes with Python. It offers a handy approach to carry out Google searches programmatically. 

Simply run the next code in a code cell to entry this Python Google search module:

from googlesearch import search
print("Googlesearch package deal put in efficiently!")

One advantage of utilizing Google Colab is that the googlesearch-python package deal is pre-installed. So, no want to try this first. 

It’s able to go when you see the message “Googlesearch package deal put in efficiently!”

Now, we’ll discover how one can use the module to carry out Google searches. And extract precious data from the search outcomes.

How one can Carry out a Google Search with Python

To carry out a Google search, write and run a number of strains of code that specify your search question, what number of outcomes to show, and some different particulars (extra on this within the subsequent part).

# set question to search for in Google
question = "lengthy winter coat"
# execute question and retailer search outcomes
outcomes = search(question, tld="com", lang="en", cease=3, pause=2)
# iterate over all search outcomes and print them
for end result in outcomes:
print(end result)

You’ll then see the highest three Google search outcomes for the question “lengthy winter coat.” 

Right here’s what it seems like within the pocket book:

top three Google search results for the query "long winter coat" in the notebook

To confirm that the outcomes are correct, you need to use Key phrase Overview.

Open the software, enter “lengthy winter coat” into the search field, and ensure the situation is ready to “U.S.” And click on “Search.” 

search for "long winter coat" in the US in Keyword Overview tool

Scroll all the way down to the “SERP Evaluation” desk. And you need to see the identical (or very comparable) URLs within the high three spots.

"SERP Analysis" table

Key phrase Overview additionally reveals you quite a lot of useful information that Python has no entry to. Like month-to-month search quantity (globally and in your chosen location), Key phrase Issue (a rating that signifies how troublesome it’s to rank within the high 10 outcomes for a given time period), search intent (the rationale behind a consumer’s question), and way more. 

Understanding Your Google Search with Python

Let’s undergo the code we simply ran. So you’ll be able to perceive what every half means and how one can make changes on your wants.

We’ll go over every half highlighted within the picture under:

an image of the code ran above in Keyword Overview
  1. Question variable: The question variable shops the search question you wish to execute on Google
  2. Search operate: The search operate offers varied parametersthat assist you to customise your search and retrieve particular outcomes:
    1. Question: Tells the search operate what phrase or phrase to seek for. That is the one required parameter, so the search operate will return an error with out it. That is the one required parameter; all following ones are non-obligatory.
    2. Tld (quick for top-level area): Enables you to decide which model of Google’s web site you wish to execute a search in. Setting this to “com” will search google.com; setting it to “fr” will search google.fr.
    3. Lang: Permits you to specify the language of the search outcomes. And accepts a two-letter language code (e.g., “en” for English).
    4. Cease: Units the variety of the search outcomes for the search operate. We’ve restricted our search to the highest three outcomes, however you would possibly wish to set the worth to “10.”
    5. Pause: Specifies the time delay (in seconds) between consecutive requests despatched to Google. Setting an applicable pause worth (we advocate no less than 10) may also help keep away from being blocked by Google for sending too many requests too rapidly.
  3. For loop sequence: This line of code tells the loop to iterate via every search end result within the “outcomes” assortment one after the other, assigning every search end result URL to the variable “end result”
  4. For loop motion: This code block follows the for loop sequence (it’s indented) and comprises the actions to be carried out on every search end result URL. On this case, they’re printed into the output space in Google Colab.

How one can Analyze Google Search Outcomes with Python

When you’ve scraped Google search outcomes utilizing Python, you need to use Python to research the information to extract precious insights. 

For instance, you’ll be able to decide which key phrases’ SERPs are comparable sufficient to be focused with a single web page. Which means Python is doing the heavy lifting concerned in key phrase clustering.

Let’s follow our question “lengthy winter coat” as a place to begin. Plugging that into Key phrase Overview reveals over 3,000 key phrase variations.

"Keyword Variations" table for "long winter coat" query shows 3.0K results

For the sake of simplicity, we’ll follow the 5 key phrases seen above. And have Python analyze and cluster them by creating and executing this code in a brand new code cell in our Google Colab pocket book:

import pandas as pd
# Outline the primary question and checklist of queries
main_query = "lengthy winter coat"
secondary_queries = ["long winter coat women", "womens long winter coats", "long winter coats for women", "long winter coats"]
# Execute the primary question and retailer search outcomes
main_results = search(main_query, tld="com", lang="en", cease=3, pause=2)
main_urls = set(main_results)
# Dictionary to retailer URL percentages for every question
url_percentages = {}
# Iterate over the queries
for secondary_query in secondary_queries:
# Execute the question and retailer search outcomes
secondary_results = search(secondary_query, tld="com", lang="en", cease=3, pause=2)
secondary_urls = set(secondary_results)
# Compute the share of URLs that seem in the primary question outcomes
share = (len(main_urls.intersection(secondary_urls)) / len(main_urls)) * 100
url_percentages[secondary_query] = share
# Create a dataframe from the url_percentages dictionary
df_url_percentages = pd.DataFrame(url_percentages.objects(), columns=['Secondary Query', 'Percentage'])
# Kind the dataframe by share in descending order
df_url_percentages = df_url_percentages.sort_values(by='Proportion', ascending=False)
# Print the sorted dataframe
df_url_percentages

With 14 strains of code and a dozen or so seconds of ready for it to execute, we are able to now see that the highest three outcomes are the identical for these queries:

  • “lengthy winter coat”
  • “lengthy winter coat girls”
  • “womens lengthy winter coats”
  • “lengthy winter coats for ladies”
  • “lengthy winter coats”

So, these queries could be focused with the identical web page.

Additionally, you shouldn’t attempt to rank for “lengthy winter coat” or “lengthy winter coats” with a web page providing coats for males.

Understanding Your Google Search Evaluation with Python

As soon as once more, let’s undergo the code we’ve simply executed. It’s just a little extra advanced this time, however the insights we’ve simply generated are way more helpful, too.

an image of the code ran above in Keyword Overview

1. Import pandas as pd: Imports the Pandas library and makes it callable by the abbreviation “pd.” We’ll use the Pandas library to create a “DataFrame,” which is actually a desk contained in the Python output space.

2. Main_query = “python google search”: Defines the primary question to seek for on Google

3. Secondary_queries = [“google search python”, “google search api python”, “python search google”, “how to scrape google search results python”]: Creates a listing of queries to be executed on Google. You possibly can paste many extra queries and have Python cluster a whole bunch of them for you.

4. Main_results = search(main_query, tld=”com”, lang=”en”, cease=3, pause=2): Executes the primary question and shops the search leads to main_results. We restricted the variety of outcomes to 3 (cease=3), as a result of the highest three URLs in Google’s search outcomes typically do the most effective job when it comes to satisfying customers’ search intent.

5. Main_urls = set(main_results): Converts the search outcomes of the primary question right into a set of URLs and shops them in main_urls

6. Url_percentages = {}: Initializes an empty dictionary (a listing with mounted worth pairs) to retailer the URL percentages for every question

an image of the code described in this section

7. For secondary_query in secondary_queries :: Begins a loop that iterates over every secondary question within the secondary queries checklist

8. Secondary_results = search(secondary_query, tld=”com”, lang=”en”, cease=3, pause=2): Executes the present secondary question and shops the search leads to secondary_results. We restricted the variety of outcomes to 3 (cease=3) for a similar cause we talked about earlier.

9. Secondary_urls = set(secondary_results): Converts the search outcomes of the present secondary question right into a set of URLs and shops them in secondary_urls

10. Proportion = (len(main_urls.intersection(urls)) / len(main_urls)) * 100: Calculates the share of URLs that seem in each the primary question outcomes and the present secondary question outcomes. The result’s saved within the variable share.

11. Url_percentages[secondary_query] = share: Shops the computed URL share within the url_percentages dictionary, with the present secondary question as the important thing

an image of the code described in this section

12. Df_url_percentages = pd.DataFrame(url_percentages.objects(), columns=[‘Secondary Query’, ‘Percentage’]): Creates a Pandas DataFrame that holds the secondary queries within the first column and their overlap with the primary question within the second column. The columns argument (which has three labels for the desk added) is used to specify the column names for the DataFrame.

13. Df_url_percentages = df_url_percentages.sort_values(by=’Proportion’, ascending=False): Types the DataFrame df_url_percentages primarily based on the values within the Proportion column. By setting ascending=False, the dataframe is sorted from the best to the bottom values.

14. Df_url_percentages: Exhibits the sorted DataFrame within the Google Colab output space. In most different Python environments you would need to use the print() operate to show the DataFrame. However not in Google Colab— plus the desk is interactive.

Briefly, this code performs a sequence of Google searches and reveals the overlap between the highest three search outcomes for every secondary question and the primary question. 

The bigger the overlap is, the extra seemingly you’ll be able to rank for a major and secondary question with the identical web page. 

Visualizing Your Google Search Evaluation Outcomes

Visualizing the outcomes of a Google search evaluation can present a transparent and intuitive illustration of the information. And allow you to simply interpret and talk the findings.

Visualization turns out to be useful after we apply our code for key phrase clustering to not more than 20 or 30 queries. 

Be aware: For bigger question samples, the question labels within the bar chart we’re about to create will bleed into one another. Which makes the DataFrame created above extra helpful for clustering.

You possibly can visualize your URL percentages as a bar chart utilizing Python and Matplotlib with this code:

import matplotlib.pyplot as plt
sorted_percentages = sorted(url_percentages.objects(), key=lambda x: x[1], reverse=True)
sorted_queries, sorted_percentages = zip(*sorted_percentages)
# Plotting the URL percentages with sorted x-axis
plt.bar(sorted_queries, sorted_percentages)
plt.xlabel("Queries")
plt.ylabel("URL Proportion")
plt.title("URL Proportion in Search Outcomes")
plt.xticks(rotation=45)
plt.ylim(0, 100)
plt.tight_layout()
plt.present()

We’ll rapidly run via the code once more:

an image of the code described in this section

1. Sorted_percentages = sorted(url_percentages.objects(), key=lambda x: x[1], reverse=True): This specifies that the URL percentages dictionary (url_percentages) is sorted by worth in descending order utilizing the sorted() operate. It creates a listing of tuples (worth pairs) sorted by the URL percentages.

2. Sorted_queries, sorted_percentages = zip(*sorted_percentages): This means the sorted checklist of tuples is unpacked into two separate lists (sorted_queries and sorted_percentages) utilizing the zip() operate and the * operator. The * operator in Python is a software that allows you to break down collections into their particular person objects

an image of the code section described above

3. Plt.bar(sorted_queries, sorted_percentages): This creates a bar chart utilizing plt.bar() from Matplotlib. The sorted queries are assigned to the x-axis (sorted_queries). And the corresponding URL percentages are assigned to the y-axis (sorted_percentages).

4. Plt.xlabel(“Queries”): This units the label “Queries” for the x-axis

5. Plt.ylabel(“URL Proportion”): This units the label “URL Proportion” for the y-axis

6. Plt.title(“URL Proportion in Search Outcomes”): This units the title of the chart to “URL Proportion in Search Outcomes”

7. Plt.xticks(rotation=45): This rotates the x-axis tick labels by 45 levels utilizing plt.xticks() for higher readability

8. Plt.ylim(0, 100): This units the y-axis limits from 0 to 100 utilizing plt.ylim() to make sure the chart shows the URL percentages appropriately

9. Plt.tight_layout(): This operate adjusts the padding and spacing between subplots to enhance the chart’s structure

10. Plt.present(): This operate is used to show the bar chart that visualizes your Google search outcomes evaluation

And right here’s what the output seems like:

"URL percentage in search results" graph

Grasp Google Search Utilizing Python’s Analytical Energy

Python gives unimaginable analytical capabilities that may be harnessed to successfully scrape and analyze Google search outcomes.

We’ve checked out how one can cluster key phrases, however there are just about limitless purposes for Google search evaluation utilizing Python. 

However even simply to increase the key phrase clustering we’ve simply carried out, you could possibly:

  • Scrape the SERPs for all queries you intend to focus on with one web page and extract all of the featured snippet textual content to optimize for them
  • Scrape the questions and solutions contained in the Folks additionally ask field to regulate your content material to indicate up in there

You’d want one thing extra strong than the Googlesearch module. There are some nice SERP software programming interfaces (APIs) on the market that present just about all the knowledge you discover on a Google SERP itself, however you would possibly discover it less complicated to get began utilizing Key phrase Overview.

This software reveals you all of the SERP options on your goal key phrases. So you’ll be able to examine them and begin optimizing your content material.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles