How SEO and Content Teams can Work Together, Smarter
Google Webmaster Central Blog recently published an article about how Google Search knows the best date for your webpage, and how they select the date to populate in the search engine results page. For example:
This is a tutorial on how you can easily expedite the process of crawling your top performing organic landing pages, finding the XPath of the date in your post, writing the rules for extracting the date XPath in Screaming Frog Web Crawler, and combining the data via excel or Google Sheets so you can present your editorial/content team with a list of key pages to update.
You shouldn't need any prior knowledge of XPath to keep up with this tutorial, but at the end of this post there are a few recommended resources for continuing your education on the subject.
The tools you need:
Screaming Frog SEO Spider
Google Search Console
Google Chrome
Excel or Google Sheets
STEP 1
Identify the top pages over a 3-month, 2-month, or 1-month timeframe in Google Search Console. Once you filter, export into Excel or Google Sheets . I generally export the last 3 months of landing pages.
Note: If your website contains a separate subdirectory for blog pages (www.website.com/blog), you can add a page filter in Google Search Console to only pull those top pages. In this example, our site's article pages are on a subdirectory /magazine/.
STEP 2
Open up a page from the top-pages export into your Google Chrome browser.
You now have to locate where the date is on the article. When you find the date, right click on it with your mouse and click “INSPECT”. This will bring up the Source Code for the webpage, and your time entry should be highlighted.
STEP 3
Right click on the highlighted entry in the source code, and navigate to “COPY” then “COPY XPATH”. This XPath will most likely be different for every website, because every website is built differently, with different developers, and different content management systems. My XPath for the article date is:
//*[@id="container-scroll"]/div/div[2]/div[2]/div[1]/div/span[2]/time
STEP 4
Paste your XPath somewhere, we'll need that in a little bit. Close the source code of the page, and open up your XPath Helper Google Chrome extension. It looks like a black box with a white "X".
The XPath helper opens a black bar across the top of your screen with two sections for “QUERY” and “RESULTS”.
Paste your XPath into the Query section, and ensure it returns your date field in the results.
Because I’m familiar with XPath, I know I can also shorten my XPath to the XPath below, and still get the same result:
//time
STEP 5
Open Screaming Frog. Navigate to CONFIGURATION > CUSTOM > EXTRACTION in the main navigation. You’ll see 10 Extractor areas (you can extract 10 XPaths for each crawl). From the INACTIVE drop down, select XPath. Name your Extractor whatever you desire, this will show up in the spreadsheet column when you export the data. Paste your XPath that you’ve copied from the date's source code. When you see a green checkmark, you should be good to go. For this example, we are extracting INNER HTML.
Hit OK.
STEP 6
Go back to the top URLs you pulled from Google Search Console, and copy the first 100 or so URLs. Paste them into Screaming Frog, via MODE>LIST>UPLOAD>PASTE. This will only crawl your pasted URLs, and will not navigate through the entire site.
Once you click OK, the crawl should start running and complete fairly quickly. Navigate to the CUSTOM tab, and select EXTRACTION from the Filter:
STEP 7
Make sure your XPath is populating accurately in the columns. If not, you may have to do some additional customization with your XPath. Screaming Frog offers a great XPath resource and are helpful with support if you have troubles. Richard Baxter from Builtvsible also has a great guide to XPath for SEOs.
If everything looks good, click the Export button.
STEP 8
Now it’s time to combine the data. Paste your Custom Extraction export into a separate tab of your Google Search Console export.
In your Google Search Console tab, in cell F2, insert a VLOOKUP function to find the Published date on your XPath tab. It should look something like:
=VLOOKUP(A2,Xpath!A1:D101,4,0)
However, I clean up the formula by adding "$" to make sure my ranges don’t change when I copy the formula down the column. I change my formula to:
=VLOOKUP($A2,Xpath!$A$1:$D$101,4,0)
This is helpful if you are copying and pasting your formula across multiple columns, extracting different items from your column_index_number.
Double click the bottom right hand corner of F2 to copy and paste the filter down the rest of the column.
STEP 9
Highlight any dates that may seem outdated.
The travel industry, among many other industries, is very competitive, so we focus on keeping content updated more for the articles that are competing for valuable SERP space and profitable keywords. This could mean an article written 2 months ago - 6 months ago should be updated to help increase CTR from the SERPs.
Please note, if you go through this process, you or your content team must actually CHANGE/UPDATE your articles. Do not simply swap a few words around and republish the article. Google can see this as a black-hat technique.
Once you republish your article updates, the date will change in a couple days in Google SERPs, and users will be more inclined to click your updated result!
Additional Resources on Web Scraping and XPath
https://www.screamingfrog.co.uk/web-scraping/
https://builtvisible.com/seo-guide-to-xpath/
Comments