Quality of Old Pages
Q. When assessing new pages on a website, the quality of older pages matters too
- (15:38) When Google tries to understand the quality of a website overall, it’s a process that takes a lot of time and if new 5 pages are added to the website that already has 10000 pages, then the focus will be on most of the site first. Over time Google will see how that settles down with the new content there as well, but the quality of the most of the website matters in the context of new pages.
Link Equity and Backlinks
Q. Links from high value pages are treated with a little bit more weight compared to some random page on the Internet
- (17:02) When assessing a value of the link from another website, Google doesn’t look at the referral traffic or at the propensity to click on that link. John explains that Google doesn’t expect people to click on every link on a page – people don’t follow links because some expert said to do so, look at the website and confirm whether things written are true about the website. People follow the link when they want to find out more information about the product. Therefore, things like the referral traffic or the propensity to click are not taken into account when evaluating the link.
When evaluating the link Google looks at the page factors and quality of the website linking. It works almost the same way PageRank works – there is a certain value set up for an individual page, and the fraction of that value is passed on through the links. So if a page is of high value, then the links from the page are treated with more weight compared to random pages on the Internet. It’s not exactly that way now as it was in the beginning, but it’s a good way of thinking about these things.
Other questions from the floor:
Category Page Dropping in the Rankings for One Keyword
Q. If a page tries to cover more than one search intent, it might lead to the page dropping in rankings
- (00:34) The person asking the question talks about a category page on his website that has a huge advisory text, dropping in the rankings for a particular keyword in a plural form. John says that in this particular category page there is so much textual information that it’s hard to tell whether the page is for people who want to buy the products or for those who want to get more info about these products, if the page is for someone looking for one product or for a bigger picture of the whole space. Therefore, John suggests splitting those pages and have two for different purposes instead of one that tries to cover everything.
As for the singular and plural forms of the keyword, he thinks that probably when someone searches for plural they want different kinds of products, and when searching for singular someone might want a product landing page, and overtime the system tries to figure out the intent behind these queries, and also how this intent changes over time. For example, “Christmas tree” might be an informational query, and around December it becomes more of a transactional intent. And if a category page covers for both, on the one hand, it’s good, because it covers both sides, but at the same time Google might see only the transactional side and ignore the informational one. So having these two intents on a separate page is a reasonable thing to do. There are different ways to do that: some people have completely separate pages, others make informational blogs from which they link to category pages or individual products.
Q. Even though this drop might happen to only certain keywords, Google doesn’t have a list of keywords for these things
- (04:52) The person asking the question wonders why this happens only to some keywords, and if there is a list of specific keywords that this happens to. John says that it’s doubtful they will manually create a list of keywords for that, as it’s something that the system tries to pick up automatically, and it might pick it up in one way for certain pages and differently for others, and change over time.
Content Word Limit
Q. There is no limit on how many words the content on the category page should be
- (09:05) There needs to be some information about the products on the page, so that Google understands what the topic is, but generally that is very little information and in many cases Google understands that from the products listed on the page if the names of the products are clear enough, for example “running shoes”, “Nike running shoes” and running shoes by other brands. In this case, it’s clear that the page is about running shoes, there is no need to put an extra text there. But sometimes product names are a little hard to understand, and in that situation it makes sense to put some text there, and John suggests that these texts need to be around 2-3 sentences.
Q. The same chunks of text can be used in category pages and blogs posts
- (10:19) Having a small amount of text duplicated is not a problem. For example, using a few sentences of text from a blog post in category pages is fine.
Merging Websites
Q. There is no fixed timeline on when the pages of the merged websites are crawled and the results of that become visible
- (10:56) Pages across a website get crawled at different speeds: some pages are recrawled every day, some are once a month or once in every few months. If the content is on the page that rarely gets recrawled then it’s going to take a long time for that to be reflected whereas if it’s content that is being crawled very actively, then the changes should be seen within a week or so.
Index Coverage Report
Q. After merging websites, if traffic is going to the right pages, if there is a shift in performance report, then there is no need to watch out for the Index Coverage Report
- (13:25) The person asking the question is concerned by the fact that after merging two websites, they’re not getting any difference in the Index Coverage Report results. John says that when pages are merged, their system needs to find a new canonical for this page first, and it takes a little bit of time for that to be reflected in the reporting. Usually, when it’s the case of simply moving everything, it’s just a transfer, there is no need for figuring out the canonicalisation. And when it’s the merging process, it takes more time.
301 Redirect, Validation Error
Q. The Change Address Tool is not necessary for migration, checking the redirects is more priority
- (20:53) John says that although some people use the Change of Address tool when migrating a website, it’s just an extra signal and not a requirement – as long as the redirects are set up properly, Change of Address doesn’t really matter. If there are things like 301 redirect error, redirects need to be re-checked, but it’s hard to tell what the issue might be without looking at it case by case. John suggests that the person asking the question can look at the things like, for example if he has a www version and a non-www version of their website, he might need to look at his redirects step by step through that. For example, he might be redirecting to the non-www version and then redirecting to the new domain, and then submitting the Change of Address in the version of the site that is not a primary version – that’s one of the things to double-check. Basically, whether it’s the version of the website that is or was currently indexed is being submitted or maybe it’s the alternate version in search console.
Several Schemas on a Page
Q. There can be any number of structured data on a page, but it should be noted that only one kind of structured data will be shown on rich results page
- (23:36) There can be any number of schema on one page, but John points out that for most cases when it comes to the rich results that Google shows on the search results, only one kind of structured data will be picked to be shown there. If someone has multiple structured data on their page, then there is a very high chance Google will pick one of these types and show in rich results. So if there is a need for one particular type to be featured, and there is no combined uses in the search results, then it’s better to focus on one structured data that is to be shown in rich results.
Random 404 Pages
Q. Random 404 URLs in a website don’t really affect anything
- (24:39) The person asking the question is concerned with a steady increase of 404 pages on his website that are not part of the website, and them making up for over 40% of the crawl response. John argues that there is nothing to worry about, as these URLs are probably random URLs found on some scraper site that is scraping things in a bad way – that’s a very common thing. When Google tries to crawl them, they return 404, so the crawlers start to ignore those pages. 404 pages don’t really exist. When looking at a website, Google tries to understand which URLs it needs to crawl and how frequently it needs to crawl them. And after working out what needs to be done. Google looks at what it can do additionally and starts trying to crawl a graded set of URLs that sometimes includes URLs from scraper sites. So if these random URLs are being crawled on a website, that means the most important URLs have already been crawled, and there are time and capacity to crawl more. So, in a way, 404 are not an issue, but a sign that there is enough capacity, and if there is more content than what was linked within the website, Google would probably crawl and index that too. It’s a good sign, and these 404 pages don’t need to be blocked by robot.txt or suppress them in any way.
Blocking Traffic from Other Countries
Q. If you’re to block undesired traffic from other countries, don’t block the U.S., since the website are crawled from there
- (27:34) The person asking the question is concerned that they’re getting their Core Web Vital scores go down because they originated in France and there is a lot of traffic from other countries with a bad bandwidth. John advises against blocking traffic from other countries, especially the U.S. as crawlers crawl from the U.S., and if it’s blocked, the website pages wouldn’t be crawled and indexed. So if the website owner is to block other countries, he should at least keep the U.S.
Q. Blocking the content for users and showing it to the Google Bots is against the guidelines
- (28:47) From the guidelines, it’s clear that the website should be showing to the crawlers what it shows the users from that country. John says, one way to not involve undesired traffic from some countries (countries for which the website doesn’t provide service), is to use Paywall Structured Data. After marking the content up with the Paywall, the users that have the access can log in and get the content, and the page can be crawled.
Another way for that, John suggests, is providing some level of information that can be provided in the U.S. For example, casino content is illegal in the U.S., so some websites have a simplified version of the content which they can provide for the U.S., which is more like descriptive information about the content. So, if, for instance, there are movies that can’t be provided in the U.S., the description of the movie can be served in the U.S., even if the movie itself can’t.
Page Grouping
Q. When it comes to grouping, there is no exact definition on how Google does grouping, because that evolves over time depending on the amount of data Google has for a website
- (35:36) The first thing John highlights about grouping is that if there is a lot of data for a lot of different pages on a website, it’s easier for Google to say that it will do grouping slightly more fine-grained rather than rough, while if there is not a lot of data, then it might end up taking the whole website as one group.
The second thing John points out is that the data collected is based on field data that the website owner sees in Search Console, which means that it’s not so much of Google taking the average of an individual page and averaging them by the number of pages, but rather Google will do something like traffic weighed average. Some pages will have a lot more traffic and there will be more data there, some will have less traffic and less data. So if a lot of people go to the home page of the website and not so many on individual products, then it might be that home page weighs a little higher just because it has more data there. Therefore, it’s more reasonable to look at Google Analytics or any other analytics, figure out which pages are getting a lot of traffic, and by optimising those pages improve the user experience that will count towards Core Web Vitals. Essentially, it is less of averaging across the number of pages and more averaging across the traffic of what people actually see when they come to the website.
Subdomains and Folders
Q. Subdomains and subdirectories are almost equivalent in terms of content, but there are differences in other aspects
- (45:25) According to the Search Quality Team, subdomains and subdirectories are essentially equivalent – the content can be put either way. Some people in SEO might think otherwise.
John argues, there are a few aspects where that plays a role, and it is less with regards to SEO and more about reporting. For example, like if the performance of these sites are to be tracked separately on separate host names or together in one host name. For some websites Google might treat things on a subdomain slightly differently because it thinks it is more like a separate website. John suggests that even though these aspects may come to play a role, it’s more important to focus on the website infrastructure first and see what makes sense for that particular case.
Gambling Website Promotion
Q. From the SEO side, there is no problem in publishing gambling related things
- (48:59) John is not sure about the policies for publishers when it comes to the gambling content, but he says that SEO wise, people can publish whatever they want.
Removing Old Content
Q. Blindly removing old content from a website is not the best strategy
- (49:42) John says that he wouldn’t recommend removing old content from a website just for the sake of removing it – old content might still be useful. He doesn’t see a value in that for SEO side of things, but he points out that archiving the old pages for usability or maintenance reasons should be fine.
Duplicate Content
Q. Duplicate content is not penalised, with a few exceptions to this rule
- (55:05) John says that the only time when Google would have something like a penalty or an algorithmic action or manual action is when the whole website is purely duplicated content – one website scraping the other website. For example, when it’s an ecommerce website that has the same description, but the rest of the website is different, that’s perfectly fine – it doesn’t lead to any kind of demotion or dropping in rankings.
With duplicate content, there are two things that Google looks at, the first being, if the whole page is the same. That includes everything – the header, the footer, the address of the store and things like that. So if it’s just a description on an ecommerce website matching the manufacturer’s description, but everything around that is different, it’s fine.
The second thing about the duplicate content plays a role when Google shows a snippet in the search results. Essentially, Google tries to avoid creating search result pages where the snippet is exactly the same as the other websites’. So if someone is searching for something generic, which is only in the description of that product, and the snippet that google would show for the website and for the manufacturer’s website are exactly the same, then Google tries to pick one of the pages. Google will try to pick the best page out of those that have exactly the same description.
Sign up for our Webmaster Hangouts today!