Which number is correct, Page Speed Insights or Search Console?
Q: (00:30) Starting off, I have one topic that has come up repeatedly recently, and I thought I would try to answer it in the form of a question while we’re at it here. So, first of all, when I check my page speed insight score on my website, I see a simple number. Why doesn’t this match what I see in Search Console and the Core Web Vitals report? Which one of these numbers is correct?
- (01:02) I think maybe, first of all, to get the obvious answer out of the door, there is no correct number when it comes to speed when it comes to an understanding of how your website is performing for your users. In PageSpeed Insights, by default, I believe we show a single number that is a score from 0 to 100, something like that, which is based on a number of assumptions where we assume that different things are a little bit faster or slower for users. And based on that, we calculate a score. In Search Console, we have the Core Web Vitals information based on three numbers: speed, responsiveness, and interactivity. And these numbers are slightly different because it’s three numbers, not just one. But, also, there’s a big difference in the way these numbers are determined. Namely, there’s a difference between so-called field data and lab data. Field data is what users see when they go to your website. And this is what we use in Google Search Console. That’s what we use for search, as well, whereas lab data is a theoretical view of your website, like where our systems have certain assumptions where they think, well, the average user is probably like this, using this kind of device, and with this kind of a connection, perhaps. And based on those assumptions, we will estimate what those numbers might be for an average user. And you can imagine those estimations will never be 100% correct. And similarly, the data that users have seen will change over time, as well, where some users might have a really fast connection or a fast device, and everything goes really fast on their website or when they visit your website, and others might not have that. And because of that, this variation can always result in different numbers. Our recommendation is generally to use the field data, the data you would see in Search Console, as a way of understanding what is kind of the current situation for our website, and then to use the lab data, namely, the individual tests that you can run yourself directly, to optimise your website and try to improve things. And when you are pretty happy with the lab data you’re getting with your new version of your website, then over time, you can collect the field data, which happens automatically, and double-check that users see it as being faster or more responsive, as well. So, in short, again, there is no absolutely correct number when it comes to any of these metrics. There is no absolutely correct answer where you’d say this is what it should be. But instead, there are different assumptions and ways of collecting data, and each is subtly different.
How can our JavaScript site get indexed better?
Q: (04:20) So, first up, we have a few custom pages using Next.js without a robots.txt or a sitemap file. Simplified, theoretically, Googlebot can reach all of these pages, but why is only the homepage getting indexed? There are no errors or warnings in Search Console. Why doesn’t Googlebot find the other pages?
- (04:40) So, maybe taking a step back, Next.js is a JavaScript framework, meaning the whole page is generated with JavaScript. But as a general answer, as well, for all of these questions like, why is Google not indexing everything? It’s important first to say that Googlebot will never index everything across a website. I don’t think it happens to any kind of non-trivial-sized website where Google would completely index everything. From a practical point of view, it’s impossible to index everything across the web. So that kind of assumption that the ideal situation is everything is indexed, I would leave that aside and say you want Googlebot to focus on the important pages. The other thing, though, which became a little bit clearer when, I think, the person contacted me on Twitter and gave me a little bit more information about their website, was that the way that the website was generating links to the other pages was in a way that Google was not able to pick up. So, in particular, with JavaScript, you can take any element on an HTML page and say, if someone clicks on this, then execute this piece of JavaScript. And that piece of JavaScript can be used to navigate to a different page, for example. And Googlebot does not click on all elements to see what happens. Instead, we go off and look for normal HTML links, which is the kind of traditional way you would link to individual pages on a website. And, with this framework, it didn’t generate these normal HTML links. So we could not recognise that there’s more to crawl and more pages to look at. And this is something that you can fix in how you implement your JavaScript site. We have a tonne of information on the Search Developer Documentation site around JavaScript and SEO, particularly on the topic of links because that comes up now and then. There are many creative ways to create links, and Googlebot needs to find those HTML links to make them work. Additionally, we have a bunch of videos on our YouTube channel. And if you’re watching this, you must be on the YouTube channel since nobody is here. If you’re watching this on the YouTube channel, go out and check out those JavaScript SEO videos on our channel to get a sense of what else you could watch out for when it comes to JavaScript-based websites. We can process most kinds of JavaScript-based websites normally, but some things you still have to watch out for, like these links.
Does it affect my SEO score negatively if I link to HTTP pages?
Q: (07:35)Next up, does it affect my SEO score negatively if my page is linking to an external insecure website?
- (07:44) So on HTTP, not HTTPS. So, first off, we don’t have a notion of an SEO score. So you don’t have to worry about the kind of SEO score. But, regardless, I kind of understand the question is, like, is it wrong if I link to an HTTP page instead of an HTTPS page. And, from our point of view, it’s perfectly fine. If these pages are on HTTP, then that’s what you would link to. That’s kind of what users would expect to find. There’s nothing against linking to sites like that. There is no downside for your website to avoid linking to HTTP pages because they’re kind of old or crusty and not as cool as on HTTPS. I would not worry about that.
Should I write how people speak?
Q: (08:39) With Symantec and voice search, is it better to use proper grammar or write how people actually speak? For example, it’s grammatically correct to write, “more than X years,” but people actually say, “over X years,” or write a list beginning with, “such as X, Y, and Z,” but people actually say, “like X, Y, and Z.”
- (09:04) Good question. So the simple answer is, you can write however you want. There’s nothing holding you back from just writing naturally. And essentially, our systems try to work with the natural content found on your pages. So if we can crawl and index those pages with your content, we’ll try to work with that. And there’s nothing special that you need to do there. The one thing I would watch out for, with regards to how you write your content, is just to make sure that you’re writing for your audience. So, for example, if you have some very technical content, but you want to reach people who are non-technical, then write in the non-technical language and not in a way that is understandable to people who are deep into that kind of technical information. So kind of the, I would guess, the traditional marketing approach of writing for your audience. And our systems usually are able to deal with that perfectly fine.
Should I delete my disavow file?
Q: (10:20) Next up, a question about links and disavows. Over the last 15 years, I’ve disavowed over 11,000 links in total. I never bought a link or did anything unallowed, like sharing. The links that I disavowed may have been from hacked sites or from nonsense, auto-generated content. Since Google now claims that they have better tools to not factor these types of hacked or spammy links into their algorithms, should I just delete my disavow file? Is there any risk or upside, or downside to just deleting it?
- (10:54) So this is a good question. It comes up now and then. And disavowing links is always kind of one of those tricky topics because it feels like Google is probably not telling you the complete information. But, from our point of view, we do work hard to avoid taking this kind of link into account. And we do that because we know that the disavow links tool is a niche tool, and SEOs know about it, but the average person who runs a website doesn’t know about it. And all those links you mentioned are the links that any website gets over the years. And our systems understand that these are not things you’re trying to do to game our algorithms. So, from that point of view, if you’re sure that there’s nothing around a manual action that you had to resolve with regards to these links, I would just delete the disavow file and move on with life and leave all of that aside. I would personally download it and make a copy so that you have a record of what you deleted. But, otherwise, if you’re sure these are just the normal, crusty things from the internet, I would delete it and move on. There’s much more to spend your time on when it comes to websites than just disavowing these random things that happen to any website on the web.
Can I add structured data with Google Tag Manager?
Q: (12:30) Adding schema markup with Google Tag Manager is that good or bad for SEO? Does it affect ranking?
- (12:33) So, first of all, you can add structure data with Google Tag Manager. That’s an option. Google Tag Manager is a simple piece of JavaScript you add to your pages and then does something on the server-side. And it can modify your pages slightly using JavaScript. For the most part, we’re able to process this normally. And the structured data you generally like can be counted, just like any other structured data on your web pages. And, from our point of view, structured data, at least the types that we have documented, is primarily used to help generate rich results, we call them, which are these fancy search results with a little bit more information, a little bit more colour or detail around your pages. And if you add your structured data with the Tag Manager, that’s perfectly fine. From a practical point of view, I prefer to have the structured data on the page or your server so that you know exactly what is happening. It makes it a little bit easier to debug things. It makes it easier to test things. So trying it out with Tag Manager, from my point of view, I think, is legitimate. It’s an easy way to try things out. But, in the long run, I would try to make sure that your structured data is on your site directly, just to make sure that it’s easier to process for anyone who comes by to process your structured data and it’s easier for you to track and debug and maintain over time, as well, so that you don’t have to check all of these different separate sources.
Is it better to block by robots.txt or with the robots meta tag?
Q: (14:20) Simplifying a question a little bit, which is better, blocking with robots.txt or using the robots meta tag on the page? How do we best prevent crawling?
- (14:32) So this also comes up from time to time. We did a podcast episode recently about this, as well. So I would check that out. The podcasts are also on the YouTube channel, so you can click around a little bit, and you’ll probably find them quickly. In practice, there is a subtle difference here where, if you’re in SEO and you’ve worked with search engines, then probably you understand that already. But for people who are new to the area, it’s sometimes unclear exactly where these lines are. And with robots.txt, which is the first one you mentioned in the question, you can essentially block crawling. So you can prevent Googlebot from even looking at your pages. And with the robot’s meta tag, you can do things like blocking indexing when Googlebot looks at your pages and sees that robot’s meta tag. In practice, both of these results in your pages do not appear in the search results, but they’re subtly different. So if we can’t crawl, we don’t know what we’re missing. And it might be that we say, well, there are many references to this page. Maybe it is useful for something. We just don’t know. And then that URL could appear in the search results without any of its content because we can’t look at it. Whereas with the robot’s meta tag, if we can look at the page, then we can look at the meta tag and see if there’s no index there, for example. Then we stop indexing that page and drop it completely from the search results. So if you’re trying to block crawling, then definitely, robots.txt is the way to go. If you just don’t want the page to appear in the search results, I would pick whichever is easier for you to implement. On some sites, it’s easier to set a checkbox saying that I don’t want this page found in Search, and then it adds a noindex meta tag. For others, maybe editing the robots.txt file is easier. Kind of depends on what you have there.
Can I list the same URL in multiple sitemap files?
Q: (16:38) Are there any negative implications to having duplicate URLs with different attributes in your XML sitemaps? For example, one URL in one sitemap with an hreflang annotation and the same URL in another sitemap without that annotation.
- (16:55) So maybe, first of all, from our point of view, this is perfectly fine. This happens now and then. Some people have hreflang annotations in sitemap files separated away, and then they have a normal sitemap file for everything. And there is some overlap there. From our point of view, we process these sitemap files as we can, and we take all of that information into account. There is no downside to having the same URL in multiple sitemap files. The only thing I would watch out for is that you don’t have conflicting information in these sitemap files. So, for example, if with the hreflang annotations, you’re saying, oh, this page is for Germany and then on the other sitemap file, you’re saying, well, actually this page is also for France or in French, then our systems might be like, well, what is happening here? We don’t know what to do with this kind of mix of annotations. And then we may pick one or the other. Similarly, if you say this page was last changed 20 years ago, which doesn’t make much sense but say you say 20 years. And in the other sitemap file, you say, well, actually, it was five minutes ago. Then our systems might look at that and say, well, one of you is wrong. We don’t know which one. Maybe we’ll follow one or the other. Maybe we’ll just ignore that last modification date completely. So that’s kind of the thing to watch out for. But otherwise, if it’s just mentioned multiple sitemap files and the information is either consistent or kind of works together, in that maybe one has the last modification date, the other has the hr flange annotations, that’s perfectly fine.
How can I block embedded video pages from getting indexed?
Q: (19:00) I’m in charge of a video replay platform, and simplified, our embeds are sometimes indexed individually. How can we prevent that?
- (19:10) So by embeds, I looked at the website, and basically, these are iframes that include a simplified HTML page with a video player embedded. And, from a technical point of view, if a page has iframe content, then we see those two HTML pages. And it is possible that our systems indexed both HTML pages because they are separate. One is included in the other, but they could theoretically stand on their own, as well. And there’s one way to prevent that, which is a reasonably new combination with robots meta tags that you can do, which is with the indexifembedded robots meta tag and a noindex robots meta tag. And, on the embedded version, so the HTML file with the video directly in it– you would add the combination of noindex plus indexifembedded robots meta tags. And that would mean that, if we find that page individually, we would see, oh, there’s a noindex. We don’t have to index this. But with the indexifembedded, it essentially tells us that, well, actually, if we find this page with the video embedded within the general website, then we can index that video content, which means that the individual HTML page would not be indexed. But the HTML page embedded with the video information would be indexed normally. So that’s kind of the setup that I would use there. And this is a fairly new robots meta tag, so it’s something that not everyone needs. Because this combination of iframe content or embedded content is kind of rare. But, for some sites, it just makes sense to do it like that.
Is it a problem if I can’t get listed in the HSTS preload list?
Q: (21:15)Another question about HTTPS, maybe. I have a question around preloading SSL via HSTS. We are running into an issue where implementing HSTS into the Google Chrome preload list. And the question kind of goes on with a lot of details. But what should we search for?
- (21:40) So maybe take a step back when you have HTTPS pages and an HTTP version. Usually, you would redirect from the HTTP version to HTTPS. And the HTTPS version would then be the secure version because that has all of the properties of the secure URLs. And the HTTP version, of course, would be the open one or a little bit vulnerable. And if you have this redirect, theoretically, an attacker could take that into account and kind of mess with that redirect. And with HSTS, you’re telling the browser that once they’ve seen this redirect, it should always expect that redirect, and it shouldn’t even try the HTTP version of that URL. And, for users, that has the advantage that nobody even goes to the HTTP version of that page anymore, making it a little more secure. And the pre-load list for Google Chrome is a static list that is included, I believe, in Chrome probably in all of the updates, or I don’t know if it’s downloaded separately. Not completely sure. But, essentially, this is a list of all of these sites where we have confirmed that HSTS is set up properly and that redirect to the secure page exists there so that no user ever needs to go to the HTTP version of the page, which makes it a little bit more secure. From a practical point of view, this difference is very minimal. And I would expect that most sites on the internet just use HTTPS without worrying about the pre-load list. Setting up HSTS is always a good practice, but it’s something that you can do on your server. And as soon as the user sees that, their Chrome version keeps that in mind automatically anyway. So from a general point of view, I think using the pre-load list is a good idea if you can do that. But if there are practical reasons why that isn’t feasible or not possible, then, from my point of view, I would not worry about only looking at the SEO side of things. When it comes to SEO, for Google, what matters is essentially the URL that is picked as the canonical. And, for that, it doesn’t need HSTS. It doesn’t need the pre-load list. That does not affect at all on how we pick the canonical. But rather, for the canonical, the important part is that we see that redirect from HTTP to HTTPS. And we can kind of get a confirmation within your website, through the sitemap file, the internal linking, all of that, that the HTTPS version is the one that should be used in Search. And if we use the HTTPS version in Search, that automatically gets all of those subtle ranking bonuses from Search. And the pre-load list and HSTS are not necessary there. So that’s kind of the part that I would focus on there.
How can I analyse why my site dropped in ranking for its brand name?
Q: (25:05) I don’t really have a great answer, but I think it’s important to at least mention, as well what are the possible steps for investigation if a website owner finds their website is not ranking for their brand term anymore, and they checked all of the things, and it doesn’t seem to be related to any of the usual things?
- (25:24) So, from my point of view, I would primarily focus on the Search Console or the Search Central Health Community and post all of your details there. Because this is where all of those escalations go and where the product and the Help forum, they can take a look at that. And they can give you a little bit more information. They can also give you their personal opinion on some of these topics, which might not match 100% what Google would say, but maybe they’re a little bit more practical, where, for example, probably not relevant to this site, but you might post something and say, well, my site is technically correct and post all of your details. And one of the product experts looks at it and says it might be technically correct, but it’s still a terrible website. You need to get your act together, write, and create better content. And, from our point of view, we would focus on technical correctness. And you need someone to give you that, I don’t know, personal feedback. But anyway, in the Help forums, if you post the details of your website with everything that you’ve seen, the product experts are often able to take a look and give you some advice on, specifically, your website and the situation that it’s in. And if they’re not able to figure out what is happening there, they also have the ability to escalate these kinds of topics to the community manager of the Help forums. And the community manager can also bring things back to the Google Search team. So if there are things that are really weird and now and then, something really weird does happen with regards to Search. It’s a complex computer system. Anything can break. But the community managers and the product experts can bring that back to the Search team. And they can look to see if there is something that we need to fix, or is there something that we need to tell the site owner, or is this kind of just the way that it is, which, sometimes, it is. But that’s generally the direction I would go for these questions. The other subtly mentioned here is that I think the site does not rank for its brand name. One of the things to watch out for, especially with regards to brand names, is that it can happen that you say something is your brand name, but it’s not a recognised term from users. For example, you might say I don’t know. You might call your website bestcomputermouse.com. And, for you, that might be what you call your business or what you call your website. Best Computer Mouse. But when a user goes to Google and enters “best computer mouse,” that doesn’t necessarily mean they want to go directly to your website. It might be that they’re looking for a computer mouse. And, in cases like that, there might be a mismatch of what we show in the search results with what you think you would like to have shown for the search results for those queries if it’s something more of a generic term. And these kinds of things also play into search results overall. The product experts see these all the time, as well. And they can recognise that and say, actually, just because you call your website bestcomputermouse.com I hope that site doesn’t exist. But, anyway, just because you call your website doesn’t necessarily mean it will always show on top of the search results when someone enters that query. But that’s kind of something to watch out for. But, in general, I would go to the Help forums here and include all of the information you know that might play a role here. So if there was a manual action involved and you’re kind of, I don’t know, ashamed of that which, it’s kind of normal. But all of this information helps the product experts better understand your situation and give you something actionable that you can do to take as a next step or to understand the situation a little bit better. So the more information you can give them from the beginning, the more likely they’ll be able to help you with your problem.
Sign up for our Webmaster Hangouts today!