A Call for a Search Discussion – How Google Works

March 31, 2016/Rae/

If you follow me on Twitter, then you know I sometimes complain about the current state of the industry – most notably centered around what passes for research and discussion these days. It feels like people want to be handed the fish – with little interest in learning how the person with the fish caught it. Seeking out debates and experience seems to have been replaced by wanting to be spoon fed blog posts – often laced with assumptions and misinformation hidden within a single-case graph or a slick graphic coupled with an impressive looking byline.

Separating fact from fiction becomes increasingly hard as the next generation of our industry raises themselves up by following rather than exploring.

At SMX West Google engineer Paul Haahr gave a presentation that gave some insight into how Google works. In my opinion, it's the most transparent and useful information we've been presented with by Google in years.

I spent this morning taking a full set of notes on it – which I always do with something like this because I believe it helps me better retain the information. Within my notes, I make notations about questions I have, theories I come up with and conclusions I draw – right or wrong. This isn't the sexy work, but it's the necessary work to be a formidable opponent in the game.

As I looked at the notes, I realized I missed the discussions, debates, and sharing of experience that used to surround analyzing information like this.

Some of what I feel limits that kind of discussion these days is needing to be seen as an infallible expert on all things Google. The industry has become so consumed with being an expert that it's afraid to ask questions or challenge assumptions for fear of being proven wrong. Unlike many of the names within this industry, I'm not afraid to be wrong. I welcome it. Being proven wrong means I have one more piece of concrete knowledge needed to win the game. Being questioned or challenged on a theory I hold gives me another theory to test.

So I'm publishing my notes – and personal notations – and am making a call for a real, exploratory – fuck it if I'm right or wrong – search discussion. Whether you're old school with massive experience or new school with untested theories and ideas – bring it to the table and let's see what we can all walk away from it with.

Notes from the How Google Works presentation – SMX West 16

Speaker – Paul Haahr | Presentation Video | Presentation Slides

To be clear, these are my notes from the presentation and not a transcription (notations in orange are comments made by me and not the speaker).

General opening remarks

Google is all about the mobile first web
Your location matters a lot when searching on mobile
Auto complete plays bigger role
His presentation centers mostly around classic search

Life of a query

Timestamp: 3:38 – Link to timestamp

Haahr infers that this next bit of information is a 20 minute secret-sauce stripped version of the half-day class attended by every new Google engineer.

He starts by explaining the two main parts of the search engine:

1. What happens ahead of time (before query):

Crawling
Analyzing crawled pages: links, render contents, annotating semantics
Build the index: think of it like the index of book
Made up of Shards. Shards segment groups of millions of pages.
There are thousands of Shards in the Google index
Per document Metadata

2. And query processing:

Query understanding – What does the query mean: are there known entities? Useful synonyms? Specifies that context matters for queries.
Retrieval and scoring
- Send the query to all the shards
- Find matching pages within each Shard
- Computes a score for A. the query (relevance) and B. the page (quality)
- Sends back the top pages from each Shard by score
- Combine all the top pages from each Shard
- Sort combined top Shard results by score
Post-retrieval adjustments
- Host clustering (notation – ~~does this mean using a dedicated server can be a bonus? Should you check shared hosts for sites with similar topics? Confirms need for separate hosts for networked or related sites?~~ this has been clarified by a former Googler, see this comment for more detail. The tldr is that host clustering is synonymous with domain clustering (the more widely used term in the industry) and site clustering and does not refer to host as in hosting.)
- Are sitelinks appropriate?
- Is there too much duplication?
- Spam demotions and manual actions get applied
- Snippets get pulled
- Etc.

What engineers do

Timestamp: 8:49 – Link to timestamp

Write code
Write formulas to compute scoring numbers to find the best match between a query and a page based on scoring signals
- Query independent scoring factors – Feature of page like Pagerank, language, mobile friendliness
- Query dependent scoring factors – Feature of page and query such as keyword hits, synonyms, proximity, etc. (notation – in relation to the proximity of the keyword within the page or of the user locale or of the site's presumed locale?)
Combine signals to produce new algorithms or filters and improve results

Key metrics for rankings

Timestamp: 10:10 – Link to timestamp

Relevance – Does the page answer the user query in context – this is the front of the line metric
Quality – How good are the results they show in regard to answering the user query? How good are the individual pages? (notation – Emphasis on individual is mine)
Time to result (faster is better) (notation – Time for site to render? Or for the user to be able to find the answer on the ranking page? Or a combination? Site render time could be a subfactor of time for the user to be able to find the answer on the ranking page? Edit > Asked Haahr for clarification on Twitter – he is unable to elaborate. However, there is some probable elaboration found via Amit Singhal in this comment.)
More metrics not listed
Offers he “should mention” that the metrics are based on looking at the SERP as a whole and not for one result as a time.
Uses the convention that higher results matter
- Positions weighed
- Reciprocally ranked metrics
- Position 1 worth the most, position 2 is worth half of what number 1 is, position 3 is worth one 1/3 of number 1, etc. (notation – The premise of reciprocally ranked metrics went over my head and I welcome simplified clarifications on what he's talking about here.)

Optimizing metrics

Timestamp: 12:00 – Link to timestamp

Metric optimization ideas and strategies are developed through an internal evaluation process that analyzes results from various experiments:

Live experiments

Timestamp: 12:33 – Link to timestamp

Split testing experiments on real traffic
Looking for changes in click patterns (notation – There has been a long-time debate as to whether click through rates are counted or taken into account in the rankings. I took his comments here to mean that he is asserting that click through rates are analyzed from a perspective of the quality of the SERP as a whole and judging context for the query vs. benefitting a specific site getting more clicks. Whether or not I agree with that I'm still arguing internally about.)
Google runs a lot of experiments
- Almost all queries are in at least one live experiment
- Example experiment – Google tested 41 hues of blue for their result links trying to determine which one performed best

Example given for interpreting live experiments: Page 1 vs. Page 2

Both pages P1 and P2 answer the user's need
For P1 the answer only appears on the page
For P2 the answer appears both on the page and in the snippet (pulled by the snippeting algorithm – resource on the snippet algortihm)
Algorithm A puts P1 before P2; user clicks on P1:from an algorithmic standpoint this looks like a “good” result in their live experiment analysis
Algorithm B puts P2 before P1; but no click is generated because the user sees answer in the snippet; purely from an algorithmic standpoint this looks like a “bad” result

But in that scenario, was Algorithm A better than Algorithm B? The second scenario should be a “good” result because the user got a good answer – faster – from the snippet. But it's hard for the algorithm to evaluate if user left the SERP because the answer they needed wasn't there or if they left because they got their answer from a snippet.

This scenario is one of the reasons they also use human quality-raters.

Human quality-rater experiments

Timestamp: 15:21 – Link to timestamp

Show real people experimental search results
Ask them to rate how good the results are
Human ratings averaged across raters
Published guidelines explaining criteria for quality-raters to use when rating a site
Tools support doing this in an automated way

Other notes:

— States they do it human quality-rater experiments for large query sets to obtain statistical significance and cites it as being similar to Mechanical Turk like processes

— Mentions that the published rater guidelines are Google's intentions for the types of results they want to produce (notation – this is very different than a user rating a query based on personal satisfaction – instead they are told to identify if the results for the query meet Google's satisfaction requirements and include the kind of results Google believes should be included – or not included. Quality rater guidelines are the results produced by the Google dream algorithm.)

— He says if you're ever wondering why Google is doing something, it is most often them trying to make their results look more like the rater guidelines. (notation – Haahr reiterated to me on Twitter how important he believes reading the guidelines is for SEOs.)

— Slide showing human rater tools: Slides 33, 34

— Re mobile first – more mobile queries in samples (2x)

Raters are told to pay attention to the user's location when assessing results
Tools display mobile user experience
Raters visit websites on smartphones, not on a desktop computer

Rating types

Timestamp: 19:04 – Link to timestamp

Assessing relevancy

Are the needs as defined by Google met?

Instructions tell raters to think about mobile user needs and think about how satisfying the result is for mobile user
Rater scales include: fully meets, highly meets, moderately meets, slightly meets, fails to meet
- Slider bars are available to further sub classify a “meets” level
  - Example: a result can be classified highly meets and the slider bar allows the rater to subclassify that “highly meets” result as very highly meets, more highly meets, etc.
  - There are two sliders for rating results – one for the “needs met” (relevancy) rating and one for the “page quality”
Examples of fully meets in slides – slide 41:
- Query CNN – cnn.com result – fully meets
- Search for yelp and you have yelp app installed on phone so google will serve the app – fully meets
- To be rated a fully meets query, they want an unambiguous query and wholly satisfy the user's needs for that query
Examples of highly meets in slides – slides 42 – 44 showing varying subclassifications of highly meets queries
- Informational query and the result is a great source of information
- Site is authoritative
- Author has expertise on the topic being discussed
- Comprehensive for the query in question
- Showing pictures where the user is likely looking for pictures
Examples of moderately meets in slides – slide 45
- Result has good information
- Interesting and useful information, though not all encompassing for the query or super authoritative
- Not worthy of being a number one answer, but might be good to have on the first page of results
Slightly meets
- Result contains less good information
  - Example: a search for Honda Odyssey might bring up the page for the 2010 Odyssey on KBB. It slightly meets because the topic is correct and there is good information, but the ranking page is outdated. The user didn't specify the 2010 model, so the user is likely looking for newer models. He cites this result as “acceptable but not great”
Fails to meet
- Example: A search for german cars and get the Subaru website (which is manufactured in Japan)
- Example: A search for rodent removal company brings up a result half a world away (notation – They want to geo-locate specific query types that are likely to be geo-centric in need – ex. Local service businesses. Using quality raters can help them identify what these service types are and add to the standard geo-need list like plumbers, electricians, etc.)

Assessing page quality:

Timestamp: 23:58 – Link to timestamp

The three most important concepts for quality:

Expertise
- Is the author an expert on topic?
Authoritativeness
- Is webpage authoritative about the topic
Trustworthiness
- Can you trust it?
- Gives example categories where trustworthiness would be most important to assessing the overall page quality – medical, financial, buying a product

The rating scale is high quality to low quality:

Does the page exhibit signals of high quality as defined in part by:
- Satisfying amount of high quality main content
- The website shows expertise, authority and trustworthiness for the topic of the page
- The website has a good reputation for the topic of the page
Does the page exhibit signals of low quality as defined in part by:
- The quality of content is low
- Unsatisfactory amount of main content
- Author does not have expertise or is not authoritative or trustworthy on the topic – on the topic is bolded in his presentation (notation – The concept behind Author rank lives in my opinion. We were who taught them how to connect the dots with Authorship markup. They can no doubt now do this algorithmically and no longer need us manually for connecting those dots.)
- The website has an explicit negative reputation
- The secondary content is unhelpful – ads, etc. (notation – Human input giving them a roadmap to how they're calculating and shaping the Above the Fold algorithm? Likely also refers to the affiliate notations in search rater guidelines starting on page 10 of the Google quality rater guidelines.)

Optimizing the metrics – the experiments

Timestamp: 25:28 – Link to timestamp

Someone has an idea for how to improve the results via metrics and signals or solve a problem in the results
Repeat development of and testing on idea until the feature is ready; code, data, experiments, analyzing results of experiments which can take weeks or months
If the idea pans out, some final experiments are run and a launch report is written and undergoes a quantitative analysis
- He feels this process is objective because it comes from outside team who was working on and is emotionally invested in the idea
Launch review process is held
- Every Thursday morning there is a meeting where the leads in the area hear about project ideas, summaries, reports or experiments, etc.
- Debates surrounding if it's good for users, for the system architecture, and to argue if the system can continue to be improved if this change is made. (notation – He makes a reference to them having published a launch review meeting a few years ago. I believe he is referring to this.)
If approved it goes into production
- Might ship same week
- Sometimes it takes a long time in rewriting code to make it fast enough, clean enough, suitable for their architecture, etc. and can take months
- One time it took almost two years to ship something

The primary goal for all features and experiments is to move pages with good ratings up and pages with bad ratings down. (notation – I believe he means human ratings, but that was not clarified.)

Two of the core problems they face in building the algorithm

Timestamp: 28:50 – Link to timestamp

Systematically bad ratings:

Gives bad rating example, texas farm fertilizer
- User is looking for a brand of fertilizer
- Showed a 3 pack of local results and a map at the top position
- It's unlikely the user doing the search wants to go to company's headquarters since it's sold in local home improvement stores
- But raters on average cited the result with the map of the headquarters as almost highly meets
- Looked successful due to raters ratings
But in reality they noted what Google describes as a pattern of losses
In a series of experiments that were increasing the triggering of maps, human raters were rating them highly
Google disagreed, so they amended their rater guidelines to create more examples of these queries and explaining to users that they should be cited as failed to meet – see slide 61 of the presentation
The new examples told raters that if they didn't think the user will go there, maps are a bad result for the query, citing examples like:
- radio stations
- lottery office
- newspapers
When Google sees patterns of losses, they look for things that are bad in results and create examples for rater guidelines to correct them

Metrics don't capture things they care about AKA missing metrics

Shows Salon.com article on slide with the headline Google News Gets Gamed by a Crappy Content Farm
From 2009-2011 they received lots of complains about low quality content
- But human ratings were going up
- Sometimes low quality content can be very relevant
- He cites this as an example of what they consider content farms
They weren't measuring what they needed to
- So they defined an explicit quality metric – which is not the same as relevance – and this is why relevance and quality each have their own sliders for human raters now
Determined quality is not the same as relevant
- They were able to develop quality signals separate of relevance signals
- Now they can work on improving the definitions of both separately in the algorithm

Quality signals became separate of relevancy signals (notation – Emphasis is mine. I think most of the search industry sees this as one metric and think it is important to emphasize that they are not and have not been for a long while now.)

So what now?

Contribute. What insights did you take away from the presentation? What were your thoughts on the things I notated? Were there things I didn't notate that you have a comment on or had a theory spurred from? Do you disagree with any of Haahr's assertions? Do you disagree with mine? Did anything in his presentation surprise you? Did anything get confirmed for you? Whatever thoughts you had on his presentation, drop them in the comments below.

Posted in Best Of, SEO

Rae

Rae Hoffman aka "Sugarrae" is a veteran digital marketer and SEO consultant. She is also a serial entrepreneur. You can find out more about her entrepreneurial efforts here. Rae is most active on Twitter.

72 Comments

Ana Hoffman on March 31, 2016 at 6:42 pm

This is awesome, Rae – haven’t listened to Paul’s presentation yet; still have it open in a tab and on my 9-mile-long to-do list. ;) Look forward to comparing notes when I am done.

However, wanted to say… about this “The industry has become so consumed with being an expert that it’s afraid to ask questions or challenge assumptions for fear of being proven wrong.” – maybe people don’t ask precisely because they DON’T feel like experts and are afraid to put their feet into their mouths?…

Not that it’s a good reason NOT to ask… I’d rather sound like an idiot, but in the end, become an expert because I asked. In theory, anyway. ;)
- Rae on April 1, 2016 at 5:38 am
  
  Ana – I hear you. To me, if you’re not an “expert,” you should be asking questions. In my early days I asked a ton of questions. If I read a single thing that I didn’t understand, I asked about it. If something confused me, I asked about it. If I found an article I thought was good, I popped it into a message board or an instant message to a more knowledgeable friend and said, “what are your thoughts on what this guy is saying?” I still do all of that to this day. I’d rather risk looking unknowledgeable than play the game with the wrong “knowledge.”
  
  So my advice to anyone in your hypothetical situation above would be to put your foot in your mouth. At the end of the day, we all have an objective to build our business and we can’t find success in that objective if we let the objective of “not looking knowledgeable” get in front of the real one. :)
Alan Bleiweiss on March 31, 2016 at 7:04 pm

Thanks for posting this Rae. I wouldn’t have any of the knowledge I have today if I wasn’t willing to listen to others, and discuss things – hash them out when there isn’t a consensus, or share what I think I understand, in order to get feedback to reinforce or change my understanding.

As for the “secondary content”, I believe that’s likely all of the “distractions” – ads, widgets, all the bells and whistles that dilute from the main focus. At least that’s my experience in helping client sites clean up from Panda type issues, or general “continual downward trend” losses.
- Rae on April 1, 2016 at 5:52 am
  
  Yep – I know ads were included there. I also have no doubts that affiliate links are included in that as well. It’s interesting because the earlier leaked Google rater guidelines had some pretty venomous notations about affiliates. Those same sentiments don’t appear in the officially released ones. Initially the affiliate comments were aimed at no value add affiliates. Over the years, the disdain for affiliate links continued where they cited certain types of affiliate sites that should be flagged even if the page was useful. Whether or not a banner that links to an affiliate link vs. an advertiser makes a difference when it comes to something being seen as “acceptable advertising” is something I don’t know. I’d like to believe the answer is no.
  
  Interesting that you mention widgets – care to give a few examples? Are we talking internal widgets or external? If external, can you give an example or two of the kinds you’ve seen cause issues (in your opinion)?
  
  And that “secondary” content comment by Paul leads me to wondering is a big above the fold push – like say the one I have for my newsletter on this site site wide – a “distraction?” I think most of us have always run with the assumption that as long as what we’re “pushing” is internal, that we’re good. Thoughts?
Will Critchlow on March 31, 2016 at 7:05 pm

Thanks Rae! I’m only part-way through and clearly this is worth reading carefully and in detail.

In the meantime, I think I can help with the “host clustering” question. I’m pretty sure it is about limiting results from one website (under certain conditions – we’ve all seen the SERPs with 9x yelp in them!).

In other words, I believe it’s not about “host” as in server / machine / IP. It’s about information sources.

So if 3 of your pages would rank in the top 10, they might still only show 1 because they “cluster” you for more diversity. This is why it’s a “post query” adjustment. Make sense?
- Rae on April 1, 2016 at 6:09 am
  
  Yep – being a non-developer, when I hear host, I think hosting – not domains. Clustering is a concept I’m aware of, but it’s most commonly referred to as “domain clustering” in this industry – at least on the sources I read.
  
  Pierre Far (a former Googler for those reading along) sent me some clarification re the host clustering comments. Nothing secret – just pointed me to two available articles on the web about host clustering here and here to help mitigate any confusion by pointing out some publicly available resources. The former mentions the phrase “host clustering” once, the latter not at all. What I had committed to memory was the phrase domain clustering. However, the first link above contains a video from Matt I had not previously seen that is a fantastic explanation on this.
  
  This is a perfect example of how misinformation can start spreading (despite good intentions) due to the use of one word vs. another by an engineer when we don’t question our interpretation of things. :) Now I know that “host clustering” is synonymous with “domain clustering” and “site clustering” and does not refer to “where you host your website” – and hopefully now some other folks without deep dev knowledge know that as well.
  
  “So if 3 of your pages would rank in the top 10, they might still only show 1 because they ‘cluster’ you for more diversity.” Makes total sense. I could also presume that sitelinks was probably meant to work in tandem with clustering, depending on if the query is a known entity query, to still be able to show those additional results for a domain without allowing them to dominate the SERPs.
  
  In your example, if a site should rank for 3 of the top 10 for a generic phase query, they’re clustered to only show one result. If the query is entity (brand) based, then (based on what I see) they cluster your domain to the one result, but include sitelinks underneath to give users the results they’re likely looking for, but in a shorter and clearer format (clearer as in, these results are all from the same site) without having that entity dominate the first page of “regular results” because they are the entity with the strongest relevancy signals for that entity based query.
  
  It’s rare I see an indented listing these days (which I have always assumed was the initial attempt to cluster without limiting a site to one result on a entity query). Of course, I don’t search every keyword on the web, so maybe indented listings are still thriving in certain sectors or for certain types of queries (when I do see them these days, it’s often for keywords that could be seen as both an entity based query and a somewhat generic query – ex: “nike shoes” – which is somewhat entity, somewhat generic in nature. I’d personally lean more toward generic.).
  - Mark Traphagen on April 1, 2016 at 11:21 am
    
    Yes, so important your comment about how misinformation spreads through the use of an unfortunate word. A great example happening before our very eyes right now is “RankBrain.” Because of the presence of “rank” in the name (and the unfortunate statement by the Googler in the original interview about it that it is the “third most important ranking signal”), many jumped to the conclusion that RankBrain is a direct ranking factor.
    
    Some even published that it is now (or soon will be) controlling the ranking algorithms, and will replace all other factors, including links. Oy!
    
    We now know from many careful discussions with Googlers a lot more of what RankBrain actually does and does NOT do, but still the “how to do RankBrain SEO” posts proliferate.
    - Rae on April 1, 2016 at 12:08 pm
      
      Gotta hop on the train early! #headdesk
      
      Didn’t y’all publish an article recently on what you know about Rankbrain from talking with Googlers? Or am I mistaken?
      - Mark Traphagen on April 1, 2016 at 12:18 pm
        
        Yes we did! https://www.stonetemple.com/rankbrain-myth-busting/
    - Ammon Johns on April 3, 2016 at 1:23 am
      
      Actually, Mark, one of the things that has annoyed me so much about the misquotes and misinterpretations was that the original quote DID NOT say ‘Ranking Factor’ – it said “contributing to results”, which is a whole different thing.
      
      Check the Bloomberg piece yourself.
      - Rae on April 3, 2016 at 6:47 am
        
        For those following along, the Bloomberg piece being referenced is here: http://www.bloomberg.com/news/articles/2015-10-26/google-turning-its-lucrative-web-search-over-to-ai-machines. In it, the author said that Greg Corrado, a senior research scientist with Google, said that “RankBrain has become the third-most important signal contributing to the result of a search query” – however, I think it’s important to note that that statement – while in quotes here as it’s a direct quote from the article – was not in quotes in the article, though after the statement it said, “he said.” Without the quotes around that exact statement, it’s a little hard to discern if that was a direct quote or if it was a paraphrased quote by the author of the piece – and if it’s the latter, we can’t put too much emphasis on the word.
        
        In a follow up piece from Danny here: http://searchengineland.com/faq-all-about-the-new-google-rankbrain-algorithm-234440 Danny says, “That’s right. From out of nowhere, this new system has become what Google says is the third-most important factor for ranking Web pages.” – emphasis on factor for ranking is mine.
        
        So, the Bloomberg article says ranking signal, but not within quotation marks and then Danny’s piece says ranking signal and factor for ranking within the same paragraph. I know Danny “isn’t from Google” but I think we’d all agree he was probably able to speak with some engineers about it prior to running his piece. I’d also guess that if Danny’s inference were wrong, Google would have corrected it due to his reach. But, maybe not. Maybe the new school of public facing engineers doesn’t realize we dissect every word they use (something that Matt learned to be extremely careful about over the years).
        
        Ammon, out of curiosity to know your take – what would you see as being the difference if Rankbrain is indeed a ranking signal and not a ranking factor?
  - Patrick Stox on April 1, 2016 at 5:17 pm
    
    To turn off clustering, you can add &filter=0 to the end of the Google search url. Useful for troubleshooting duplicate content issues and keyword cannibalization.
    - Rae on April 2, 2016 at 6:39 am
      
      That’s definitely old school – man, I half forgot that even existed. We used to use it to see if a site had a penalty. Interesting is that, after this reminder, I ran a search with and without &filter=0 on a few SERPS and it’s insanity how many pages are clustered now as compared to a decade ago. It used to be valuable from a penalty diagnosis perspective, but it looks like there’s so many non-penalized sites being clustered these days that it wouldn’t be useful much for that specific purpose anymore.
      
      But thanks for the reminder of it having an updated use – much better time wise to simply add that filter to the end then going to the end of the search results and clicking to see omitted results to get Google to add it for you. :)
Brian on March 31, 2016 at 7:07 pm

RE: The “host clustering” part of your notes… As far as I know, they consider host & domain/subdomains to be the same thing. For example, http://www.domain.com and subdomain.domain.com would be 2 different “hosts.” That just happens to be a term that devs use for it.
- Rae on April 1, 2016 at 6:24 am
  
  Yep – thank you for clarifying. I mentioned in this comment that a former Googler clarified my misinterpretation for “host” as well. Subdomains now being clustered in with domains is another important aspect some people still don’t realize. Because Google treated them as separate sites years ago, I will still get asked if someone “can dominate” the results using subdomains.
  
  The initial aspect of treating them as separate sites – I believe – was a spam fighting tactic (at the very least, in part) – ten years or so ago, it was common to launch a spam site on a subdomain and ride its coat tails into rankings. Or so I’ve heard [whistles]. But I know you already know that as well (though probably with less mud on your hands than me). :D But then Google got smarter.
Kristine S on March 31, 2016 at 8:20 pm

Than you Rae. I so agree with your thoughts on the industry. I was just thinking how much I missed the quality and frequency of discussions, so I especially appreciate this.

I’ll post thoughts on the post later, after I’ve had a chance to digest, but thank you for this!
- Rae on April 1, 2016 at 6:28 am
  
  You’re welcome. Looking forward to your thoughts. :) I figured rather than bitching, I’d at least try to foster some real discussion that wasn’t centered around 10 somewhat vague ways to do something semi-useful – with gifs! ;-)
Jessyseonoob on April 1, 2016 at 12:13 am

Hi,

I’ve made a transcript too from this SMX, and I’m asking myself if there’s not an other factor, i call it Keyword trust: a single word have more trust than a synonym

I saw 2 distincts things :

Before Query : semantic topical symbolic reduction
After query : knowledge graph entities, géolocalisation

Sorry it’s in french

http://www.love-moi.fr/2016/03/comment-fonctionne-le-moteur-de.html
- Rae on April 1, 2016 at 6:40 am
  
  I was somewhat able to read your link by using Chrome’s translator, but I don’t think it did the greatest job. ;-)
  
  “Before Query : semantic topical symbolic reduction”
  
  Was that in his speech? If so, I missed it. Whether it was in his speech or a phrase your using to indicate something you inferred from it, I’d love for you to elaborate on that. :)
  
  “Keyword trust: a single word have more trust than a synonym”
  
  I’d love if you could elaborate on that theory a bit more as well. Gut reaction is that they’re looking to be able to serve up relevant results for synonym based queries as equally as they do for the exact match keyword query – though I could see where there might be instances where they might give more weight to the exact match query (as in Coke vs. Cola where Coke is an actual entity and brand product and Cola is a colloquial synonym for it in certain geographic locales). When it comes to lawyer or attorney, I don’t think there’s any more weight for exact matching either.
  
  The second aspect of looking for synonyms likely revolves around LSI, but I’ve seen debate as to whether or not Google actually uses it.
Paul Macnamara on April 1, 2016 at 9:09 am

>>>>>Author does not have expertise or is not authoritative or trustworthy on the topic – on the topic is bolded in his presentation (notation – The concept behind Author rank lives in my opinion.

Totally agree and the fact that it was bolded in the presentation absolutely lends credence to that and makes me think that it could be taken further, in the sense that they want to understand entities rather than just pages (sites).

>>>>The secondary content is unhelpful – ads, etc.

Especially as it pertains to YMYL pages. The advent of publishers getting hooked on the crack of (sponsored) content syndication that blends content blocks into what looks like secondary internal navigation is putting sites at risk IMHO. If I’m Google and I am preaching E-A-T, this type of arguably deceptive advertising would be on my radar.
- Rae on April 1, 2016 at 9:18 am
  
  “the fact that it was bolded in the presentation absolutely lends credence to that”
  
  Totally. Matt had said on a panel we were on several years back something to the effect that the ideal was that Google would be able to tell I have expertise on the topic of affiliate marketing, but wouldn’t give me expertise credit so to speak should I write an article on gardening. The question I have is how granular they’ll be able to get on that. Will I be seen as authoritative on “marketing” or will they be able to identify that on a more granular level and say, “she is authoritative on SEO, but not PPC.”
  
  IMO, there are enough footprints on the web about anyone making themselves purposely visible to connect some pretty granular dots. To be clear though, in this session I’m referencing, Matt said that was where Google would like to go with it. They weren’t there yet at that time. Seeing that bolded in Paul’s presentation made me think they’ve moved a lot closer to that. I tried to find the video for the panel Matt and I were on where this came up, but all the conferences run together for me and I have no idea what year or which SMX conference it was. ;-)
  - Mark Traphagen on April 1, 2016 at 11:24 am
    
    In an SMX Advanced Q&A, Matt Cutts said that (at that time) there were over 30,000 topics and subtopics that Google was tracking for topical expertise.
    - Rae on April 1, 2016 at 12:06 pm
      
      That doesn’t surprise me at all. And if it was Matt who said this, it had to have been, what, one to two years ago?
      - Mark Traphagen on April 1, 2016 at 12:17 pm
        
        Yeah, it was like almost three years ago now.
- Rae on April 1, 2016 at 9:25 am
  
  Also, for anyone following along – YMYL stands for Your Money – Your Life and means things that surround your financial health or your actual health.
  
  E-A-T is a term from the Google rater guidelines.
Rae on April 1, 2016 at 9:45 am

Also, Pierre also directed me to this publicly available video – again – nothing secret, and mentioned that Amit talks a little about time to result within it if I wanted to give it a watch. In the video – which is from 2011 – Amit Singhal gives an answer to a question surrounding how they test new algorithm changes – (I’d start watching at the 16:50 mark). If a new algorithm passes the tests in the internal sandbox, then the algorithm begins live experiments (as described above in Paul’s presentation).

Amit mentions in the above linked video that they’re testing to see if users like the new algorithm. He said liking is described as by looking at if the user clicks higher up on the page (which signifies that more relevant results are ranking higher) and they’re spending less time searching through results, not having to click on multiple results. I’m guessing that is – at least in part – what Paul is referring to as “time to result” above.

Matt’s comments directly after was also interesting regarding clicks that relate to spam.
Ian on April 1, 2016 at 10:53 am

Thanks Rae, this is kick-ass stuff.

I’m going to go way out on a limb here with my VERY limited knowledge. Mean Reciprocal Ranking (MRR) is probably how Google scores the quality of an entire rankings page.

IF I’m interpreting right, it means that Google keeps track of the FIRST rankings result (page, document, whatever you want to call it) in a SERP that we find useful. To do that, they look at the LAST ranking result that we clicked.

But you can’t assume that, if I click 3 results on a SERP, only the 3rd document provided the answer. I might click multiple rankings to round out my research or compare. So MRR levels the score a bit.

That’s it. That’s all the math I got. “Levels the score a bit.”

Please, everyone, TAKE MY ANSWER AS THE BEST GUESS OF A HISTORY MAJOR. If you create some hair-brained ‘SEO strategy’ based on my interpretation of information retrieval mathematics, you get what you deserve.
- Rae on April 1, 2016 at 12:04 pm
  
  Interesting. Based on my take of Paul’s presentation, he seemed to imply that they looked at the SERP as a whole and not at any individual result with this process. But, we all know the click through rate studies that have shown CTR has an effect on rankings – but, in light of this, does the CTR really have an impact on moving site A up or down? Or does it have impact via changing what Google thinks the query is about and adjusting the SERP accordingly? I’ve never believed CTR was ignored in their algorithm – but maybe we’ve been looking at it too simplistically?
  - Mark Traphagen on April 1, 2016 at 12:16 pm
    
    When pressed about CTR, Google spokespersons these days tend to emphasize how “difficult” a signal it is…but they stop short of saying “impossible to use,” and never outright deny that it has some function to play.
    
    Mostly the difficulty claims center around assessing the intent of the click, including length of click (before returning to the SERPs) and pogosticking. They explain that there are a variety of reasons why someone might leave a page quickly; some are a good thing, some bad.
    
    But, I wonder if while difficult, they are finding ways to evaluate click intent, and use that where it seems relevant. For example, most of Rand’s successful tests involved getting a lot of users (often all in one location, like a conference) to click a certain result within a short period of time. Could that perhaps be seen by Google as a signal that the result might be “trending” in some way? That there might be some external reason for sudden interest in that particular result, and so perhaps they ought to rank it higher and see what happens.
    
    Its interesting that in almost every case, any ranking gain decayed within a few days. So if my theory is correct, Google also watches if the sudden interest is sustained over time.
    - Rae on April 1, 2016 at 12:42 pm
      
      “Could that perhaps be seen by Google as a signal that the result might be ‘trending’ in some way?”
      
      That’s definitely possible. My first thought when I saw one of the tests Rand ran was some kind of Freshbot’ish effect.
      
      In this test he showcased, he noted how the CTR moved his site from #7 to #1. but if you look at the SERP, his result wasn’t the only one that changed. Intel drops from #6 to #8, Nuzzle goes from not in the top 8 to #6. Rand’s tweet shows up at #7 after the test and ITForm drops from the top 8. Rand’s listing didn’t merely move up – the SERP changed.
      
      If Google looks at the CTR of the SERP as a whole to determine if the SERP meets user intent, then it would make sense that the sudden increase in search volume and resulting clicks might change what context Google believes the search to be related to. His tweet gets a lot of action, there’s a sudden influx volume wise for the search term (where it appears there was a very low volume centered around before this test), Moz is the result being clicked, social footprints maybe potentially connect some dots re expertise and authority.
      
      So did the test really affect Moz’s ranking, or change the context of the SERP, which resulted in an improved ranking for Moz because they were now the most relevant result for the newly defined context of the query? And if the term is relatively low volume (as imec lab appears to be), will there ever be enough interaction again (especially with new users finding Rand’s test and clicking shit like I just did) to revert the SERP back to whatever Google thought the original context was? And if that’s the case, is it really possible to ever generate as significant of an effect the SERP of a term that does have volume?
      
      This is just me thinking out loud here – I’m not stating any of the above to be fact in any way.
      - Mark Traphagen on April 2, 2016 at 2:17 pm
        
        All excellent questions. And a great observation about the multiple changes in the SERP during the test!
      - Frank Watson on April 3, 2016 at 11:18 am
        
        I agree context has a major role in ordering the results. Last year I heard of people using low cost people to search for a term and then have them click on specific results and find an answer on the page and a correct reply got them paid.
        
        The result was in a short number of days the SERPs changed moving some not clicked on links off the front page.
        
        Sadly that is not the case this year or so I was told.
      - Frank Watson on April 3, 2016 at 11:27 am
        
        Also there was a recent study that found results 9 and 10 got more clicks than the middle results which you would expect if CTR was a factor would constantly be moving them up and down as they rose off those spots.
Mark Traphagen on April 1, 2016 at 11:11 am

Some initial comments. Still reading and may add more. Also haven’t read the other comments yet, so may come back to add if others’ takes contradict mine:

RE: “proximity” in query-dependent scoring factors. In context, I think this most likely refers to the proximity of relevant keywords on the page to other words on the page, but certainly location proximity is a factor, especially for mobile, as Haahr noted in his opening remarks.

RE: “reciprically-ranked metrics” – From his description, it sounds to me like a ranking system that is not merely a ranking of scores, highest to lowest, but in which what is in each ranked position has some effect on what is in all the others. In other words, not only is each result individually scored, but then it is compared to the other results in some way.
- Rae on April 1, 2016 at 12:14 pm
  
  Yeah, in my notes, those were the two thoughts that immediately came to mind re proximity as well. However, my mind continually drifts back to LSI. I know there’s a debate on whether or not Google uses it, but LSI uses proximity in its calculations as well. So, that’s an additional thought.
  
  “The second phase is the analysis phase. Most often this includes the study of the
  proximity between a couple of documents, a couple of words or between a word and a document. A simple mathematical transformation permits to obtain the vector for a nonindexed text. This permits the design of a LSA based search engine processing natural language queries. The proximity degree between two documents can be calculated as the dot product between their normalized LSA vectors. ” – Source
  - Mark Traphagen on April 1, 2016 at 12:17 pm
    
    Yes, that’s what I had in mind.
- Rae on April 1, 2016 at 12:18 pm
  
  Re rrm – yep. I don’t understand how rrm is calculated (and am struggling to even grasp how an example where we know all the factors would even be calculated), but in the presentation he said, that the ranking metrics are based on looking at the SERP as a whole and not for one result as a time. The million dollar question is what metrics can affect the SERP as a whole – and what kind of influence clicks have in the whole scenario.
Ingo Bousa on April 1, 2016 at 11:18 am

I love asking questions. I often ask questions or pose a bit stupid just to see how people react. And sometimes I’m just stupid ; ]

With a lot of official “industry experts” around, push-marketing their agendas and personas, it’s indeed hard for a newbie to ignore these god-like creatures and switch on their own brain. SEO is in large parts very non-black&white, there is a lot of uncertainty and what works for some might not work for others at all. It’s a very volatile and ever-changing landscape that’s made up of a deities [Google & a bit of Bing] and preachers [SEOs]. Now some preachers know their shit more than others – and some preachers are doing good while others are worshipping false idols. The only comforting truth is that everybody is cooking with water nowadays, so the only question that’s left is who’s getting the most water hot in the quickest time.

Nothing fundamental has actually changed in the last years: It’s still about giving Google what they want in the most in-depth / efficient way. And much more than any time before, it’s very important to know what NOT to do, especially at volume. Realistic risk management is incremental as soon as you work in non-spammy competitive niches. As is continuous client education. It’s easy to write up a massive report on what the client should do on a site to win at SEO – the hard part is to get all of the things implemented that will actually result in a tangible performance difference regarding the agreed KPIs.

So.. another ‘train of thought ramble rant’ by me. Ah well.. :P

“10 somewhat vague ways to do something semi-useful – with gifs!” ..made me laugh. Spot on.
- Rae on April 1, 2016 at 12:49 pm
  
  “there is a lot of uncertainty and what works for some might not work for others at all”
  
  The usually is evidenced the most when it comes down to big brands vs. SMBs. Joe’s Bike Shop will read articles by the in-house SEO at a Fortune company and follow that strategy, often with little impactful result. It’s great that making X change for Coca-Cola resulted in X increase in SEO traffic, but Joe’s Bike Shop doesn’t have a half a million links and 200K exact match searches for their brand per month. One strategy does not fit all.
Grant Simmons on April 1, 2016 at 2:42 pm

Agree 100% with your assessment of the industry and appreciate you laying bare your thoughts and observations.

I never profess to be the top techie in this industry I love, but gut instinct, testing, and results have mostly proved I have a better than 50% ‘dumb luck’ chance of getting past, current & anticipatory online marketing thoughts right.

This observation copied below jumped out at me, because I’ve been a massive believer in user behavior in SERP affecting results’ visibility:

“Looking for changes in click patterns (notation – There has been a long-time debate as to whether click through rates are counted or taken into account in the rankings. I took his comments here to mean that he is asserting that click through rates are analyzed from a perspective of the quality of the SERP as a whole and judging context for the query vs. benefitting a specific site getting more clicks. Whether or not I agree with that I’m still arguing internally about.)”

I agree with Mark that ‘click patterns’ can give insights into trends, hot topics etc., also brand in it’s most literal ‘trust’, for both companies and authors (higher clicks in similar positions might equate to higher trust – other signals abound), also click patterns can give insights into query topical preferences of users i.e. preference for rich snippets, media answers, and also satisfaction of users with results through analysis of returning & bounce click patterns (including dwell time – same session return to SERP etc.)

I don’t think site (result) / quality of SERP are too different an animal when saying “does CTR help ranking”, it’s just another potential filter in a stack of potential filters that can affect the ability, voracity, eligibility and propensity of a site (or site asset) to rank for a specific query, in a specific location, at a specific time.

Google follows the age old mantra of quality = “delivering the right message at the right time in the right place for the right audience, based on satisfying their intent, with a full understanding of their query in the context it was intended.”

Yeah… simple stuff :-)

Cheers

Grant
- Rae on April 1, 2016 at 3:27 pm
  
  “also click patterns can give insights into query topical preferences of users i.e. preference for rich snippets, media answers”
  
  Great point.
Cory Collins on April 1, 2016 at 3:56 pm

My biggest takeaway is that every SEO needs to read Google’s Search Quality Rater Guidelines.

I’ll be the first to admit I didn’t read them when they were first published. I remember the splashy headlines and titles about Google releasing the document, clicking the links, seeing 100+ pages and closing out.

We’re all busy, and who has the time to read company documentation, which is often dry and boring? Especially when it weighs in over 100 pages.

The fact that Paul said all ranking changes are designed to make Google more closely mirror that document though is HUGE. At least, to my interpretation. It’s damn hard to understand the nuts and bolts of ranking changes (try though we might), but if we can better understand conceptually WHY they’re implementing a ranking update that’s a pretty big deal.

Especially if it goes beyond the “improve search experience” and “create good content” generalities.

It’s definitely high on my priority list now. I’m going to have to make the serious effort to go through and understand as much as possible.

Also, hope it’s not too promotional but I wrote a coverage piece of the video that breaks it down by sections.

In part to help me better understand the presentation (cause writing about it helps me process) and in part to help people digest the presentation faster. Hope it’s genuinely helpful to people.

Thanks for starting this discussion and broadcasting the recorded presentation. Love reading all the comments and discussion.
Victor Pan on April 1, 2016 at 3:59 pm

The presentation let me with a bunch of questions … really.

1. Who sets the minimum bid required for an ad to show above organic search results and number of ads?
– Does Ad Rank or a comparison between expected CTR between paid vs organic determine whether 1 ad or 4 shows up on top?

2. When Google says they’ve been testing the removal of ads on the right hand side of a search result for years…is this a joint discussion between two teams where a test is done to maximize user “utility” or is there a separate E-A-T protocol?

3. A human rating system will always be flawed. We can find its flaws (race, culture, political tendencies) but I think it’s impossible to remove bias from human raters from well-known sites. For example, if NYTimes (e.g. http://www.nytimes.com/health/guides/disease/autism/overview.html) and my website had the same content on “autism” guess who would be rated higher? It’s the equivalent of management consultants who graduate from ivy league schools, except with websites.

4. Does Google vet the raters to be representative of the world?
- Rae on April 2, 2016 at 6:14 am
  
  Paul specifically said at the beginning of his presentation that he was not going to cover paid advertising. He said he doesn’t work with it and that Google organic search engineers are specifically instructed to not think about ads when dealing with the algorithm. I don’t, but do, believe that to be true. I think the engineers who build the algo probably are told that, but I’ve personally seen cases where very large spend advertisers (top 10%) have gotten “help” re SEO through their paid search reps.
  
  “impossible to remove bias from human raters”
  I’d tend to agree here – and it’s not something I’d really thought solely about from the quality-rater perspective, so thanks for that. But, I do think there is a place for small brands. But then general favoring of brands in the SERPs that we’ve experienced over the last decade is why I’m pretty adamant for branding for affiliate – and why I’ve been writing about it and promoting it for at least the last eight years. Content sites fall under the affiliate umbrella in this case. Of course, we’ll (except in rare cases) never be household names like the NYT and able to get that same level of brand bias, but I think branding, design, social proof etc. can go a long way to creating a smaller version of that “oh, this is a brand in this niche” aspect re quality raters.
  
  Re vetting, there was no mention of that.
  - Grant Simmons on April 4, 2016 at 7:35 am
    
    “impossible to remove bias from human raters” in the same was it’s “impossible to remove bias from human jurors” or human anything .. yet our justice system works “well enough” and I think we have to assume that diversity via selection process & anonymity inspires free thinking folk to give their opinions without bias and malice.
    
    Remember… the raters are not rating for ranking, they help inform the algorithm what is quality, utility, and in alignment with (and / or defining / iterating) E.A.T., not helping large brands rank over small business.
    
    I’m agnostic when it comes to some industries where I have no brand ‘dog in the game’, but there’s definitely a big brand bias in some vertical SERP I think that’s the outcome of advertising & marketing (and click signals), and trust built, not a (direct) function of quality raters.
    
    Cheers
    - Victor Pan on April 4, 2016 at 1:19 pm
      
      While Paul does not have to address paid search, it’s clear that someone else in the paid search team does consider the impact on organic. Who though and what’s the process?
      
      http://searchengineland.com/first-page-minimum-bids-continue-to-rise-in-wake-of-google-desktop-serp-changes-245917
      
      “The minimum Ad Rank required to appear above search results is generally greater than the minimum Ad Rank to appear below search results. As a result, the cost-per-click (CPC) when you appear above search results could be higher than the CPC if you appear below search results, even if no other advertisers are immediately below you.”
      
      What I’m getting to is… why is there a Chinese wall if “ads are answers”? This is a big gap on what would improve search results for users.
Luke Regan on April 1, 2016 at 6:35 pm

Thanks for sharing all of this detail, Rae – some great points on this talk and our industry (likewise from the commenters).

I presumed the reciprocal ranking metrics means that when evaluating performance of a test, the effect on position 1 metrics counts 1X, on position 2 0.5X, position 3 0.33X and so on. That way position 1 is sacrosanct and lower result relevance will be sacrificed to protect having the best P1 in the industry.

That’s why you so often find junk in Positions 3 onwards for some long tail queries.

Unless I’ve misunderstood (always possible with Google pronouncements).
- Rae on April 2, 2016 at 6:45 am
  
  TIL that sacrosanct means: (especially of a principle, place, or routine) regarded as too important or valuable to be interfered with. ;-)
  
  Luke – question re your comment – can you give or direct me to a more detailed example of how this works in a real world calculation or scenario where the effecting metrics wouldn’t be hidden as they are with Google? I’m struggling to understand this – but I’d really, really like to. Do you know of a link “for dummies” on this? I’ve searched, but everything I can find I’d label as level = math scholar. ;-)
  - Luke Regan on April 2, 2016 at 9:36 am
    
    @Rae: My theory may well be wide of the mark but this might be useful re weighting and normalisation: http://www.gitta.info/Suitability/en/html/Normalisatio_learningObject1.html
AJ Kohn on April 1, 2016 at 7:50 pm

It’s nice to see someone actually talk about what was perhaps the most transparent and interesting presentation about search in … years. So kudos to you Rae for bringing more light and discussion to it.

A couple of points from Paul’s presentation that jumped out at me. First was the presentation of shards. Now, I’ve known about this for quite a while but the shards are essentially topical in nature from what I understand which should give SEOs a bit to think about in terms of what shard(s) your site is represented.

The fact that they’re now looking at entities in the query at the beginning of the process is pretty amazing and points to all sorts of knowledge graph based treatments.

Time to result is, from what I understand, essentially time to long click. How quickly does that SERP get users (on average) to a satisfied (i.e. – long click) experience. I wrote about this in 2013 (http://www.blindfiveyearold.com/time-to-long-click) and subsequent conversations seem to confirm this.

The question of whether CTR is used gets thorny because I think when that’s asked they think we’re asking about CTR specifically when really it’s call click data, which includes – at the top of the funnel – CTR.

Paul’s response was intriguing. He said they don’t use CTR but mentioned they did that experiment where they’d invert the 2nd and 4th result – a modified pairwise test. He expressed that it was a failure because it wound up resulting in more 1st position clicks.

So at first you might think that it shows Google doesn’t used this data because it produces this type of spurious information. But if you think about it, it actually proves they track the CTR pretty closely and can see the differences in the distribution.

I think it’s better to say Google uses click data – of which CTR is but one part. However, there’s also a ton of evidence that simple things such as ‘if the second result consistently gets more clicks than the first, overtime we’ll learn that and make that change’.

Here’s what I wrote last year on that topic (http://www.blindfiveyearold.com/is-click-through-rate-a-ranking-signal) though I need to update that with a host of new information.

It was also interesting to get confirmation from Paul that Google can ‘drill down’ to any specific query to look at the results. Again, more evidence of the capability Google has to monitor SERP level metrics and then look at them in aggregate when algorithm changes roll out.

The RankBrain stuff was also intriguing since Paul made it clear that RankBrain was plugged in post-retrieval, which was not what I anticipated. It has the ability to see the query and a subset of signals but it does so, seemingly, after the shards have returned those candidates. It makes me believe that RankBrain is then employing some better understanding of the query to then select (aka rerank) the candidates that were returned from the original query.

At least that’s what I took away from it.

The rater guidelines were a major focus for Paul, showing that they seem to rely on that mechanism to better tune their results. Obviously this goes wrong from time to time – as he showed – but I’d love to better understand how they avoid a local maxima, where Google’s own definitions on how to rate results is … off.

There’s more to sift through really but it’s time for some weekend fun.
- Rae on April 2, 2016 at 7:06 am
  
  “Time to result is, from what I understand, essentially time to long click. How quickly does that SERP get users (on average) to a satisfied (i.e. – long click) experience.”
  
  It would seem from this comment here that Google concurs in part. One thing Amit did say there was that snippets make it hard to determine what you’re describing as time to long click. Did the user leave because they didn’t like the SERP or because they found the answer in the snippet of a result and didn’t need to click it? Has Google been able to figure out how many licks it takes to get to the center of the Tootsie pop since 2011? However, I see the snippet issue as being an issue much, much more for information sites than commercial ones.
  
  Re CTR, I’m thinking you’re right in it being part (and maybe a larger part) of a larger combined dataset. We’ve always assumed they’re lying about using CTR in rankings because we’ve seen that it can have an effect. Maybe the “we don’t use it to determine rankings” line is more that they don’t use it in the simplistic way we assume it’s used.
  
  “they track the CTR pretty closely and can see the differences in the distribution”
  
  Agreed. There’s no way they can have access to data like that and not be tracking it and using it to some degree – even if that degree is merely for testing to come up with new ideas. Kind of in the way that I never believed they didn’t use Toolbar data at some point. I know someone who privately ran (as in they didn’t go public with it) a toolbar test a decade ago and found that Google was tracking it much more granularly than claimed – or maybe the better word is shared. I trust that the test was run smartly and the results were what the tester claimed.
  
  “employing some better understanding of the query”
  
  Unless I’m mistaken in the way I’d interpreted it, that was the (loose) point of Hummingbird as well.
  
  “The rater guidelines were a major focus for Paul”
  
  Agreed – he pointed to them again to me in a tweet after I wrote this post citing it was an important read and study for SEOs.
  - Ammon Johns on April 3, 2016 at 1:07 am
    
    CTR is one of those “We don’t use it (like that)” ones. One of those annoying times when Google decide that we can’t cope with complex “it depends” discussion, and should be just given a ‘safe’ answer.
    
    We did talk about this a little (Rand, Eric Enge and I) in the discussion with Andrey Lipattsev ( https://www.youtube.com/watch?v=l8VnZCcl9J4 ), and addressed some of Rand’s experiments.
    
    Eric brought it up first at 10:23 – https://youtu.be/l8VnZCcl9J4?t=10m22s – specifically giving the context of both Rand’s CTR experiments *and* Paul Haar’s keynote that we’re talking about here. Andrey was quite clear on this one that Google do not use CTR in the way most people would run off and rave about in a thousand articles about how they use CTR.
    
    It is a quality control metric primarily, and is mostly about did this change to the SERP change how people interact with the results and if so, is it a positive or negative change.
    
    Rand immediately followed up with some of the specifics of his testing, such as the ones at conferences where he asks the audience to all do a search and then watch the results. (i.e. a spoken set-up, not tweeting or telling people a link to click, in which Google might simply be seeing the link). However, Andrey did suggest that people might still be tweeting the link or live-blogging the link they chose…
    
    I meanwhile was immediately thinking of ‘Burstiness’ in search, which goes right back to Jon Kleinberg’s work, but is also something Google have quite a number of patents on, some relating to search, and others more to the more traditional interpretation of burstiness, about spikes in data transfer.
    
    https://plus.google.com/113426828273677356184/posts/ervKAGe4Usw
    
    Certainly worth adding to the discussion.
  - AJ Kohn on April 5, 2016 at 1:54 pm
    
    Google’s actually done a fair amount of work on the issue of searches where the user doesn’t need to click. This paper by Scott Huffman (an equally important search quality engineer IMO) details ‘Good Abandonment’.
    
    I’d be surprised if they didn’t calculate the time to long click with this in mind. Whether they simply remove all non-click sessions or remove those where there isn’t a query reformulation (i.e. – that first result sucked so bad I have to search again) or actually classify some of the abandoned searches as long clicks isn’t clear.
    
    But you read that paper and you can see how much it informed the landscape. In particular, the heatmap image that details where they could use more ‘answerbox’ information is essentially a roadmap of what we’ve seen come to fruition.
    
    And I agree that RankBrain is sort of taking Hummingbird to the next level. I got the sense that Hummingbird was, in many ways, building the infrastructure for more elegant algorithms. I wrote a bunch about that when Hummingbird came out and I still think it holds up today.
    
    http://www.blindfiveyearold.com/what-does-the-hummingbird-say
    
    Finally, the rater guidelines were just updated. So the question is whether that means the algorithm was changed and they want the raters to validate those changes using different criteria or whether the raters are now going to be generating new training data so the algorithm can change based on that new criteria.
- Ammon Johns on April 3, 2016 at 1:31 am
  
  AJKohn wrote, “I’ve known about this for quite a while but the shards are essentially topical in nature from what I understand which should give SEOs a bit to think about in terms of what shard(s) your site is represented.”
  
  Shards are randomly distributed, according to the very presentation we are discussing.
  https://youtu.be/iJPu4vHETXw?t=7m23s is the appropriate point.
  
  I think Shards are randomised so that, statistically, if one shard were corrupted, unavailable, or simply over-loaded and slow or whatever, there would still be good relevant results from other shards. You don’t have to give *all* the results, so long as you can give good, satisfying results.
  - Rae on April 3, 2016 at 6:25 am
    
    In fairness – I watched that clip several times again just now – and even went back 20 seconds before it – I didn’t see any direct clarification that clearly agrees or disagrees with either of your take on them. However, what he does say is that when a query is performed, they send them to “all” of the shards and put emphasis on the word all.
    
    Ammon, I think the clip you meant to share was here: https://youtu.be/iJPu4vHETXw?t=5m55s
    
    In that clip, Paul says Shards are a “practical matter to deal with the scale of the web” and says that they are “distributed randomly,” so it would appear he’s saying Shards are created to improve performance by segmenting and not to theme by segmenting.
    - Ammon Johns on April 3, 2016 at 10:39 am
      
      Thanks, Rae. As you guessed, I kind of wanted both parts, but of the two, I felt the “send the query to all shards” was the more telling. Yes, I see shards much as you describe, as a practicality, allowing the process to be split into parallel processes.
      
      Personally, I’ve always wondered if there is some crossover/duplication in the shards to also add more to the ‘redundancy’ component.
      - Rae on April 3, 2016 at 5:51 pm
        
        I just wanted to make sure anyone following along saw the part where that was stated. :)
      - Grant Simmons on April 4, 2016 at 7:41 am
        
        @Ammon Shard RAID
    - AJ Kohn on April 5, 2016 at 2:02 pm
      
      That does seem to be what he’s saying here. So perhaps there isn’t a thematic nature in the shards. I’ve been lead to believe otherwise in some other conversations but there’s an equal chance that I might have heard what I wanted to hear.
      
      Topical relevance of a page and site comes into play elsewhere in the process, that’s very clear. But perhaps the classification comes into play in scoring and the actual index doesn’t give a damn.
Josh Levenson on April 1, 2016 at 9:12 pm

Funny I did something similar, though not as comprehensive.

Brand Bias – jumped out at me. Made me think there is a flaw in there somewhere and Aaron Wall is on to something.

CTR- There was a lot of squirming on stage with this question. The answer was obviously not no.

ORM gonna play a heavier role.
- Rae on April 2, 2016 at 7:10 am
  
  “and Aaron Wall is on to something”
  
  Aaron is almost always on to something. ;-)
  
  “ORM gonna play a heavier role”
  
  Are you saying that because of Paul’s comments that the positive or negative reputation of a company plays a role in ranking scores? I assume you mean from a “don’t be an asshole” perspective and not a “push bad comments about our company to the fourth page” perspective – since Google can see all.
  - Josh Levenson on April 2, 2016 at 8:37 pm
    
    Hey Rae,
    
    Yeah my comment on ORM, I was thinking sentiment analysis. Roger has an interesting take that is rather timely: https://www.searchenginejournal.com/5-strategies-unlocked-googles-quality-rating-guidelines/156806/
    - Rae on April 3, 2016 at 6:16 am
      
      Thanks Josh – I hadn’t seen that article. It was an interesting read, as usual coming from Roger.
Tony Dimmock on April 3, 2016 at 2:14 pm

Hi Rae,

Within the last 48 hours, Bill Slawski kindly tweeted me a number of blog posts he’s written that dissect specific patents that have been granted to Paul and that are pertinent to his presentation. See the link here: https://plus.google.com/+BillSlawski/posts/LZiFyypYtnw

This list also includes the full list of 50 patents granted to Paul, that I shared with you on Saturday.

I hope readers find them as useful as I have, although I’m still wading through a number of them :)
- Rae on April 3, 2016 at 5:41 pm
  
  Thanks Tony! I bookmarked it to go through later when you tweeted it, but for anyone following along, it’s definitely something for you to bookmark as well!
Janet on April 3, 2016 at 3:01 pm

Add me to the list of all who have expressed appreciation for a read-worthy article. You’ve pieced together so much info that had been mentally dangling. It’s nice to feel a bit more confident about how the pieces are related and their purpose.

I think some of us, who are not in the top tier of knowledge and experience, are reluctant to be asking the difficult questions because of the item listed as “Assessing Page Quality.”

Business managers want to hire professionals who are the “experts,” and they don’t understand how much we’re a work in progress, i.e. that we all learn through discussion. They want to see hard evidence of Expertise and Authoritativeness (not so much Trustworthiness, unfortunately). Most of us will never reach that level, but hiring us might still be a great idea if our skill set is what they need. To their thinking, if we were experts why would we be posing questions.

Kudos!
- Rae on April 3, 2016 at 5:47 pm
  
  Glad you got something out of it Janet. My biggest reason for sharing my notes was to get a conversation started, so that I – and others – could benefit from it. And yep, I get it, I do. And it’s a shame, because we can’t truly learn an ever changing system without asking questions. I think you can showcase expertise and authoritativeness while still being curious.
  
  Personally, I wouldn’t want to hire an SEO that thought they knew it all. I’ve been doing this almost two decades and I still learn something new – and sometimes even get reminded of something I forgot – every month that goes by. Unfortunately, some people tend to judge hires by flashy powerpoints and where they have bylines, instead of the results they achieve – and that is definitely their loss. And usually, most of them eventually figure it out after wasting a lot of money.
  - Ammon Johns on April 3, 2016 at 7:44 pm
    
    Just as Rae says, most of the people who truly know anything about this game simply laugh at anyone who calls themselves an ‘expert’ (right along with the ‘gurus’ and the ‘ninjas’).
    
    Information Retrieval is still very young, developing daily, and the entire scope is shifting constantly. Not only is there still so very much to discover about what we already have, but the vista of what we have keeps expanding exponentially.
    
    We are all students, even those on the other side of the search engines, studying together, experimenting, discovering, and comparing notes. The Google engineers who go to talk at conferences often remark on the value they get in return, in terms of hints at what isn’t working right, and detailed feedback on things they are trying.
    
    Across the years, I’ve maintained an interesting set of testimonials on my site. Like most other people’s, there are a lot of comments about how smart or insightful I’ve been. Which, don’t get me wrong, is very nice. But the reason I have always made it so prominent is because the people giving those comments are all smart, knowledgeable people themselves. It s intended to show that we all ask questions and get advice and insights from others. The top echelons just acknowledge it more.
    
    The expectation of knowing it all is a myth, and often may be the sign of someone who really doesn’t even know what they don’t know. The most valuable resource of almost all of the top SEOs I know is their contacts list, of people they trust for a second opinion, or a specific area of excellence.
Patrick Hathaway on April 4, 2016 at 10:47 am

Rae, this is awesome. I had already seen the slides but did not realise the video is available, and this discussion is adding even more value.

My understanding the whole ‘Metrics’ part was about Google measuring how good Google is. So, for example, ‘Time to result’ is about ‘how fast do we retrieve and display the results’ rather than ‘how fast is an individual page from our results.’

The other two major points from that slide were Relevance and Quality, which are also the two scores that quality raters are used to judge pages on. It seems that these are based on individual pages from the results, as we saw from his examples.

I think that the reciprocal ranking system is used along with these quality rater scores in order for them to give an overall value to a particular SERP.

So in a rudimentary form, this might look like this – http://screencast.com/t/HDY4Wbfz (a lot of assumptions have gone into that, obviously).

So using the reciprocal weighting, they can judge the effectiveness of the entire results page, rather than individual results. So the search engineers would be aiming to increase this ‘SERP Score’, as I called it in my example.
Ammon Johns on April 4, 2016 at 12:16 pm

Ammon, out of curiosity to know your take – what would you see as being the difference if Rankbrain is indeed a ranking signal and not a ranking factor? ~ Rae

I think the simplest way I personally break it down is that a ranking signal is the thing itself, which may vary in value, while a Ranking Factor is that thing written into math and baked into the algo.

Signals may not always be used, but a factor is always a factor. That Factor can still be calculated to be worth zero, to make no difference, but it was always in the calculation.
Ian on May 3, 2016 at 10:00 am

Quick ping and an update: Google just published a paper about CTR measurement and ranking signals:

http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45286.pdf
Serp on October 18, 2016 at 6:20 am

Great post Rae,

Google is an amazing product used by so many everyday. Unfortunately many of them have no idea how it works and earns money.

A few of my clients used to believe that they will have pay Google money to get organic rankings. I don’t blame them. After all its our job to educate them and make internet work for their business.