John Mueller from Google gave one of the clearest and easiest to understand explanations on how Google uses machine learning in web search. He basically said Google uses it for "specific problems" where automation and machine learning can help improve the outcome. The example he gave was with canonicalization and the example clears things up.

This is from the Google webmaster hangout starting at 37:47 mark. The example is this "So, for example, we use machine learning for canonicalization. So what that kind of means is we have all of those factors that we talked about before. And we give them individual weights. That's kind of the traditional way to do it. And we say well rel canonical has this much weight and redirect has this much weight and internal linking has this much weight. And the traditional approach would be to say well we will just make up those weights, at those numbers and see if it works out. And if we see that things don't work out we will tweak those numbers a little bit. And with machine learning what we can essentially do is say well this is the outcome that we want to have achieved and machine learning algorithms should figure out these weights on their own."

This was the first part of the answer around how Google debugs its search algorithm.

Here is the full transcript of this part.

The question:

Machine learning has been a part of Google search algorithm and I can imagine it's getting smarter every day. Do you as an employee with access to the secret files know the exact reason why pages rank better than others or is the algorithm now making decisions and evolving in a way that makes it impossible for humans to understand?

John's full answer:

We get this question every now and then and we're not allowed to could provide an answer because the machines are telling us not to talk about this topic. So it's I really can't answer. No just kidding.

It's something where we use machine learning in lots of ways to help us understand things better. But machine learning isn't just this one black box that does everything for you. Like you feed the internet in on one side the other side comes out search results. It's a tool for us. It's essentially a way of testing things out a lot faster and trying things out figuring out what the right solution there is.

So, for example, we use machine learning for canonicalization. So what that kind of means is we have all of those factors that we talked about before. And we give them individual weights. That's kind of the traditional way to do it. And we say well rel canonical has this much weight and redirect has this much weight and internal linking has this much weight. And the traditional approach would be to say well we will just make up those weights, at those numbers and see if it works out. And if we see that things don't work out we will tweak those numbers a little bit. And with machine learning what we can essentially do is say well this is the outcome that we want to have achieved and machine learning algorithms should figure out these weights on their own.

So it's not so much that machine learning does everything with canonicalization on its own but rather it has this well-defined problem. It's working out like what are these numbers that we should have there as weights and kind of repeatedly trying to relearn that system and understanding like on the web this is how people do it and this is where things go wrong and that's why we should choose these numbers.

So when it comes to debugging that. We still have those numbers, we still have those weights there. It's just that they're determined by machine learning algorithms. And if we see that things go wrong then we need to find a way like how could we tell the machine learning algorithm actually in this case we should have taken into account, I don't know phone numbers on a page more rather than just the pure content, to kind of separate like local versions for example. And that's something that we can do when we kind of train these algorithms.

So with all of this machine learning things, it's not that there's one black box and it just does everything and nobody knows why it does things. But rather we try to apply it to specific problems where it makes sense to automate things a little bit in a way that saves us time and that helps to pull out patterns that maybe we wouldn't have recognized manually if we looked at it.

Here is the video embed:

Here is how Glenn Gabe summed it up on Twitter:

Glenn Gabe@glenngabe
Glenn Gabe@glenngabe

More from @johnmu: Machine learning helps us pull out patterns we might have missed. And for debugging, Google can see those weights which are determined by ML algos. If there is something that needs to be improved, Google can work to train the algorithms: https://www.youtube.com/watch?v=5QxYWMEZT3A&t=38m53s 

[Source: This article was published in seroundtable.com By Barry Schwartz - Uploaded by the Association Member: Robert Hensonw]

Categorized in Search Engine

Source: This article was published internetofbusiness.com By Malek Murison - Contributed by Member: Carol R. Venuti

Facebook has announced a raft of measures to prevent the spread of false information on its platform.

Writing in a company blog post on Friday, product manager Tessa Lyons said that Facebook’s fight against fake news has been ongoing through a combination of technology and human review.

However, she also wrote that, given the determination of people seeking to abuse the social network’s algorithms for political and other gains, “This effort will never be finished and we have a lot more to do.”

Lyons went on to announce several updates and enhancements as part of Facebook’s battle to control the veracity of content on its platform. New measures include expanding its fact-checking programme to new countries and developing systems to monitor the authenticity of photos and videos.

Both are significant in the wake of the Cambridge Analytica fiasco. While fake new stories are widely acknowledged or alleged to exist on either side of the left/right political divide, concerns are also growing about the fast-emerging ability to fake videos.

Meanwhile, numerous reports surfaced last year documenting the problem of teenagers in Macedonia producing some of the most successful viral pro-Trump content during the US presidential election.

Other measures outlined by Lyons include increasing the impact of fact-checking, taking action against repeat offenders, and extending partnerships with academic institutions to improve fact-checking results.

Machine learning to improve fact-checking

Facebook already applies machine learning algorithms to detect sensitive content. Though fallible, this software goes a long way toward ensuring that photos and videos containing violence and sexual content are flagged and removed as swiftly as possible.

Now, the company is set to use similar technologies to identify false news and take action on a bigger scale.

In part, that’s because Facebook has become a victim of its own success. With close to two billion registered users, one billion regularly active ones, and over a billion pieces of content posted every day, it’s impossible for human fact-checkers to review stories on an individual basis, without Facebook employing vast teams of people to monitor citizen behavior.

Lyons explained how machine learning is being used, not only to detect false stories but also to detect duplicates of stories that have already been classed as false. “Machine learning helps us identify duplicates of debunked stories,” she wrote.

“For example, a fact-checker in France debunked the claim that you can save a person having a stroke by using a needle to prick their finger and draw blood. This allowed us to identify over 20 domains and over 1,400 links spreading that same claim.”

The big-picture challenge, of course, is that real science is constantly advancing alongside pseudoscience, and new or competing theories constantly emerge, while others are still being tested.

Facebook is also working on technology that can sift through the metadata of published images to check their background information against the context in which they are used. This is because while the fake news is a widely known problem, the cynical deployment of genuine content, such as photos, in false or deceptive contexts can be a more insidious problem.

Machine learning is also being deployed to recognise where false claims may be emanating from. Facebook filters are now actively attempting to predict which pages are more likely to share false content, based on the profile of page administrators, the behavior of the page, and its geographical location.

Internet of Business says

Facebook’s moves are welcome and, many would argue, long overdue. However, in a world of conspiracy theories – many spun on social media – it’s inevitable that some will see the evidenced, fact-checked flagging-up of false content as itself being indicative of bias or media manipulation.

In a sense, Facebook is engaged in an age-old battle, belief versus evidence, which is now spreading into more and more areas of our lives. Experts are now routinely vilified by politicians, even as we still trust experts to keep planes in the sky, feed us, teach us, clothe us, treat our illnesses, and power our homes.

Many false stories are posted on social platforms to generate clicks and advertising revenues through controversy – hardly a revelation. However, red flags can automatically be raised when, for example, page admins live in one country but post content to users on the other side of the world.

“These admins often have suspicious accounts that are not fake, but are identified in our system as having suspicious activity,” Lyons told Buzzfeed.

An excellent point. But some media magnates also live on the other side of the world, including – for anyone outside of the US – Mark Zuckerberg.

Categorized in Social

Source: This article was published phys.org - Contributed by Member: Logan Hochstetler

As scientific datasets increase in both size and complexity, the ability to label, filter and search this deluge of information has become a laborious, time-consuming and sometimes impossible task, without the help of automated tools.

With this in mind, a team of researchers from Lawrence Berkeley National Laboratory (Berkeley Lab) and UC Berkeley is developing innovative machine learning tools to pull contextual information from scientific datasets and automatically generate metadata tags for each file. Scientists can then search these files via a web-based search engine for scientific , called Science Search, that the Berkeley team is building.

As a proof-of-concept, the team is working with staff at the Department of Energy's (DOE) Molecular Foundry, located at Berkeley Lab, to demonstrate the concepts of Science Search on the images captured by the facility's instruments. A beta version of the platform has been made available to Foundry researchers.

"A tool like Science Search has the potential to revolutionize our research," says Colin Ophus, a Molecular Foundry research scientist within the National Center for Electron Microscopy (NCEM) and Science Search Collaborator. "We are a taxpayer-funded National User Facility, and we would like to make all of the data widely available, rather than the small number of images chosen for publication. However, today, most of the data that is collected here only really gets looked at by a handful of people—the data producers, including the PI (principal investigator), their postdocs or graduate students—because there is currently no easy way to sift through and share the data. By making this raw data easily searchable and shareable, via the Internet, Science Search could open this reservoir of 'dark data' to all scientists and maximize our facility's scientific impact."

The Challenges of Searching Science Data

Today, search engines are ubiquitously used to find information on the Internet but searching  data presents a different set of challenges. For example, Google's algorithm relies on more than 200 clues to achieve an effective search. These clues can come in the form of keywords on a webpage, metadata in images or audience feedback from billions of people when they click on the information they are looking for. In contrast, scientific data comes in many forms that are radically different than an average web page, requires context that is specific to the science and often also lacks the metadata to provide context that is required for effective searches.

At National User Facilities like the Molecular Foundry, researchers from all over the world apply for time and then travel to Berkeley to use extremely specialized instruments free of charge. Ophus notes that the current cameras on microscopes at the Foundry can collect up to a terabyte of data in under 10 minutes. Users then need to manually sift through this data to find quality images with "good resolution" and save that information on a secure shared file system, like Dropbox, or on an external hard drive that they eventually take home with them to analyze.

Oftentimes, the researchers that come to the Molecular Foundry only have a couple of days to collect their data. Because it is very tedious and time-consuming to manually add notes to terabytes of scientific data and there is no standard for doing it, most researchers just type shorthand descriptions in the filename. This might make sense to the person saving the file but often doesn't make much sense to anyone else.

"The lack of real metadata labels eventually causes problems when the scientist tries to find the data later or attempts to share it with others," says Lavanya Ramakrishnan, a staff scientist in Berkeley Lab's Computational Research Division (CRD) and co-principal investigator of the Science Search project. "But with machine-learning techniques, we can have computers help with what is laborious for the users, including adding tags to the data. Then we can use those tags to effectively search the data."

To address the metadata issue, the Berkeley Lab team uses machine-learning techniques to mine the "science ecosystem"—including instrument timestamps, facility user logs, scientific proposals, publications and file system structures—for contextual information. The collective information from these sources including the timestamp of the experiment, notes about the resolution and filter used and the user's request for time, all provide critical contextual information. The Berkeley lab team has put together an innovative software stack that uses machine-learning techniques including natural language processing pull contextual keywords about the scientific experiment and automatically create metadata tags for the data.

For the proof-of-concept, Ophus shared data from the Molecular Foundry's TEAM 1 electron microscope at NCEM that was recently collected by the facility staff, with the Science Search Team. He also volunteered to label a few thousand images to give the machine-learning tools some labels from which to start learning. While this is a good start, Science Search co-principal investigator Gunther Weber notes that most successful machine-learning applications typically require significantly more data and feedback to deliver better results. For example, in the case of search engines like Google, Weber notes that training datasets are created and machine-learning techniques are validated when billions of people around the world verify their identity by clicking on all the images with street signs or storefronts after typing in their passwords, or on Facebook when they're tagging their friends in an image.

Berkeley Lab researchers use machine learning to search science data
This screen capture of the Science Search interface shows how users can easily validate metadata tags that have been generated via machine learning or add information that hasn't already been captured. Credit: Gonzalo Rodrigo, Berkeley Lab

"In the case of science data only a handful of domain experts can create training sets and validate machine-learning techniques, so one of the big ongoing problems we face is an extremely small number of training sets," says Weber, who is also a staff scientist in Berkeley Lab's CRD.

To overcome this challenge, the Berkeley Lab researchers used to transfer learning to limit the degrees of freedom, or parameter counts, on their convolutional neural networks (CNNs). Transfer learning is a machine learning method in which a model developed for a task is reused as the starting point for a model on a second task, which allows the user to get more accurate results from a smaller training set. In the case of the TEAM I microscope, the data produced contains information about which operation mode the instrument was in at the time of collection. With that information, Weber was able to train the neural network on that classification so it could generate that mode of operation label automatically. He then froze that convolutional layer of the network, which meant he'd only have to retrain the densely connected layers. This approach effectively reduces the number of parameters on the CNN, allowing the team to get some meaningful results from their limited training data.

Machine Learning to Mine the Scientific Ecosystem

In addition to generating metadata tags through training datasets, the Berkeley Lab team also developed tools that use machine-learning techniques for mining the science ecosystem for data context. For example, the data ingest module can look at a multitude of information sources from the scientific ecosystem—including instrument timestamps, user logs, proposals, and publications—and identify commonalities. Tools developed at Berkeley Lab that uses natural language-processing methods can then identify and rank words that give context to the data and facilitate meaningful results for users later on. The user will see something similar to the results page of an Internet search, where content with the most text matching the user's search words will appear higher on the page. The system also learns from user queries and the search results they click on.

Because scientific instruments are generating an ever-growing body of data, all aspects of the Berkeley team's science search engine needed to be scalable to keep pace with the rate and scale of the data volumes being produced. The team achieved this by setting up their system in a Spin instance on the Cori supercomputer at the National Energy Research Scientific Computing Center (NERSC). Spin is a Docker-based edge-services technology developed at NERSC that can access the facility's high-performance computing systems and storage on the back end.

"One of the reasons it is possible for us to build a tool like Science Search is our access to resources at NERSC," says Gonzalo Rodrigo, a Berkeley Lab postdoctoral researcher who is working on the natural language processing and infrastructure challenges in Science Search. "We have to store, analyze and retrieve really large datasets, and it is useful to have access to a supercomputing facility to do the heavy lifting for these tasks. NERSC's Spin is a great platform to run our search engine that is a user-facing application that requires access to large datasets and analytical data that can only be stored on large supercomputing storage systems."

An Interface for Validating and Searching Data

When the Berkeley Lab team developed the interface for users to interact with their system, they knew that it would have to accomplish a couple of objectives, including effective search and allowing human input to the machine learning models. Because the system relies on domain experts to help generate the training data and validate the machine-learning model output, the interface needed to facilitate that.

"The tagging interface that we developed displays the original data and metadata available, as well as any machine-generated tags we have so far. Expert users then can browse the data and create new tags and review any machine-generated tags for accuracy," says Matt Henderson, who is a Computer Systems Engineer in CRD and leads the user interface development effort.

To facilitate an effective search for users based on available information, the team's search interface provides a query mechanism for available files, proposals and papers that the Berkeley-developed machine-learning tools have parsed and extracted tags from. Each listed search result item represents a summary of that data, with a more detailed secondary view available, including information on tags that matched this item. The team is currently exploring how to best incorporate user feedback to improve the models and tags.

"Having the ability to explore datasets is important for scientific breakthroughs, and this is the first time that anything like Science Search has been attempted," says Ramakrishnan. "Our ultimate vision is to build the foundation that will eventually support a 'Google' for scientific data, where researchers can even  distributed datasets. Our current work provides the foundation needed to get to that ambitious vision."

"Berkeley Lab is really an ideal place to build a tool like Science Search because we have a number of user facilities, like the Molecular Foundry, that has decades worth of data that would provide even more value to the scientific community if the data could be searched and shared," adds Katie Antypas, who is the principal investigator of Science Search and head of NERSC's Data Department. "Plus we have great access to machine-learning expertise in the Berkeley Lab Computing Sciences Area as well as HPC resources at NERSC in order to build these capabilities."

Categorized in Online Research

Understanding the impact of machine learning will be crucial to adjusting our search marketing strategies -- but probably not in the way you think. Columnist Dave Davies explains.

There are many uses for machine learning and AI in the world around us, but today I’m going to talk about search. So, assuming you’re a business owner with a website or an SEO, the big question you’re probably asking is: what is machine learning and how will it impact my rankings?

The problem with this question is that it relies on a couple of assumptions that may or may not be correct: First, that machine learning is something you can optimize for, and second, that there will be rankings in any traditional sense.

So before we get to work trying to understand machine learning and its impact on search, let’s stop and ask ourselves the real question that needs to be answered:

What is Google trying to accomplish?

It is by answering this one seemingly simple question that we gain our greatest insights into what the future holds and why machine learning is part of it. And the answer to this question is also quite simple. It’s the same as what you and I both do every day: try to earn more money.

This, and this alone, is the objective — and with shareholders, it is a responsibility. So, while it may not be the feel-good answer you were hoping for, it is accurate.

Author:  Dave Davies

Source:  http://searchengineland.com/heck-machine-learning-care-265511

Categorized in Others

Dr. Chris Brauer will tell the Globes Israel Business Conference that computers will free us to be more  creative, but warns that machines are making unexplained decisions.

"Sorry I'm late," Dr. Chris Brauer apologizes. "I was preparing a bot for a bank, and it got a little crazy. We had to correct it."

"Globes": How does a bot go crazy?

Brauer: "When you give a learning machine too much freedom, or when you let the wrong people work on it, you get an unpredictable, inefficient machine that is sometimes racist."

This statement began the meeting with Brauer, who owns a creative media consultant firm, and founded the Centre for Creative and Social Technologies at Goldsmiths University of London. He will address next week's Globes Israel Business Conference in Tel Aviv. He immediately explains: "A  bot is actually software that learns how to respond through interactions with its surroundings. We teach it how to respond to a given number of situations, and it is then supposed to make deductions from these  examples, and to respond to new situations. It receives feedback from its decisions - if it was right - and improves its decision the next time according to the feedback."

This is similar to how a child is taught to recognize a dog, so that the definition will include all types of dogs, but not all the other animals having four legs and a tail. First he is shown a dog and told, "This is a dog," and then he is allowed to point to dogs, cats, and ferrets in the street. Only when he correctly points to a dog is he told that he was right, and his ability to identify a dog improves.

When the pound fell with no real reason

"Every bot has different degrees of freedom," Brauer says. "It can be restricted by setting many hard and fast rules in advance what it is and isn't allowed to do, but then you get a rather hidebound bot that does not benefit from all the advantages of machine learning. On the other hand, if you allow it too free a hand, the bot is liable to make generalizations that we don't like." One example is Google's bot, which mistakenly labels certain people as animals.

"We also have to decide who is entitled to teach the bot," Brauer continues. "If we let an entire community of participants prepare the bot and give it feedback, we get a very effective and powerful bot within a short time and with little effort. This, however, is like sending a child to a school where you know nothing about the teachers or the study plan. Sometimes, the community will teach the bot undesirable things, and sometimes it does this deliberately. That's what happened, for example, when Microsoft wanted to teach its bot to speak like a 10 year-old child. Microsoft sent it to talk with real little girls, but someone took advantage of this by deliberately teaching the bot terrible wordsthat destroyed it rather quickly."

People are dangerous to machines, and machines are dangerous to people.

"Absolutely. Machines were responsible, for example, for the drop in the pound following the Brexit events, and the process by which they did this is not completely clear to all those involved to this day. It is clear, however, that the pound fell sharply without people having made an active decision that this should be the pound's response to Brexit. It simply happened one day all of a sudden because of the machines. Only when they investigated this did they discover that the fall had occurred right around the time when a certain report was published in the "Financial Times." No one thought that this report said anything previously unknown, but for some reason, it was news to this machine.

"The mystery is that we don't know what in this report caused the machines to sell the pound at the same moment, what information was in the report, or what was the wording that drove the machines to sell. In a world in which machines are responsible for 90% of trading, they don't wait for approval from us, the human beings. They act first, and don't even explain afterwards."

New experts on relations

Brauer says that such incidents created a need for "machine relations experts" people whose job is to try to predict how certain actions by a person will affect how machines making decisions about him or her will act.

For example, Brauer now works with a public relations company. The job of such a company is to issue announcements to newspapers written in a manner that will grab the attention of human readers, and especially the attention of journalists whose job is to process these reports and use them as a basis for newspaper stories. This, however, is changing. Today, a sizeable proportion of press releases pass through some kind of machine filter before they get to the journalists. In the future, this will be the norm. "Because of the large amount of information and the need to process it at super-human speed, we have to delegate a large proportion of our decisions to machines," Brauer explains. "A journalist who doesn't let a machine sort press releases for him, and insists on sorting them by himself, will not produce the output required of him."

The public relations firms will therefore have to write the press release so that it will catch the attention of a machine, not a journalist. In the near future, people will jump through hoops trying to understand the machine reading the press releases in order to tailor the press releases to it. Later, the machine will also be doing the writing, or at least will process the press release into a language that other machines like. Bit by bit, people are losing control of the process, and even losing the understanding of it - the transparency.

This is true not only for journalists. For example, take profiles on dating websites. Machines are already deciding which profiles will see which people surfing the site. In the future, there will be advisors for helping you devise a profile for the website that a computer, not necessarily your prince charming, will like, because if you don't do this, the computer won't send prince charming the profile. You can hope as much as you want that your special beauty will shine out, and the right man will spot it, but not in the new era. If it doesn't pass the machine, it won't get you a date.

That's also how it will be in looking for a job when a machine is the one matching employers and CVs, or between entrepreneurs and investors. Even today, when you throw a joke out into space (Facebook or Twitter space, of course), and you want someone to hear it, it first has to please a machine.

"We're talking about a search engine optimization (SEO) world. Up until now, we have improved our websites for their benefit. Tomorrow, it will be the entire world," Brauer declares.

To get back to the "Financial Times" Brexit story, public relations firms also have to speak with machines, and journalists also have to realize that they're talking with machines, and that the stories they write activate many machines whose actions are not necessarily rational.

"That's right. A reporter must know that what he writes can directly set machines in motion, in contrast with human readers, who are supposed to exercise some kind of judgment. The press may be more influential in such a world, if that's what it wants."

That sounds frightening.

"I'd like people to begin designing the machines so that we will at least be able to retrospectively understand what led them to make a given decision. There should be documentation for every machine, an 'anti-machine' that will follow and report what's happening in real time, so that people can intervene and tell the algorithm, 'I saw you, algorithm! I know what you tried to do!' I want to believe that in 2025, there will be no more events like Brexit, in which months afterwards, we still haven't understood why the machines acted the way they did."

People are superior to machines

The world of 2025 will be the subject of one of the sessions at the Israel Business Conference that will take place in December, in which Brauer will take part. As a former developing technologies consultant for PricewaterhouseCoopers and the owner now of his own consultant company (he isdirector of Creative Industries at investment bank Clarity Capital), the need to flatter machines is only one of his technological predictions.

"The Internet of Things is expected to substantially alter the energy industry," Brauer says. "We are seeing a change in the direction of much better adaptation of energy consumption to the consumers' real needs, and differential energy pricing at times when it is in short supply. For example, the dishwasher will operate itself at times when energy is cheap and available, and will be aware of availability in real time, because all the devices will be connected, not just for the user's convenience in a smart home. When it is working, this dishwasher will also be able to consume energy from 15 different suppliers, and to automatically switch between them. It will change the energy companies' market from top to bottom, because like all of us, they too will be marketing to your machine, not to you."

Will the machines leave work for people, other than as machine relations managers, of course?

"We have always known that technology increases output. This happens mainly in places where decisions are deterministic, for example in medicine, where treatment takes place in the framework of a clear protocol. In such a world, there is ostensibly no need for a doctor, or at least, not for many doctors. The few that will remain will be the technology controllers, or will be consulted only in the difficult cases that a machine can't solve. You can see that the new technology improves employees' output. Instagram has attained a billion dollar value with only 12 employees, and they reach the same number of people as 'The New York Times'.

"People see this, and are fearful, but I say, 'Let's regard this period as our emergence from slavery.' You could say that up until now, because we didn't have the technology we really wanted, too many people worked in imitation of machines, and that detracted from their ability to be human beings. Now we can let the machines be machines, and people will prosper in all sorts of places where creativity is needed that is beyond a machine's capability. People will flourish when they are able to think critically, with values and nuances, about every good database they get from machines. People will do what they were always meant to do."

Is everyone cutout for these professions? Will they have enough work?

"I don't believe that any person is only capable of thinking like a machine. Our society has made them develop this way by pushing people into doing a machine's work. We're now learning how to change education and the enterprise environment so that all people will be able to do the work of people."

In order to prove his point, Brauer examined an algorithm that writes newspaper stories with a senior "Financial Times" journalist (not the one that pushed down the pound; a different journalist named Sara O'Connor). "The machine issued quite a good summary that included all the important facts, but it missed the real story, what all the human readers agreed was the story after reading Sara's story. That's what a good reporter does sees the contexts that are not immediately accessible, asks the right question, and fills in what's missing. This, at least, will characterize the reporter of the future, and it will be the same with all the other professions. Anyone who rises above all of them today in professional creativity, whether it's a politician or an accountant, will be the model for how this profession will appear in the future."

And all humans will have to work with machines in order to achieve the output expected of them.

"Anyone who doesn't will be useless. They will have no place in the hyper-productive future."

Author:  Gali Weinreb

Source:  http://www.globes.co.il/

Categorized in Internet Technology

One of the biggest buzzwords around Google and the overall technology market is machine learning. Google uses it with RankBrain for search and in other ways. We asked Gary Illyes from Google in part two of our interview how Google uses machine learning with search.

Illyes said that Google uses it mostly for “coming up with new signals and signal aggregations.” So they may look at two or more different existing non-machine-learning signals and see if adding machine learning to the aggregation of them can help improve search rankings and quality.

He also said, “RankBrain, where … which re-ranks based on based on historical signals,” is another way they use machine learning, and later explained how RankBrain works and that Penguin doesn’t really use machine learning.

Danny Sullivan: These days it seems like it’s really cool for people to just say machine learning is being used in everything.

Gary Illyes: And then people freak out.

Danny Sullivan: Yeah. What is it, what are you doing with machine learning? Like, so when you say it’s not being used in the core algorithm. So no one’s getting fired. The machines haven’t taken over the algorithm, you guys are still using an algorithm. You still have people trying to figure out the best way to process signals, and then what do you do with the machine learning; is [it] part of that?

Gary Illyes: They are typically used for coming up with new signals and signal aggregations. So basically, let’s say that this is a random example and not know if this is real, but let’s say that I would want to see if combining PageRank with Panda and whatever else, I don’t know, token frequency.

If combining those three in some way would result in better ranking, and for that for example, we could easily use machine learning. And then create the new composite signal. That would be one example.

The other example would be RankBrain, where… which re-ranks based on based on historical signals.

But that also is, if you, if you think about it, it’s also a composite signal.

It’s using several signals to come up with a new multiplier for the results that are already ranked by the core algorithm.

What else?

Barry Schwartz: Didn’t you first use it as a query refinement? Right? That’s the main thing?

Gary Illyes: I don’t know that … ?

Barry Schwartz: Wasn’t RankBrain all about some type of query understanding and…

Gary Illyes: Well, making sure that for the query we are the best possible result, basically, it is re-ranking in a way.

Barry Schwartz: Danny, did you understand RankBrain to mean, maybe it was just me, to mean, alright someone searched for X, but RankBrain really makes [it] into Xish? And then the queries would be the results.

Danny Sullivan: When it first came out, my understanding was [that] RankBrain was being used for long-tail queries to correspond them to short short answers. So somebody comes along and says, Why is the tide super-high sometimes, when I don’t understand — the moon seemed to be very big, and that’s a very unusual query, right? And Google might be going, OK, there’s a lot going on here. How do unpack this and to where, and then getting the confidence and using typical things where you’d be like, OK, we’ll see if we have all these words you have a link to whatever. Meanwhile, really what the person is saying is why is the tide high when the moon is full. And that is a more common query. And Google probably has much more confidence in what it’s ranking when it deals with that, and my understanding [is that] RankBrain helped Google better understand that these longer queries coresponded basically to the shorter queries where it had a lot of confidence about the answers.

That was then, that was like what, a year ago or so? At this point, Gary, when you start talking that re-ranking, is that the kind of the re-ranking you’re talking about?

Gary Illyes: Yeah.

Danny Sullivan: OK.

Barry Schwartz: All right. So we shouldn’t be classifying all these things as RankBrain, or should we? Like it could be other machine learning.

Gary Illyes: RankBrain is one component in our ranking system. There are over 200, as we said in the beginning, signals that we use and what each of them might become like machine learning-based.

But when you or I don’t expect that any time soon or in the foreseeable future all of them would become machine learning based. Or that’s what we call the core algorithm would become machine learning-based. The main reason for that is that debugging machine learning decisions or AI decisions, if you want, if you like, is incredibly hard, especially when you have … multiple layers of neural networks. It becomes close to impossible to debug a decision. And that’s very bad for us. And for that we try to develop new ways to to track back decisions. But if it can easily obfuscate issues, and that would limit our with our ability to improve search in general.

Barry Schwartz: So when people say Penguin is now an old machine learning-based…

Gary Illyes: Penguin is not ML.

Barry Schwartz: OK, there’s a lot of people saying that Penguin [is] machine learning-based.

Gary Illyes: Of course they do. I mean if you think about it, it’s a very sexy word. Right. And if you publish it…

Danny Sullivan: People use it in bars and online all the time. Like hey, machine learning. Oh yeah.

Gary Illyes: But basically, if you publish an article with a title like machine learning is now in Penguin or Penguin generated by machine learning it’s like…. But if you publish an article with that title it’s much more likely that people could click on that title, and well, probably come up with the idea that you are insane or something like that. But it’s much more likely they would visit your site than if you publish something with a title Penguin has launched.

Source : searchengineland

Categorized in Search Engine

Google has produced a car that drives itself and an Android operating system that has remarkably good speech recognition. Yes, Google has begun to master machine intelligence. So it should be no surprise that Google has finally started to figure out how to stop bad actors from gaming its crown jewel – the Google search engine. We say finally because it’s something Google has always talked about, but, until recently, has never actually been able to do.

With the improved search engine, SEO experts will have to learn a new playbook if they want to stay in the game.

SEO Wars

In January 2011, there was a groundswell of user complaints kicked off by Vivek Wadwa about Google’s search results being subpar and gamed by black hat SEO experts, people who use questionable techniques to improve search-engine results. By exploiting weaknesses in Google’s search algorithms, these characters made search less helpful for all of us.

We have been tracking the issue for a while. Back in 2007, we wrote about Americans experiencing “search engine fatigue,” as advertisers found ways to “game the system” so that their content appeared first in search results (read more here). And in 2009, we wrote about Google’s shift to providing “answers,” such as maps results and weather above search results.

Even the shift to answers was not enough to end Google’s ongoing war with SEO experts. As we describe in this CNET article from early 2012, it turns out that answers were even easier to monetize than ads. This was one of the reasons Google has increasingly turned to socially curated links.

In the past couple of years, Google has deployed a wave of algorithm updates, including Panda and Panda 2, Penguin, as well as updates to existing mechanisms such as Quality Deserved Freshness. In addition, Google made it harder to figure out what keywords people are using when they search.

The onslaught of algorithm updates has effectively made it increasingly more difficult for a host of black hat SEO techniques — such as duplicative content, link farming and keyword stuffing — to work. This doesn’t mean those techniques won’t work. One look into a query like “payday loans” or ‘‘viagra” proves they still do. But these techniques are now more query-dependent, meaning that Google has essentially given a pass for certain verticals that are naturally more overwhelmed with spam. But for the most part, using “SEO magic” to build a content site is no longer a viable long-term strategy.

The New Rules Of SEO

So is SEO over? Far from it. SEO is as important as ever. Understanding Google’s policies and not running afoul of them is critical to maintaining placement on Google search results.

With these latest changes, SEO experts will now need to have a deep understanding of the various reasons a site can inadvertently be punished by Google and how best to create solutions needed to fix the issues, or avoid them altogether.

Here’s what SEO experts need to focus on now:

Clean, well-structured site architecture. Sites should be easy to use and navigate, employ clean URL structures that make hierarchical sense, properly link internally, and have all pages, sections and categories properly labeled and tagged.

Usable Pages. Pages should be simple, clear, provide unique value, and meet the average user’s reason for coming to the page. Google wants to serve up results that will satisfy a user’s search intent. It does not want to serve up results that users will visit, click the back button, and select the next result.

Interesting content. Pages need to have more than straight facts that Google can answer above the search results, so a page needs to show more than the weather or a sports score.

No hidden content. Google sometimes thinks that hidden content is meant to game the system. So be very careful about handling hidden items that users can toggle on and off or creative pagination.

Good mobile experience. Google now penalizes sites that do not have a clean, speedy and presentable mobile experience. Sites need to stop delivering desktop web pages to mobile devices.

Duplicate content. When you think of duplicate content you probably think of content copied from one page or site to another, but that’s not the only form. Things like a URL resolving using various parameters, printable pages, and canonical issues can often create duplicate content issues that harm a site.

Markup. Rich snippets and structured data markup will help Google better understand content, as well as help users understand what’s on a page and why it’s relevant to their query, which can result in higher click-through rates.

Google chasing down and excluding content from bad actors is a huge opportunity for web content creators. Creating great content and working with SEO professionals from inception through maintenance can produce amazing results. Some of our sites have even doubled in Google traffic over the past 12 months.

So don’t think of Google’s changes as another offensive in the ongoing SEO battles. If played correctly, everyone will be better off now.


Categorized in Search Engine


World's leading professional association of Internet Research Specialists - We deliver Knowledge, Education, Training, and Certification in the field of Professional Online Research. The AOFIRS is considered a major contributor in improving Web Search Skills and recognizes Online Research work as a full-time occupation for those that use the Internet as their primary source of information.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.