As Google Scholar approaches its 10th anniversary, Nature spoke to its co-creator Anurag Acharya

Google Scholar, the free search engine for scholarly literature, turns ten years old on November 18. By 'crawling' over the text of millions of academic papers, including those behind publishers' paywalls, it has transformed the way that researchers consult the literature online. In a Nature survey this year, some 60% of scientists said that they use the service regularly. Nature spoke with Anurag Acharya, who co-created the service and still runs it, about Google Scholar's history and what he sees for its future.

How do you know what literature to index?

'Scholarly' is what everybody else in the scholarly field considers scholarly. It sounds like a recursive definition but it does settle down. We crawl the whole web, and for a new blog, for example, you see what the connections are to the rest of scholarship that you already know about. If many people cite it, or if it cites many people, it is probably scholarly. There is no one magic formula: you bring evidence to bear from many features.

Where did the idea for Google Scholar come from?

I came to Google in 2000, as a year off from my academic job at the University of California, Santa Barbara. It was pretty clear that I was unlikely to have a larger impact [in academia] than at Google — making it possible for people everywhere to be able to find information. So I gave up on academia and ran Google’s web-indexing team for four years. It was a very hectic time, and basically, I burnt out.

Alex Verstak [Acharya’s colleague on the web-indexing team] and I decided to take a six-month sabbatical to try to make finding scholarly articles easier and faster. The idea wasn’t to produce Google Scholar, it was to improve our ranking of scholarly documents in web search. But the problem with trying to do that is figuring out the intent of the searcher. Do they want scholarly results or are they a layperson? We said, “Suppose you didn’t have to solve that hard a problem; suppose you knew the searcher had a scholarly intent.” We built an internal prototype, and people said: “Hey, this is good by itself. You don’t have to solve another problem — let’s go!” Then Scholar clearly seemed to be very useful and very important, so I ended up staying with it.

Was it an instant success?

It was very popular. Once we launched it, usage grew exponentially. One big difference was that we were relevance-ranking [sorting results by relevance to the user’s request], which scholarly search services had not done previously. They were reverse-chronological [providing the newest results first]. And we crawled the full text of research articles, though we did not include the full text from all the publishers when we started.

It took years in some cases to convince publishers to let you crawl their full text. Was that hard?

It depends. You have to think back to a decade ago, when web search was considered lightweight — what people would use to find pictures of Britney Spears, not scholarly articles. But we knew people were sending us purely academic queries. We just had to persuade publishers that our service would be used and would bring them more traffic. We were working with many of them already before Google Scholar launched, of course.

In 2012 Google Scholar was removed from the drop-down menu of search options on Google’s home page. Do you worry that Google Scholar might be downgraded or killed?

No. Our team is continually growing, from two people at the start to nine now. People may have treated that menu removal as a demotion, but it wasn’t really. Those menu links are to help users get from the home page to another service, so they emphasize the most-used transitions. If users already know to start with Google Scholar, they don’t need that transition. That’s all it was.

How does Google Scholar make money?

Google Scholar does not currently make money. There are many Google services that do not make a significant amount of money. The primary role of Scholar is to give back to the research community, and we are able to do so because it is not very expensive, from Google’s point of view. In terms of volume of queries, Google Scholar is small compared to many Google services, so opportunities for advertisement monetization are relatively small. There’s not been pressure to monetize. The benefits that Scholar provides, given the number of people who are working on it, are very significant. People like it internally — we are all, in part, ex-academics.

How many queries does Google Scholar get every day, and how much literature does the service track? (Estimates place it anywhere from 100 million to 160 million scholarly items).
I’m unable to tell you, beyond a very, very large number. The same answer for the literature, except that the number of items indexed has grown about an order of magnitude since we launched. A lot of people wonder about the size. But this kind of discussion is not useful — it’s just 'bike-shedding'. Our challenge is to see how often people are able to find the articles they need. The index size might be a concern here if it was too small. But we are clearly large enough.

Google Scholar has introduced extra services: author profile pages and a recommendations engine, for instance. Is this changing it from a search engine to something closer to a bibliometrics tool?

Yes and no. A significant purpose of profiles is to help you to find the articles you need. Often you don’t remember exactly how to find an article, but you might pivot from a paper you do remember to an author and to their other papers. And you can follow other people’s work — another crucial way of finding articles. Profiles have other uses, of course. Once we know your papers, we can track how your discipline has evolved over time, the other people in the scholarly world that you are linked to, and can even recommend other topics that people in your field are interested in. This helps the recommendations engine, which is a step beyond [a search engine].

Are you worried about the practice known as gaming — people creating fake papers, getting them indexed by Google, and gaining fake citations?

Not really. Yes, you can add any papers you want. But everything is completely visible — articles in your profiles, articles citing yours, where they are hosted, and so on. Anyone in the world can call you on it, basically killing your career. We don’t see spam for that very reason. I have a lot of experience dealing with spam because I used to work on web search. Spam is easier when people are anonymous. If I am trying to build a publication history for my public reputation, I will be relatively cautious. 

What features would you like to see in the future?

We are very good at helping people to find the articles they are looking for and can describe. But the next big thing we would like to do is to get you the articles that you need, but that you don’t know to search for. Can we make serendipity easier? How can we help everyone to operate at the research frontier without them having to scan over hundreds of papers — a very inefficient way of finding things — and do nothing else all day long?

I don’t know how we will make this happen. We have some initial efforts on this (such as the recommendations engine), but it is far from what it needs to be. There is an inherent problem to giving you information that you weren’t actively searching for. It has to be relevant — so that we are not wasting your time — but not too relevant, because you already know about those articles. And it has to avoid short-term interests that come and go: you look up something but you don’t want to get spammed about it for the rest of your life. I don’t think getting our users to ‘train’ a recommendations model will work — that is too much effort.

(For more on recommendation services, see 'How to tame the flood of literature', in Nature's Toolbox section.)

What about helping people search directly for scientific data, not papers?

That is an interesting idea. It is feasible to crawl over data buried inside paywalled papers, as we do with full text. But then if we link the user to the paywalled article, they don’t see this data — just the paper’s abstract. For indexing full-text articles, we depend on that abstract to let users estimate the probable utility of the article. For data we don't have anything similar. So as a field of scholarly communication, we haven’t yet developed a model that would allow for a useful data-search service.

Many people would like to have an API (Application Programming Interface) in Google Scholar, so that they could write programs that automatically make searches or retrieve profile information, and build services on top of the tool. Is that possible?

I can’t do that. Our indexing arrangements with publishers preclude it. We are allowed to scan all the articles, but not to distribute this information to others in bulk. It is important to be able to work with publishers so we can continue to build a comprehensive search service that is free to everybody. That is our primary function, and everything else is in addition to this.

Do you see yourself working at Google Scholar for the next decade?

I didn’t expect to work on Google Scholar for ten years in the first place! My wife reminds me it was supposed to be five, then seven years — and now I’m still not leaving. But this is the most important thing I know I can do. We are basically making the smartest people on the planet more effective. That’s a very attractive proposition, and I don’t foresee moving away from Google Scholar any time soon, or any time easily.

Does your desire for a free, effective search engine go back to your time as a student at the Indian Institute of Technology Kharagpur?
It influenced the problems that appealed to me. For example, there is no other service that indexes the full texts of papers even when the user can see only the abstract. The reason I thought this was an important direction to go in was that I realised users needed to know the information was there. If you know the information is in a paywalled paper, and it is important to you, you will find a way in: you can write to the author, for instance. I did that in Kharagpur — it was really ineffective and slow! So my experiences informed the approach I took. But at this point, Google Scholar has a life of its own. 

Should people who use Google Scholar have concerns about data privacy?

We use the standard Google data-collection policies — there is nothing different for Scholar. My role at Google is focused on Google Scholar. So I am not going to be able to say more about broader issues.

Source: This article was published scientificamerican.com By Richard Van Noorden

Published in Search Engine

The academic world is supposed to be a bright-lit landscape of independent research pushing back the frontiers of knowledge to benefit humanity.

Years of fingernail-flicking test tubes have paid off by finding the elixir of life. Now comes the hard stuff: telling the world through a respected international journal staffed by sceptics.

After drafting and deleting, adding and revising, the precious discovery has to undergo the ritual of peer-reviews. Only then may your wisdom arouse gasps of envy and nods of respect in the world’s labs and lecture theatres.

The goal is to score hits on the international SCOPUS database (69 million records, 36,000 titles – and rising as you read) of peer-reviewed journals. If the paper is much cited, the author’s CV and job prospects should glow.

SCOPUS is run by Dutch publisher Elsevier for profit.

It’s a tough track up the academic mountain; surely there are easier paths paved by publishers keen to help?

Indeed – but beware. The 148-year old British multidisciplinary weekly Nature calls them “predatory journals” luring naive young graduates desperate for recognition.

‘Careful checking’

“These journals say: ‘Give us your money and we’ll publish your paper’,” says Professor David Robie of New Zealand’s Auckland University of Technology. “They’ve eroded the trust and credibility of the established journals. Although easily picked by careful checking, new academics should still be wary.”

Shams have been exposed by getting journals to print gobbledygook papers by fictitious authors. One famous sting reported by Nature had a Dr. Anna O Szust being offered journal space if she paid. “Oszust” is Polish for “a fraud”.

Dr Robie heads AUT’s Pacific Media Centre, which publishes the Pacific Journalism Review, now in its 23rd year. During November he was at Gadjah Mada University (UGM) in Yogyakarta, Central Java, helping his Indonesian colleagues boost their skills and lift their university’s reputation.

The quality of Indonesian learning at all levels is embarrassingly poor for a nation of 260 million spending 20 percent of its budget on education.

The international ranking systems are a dog’s breakfast, but only UGM, the University of Indonesia and the Bandung Institute of Technology just make the tail end of the Times Higher Education world’s top 1000.

There are around 3500 “universities” in Indonesia; most are private. UGM is public.

UGM has been trying to better itself by sending staff to Auckland, New Zealand, and Munich, Germany, to look at vocational education and master new teaching strategies.

Investigative journalism

Dr. Robie was invited to Yogyakarta through the World Class Professor (WCP) programme, an Indonesian government initiative to raise standards by learning from the best.

Dr. Robie lectured on “developing investigative journalism in the post-truth era,” researching marine disasters and climate change. He also ran workshops on managing international journals.

During a break at UGM, he told Strategic Review that open access – meaning no charges made to authors and readers – was a tool to break the user-pays model.

AUT is one of several universities to start bucking the international trend to corral knowledge and muster millions. The big publishers reportedly make up to 40 percent profit – much of it from library subscriptions.

Prof-David-Robie-being-presented-with-UGM-koha-400wide academic - AOFIRS

Pacific Journalism Review’s Dr. David Robie being presented with a model of Universitas Gadjah Mada’s historic main building for the Pacific Media Centre at the editor's workshop in Yogyakarta, Indonesia.

According to a report by AUT digital librarians Luqman Hayes and Shari Hearne, there are now more than 100,000 scholarly journals in the world put out by 3000 publishers; the number is rocketing so fast library budgets have been swept away in the slipstream.

In 2016, Hayes and his colleagues established Tuwhera (Māori for “be open”) to help graduates and academics liberate their work by hosting accredited and refereed journals at no cost.

The service includes training on editing, presentation and creating websites, which look modern and appealing. Tuwhera is now being offered to UGM – but Indonesian universities have to lift their game.

Language an issue
The issue is language and it’s a problem, according to Dr. Vissia Ita Yulianto, researcher at UGM’s Southeast Asian Social Studies Centre (CESASS) and a co-editor of IKAT research journal. Educated in Germany she has been working with Dr. Robie to develop journals and ensure they are top quality.

“We have very intelligent scholars in Indonesia but they may not be able to always meet the presentation levels required,” she said.

“In the future, I hope we’ll be able to publish in Indonesian; I wish it wasn’t so, but right now we ask for papers in English.”

Bahasa Indonesia, originally trade Malay, is the official language. It was introduced to unify the archipelagic nation with more than 300 indigenous tongues. Outside Indonesia and Malaysia it is rarely heard.

English is widely taught, although not always well. Adrian Vickers, professor of Southeast Asian Studies at Sydney University, has written that “the low standard of English remains one of the biggest barriers against Indonesia being internationally competitive.

“… in academia, few lecturers, let alone students, can communicate effectively in English, meaning that writing of books and journal articles for international audiences is almost impossible.”

Though the commercial publishers still dominate there are now almost 10,000 open-access peer-reviewed journals on the internet.

“Tuwhera has enhanced global access to specialist research in ways that could not previously have happened,” says Dr Robie. “We can also learn much from Indonesia and one of the best ways is through exchange programmes.”

This article was first published in Strategic Review and is republished with the author Duncan Graham’s permission. Graham blogs at indonesianow.blogspot.co.nz

Published in How to

When social media site Academia.edu hit the scene in 2008, it was hailed by some scholars as an alternative to pricey academic publishing and peer-review models. Professors and researchers could use the social platform to share their work, findings and ideas—most importantly, at no cost. For some users who have recently visited Academia.edu however, things are looking different.

http://Academia.edu now asking $ for ability to search within full-text. This is crazy... My guess they just shot themselves in the foot

The website has set up a paywalled “advanced search” option, accessible only through Academia.edu’s new premium subscription, which costs $9 per month or $99 per year. The feature allows paying users to find exact keyword matches within the full-text of papers on Academia. Without a subscription, the website’s search engine still retrieves articles with headlines containing the searched keywords.

While the website’s existing features—including uploading and downloading articles, peer-review sessions and recommendations—remain free, the company has received backlash for its new pricing model.

“Open access to an ocean of articles without the ability to search through them is meaningless,” a self-described academic mathematician wrote in a recent Hacker News thread started by Ben Lund, chief technology officer at Academia.edu. The discussion initiated in response to an article that appeared on Diggit Magazine by a university lecturer (and the publication's editor-in-chief) who discovered the blocked feature while searching for materials for his students.

Screenshot 1

Academia.edu first introduced Academia Premium and its accompanying paid search features last December. In a blog post on Medium, the company wrote that the purpose of the subscription was to “help make all scholarship and science easily and freely accessible to everyone, not just those affiliated with well-endowed institutions.” Establishing a premium account was also part of the company’s effort to become a more “sustainable operation,” CEO Richard Price wrote in a subsequent blog post in March.

“Running any site at scale has costs and you have to figure out how to pay for that,” CEO and founder Richard Price tells EdSurge. He adds that nearly 19 million papers exist and are free on the platform, and that 40 percent of users are from from developing countries with limited access to research.

Academia Premium has yet to roll out to all users. According to Price, only a “random” selection of users are currently offered the service, which also includes alerts when a subscriber’s name is mentioned in an article, analytics to see who is reading an author’s work, and a personal website option that creates a unique URL for someone’s Academia profile. 

The CEO also says premium subscription will eventually be available to all users, but is targeted for “academics themselves.”

“Our most avid users are authors, the people who are building their careers in academia,” says Price. “The premium features we have built so far are more useful to them.”

But skeptical scholars say the perks are just another way of monetizing academic information for an elite few. “The new [premium] feature is academic class politics to a new level—and it only promotes the further stratification of the academy,” Sarah Bond, an assistant professor at the University of Iowa, wrote for Forbes in January.

Critics have pointed out holes in the business plan, too. “This feature isn't going to be effective for you without some rethinking. People can simply search on Google: site:academia.edu "Potterheads" and retrieve all the results,” one user shared in the Hacker News thread.

Academia Premium isn’t the first cost to be introduced on Academia.edu, which claims 35 million people visit the site each month. Last January, the company proposed allowing authors to pay a fee that would boost article recommendations from the website’s editors. That move also garnered a slew of criticism, and the company eventually killed the idea.

This article was published in edsurge.com By Sydney Johnson

Published in Search Engine

If you’re performing work that requires in-depth sources, such as academic studies or a job that requires heavy research, finding quality sources can be hard. Using bad or shaky sources to prove points can cause a lot of trouble: it brings down the strength of the work as a whole and makes it harder to prove its point. Fortunately, we live in an age of easy-access information and education, and with that comes education search engines.

These specialist search engines focus less on providing general results to a search query and more on articles from academia and news. This makes them perfect choices for someone who needs solid, citable sources without much hassle. While there’s nothing particularly “incorrect” about using a search engine like Google or Bing to perform research, using education search engines will make sure to bring up dependable, informative articles that you can cite with confidence in your work.

What kind of education search engines are out there? Let’s take a look at five examples, each with their own fortes and ways of helping you perform top-quality research for your projects.

1. Google Scholar

educational-search-google-scholar

Don’t be mistaken; this isn’t just regular Google! This is a branch off of “regular” Google searches, called Google Scholar. Instead of a general search, you can use it to search books, studies, and even court cases.

On the main page, simply enter the search terms that you’re interested in looking up. Google Scholar will then go through its database and pick out relevant examples. If your research is very time-sensitive (such as technology), you can select options on the left to change how recent you want your sources to be, up to and including the current year.

If you’re writing a piece that has a strict sourcing style, Google Scholar gives you template cites for its sources. Find the template that suits the style standard, then simply copy it directly into your citations to save yourself some time.

2. RefSeek

educational-search-refseek

Currently in a public beta, RefSeek is a pretty solid choice for general research. It takes a more website-based approach, bringing up relevant but highly dependable websites for whatever you want to research. It’s a great way to pull up multiple articles relating to a specific object. For example, if you wanted to learn about computer processors, a search brings up lots of great articles.

educational-search-refseek-example

RefSeek does more than just searching, however; if you’re studying in a specific field, RefSeek also has a “directory” page which acts as a great directory of useful websites related to education. Once you choose the category you’d like to browse, RefSeek brings up a list of productive sites to help you with your studies.

educational-search-refseek-mathematics-tools

3. Citeulike

educational-search-citeulike

Citeulike is one of the more powerful education search engines if you’re looking for papers and studies specifically. After entering a search term, Citeulike brings up all the studies it has on the topic. If an article is regarded as “trusted” by Citeulike, it will have a tick-mark next to it. You can also see groups that are interested in your search term, see quick abstracts for each article before checking the full version, and hide all the details for quicker browsing.

educational-search-citeulike-example

Once you’ve found a paper you think you’d like, clicking on it will bring you to its page. Here, you can see all the websites the paper can be found on, export the article to different formats, and generate a citation template for that paper. This makes Citeulike highly useful if you want solid, dependable studies to read though and cite on your work.

4. iSeek

educational-search-iseek

iSeek is a powerful tool for finding studies in your area of interest. Don’t be fooled by the seemingly small results list – iSeek displays results in pages of 10, and if you searched something quite scientifically popular, there’s going to be a lot of pages on the topic. If the sheer amount of results overwhelm you, you have a selection of filters to apply on the left.

educational-search-iseek-example

Each result comes with a direct link to the source, as well as an option to email results to people. The sources can also be rated out of five stars by other users which can help you locate the more important sources for your research.

5. Virtual LRC

educational-search-lrc

Virtual LRC is an interesting website for research. While it operates mostly like any other engine, the real key to working with Virtual LRC is its filtering ability. There are a few categories at the top of the page after you search; by clicking these, you can filter the results using the category you selected. For example, if you search for “coffee,” you can click on “News/Opinion” for general news articles about coffee, “Health/Medicine” to read about the current positive and negative health effects of coffee, or “History” to learn about how coffee came to be. This makes it quite a diverse engine that can be used to display topics in specific viewpoints.

Study Well, Not Hard

No matter how much you love or hate researching facts, making it an easier task is always welcome. If you’re an avid fact-hunter, hopefully these education search engines will serve you well in your studies.

Author: Simon Batt
Source: https://www.maketecheasier.com/best-education-search-engines

Published in Search Engine

Look out Google Scholar—there’s a new kid on the block. Semantic Scholar, a free, online tool developed under the guidance of Microsoft cofounder Paul Allen, is using machine learning and other aspects of artificial intelligence (AI) to make the monumental task of parsing the scientific literature less onerous. Launched last year, Semantic Scholar can now comb through 10 million published research papers, its creators announced last week (November 11). “This is a game changer,” Andrew Huberman, a neurobiologist at Stanford University not involved in the project, told Nature. “It leads you through what is otherwise a pretty dense jungle of information.”

When the nonprofit Allen Institute for Artificial Intelligence (AI2) launched Semantic Scholar last November, the search engine indexed 3 million published research articles in the field of computer science. The service now searches 10 million papers in both computer science and neuroscience. “Semantic Scholar puts AI at the service of the scientific community,” Oren Etzioni, chief executive officer of AI2, said in the statement. “The brain continues to mystify the scientific and medical research community and harbors some of the diseases that are the most challenging to cure. Our hope is that the field of neuroscience can benefit from AI methods to ensure the best and most relevant studies are easily queried so medical research can move with maximum speed and efficiency.”

The main benefit of Semantic Scholar, which its creators say will soon be expanded to include the full biomedical literature, is that the AI-driven engine is able to understand the content and context of scientific papers, searching figures within an article, for example, rather than just listing its abstract and raw bibliographic data.

But early reports suggest that Semantic Scholar isn’t yet 100 percent debugged. “Looking at ‘most influential publications’ sometimes gives strange results,” Sam Gershman, a Harvard University computational neuroscientist told ScienceInsider. “For example, none of the most influential articles listed for [University of California, Berkeley, psychologist] Thomas Griffiths fall into his top five most cited articles.”

Warts and all, ScienceInsider recently took the search engine for a spin, instructing it to rank the top 10 most influential neuroscientists based on an analysis of their citation histories.

Source:  the-scientist.com

Published in Search Engine

AOFIRS

World's leading professional association of Internet Research Specialists - We deliver Knowledge, Education, Training, and Certification in the field of Professional Online Research. The AOFIRS is considered a major contributor in improving Web Search Skills and recognizes Online Research work as a full-time occupation for those that use the Internet as their primary source of information.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.