Wikipedia developers have sketched out designs for a Wikipedia Search Engine, which would give users a one-click replacement for Google search. The search engine could also be embedded in devices such as the Kindle, or smartphones.

It’s an fascinating strategic option, and an aggressive one. Google’s site scraping algorithms and front page Info Box have made visiting Wikipedia’s page superfluous, if all the user wants quick facts, or a factoid. Instead of finding Wikipedia through Google, you could bypass Google completely.

The concepts were revealed after much sleuthing by Andreas Kolbe, board member of Wikipedia’s Signpost and occasional Reg contributor.

Most of the staff employed by the cash-rich Wikimedia Foundation work in software development, a fact acknowledged by the appointment of an experienced software exec, Lila Treitkov, to run the non-profit outfit. In recent years, the unpaid Wikipedia volunteers who create the content had complained about the tools WMF produced for them, even going so far as to reject them.

One of WMF’s responses is controversial; the Knowledge Engine project, described by Kolbe. It’s shrouded in secrecy, and Knowledge Engine was cited by former community board member James Heilman as being the cause of disagreements between himself and the WMF board. Heilman was dismissed from his post during the Christmas holidays. It’s also caused disquiet because it was funded not from donations, but by a restricted grant from the Knight Foundation. WMF has not published the grant application in full, only excerpts.

From these some “deliverables” have been made public, in which we can find "an improved search engine and API for Wikipedia searches” – but a more ambitious search is explicitly denied:

Are you building a new search engine?

We are not building Google. We are improving the existing CirrusSearch infrastructure with better relevance, multi language, multi projects search and incorporating new data sources for our projects. We want a relevant and consistent experience for users across searches for both wikipedia.org and our project sites.

In 2008, Wikipedia co-founder Jimmy Wales attempted to create a for-profit search Google rival that cashed in on WIkipedia’s brand – called Wikia – but it failed to achieve scale, and was shut down after a year.

The new designs show how Wikipedia.org could be "reimagined", incorporating the Knowledge Engine, to provide a Google-style search engine.

But what would Wikipedia actually search?

With more than five million articles, WMF developers have a wealth of content. So would a Google-style Wikipedia search page or app need to index anything else?

Perhaps not. Since all the world is contained in Wikipedia (or a peculiarly warped representation of the world, at least) then its map is as good as the territory.

And there’s more on Wikipedia than many people think.

Wales is currently debating with contributors the merits of embedding the entire porn movie Debbie Does Dallas in the Wikipedia entry for the film.

Jimbo isn’t keen.

He warns: "It is very easy to imagine a really stupid press story or campaign against us about this. 'Wikipedia embeds porn movies in article content' gives people entirely the wrong impression of what we are about. Why invite that?,” asks Wales.

But he hasn’t been paying close attention, it turns out. It already is.

"The movie was embedded in Debbie Does Dallas so that readers could choose to play it right in the Wikipedia article. For reasons I do not understand, an edit war broke out...,” explains contributor ‘Right Hand Drive’.

"Readers of an article about a pornograohic [sic] movie should not be surprised to see a pornographic movie on Wikipedia,” he continues. "Did you take a look at A Free Ride which has included a pornographic movie in the article since 2012? Can you explain why Debbie Does Dallas is any different?"

So why go outside Wikipedia for any of your needs? It’s all there. With porn on tap, Wikipedia Search could be a winner. ®

Author : Andrew Orlowski

Source : http://www.theregister.co.uk/2016/02/11/wikipedia_search_engine/

Categorized in Search Engine


In the course ‘Methodology for Urbanism’ we discuss why Wikipedia cannot be considered a reliable academic source. This is because Wikipedia is not “peer reviewed”. Peer reviewing means that to be accepted as authoritative, a text must be reviewed by a team of recognized specialists in that specific field of studies.

Wikipedia is indeed “peer reviewed” but the problem here is that the people contributing to Wikipedia are not backed by any scientific institution that guaranties their credentials (even though some of them are true authorities in their fields).

This generates all kinds of uncertainties.  But does this mean that we should avoid Wikipedia at all costs? Not at all.

Wikipedia is great to find FACTUAL INFORMATION that can be quickly TRIANGULATED. The kinds of verification and review mechanisms put in place by the Wikipedia Foundation are generally effective (but not always) and also generally result in reliable information (but again, not always).  The primary questions answered with factual information are WHAT?, WHERE?, HOW MANY? and WHO? (but again, this is disputable, as even these questions may result in different answers according to different sources and world views).

WIKIPEDIA cannot be used to gather ANALYTICAL INFORMATION, in which someone “analyses and interprets facts to form an opinion or come to a conclusion. The primary questions answered with analytical information are WHY? or HOW?”, according to the ODU Library Services Website.

WIKIPEDIA is a tremendous SOCIAL EXPERIENCE, where thousands of people contribute to a collective description and understanding of different issues. Besides, the comprehensiveness of the information contained in Wikipedia is impressive. The reality is, students make use of Wikipedia all the time.

However, we want to encourage you to go beyond Wikipedia and use other more authoritative sources.  If you want to be scientific, you must then go further and TRIANGULATE  your information. You also need to look for data in authoritative sources, which have been checked by people working in recognized education or research institutions. A good place to start is GOOGLE SCHOLAR. It will lead you to scientific papers published by responsible editors. You should also look into TU DELFT INSTITUTIONAL REPOSITORY of thesis and reports. And of course, you should look into the collection of SCIENTIFIC JOURNALS at the TU DELFT LIBRARY. These are BY FAR the best sources of reliable, relevant analytical information! 


Categorized in Internet Technology

(ISNS) -- Wikipedia isn't just a website that helps students with their homework and settles debates between friends. It can also help researchers track influenza in real time.

A new study released in April in the journal PLOS Computational Biology showcased an algorithm that uses the number of page views of select Wikipedia articles to predict the real-time rates of influenza-like illness in the American population.

Influenza-like illness is an umbrella term used for illnesses that present with symptoms like those of influenza, such as a fever. These illnesses may be caused by the influenza virus, but they can have other causes as well. The Centers for Disease Control and Prevention publish data on the prevalence of influenza-like illness based off a number of factors like hospital visits, but the data takes two weeks to come out, so it's of little use to governments and hospitals that want to prepare for influenza outbreaks.

The researchers compared the results from their algorithm to past data from the CDC and found that it predicted the incidence of influenza-like illness in America within 1 percent of the CDC data from 2007 to 2013.

The algorithm monitored page views from 35 different Wikipedia articles, including "influenza" and "common cold."

"We also included a few things such as 'CDC' and the Wikipedia main page so we could glean the background level of Wikipedia usage," said David McIver, one of the authors of the study and a researcher at Harvard Medical School. Those terms helped make the algorithm more accurate, even during the 2009 swine flu pandemic.

Google Flu Trends, a similar tool for tracking influenza developed by Google, came under criticism recently when it overestimated illnesses during the swine flu pandemic and the 2012-2013 flu season. Scientific experts and journalists attributed the miscalculation to increased media coverage of the flu during those periods. Google's tool, which uses Internet search terms to monitor influenza's spread, did not account for increased web searches by healthy individuals that may have been prompted by the increased media coverage.

McIver's model attempts to account for this by assessing the background usage of Wikipedia. Additionally, a recent paper in Science suggests that Google Flu Trends could become more accurate over time with more data.

Some also lobbed criticism at Google for keeping their algorithms for Google Flu Trends a trade secret. McIver and his colleague, John Brownstein, wanted their algorithm to be all open-source.

"We initially decided to go with Wikipedia because all of their data is open and free for everyone to use. We really wanted to make a model where everyone could look at the data going in and change it as they saw fit for other applications," McIver said.

The benefits of tracking influenza-like illness in real time are huge, McIver added.

"The idea is the quicker we can get the information out, the easier it is for officials to make choices about all the resources they have to handle," he said.

Such choices involve increasing vaccine production and distribution, increasing hospital staff, and general readiness "so we can be prepared for when the epidemic does hit," McIver said.

The Wikipedia model is one of many such tools, but is not without its limitations. Firstly, it can only track illness at the national level because Wikipedia only provides page views by nation.

The model also assumes that one visitor will not make multiple visits to one Wikipedia article. There is also no way to be sure that someone is not visiting the article for their general education, or if they really have the flu.

Nonetheless, the model still matches past CDC data in the prevalence of influenza- like illness in the U.S.

"This is another example of these types of algorithms that are trying to glean signals from using social media," said Jeffrey Shaman, professor of environmental health sciences at Columbia University, in New York. "There are all these ways that we might get some lines on what's going on."

He said he was interested to see how well the model would do to predict future flu seasons, especially compared to Google.

Shaman and his colleagues use data from past influenza seasons to try and predict future ones, using models similar to those used by weather forecasters.

"They're not any sort of replacement for the basic surveillance that needs to be done," he said of the Wikipedia model, Google Flu Trends, and similar tools. "I like them and they're great tools and I use them all the time, but we still don't have a gold standard of monitoring influenza."

"Right now the attitude is the more the merrier so long as they're done well," Shaman said.

McIver echoed similar sentiments, "People need to remember that these sorts of technologies are not designed to be replacements for the traditional methods. We're designing them to work together – we'd rather combine all the information."

Cynthia McKelvey is a science writer based in Santa Cruz, California. She tweets at @NotesofRanvier. 


Categorized in Online Research


World's leading professional association of Internet Research Specialists - We deliver Knowledge, Education, Training, and Certification in the field of Professional Online Research. The AOFIRS is considered a major contributor in improving Web Search Skills and recognizes Online Research work as a full-time occupation for those that use the Internet as their primary source of information.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.