Web Directories

Rachel Bilson

Rachel Bilson

Just about every player in the Big Data and analytics game was in New York last week at the Strata + Hadoop World conference, to showcase their latest technologies.

Over 7,000 people attended the event where keynote speakers, including White House chief data scientist DJ Patil, laid out their visions for where machine learning, analytics, the Internet of Things, autonomous vehicles and smart cities will be taking us in the near future.

Here are a few of my highlights from the event and some of the major announcements from key players.

DJ Patil spoke about how during his time so far as the U.S.’s first chief data scientist, his mission has been to “responsibly unleash the power of data to benefit all Americans.”

He spoke about how Big Data and analytics is helping to reduce damage by opioid abuse, and the importance of openness.

“When the president first started in office there was about 10 [open] data sets put out there, now there are about 2,000”, he said. Whatever you think of Barack Obama’s presidency that is an impressive achievement, as it means that anyone from major corporations to armchair data scientists can now use data to develop new strategies and technologies to harness it.

Martin Hall, chief data scientist for Big Data solutions at Intel, told his audience that the explosion of interest and activity in Big Data means that “we now have the data, the analytics and the compute power to deliver more than insights – we can enable intelligence.”

The arrival of personalized medicine, autonomous cars and smart connected devices mean that we are now entering the age of AI, and rather than a simple Internet of Things, we are heading towards an Internet of Intelligent Things, with ever increasing levels of automation. This vision undoubtedly has huge implications on just about everything concerning how we live our lives, jobs, and how we interact and communicate with the world and each other. 


This year’s conference served as confirmation that streaming real time analytics has moved firmly into the mainstream of data science, and rather than simply being a pipe dream or end goal, is fast becoming a reality. Apache’s open source Kafka engine is seen as the driving force which is enabling this shift, and big players were keen to show their support for this particular piece of technology.

Cloudera – one of the biggest distributors of open source platforms – announced the upcoming release of version 5.9 of its own Hadoop distribution which will ship with Spark 2.0 for the first time, as well as the latest release of Apache Kudu, which is specifically tooled toward real time analytics. It also announced that its distribution will run on Microsoft’s Azure cloud infrastructure for the first time (as well as Amazon Web Services and Google’s Cloud Platform). It also announced a new pay-as-you-go pricing model which it will offer alongside its existing annual subscription model.

IBM’s rock, paper, scissors-playing robot Marvin was let loose to entertain the crowds. Taking on all comers and appearing to show, by the increasing win rate, that computers are increasing in their ability to predict our behavior. Marvin is powered by Apache Spark and you can see a brief video of him in action here. IBM also announced a new initiative known as Project Dataworks which aims to use Spark to make more data available for processing through its Watson cognitive computing engine.

SAP spoke about how their recently announced acquisition of Californian Big Data startup Altiscale will boost its HANA cloud service offerings, such as offering access to its cloud-based Spark services. It also showcased its Vora query engine which deploys machine learning to enhance contextual awareness of AI-driven Big Data operations.

These are just a few of the highlights from this year’s event – expect more from me soon, in the meantime if you were at Strata + Hadoop World, why not let me know what you were most excited by?

Bernard Marr is a best-selling author & keynote speaker. His new book: 'Big Data in Practice: How 45 Successful Companies Used Big Data Analytics to Deliver Extraordinary Results'

Original source of this article is forbes

Sunday, 25 September 2016 12:36

Search customization comes with consequences

Google is an amazingly powerful tool for finding information online.

Many of us use it daily in our personal and professional lives for all kinds of purposes. In its speedy and seamless way, Google retrieves web material based on keywords entered in its search box.

Although its relevancy ranking algorithm is a closely guarded trade secret and a big reason for Google’s success as the world’s most popular search engine, we know basically how it works. You simply type in words or phrases and Google will retrieve web sources that match those terms.

The ranking of those sources is based on such things as how many times your search-term words appear, where they appear (e.g. title), and how many other websites link to those sources.

Since Google is simply yet precisely executing a series of steps matching and weighting those terms, Donald Trump and Hillary Clinton should get identical search results if each entered the same words or phrases such as “ISIS” or “Black Lives Matter.” Or so we would think.

In 2011, political activist and web organizer Eli Pariser wrote the book "The Filter Bubble: How the New Personalized Web is Changing What We Read and How We Think." In it, he revealed that Google search results may in fact vary widely from user to user.

Why? Because in an effort to personalize your search results, Google will feed you sources that match your interests. And just how does Google know your interests? Because it maintains a log of all of your past Google searches and sites viewed, that’s how.

This kind of personalization of the web is widespread. Anyone who shops on Amazon or uses Netflix knows that those services review the items you have purchased or just simply browsed and then offer recommendations for other similar books or movies.

In the case of Google, most users realize that the ads that appear with their search results are connected to their own search history. But personalization also affects which sources are retrieved and the order in which they are displayed. While it may be seen as a benefit to have Google customize your search results, it comes with some serious consequences.

In its early days, the internet was seen as a marvelous way to broaden one’s world by making it easier to disseminate and retrieve information. The web seemed to embody the true spirit of democracy by providing free and equal access to information for all. And though that is still largely true, the filter bubble has had a substantial narrowing effect on the information we receive through web services.

A recent study by the Pew Internet Research Center showed that 62 percent of adults in the U.S. get their news from social media sites, and that 18 percent do often. And the leader of the social media pack is, as you might guess, Facebook. Earlier this year, a former Facebook employed charged that Facebook suppressed conservative stories from its news feed. After much media attention and a denial, CEO Mark Zuckerberg convened a group of conservatives to discuss the issue and build trust between them and Facebook.

Facebook recently made the news again when the New York Times reported last month that Facebook profiles its users by their political leanings, among other things. Like Google, Facebook knows every post or site you read or liked, every ad you followed, every Facebook friend you have, and categorizes you accordingly. To find out how Facebook has labeled you politically, go to http://nyti.ms/2bfm2gU.

Also significant is the amount of political information produced and shared exclusively within Facebook. There are numerous political organizations ranging from the Occupy Democrats to The Angry Patriot that host Facebook sites where they post their views. These posts may be shared, liked and thus circulated to a large readership. Taken all together, these sites reach a combined audience of tens of millions of people, comparable in size to that of CNN and the New York Times itself, who also reported this story in August.

The moral to this story is “user beware.” If you can spare nine minutes, watch Eli Pariser’s TED Talk on the filter bubble. It will forever change your view of the neutrality of the web and make you more aware of the type of information you are fed online.

Source : http://www.pressrepublican.com

Bing's making it easier to find song titles, as well as add Netflix and Amazon movie titles to your watchlist on mobile.

Bing has released new updates to its search app for iOS and Android, adding new music and video features, along with more map options.

The updated version 6.7.2 Bing search app will now play a video without sound directly in search results with the lyrics listed below the video — a feature that has been available on Bing desktop. The app also includes a new Music page listing trending songs and artists.

Bing search app update

According to a report on the Android Community website, Bing’s search app can also give the title to a song being played: “In case you’re looking for a specific song, you can use the Bing Search app by typing the name of the song or if something is playing, it will tell you what it is.”

In addition to updates around music, users can add movie titles to their Netflix and Amazon Prime watch lists from the app’s Movies page.

Bing has also added the option for users to pick their preferred map app for directions.

Other updates include a refreshed “reading mode” to make news pages “more enjoyable” and the ability to sign in with your Microsoft account to see search history on other logged-in devices.

Source : http://searchengineland.com/bing-search-app-ios-android-gets-new-music-video-map-features-257647

ONE OF THE most important laws protecting online speech is also one of the worst. You’ve probably heard of it. In 1998, President Bill Clinton passed the Digital Millennium Copyright Act, or DMCA. It’s the law that, for example, makes it all too easy for companies to have embarrassing content removed from sites like YouTube by issuing bogus takedown requests, claiming that the content violates their copyright—no presumption of innocence required. But the DMCA also contains one incredibly important section: the so-called safe harbor provision. Thanks to safe harbor, companies can’t be held liable for copyright violations committed by their users, so long as the companies take reasonable steps to ensure that repeat offenders are banned from their services. Post a pirated copy of Ghostbusters to YouTube via your Comcast Internet connection? That’s on you, the DMCA says, not on YouTube or Comcast.

Companies fearing they’ll lose their safe harbor might start policing the content posted by their users.

But after a recent court decision, that safe harbor doesn’t look so safe anymore.

Last week a federal judge ruled that cable Internet provider Cox Communications must pay $25 million in damages to BMG Rights Management, which controls the rights to the music of some of the world’s most popular artists. The court found that Cox was liable for the alleged copyright infringement carried out by its customers, safe harbor or not. The decision might not rattle the giants of the Internet business, like Comcast, Verizon, Google and Facebook–at least not yet. But it could be bad news for smaller companies that can’t afford such costly legal battles. And if companies start fearing they’ll lose their safe harbor, they might have to start more carefully policing the content posted by their users.

Turning Off Notifications

It’s hard to overstate the importance of the DMCA’s safe harbor provision to the growth of the early Internet. Had providers and platforms faced liability for what users published, far fewer social networks and web hosts would have existed because of the legal risk. Those that did exist would have had to carefully screen what users posted to ensure no copyright violations were taking place. In short, the DMCA, for all its problems, enabled the explosion of online speech over the past two decades.

But that explosion has not been kind to some businesses, such as the music industry, which has seen its margins erode since the 1990s due to peer-to-peer file sharing. To fight back, BMG in 2011 hired a company called Rightscorp to monitor file sharing networks and catch people illegally sharing music that belonged to BMG. Whenever Rightscorp believed it had detected a copyright violation, it would forward notifications to the offending user’s Internet provider. The twist was that Rightscorp added a bit of language to its letters offering to settle the copyright dispute if the user was willing to pay a fee of around $20 to $30 per infraction. Cox refused to forward these letters on to its users because it believed the settlement offers were misleading, arguing the notifications of infringement were not in and of themselves proof that a user had actually broken the law.

Rightscorp refused to alter the language of the letters, so Cox refused to process any further notifications from the company. In 2014, BMG sued Cox.

Last year, US District Court Judge Liam O’Grady judge found that by refusing to process Rightscorp’s requests, Cox had failed to live up to its responsibilities under the safe harbor provision, and therefore was not eligible for its protections. A jury found Cox liable for $25 million in damages. Cox filed for a new trial but O’Grady denied the request last week, allowing the previous decision to stand.

Just a Pipe

While the decision does not set a binding precedent, some open Internet advocates worry the decision could embolden copyright holders to sue smaller companies. A company like Google can afford expensive lawyers. It can invest in multi-million-dollar digital rights management software to keep offending content off its sites. But smaller ISPs or web sites can’t. “If safe harbor is for anyone, it’s for Internet service providers that do nothing but carry information from sites to specific homes,” says Charles Duan, staff attorney at Public Knowledge.

Safe harbor issues aside, BMG’s argument also depends on the idea that users should be denied Internet access because of the mere accusation of copyright infringement, even if the accuser has never proven in court that those users had actually broken the law.

“It doesn’t take into account all the things people use the Internet for,” says Mitch Stolz, a staff attorney with the Electronic Frontier Foundation. “People use it for their jobs, to interact with government. The circumstances in which it’s reasonable to cut someone off are narrower now than 20 years ago.”

However flawed it is, the DMCA enables online speech to flourish. But if the BMG case does become a precedent, online service providers of all types will have to crack down on their users—even if no one has proven in court that those users committed a crime. If you don’t like what someone has to say, you could accuse them of copyright violations and not only have a video banned from YouTube, but have that person kicked off the Internet entirely. That’s not a future in which the Internet flourishes.

Source : http://www.wired.com/2016/08/internets-safe-harbor-just-got-little-less-safe/

Calling it an “un-pivot,” Biz Stone is bringing back Jelly, the Q&A app he created in 2013. Launching today, the new and improved Jelly remains close to its roots, but with an added twist. This time, everything is anonymous so you can ask what you really want to know.

Referring to the new Jelly an “on-demand search engine,” Stone said that one lesson he learned from the original Jelly was that people didn’t necessarily want to ask questions to their social network. “Would you want your Googles to be your tweets?”

Some might draw similarities to Quora or Yahoo Answers, but Stone is hoping that this Jelly will be an alternative to Google.

“We think the future of search engines is just ask a question, get the answer,” he added. It’s “ten or 15 minutes you didn’t have to spend looking around on links.”

Users can sign up to answer questions on Jelly. People can rate whether responses were helpful. If someone receives a lot of positive feedback on a certain topic, they are more likely to be selected to answer future similar questions.

The new Jelly is optimized for mobile, but will also be available for desktop searches.

Jelly was founded by both Biz Stone and Ben Finkel in 2013. Backers included Spark Capital, Greylock, Jack Dorsey and Bono.

Says Stone of the change of plans. “We made a rookie mistake. We got talked into pivoting” Jelly into an opinion-sharing app called Super. So the Jelly co-founders decided to go back to “our original dream, our original vision.”


More than 484,000 Google keyword searches a month from around the world, including at least 54,000 searches in the UK, return results dominated by Islamist extremist material, a report into the online presence of jihadism has revealed.

The study found that of the extremist content accessible through these specific keyword searches, 44% was explicitly violent, 36% was non-violent and 20% was political Islamist in content, the last being non-violent but disseminated by known Islamist groups with political ambitions.

The study is one of the first to expose the role of the search engine rather than social media in drawing people to extremist jihadi material on the web. It argues the role of the search engine – a field dominated by Google – has been a blind spot that has been missed by those seeking to measure and counter extremist messages on the internet.

Although the UK government’s Prevent strategy claims the internet must not be ungoverned space for Islamist extremism and British diplomats have taken the lead in the global communications fight against Islamic State on the net, the study suggests government agencies are only at the beginning of a “labyrinthine challenge”. So-called counter-narrative initiatives led by governments and civil society groups are “under-resourced and not achieving sufficient natural interest”, suggesting the battle of ideas is not even being engaged, let alone won.

The study, undertaken jointly by Digitalis and the Centre on Religion and Geopolitics, will be challenged by those who claim it advocates censorship, has blurred the lines between political Islam and violent extremism and cannot validly quantify the presence of extremism.

But the findings come in a week in which there has been a spate of terrorist attacks in Germany and France, some undertaken by young people either radicalised on the internet, or using it to feed their obsession with violence. Many of the jihadist foreign fighters in Syria were radicalised online as “the search engine gradually overtakes the library and the classroom as a source of information”.

The study, entitled A War of Keywords: how extremists are exploiting the internet and what to do about it, argues “many of the legitimate mainstream Islamic scholarly websites host extremist material, including jihadi material, often without any warning or safeguards in place”.

It also argues non-violent Islamist organisations, such as Hizb ut-Tahrir, have a very strong online presence and dominate the results for some keyword searches. Some of the most popular search words used were crusader, martyr, kafir (non-believer), khilafa (a pan-Islamic state) or apostate.

In a condemnation of government efforts it finds very little of this content is challenged online. Analysing 47 relevant keywords, the search-engine analysis found counter-narrative content outperformed extremist content in only 11% of the results generated. For the search term khilafah, which has 10,000 global monthly searches, the ratio of extremist content to counter-narrative is nine to one.

This is partly because counter-narrative sites lack search engine optimisation so they do not rank high enough in searches, By contrast, Khilafa.com, the English website of Hizb ut-Tahrir, had more than 100,000 links into it.

The study also warns some of the most-used Muslim websites such as Kalmullah.com and WorldofIslam.info “host traditional Islamic content alongside extremist material” so are knowingly or unknowingly abusing the trust of their readers.

The study also claims a user can come across extremist content relatively easily while browsing for Islamic literature. Few effective restrictions apply to accessing Islamic State English-language magazine Dabiq or Inspire magazine, which is linked to al-Qaeda in the Arabian peninsula. Both are readily available to browse and download through clearing sites.

The study produced its headline numbers by looking at the average monthly number of global searches conducted in Google for 287 extremist-related keywords – 143 in English and 144 in Arabic. It then looked at two samples totalling 47 keywords, the first sample focused on the most-used words and the second sample on the keywords deemed to be most extremist. The research then analysed the first two pages thrown up by the search for these keywords.

The authors acknowledge the difficulties technology companies face in policing the results of their search engines. Google is responsible for 40,000 searches a second, 2.5 billion a day and 1.2 trillion a year worldwide. Facebook boasts more than one and a half billion users who create 5 billion likes a day.

Dave King, chief executive of Digitalis, argues: “While the company’s advertising model is based on automatically mining the content its users create, their ability to distinguish a single credible kill threat from the plethora who have threatened to kill in jest is highly limited.”

The study recommends governments, the United Nations, technology companies, civil society groups and religious organisations together establish a charter setting out a common definition of extremism and pledge to make the internet a safer place.

Technology companies, the report says, could work with governments to shift the balance of the online space, as well as share analytical data and trending information to bolster counter-efforts. It suggests search engine companies have been reluctant to or unable to alter the search algorithms that are responsible for search page rankings.

The authors also call for a debate on “the murky dividing line between violent and non-violent extremist material online”, arguing such legal definitions have been achieved over “copyrighted material, child pornography and hate speech all of which have been subject to removal requests.”

Exiisting content control software that prevents access to graphic or age-restricted material could be used and warning signals put on sites.

A Google spokesperson said: “We take this issue very seriously and have processes in place for removing illegal content from all our platforms, including search. We are committed to showing leadership in this area – and have been hosting counterspeech events across the globe for several years. We are also working with organisations around the world on how best to promote their work on counter-radicalisation online.”


Tuesday, 26 July 2016 08:48

Life Sciences Search Engine

A startup aims to make doing research easier by mining publications for research products, protocols, and potential collaborators.

A good reagent can be hard to find. It typically takes wading through journal articles and published protocols to determine how best to set up an experiment. But a new search engine, Bioz, is hoping to streamline that process. The Palo Alto startup this week (June 20) announced $3 million in seed funding to improve their venture, which is currently in beta.

The site works by using natural language processing and machine learning to mine papers for a treasure-trove of information. A user can enter in a method (“PCR”) or a tool (“DNA polymerase”) and Bioz identifies reagents, ranking them according to how many times they’ve been used in experiments, the impact factor of the journal in which the referenced papers were published, and how recently a product was used.

Each result links to the vendor’s webpage. Bioz receives a lead referral fee, making the service free for users.

In addition, the search engine suggests relevant assays and collaborators, and shows the article context that describes how certain reagents were used.Founder Karin Lachmi of Stanford explained the value of this feature: “Ok, I know what experiment I’m trying to do and I know the product, but now I want to go further,” she told Bio-IT World. “Should I use it at room temperature? Should I use a 1:1,000 dilution or a 1:200 dilution?”

More than 11,000 people across 40 countries are now using the search engine. “The business model for Bioz is around things you can buy, but there’s a subtext that you should be paying attention to everything,” Esther Dyson, who has invested in the company, told Tech Crunch. “And Bioz can help you find all those external factors you may not be noticing.”


Wednesday, 18 May 2016 01:28

Privacy Laws and Social Media Sites

Social media sites and privacy are somewhat inherently at odds. After all, the point of social media is to share your life with the world; the very opposite of maintaining your privacy. Still, there is a difference between sharing parts of your life and all of it. Thus, a number of legal lines have been drawn in the sand regarding privacy on social media sites.


While the sharing of social media may help us to feel closer with friends and family, even if they are far away, social media can create a number of problems, too. While pictures from a drunken night out with friends or soaking up sun in a skimpy bikini on the beach might be totally fine to share with your friends, you may not want employers or coworkers finding them. Similarly, you almost certainly do not want the world knowing your passwords or private messages with other people.


Until recently, there has been very little to protect those who either intentionally or accidentally share too much on social media. Prior to 2013, lawmakers were more concerned with gaining access to information on social media than protecting it from others. On the other hand, other nations around the world recognized the potential risks of social media much earlier than in the US and began acting laws to protect privacy much sooner. Still today, in the United States, only certain classes of information enjoy any sort of protection under federal law. They generally relate to things like financial transactions, health care records, and information about kids under the age of 13. Nearly everything else still remains fair game, provided it is obtained through legally acceptable means (i.e., not by virtue of a hacking attack, fraud, or other illegal activity).


Traditionally, two bodies have acted to protect the rights of those online: the Federal Trade Commission (FTC) and state attorneys general. However, throughout the development and rise of popularity of social media sites, these bodies have only acted to protect published privacy policies. If the site either claimed not to collect certain information, or merely omitted it from disclosures, the site itself might be subject to prosecution, but generally those third parties gaining access to that information legally were not. However, social media sites with vague privacy policies that did not clearly disclose which information it gathered and whether it sold that information, or sites that disclosed their practices of gathering and selling information (even if the disclosure was hard to find) were generally not subject to any sort of enforcement action.


Recently, though, the FTC has changed its philosophy on these matters, using its powers to enforce privacy policies on social media sites to force many social media sites into both monetary settlements and long-term consent order permitting the FTC to exercise greater control over the site’s policies.


States have had somewhat different experiences with social media laws. Attorneys general have had mixed results trying to enforce privacy policies, and even less success when trying to strong arm social media sites into offering tighter protections of user information. More than 45 jurisdictions around the US have some sort of data breach notice law requiring companies to disclose intentional or accidental disclosures of information. While these laws would generally encompass social media sites, as well, they are often excluded by special provisions because they are specifically designed to allow the users to share personal information with the larger public. Thus, many state laws are largely ineffectual when it comes to protecting one’s privacy rights under social media sites.


As social media sites grow in popularity and become increasingly central to the lives of Americans who use them, privacy intrusions have similarly grown increasingly common. Unfortunately, as is often the case with new technologies, the laws relating to those technologies lag years if not decades behind the developments themselves. States, with smaller legislatures and more agile means of enacting laws, are leading the way in creating new regulations, but many of these may suffer under the scrutiny of judicial review (particularly if they contradict existing federal laws). Additional legal changes will likely take place in the coming months and years, but true privacy on social media is likely not going to occur in the near future.


In the meantime, the best way to avoid privacy concerns through social media sites is to avoid using them. Of course, that is rather like suggesting that the best way to avoid a wiretap is to not speak on the phone, so odds are good that you will continue using social media and accepting the risk of somewhat eroded privacy. However, if you do feel that you have experienced a breach of your privacy in violation of a site’s privacy policy, consider speaking with your state’s attorney general or reporting the situation to the FTC. You may also want to consult with an attorney. You can find a lawyer experienced in internet privacy laws by visiting HG.org’s attorney search feature.


Source:  https://www.hg.org/article.asp?id=36795


It has been two years since the Court of Justice of the European Union established the “Right to be forgotten” (RTBF). Reputation VIP subsequently launched Forget.me as one way for consumers in Europe to submit RTBF requests to Bing and Google.

The company has periodically used consumer submissions through the site (130,000 URLs) to compile and publish aggregate data on RTBF trends. A new report looks at two years’ worth of cumulative data on the nature, geographic location and success rates of RTBF requests.

The top three countries from which RTBF requests originate are Germany, the UK and France. In fact, more than half of all requests have come from Germany and the UK.



RTBF top countries



Google refuses roughly 70 percent to 75 percent of requests, according to the data. The chart below reflects the most common categories or justifications for URL removal requests, on the left. On the right are the reasons that Google typically denies RTBF requests.

Google most frequently denies removal requests that concern professional activity. Following that, Google often denies requests where the individual involved is the source of the content sought to be removed.



RTBF data Reputation VIP



The following list shows the breakdown of URLs submitted by site category. Accordingly, Europeans request more link removals from social sites than any other category. That’s followed by directories, blogs and so on:

Social networks/communities
Directories/Content aggregators
Press sites
Others (real estate, e-commerce, adverts, events, etc.)
By comparison, the links that are actually removed are more often from directories (not clearly defined here) than other site categories. Social site link removals are granted much less often than they’re requested.



RTBF removals Reputation VIP


Reputation VIP also points out in the report that Google’s processing time has improved in the two years since RTBF was announced. It has cut time from 49 days per request to 20 days (or less), according to the report.


Source:  http://searchengineland.com/report-2-years-75-percent-right-forgotten-asks-denied-google-249424





This article attempts to convey the joys and frustrations of skimming the Internet trying to find relevant
information concerning an academic’s work as a scientist, a student or an instructor. A brief overview of
the Internet and the “do’s and don’ts” for the neophyte as well for the more seasoned “navigator” are given.
Some guidelines of “what works and what does not” and “what is out there” are provided for the scientist
with specific emphasis for biologists, as well as for all others having an interest in science but with little
interest in spending countless hours “surfing the net”. An extensive but not exhaustive list of related
websites is provided.


In the past few years the Internet has expanded to every aspect of human endeavor, especially since the
appearance of user-friendly browsers such as Netscape, Microsoft Internet Explorer and others. Browsers
allow easy access from anywhere in the world to the World Wide Web (WWW), which is a collection of
electronic files that are the fastest growing segment of the Internet. Correspondingly, we are drowning in
a sea of information while starving for knowledge. Can we manage this wealth of information into digestible
knowledge? Yes! With help and perseverance. However, given the magnitude and rate at which the Internet
changes, this article cannot provide a comprehensive guide to available resources; rather, it serves primarily
as a starting-point in the individual quest for knowledge.


The Internet is a worldwide computer network started by the US government primarily to support education
and research. Many books and reviews exist that detail the Internet in almost every aspect. Among these,
“The World Wide Web–Beneath the Surf” by Handley and Crowcroft (1) gives basic information and
history. A succinct overview in a tutorial format has been set up by the University of California at Berkeley
Library (2). It provides a quick start to finding information through the Internet. Information about teaching
and learning through the “Web” can also be found in study modules set up by Widener University’s
Wofgram Memorial Library (3). For the science aficionado, concise information containing a primer to the Internet for the biotechnologist can be found in a recent review by Lee at al., 1998 (4).

For more in-depth knowledge, two books of interest to the biologist are Swindell et al., 1996 (5) and by Peruski and Peruski,
1997 (6). However, given the scope and the rate of growth of the Internet, estimated at 40 million servers
and predicted to reach over 100 million servers by the year 2000 (7), any review can become obsolete within
months of publication. (Table 1 illustrates growth estimates of the Internet).


What are URLs?

URL stands for Universal (or Uniform) Resource Locator and is analogous to the address protocol used
in sending and receiving regular mail. The first portion usually refers to the protocol type, for example:

• HTTP (hypertext transfer protocol) allows users to access the information in hypertext format, namely
clickable sites and multimedia (sound, graphics, video).
• FTP (file transfer protocol) permits transfer of files, whether these are text files, image files or
software programs.
• GOPHER is an obsolete text transfer protocol without multimedia access that preceded HTTP.
The next portion of the URL is a set of letters or numbers that indicate website address and files. For a more
detailed explanation see “Understanding and decoding URLs” by Kirk, 1997 (8).


Due to the size of the Internet, one needs to rely on various software, called search engines, to find
appropriate information. A common start-up site that can provide quick subject catalogs by topic area is
Yahoo (11). Many single or multiple database search engines perform broad searches on a topic by keyword. Links to these can be found through the Internet Public Library (IPL) (12). The most popular engines include: Lycos (13), Excite (14), Infoseek (15), Dogpile (16), and Metacrawler (17). A recent addition that allows for one-step searching of web-pages and full-text journals is Northern Light (18). This engine is recommended for scientists, but access to its full text articles requires payment. A comparison of various search engines’ performance with overall tips for Internet searching can be found at the Okanagan University College Library (19).

Other sites containing links to sites of scientific relevance include SciCentral (20), SciWeb (21), BioMed
Net (22) and Science Channel (23), among others. A comprehensive list cataloguing selected sites for
biomedical sciences can be found at Biosites (24) and at the IPL Biological Sciences Reference (25).
Timely topics in science are provided by Scientific American (26). Abstracts of scientific articles catalogued
by the National Library of Medicine can be searched for free using Medline (27), and those catalogued by
the National Agricultural Library, using Agricola (28). Some sites allow for free perusal of full text but few
such journals exist. A good site for development, cell science and experimental biology can be found at the
Company of Biologists (29). Some free online magazines that may be of interest include: In Scight (30)
produced by Academic Press in partnership with Science Magazine, ScienceNow (31) sponsored by the
American Association for the Advancement of Science, UniSci (32), HMSBeagle (33) from BioMedNet.
As well as Network Science (34).

Despite the abundance of websites, effective and efficient searching can be frustrating when a query results
in over 100,000 hits. Successful search strategies are typically through experience and discipline, although
following the guidelines indicated by (2, 3) and the comprehensive basic guide for general researching and
writing from the IPL (35), can be most helpful. Nonetheless, searching through the Internet has become
a common and convenient feature, necessitating one to approach each WWW site with caution. Some
guidelines are given below.


The Internet changes daily as resources are added, changed, moved or deleted. Millions of people, young
and old, as individuals or within organizations create resources ranging from basic information about
themselves, their interests or their products, to complex lists of funding resources, multimedia textbooks,
full-text journals, clinical information systems, epidemiological and statistical databases, and the like. One
of the most pressing needs is to evaluate these resources for accuracy and completeness. All information
should be received with skepticism, unless an evaluation of a site can be performed.

Relevant question in evaluating a site include the following: Is the site affiliated with a reputable institution
or organization such as a University, government or research institution? URL’s may reveal this information:
“edu” includes most educational institutions, “gov” indicates government affiliated sites, and “com” refers
to commercial enterprises, while “org” suffixes are used by many non-profit organizations. The two-letter
suffix on non-USA sites indicates the country of origin (8). Is there a tilde (~) in the site address? Usually
personal webpages are indicated with a tilde, and although not necessarily bad, one should be particularly
careful when evaluating such sites. Other questions to keep in mind: Is there a particular bias? Who is the
author? What are their credentials? How current is this site? Many sites have been abandoned and sit as
“junkyards” of old information. How stable is the site? Is the general style of the site reliable? Consider
grammar and spelling

Critical evaluation of websites

Many websites provide strategies for the critical evaluation of webpages. The University of Florida with
a list of short tips (36), Purdue University provides a step by step checklist (37), and Widener University
has page-specific checklists (38). Another list of evaluating resources posted by many librarians can be
found through the University of Washington Libraries site (39).

The following are some points to consider when visiting sites:
1. Content: is real information provided? Is content original or does it contain just links? Is the
information unique or is it a review? How accurate is it? What is the depth of content?
2. Authority: who or what is the source of the information? What are the qualifications?
3. Organization: how is the site organized? Can you move easily through the site? Is the information
presented logically? Is the coverage adequate? Can you explore the links easily? Is there a search
engine for the site?
4. Accessibility: can you access the server dependably? Does the site require registration? If so, is it
billed? Can it be accessed through a variety of connections and browsers? Is it friendly for text
viewers? How current is it? Is it updated regularly?
5. Ratings: is the site rated? By whom? Using what criteria? How objective is it? If the site is a rating
service itself, does it state its criteria?


Information from any source should be properly referenced whenever possible as intellectual property and
copyright laws usually apply. Electronically stored information presents new challenges since no method
exists to easily monitor this vast “global library”. However, scholarly activity should maintain a high
standard of conduct by following appropriate citation protocols.

Several citation formats exist for referencing webpages. Two common citing conventions are the MLA style
from the Modern Language Association of America (40), and the APA style from the American
Psychological Association (41). The latter acknowledges a guide by Li and Crane, 1996 (42) to its style
for citing electronic documents. Slight variations exist, depending on whether the citation is from individual
works, parts of works, electronic journal articles, magazine articles, or discussion list messages. Detailed
information for these can be found in Crane’s webpages (43), for APA style and for the MLA style (44).
A proposed Web extension to the APA style has recently been reported by Land (45). Consider however,
that there are many citation style guides for electronic sources. Some of these sites are listed at the
University of Alberta Libraries (46).

All references should generally contain the same information that would be provided from a printed source
(or as much of that information as possible). If author of the site is given, their last name and initials are placed first, followed by the date that the file was created or modified (full date in day/month/year format or year, month/date if feasible) and the title of the site in quotations. If affiliation to organization is known, this should be indicated. The date the resource was accessed is placed next (day/month/year or year, month/date), and finally the complete URL within angle brackets. Care should be taken not to give

authorship to webmasters who are responsible for posting or maintaining information on webpages and are
not the originators of the contents. However, they can be referenced as editors with the generic Ed.
abbreviation. Finally, in some instances, Internet resources are also published on hard copies, in those cases,
the appropriate citation format should be followed and the URL address should also be indicated.

Organization of Bibliography

Bibliographic format varies according to the preference of the publisher, institution, or journal. In general, include authors in alphabetical or in numerical order of appearance. Some prefer separate bibliographies for paper-based “hardcopy” references and for “softcopy” electronic sources. Others permit intermixing (as in the present article). If the author is unknown, site names are listed in appropriate order. Should some information be missing, it is acceptable to omit this information and still cite the reference. For example, some sites may not show authors or dates or have any indication of affiliation. However, the URLs should always be indicated.


The Internet holds vast and exciting possibilities for the scientific community and for society as a whole.
The power of the individual can be multiplied by the “click of a mouse” as new capabilities are provided
by linking various computing systems to the global village. Nevertheless, the Internet as seen through the
WWW can be addictive. One “click” effortlessly from one site to another in a seemingly endless and aimless
loop. Enjoy or despair, at your own risk!

Written By: E. Misser


Page 4 of 4


World's leading professional association of Internet Research Specialists - We deliver Knowledge, Education, Training, and Certification in the field of Professional Online Research. The AOFIRS is considered a major contributor in improving Web Search Skills and recognizes Online Research work as a full-time occupation for those that use the Internet as their primary source of information.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.