Just about every player in the Big Data and analytics game was in New York last week at the Strata + Hadoop World conference, to showcase their latest technologies.

Over 7,000 people attended the event where keynote speakers, including White House chief data scientist DJ Patil, laid out their visions for where machine learning, analytics, the Internet of Things, autonomous vehicles and smart cities will be taking us in the near future.

Here are a few of my highlights from the event and some of the major announcements from key players.

DJ Patil spoke about how during his time so far as the U.S.’s first chief data scientist, his mission has been to “responsibly unleash the power of data to benefit all Americans.”

He spoke about how Big Data and analytics is helping to reduce damage by opioid abuse, and the importance of openness.

“When the president first started in office there was about 10 [open] data sets put out there, now there are about 2,000”, he said. Whatever you think of Barack Obama’s presidency that is an impressive achievement, as it means that anyone from major corporations to armchair data scientists can now use data to develop new strategies and technologies to harness it.

Martin Hall, chief data scientist for Big Data solutions at Intel, told his audience that the explosion of interest and activity in Big Data means that “we now have the data, the analytics and the compute power to deliver more than insights – we can enable intelligence.”

The arrival of personalized medicine, autonomous cars and smart connected devices mean that we are now entering the age of AI, and rather than a simple Internet of Things, we are heading towards an Internet of Intelligent Things, with ever increasing levels of automation. This vision undoubtedly has huge implications on just about everything concerning how we live our lives, jobs, and how we interact and communicate with the world and each other. 


This year’s conference served as confirmation that streaming real time analytics has moved firmly into the mainstream of data science, and rather than simply being a pipe dream or end goal, is fast becoming a reality. Apache’s open source Kafka engine is seen as the driving force which is enabling this shift, and big players were keen to show their support for this particular piece of technology.

Cloudera – one of the biggest distributors of open source platforms – announced the upcoming release of version 5.9 of its own Hadoop distribution which will ship with Spark 2.0 for the first time, as well as the latest release of Apache Kudu, which is specifically tooled toward real time analytics. It also announced that its distribution will run on Microsoft’s Azure cloud infrastructure for the first time (as well as Amazon Web Services and Google’s Cloud Platform). It also announced a new pay-as-you-go pricing model which it will offer alongside its existing annual subscription model.

IBM’s rock, paper, scissors-playing robot Marvin was let loose to entertain the crowds. Taking on all comers and appearing to show, by the increasing win rate, that computers are increasing in their ability to predict our behavior. Marvin is powered by Apache Spark and you can see a brief video of him in action here. IBM also announced a new initiative known as Project Dataworks which aims to use Spark to make more data available for processing through its Watson cognitive computing engine.

SAP spoke about how their recently announced acquisition of Californian Big Data startup Altiscale will boost its HANA cloud service offerings, such as offering access to its cloud-based Spark services. It also showcased its Vora query engine which deploys machine learning to enhance contextual awareness of AI-driven Big Data operations.

These are just a few of the highlights from this year’s event – expect more from me soon, in the meantime if you were at Strata + Hadoop World, why not let me know what you were most excited by?

Bernard Marr is a best-selling author & keynote speaker. His new book: 'Big Data in Practice: How 45 Successful Companies Used Big Data Analytics to Deliver Extraordinary Results'

Original source of this article is forbes

Categorized in Internet Technology

The new version of a SQL query engine for Hadoop scheduled for release later this month adds “auto-cubes,” a type of multi-dimensional dataset that, in this implementation, consists of aggregated “micro-cubes” that are generated based on usage patterns learned by the SQL accelerator.

New York-based Jethro (formerly JethroData) said it plans to release its 2.0 version during Stata+Hadoop later this month. Jethro’s platform is a combination of two engines: columnar SQL database and search indexing.

The startup said the new auto-cubes feature complements its full-indexing and intelligent caching approaches designed to accelerate SQL query performance in business intelligence and dashboards. Auto-cubes “are generated based on user activity and are maintained and updated transparently,” Jethro CTO Boaz Raufman noted in a statement.

The startup further claims that dynamically aggregated micro-cubes reduce the need for complex data cube designs while boosting coverage via “hundreds” of small cubes while supporting incremental data loads. The combination of dynamic aggregation of auto-cubes and indexing is intended to further accelerate SQL queries to handle a broader range of big data use cases. It also provides interactive response times for business intelligence applications in more user scenarios, Raufman added.

Along with auto-cubes, Jethro 2.0 also boosts support for Qlik View and Sense along with broader SQL coverage with expanded math functions. The point is to make big data analytics work in real time.

The startup differentiates its acceleration engine technology by allowing users to keep data on Hadoop while retaining the performance of an electronic data warehouse engine. The engine is “sandwiched” between a BI tool and existing data sources. Jethro is intended to accelerate BI tool reporting and visualizations without overtaxing a Hadoop cluster.

Jethro essentially takes a column-oriented database (like Vertica or Impala) and combines it with a search engine indexing tool. The resulting columnar-based database is fully indexed, where each additional column of data is treated as its own index.

The index-based SQL engine for Hadoop seeks to enable organizations to use their BI tools with large datasets while maintaining interactive speed. It works by fully indexing select datasets in Hadoop. BI queries use indexes to access only the data they need instead of scanning an entire dataset. The result is supposed to be increased speed and less stress on computing resources.

Jethro has scheduled a webinar for Sept. 15 to demonstrate how new “auto microcubes” work with indexing and smart caching to accelerate interactive business intelligence applications.

The startup has so far raised $12.6 million in two funding rounds, including an $8.1 million Series B funding round in March 2015.

Source : https://www.datanami.com/2016/09/13/jethro-indexer-adds-auto-cubes/

Categorized in Internet Technology


The local library is a great place to fill your summer reading list for free. But it also remains a vital source for research. And in an era of online searches, librarians at the New York Public Library are still the most- trusted source.

They have been called the "Human Google," said CBSN's Elaine Quijano. And though they may not be as fast as your favorite Internet search engine, they're as reliable as ever.

The Fifth Avenue branch of the New York Public Library attracts about 2.5 million visitors each year.

Many pose with the lions named Patience and Fortitude ... snap pictures in the grand entry hall ... and pass through the reading rooms without cracking a book.

But the tables are full here. "Shushing" happens as much as you may remember. And the phones keep ringing for researchers.

"One of the number one comments that we get from callers is, 'Thank God I've reached a human being," said Rosa Li, who manages the library's "Ask Desk." "Even on chat sometimes people will say, 'Is this a robot or a person?' We have to laugh and say, 'I'm a real person.'"

The Ask Desk receives about 300 inquiries a day -- via telephone, email, chat or text message. "Facebook, Twitter, and even snail mail queries from New Yorkers and even people from around the world," Li said.

Researchers here can access materials not available to the general public, but Google -- and even Wikipedia -- are not off-limits.

"We love the fact that more and more things are online," Li said. "The computer is a tool for us, so the faster we can find an answer for somebody, the better."

While the average Google search takes 0.2 seconds, this human search engine is a bit slower. Five minutes per call is typical.

Quijano asked, "Is there such a thing as a typical question?"

"Uhm, no, not when you work in reference."

Here are some recent questions they've received:

"I need to know the exterior dimensions of Radio City Music Hall."
"I am looking for a New York City law that prohibits solicitation by monkeys."
"I'm looking for information on the history of black lipstick."
"Are the lions Patience and Fortitude in front of the Library life-size or larger-than-life?"

Researcher Bernard van Maarseveen keeps a file card archive on hand for the queries best defined as random.

Such as the definition of Lobro: "Well, I guess this is a city nickname, a neighborhood nickname, that didn't quite pan out ... Lobro borders NoHo, SoHo and Little Italy." Yet, unlike those well-known neighborhoods, "it didn't quite catch on."

Quijano asked, "What is the most interesting that you've ever received?"

"Well, it's usually the last one that I've gotten," van Maarseveen replied. "There is one that I've been working on about Manhattan. There is this one caller who found that their street -- East 84th Street -- is wider than the ordinary street. I didn't quite believe them at first, so I actually went up to their block and I measured it out. And it's true. It's about seven feet wider than the standard block!"

"Wow! So Bernard, you are awfully dedicated."

"You know, I'm glad that I'm able to do this job," he said. "Don't tell the management, but it's kind of like I'm always amazed that I get paid to do this work!"

Surprising as it may sound, that sentiment is shared on this floor, where people proudly answer whatever's on your mind.

Quijano asked Li, "What is it that you are able to discern after you've answered a question?"

"Gratitude," she replied. "Also, that moment -- that 'A-ha!,' that 'A-ha!' moment is great to listen to. Hearing that joy in their voice. It's almost like a little checkmark goes off and it's like, OK, I've managed to accomplish that!"

Souce : http://www.cbsnews.com/news/meet-the-human-google-at-the-new-york-public-library/


Categorized in Search Engine


World's leading professional association of Internet Research Specialists - We deliver Knowledge, Education, Training, and Certification in the field of Professional Online Research. The AOFIRS is considered a major contributor in improving Web Search Skills and recognizes Online Research work as a full-time occupation for those that use the Internet as their primary source of information.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.