Oracle’s Jump Into the Big Data Realm

October 3, 2011

Matt Topper

Many of you may have seen that Oracle officially announced their new Big Data product offerings today. Included in that list is the Apache distribution of Hadoop, Oracle Loader for Hadoop, Oracle Data Integrator Adapter for Hadoop, the Oracle NoSQL Database, and Oracle’s R Enterprise. On the Oracle Big Data pages it seems like there is some confusion as to whether R will run a custom Oracle version or an open source version. Knowing Oracle it may be both. They have been working on a version of R within the Oracle database and might be augmenting it with an open source version on the Big Data appliance. Let’s cover what is known so far about the different components.

Apache Distribution of Hadoop
I’m actually really surprised here. Larry Ellison has acquired a Mike Olson company, previously Sleepy Cat a.k.a. Berkeley DB, and Mike now runs Cloudera. I would have bet good money that a partnership would have been formed. My guess is that Oracle looked at the management tools Cloudera provides and determined that it would be too hard to integrate into their Enterprise Manager product or that Cloudera’s price was too high. Cloudera is leading the Hadoop support market right now and have a really good future. Hadoop comes with a very large open source ecosystem around it. This solution should be great for both Hadoop and Oracle.

Oracle Loader for Hadoop
From what I have heard this software is a map reduce job that will format the resulting data set file into an Oracle Data Pump file to be loaded directly into the database. There has been the tools SQOOP to pull data out of the database, but now we have the other side of the coin. Hadoop is great, but joins are still a bit problematic and the BI tools around it don’t match what can be done in the traditional database world. The loader should help companies figure out what the balance should be.

Oracle Data Integrator Adapter for Hadoop
The Data Integrator Adapter is similar to the Oracle Loader for Hadoop, but it extends Oracle’s Data Integrator product to be able to execute and manage Hadoop jobs as part of an ETL process. It is well known that Hadoop can crunch and count numbers faster than the Oracle Database in many cases. This adapter allows the ETL processes to offload the heavy number crunching and then use the Loader to put the data into the Oracle database when complete.

Oracle NoSQL Database
For a long time it has seemed like Oracle was neglecting the Berkeley DB product and not making giant leaps forward. Berkeley has always been a fantastic product for key value stores. In fact many of the major key value stores today are underpinned with Berkeley. It looks like Oracle has updated the product to bring many of the missing distributed features into the new product. It will be interesting to see how the new Times Ten Database, NoSQL Database, Hadoop and ExaData components work together in the Oracle BI tools over time.

Oracle R Enterprise
R has long been the language of choice for the statistics community. It isn’t clear if Oracle will be using the open source R-project.org distribution or will release their own. My guess is both. R-project will be deployed with the Big Data appliance and Oracle’s R within the database. This should make a bunch of the big data number crunchers from the SAS world happy.

I have been told that Larry and TK have given the green light to go full force into NoSQL. If that isn’t justification that it’s a “real thing”and that it is here to stay, I don’t know what is. Oracle has and will continue to invest significant resources into Big Data and bring new capabilities to market. Ultimately this is great for the NoSQL community.

This makes the team here at UberEther happy to see the largest software vendor come to the table and support the Hadoop community. It relieves some of our worries in developing out new log aggregation and risk adjustable access control product to the market. We know our product will be able to run on predefined hardware platforms for our largest customers and we can easily load the data into their legacy tools to reuse their existing investments. We still have a lot of work to do, but if you’re interested in hearing some more while we’re out here at Open World contact us and we’d be more than happy to show you what we’ve been working on.

Oracle’s Jump Into the Big Data Realm

Matt Topper

You might also enjoy

PIV-D needs to die. But then what?

Identity Management is More Important than CISOs Think

UberEther’s IAM Advantage Achieves FedRAMP High Authorization: Cost and Time Savings for Government Procurement and Security Teams